Interesting behaviour !

From: Constantin Teodorescu <teo(at)flex(dot)ro>
To: "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Interesting behaviour !
Date: 1999-07-14 07:16:28
Message-ID: 378C394C.3487DBAC@flex.ro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please Cc: to teo(at)flex(dot)ro

I have a table called "note" that looks like this :

create table "note" (id serial,perioada int2,schema int2,explicatie
text,...);

The "note" table has 22.000 records and the record length is about 75
bytes (is has also a "text" field").

Because I am frequently accesing the table with queries like "... where
perioada=12" I was tempted to make also indexes on "perioada" and
"schema" field.

The tables have the following sizes (their file sizes into
/usr/local/pgsql/data/base....)

note 2.890 Kb
note_id 385 Kb
note_perioada 409 Kb
note_schema 466 Kb

I ran previusly "vacuum analyze" on that database ensuring that
statistical tables have been updated.

Trying some selects with explain I got the following results:

contabil=> explain select * from note where id=15;
NOTICE: QUERY PLAN:
Index Scan using note_id on note (cost=2.05 rows=2 width=87)

contabil=> explain select * from note where perioada=15;
NOTICE: QUERY PLAN:
Seq Scan on note (cost=1099.99 rows=1600 width=87)

contabil=> explain select * from note where schema=15;
NOTICE: QUERY PLAN:
Seq Scan on note (cost=1099.99 rows=432 width=87)

That means that searching on "perioada" field don't use "note_perioada"
index!!!

I know that the query optimisation take care of record lengths, table
sizes, index sizes, but I thought that in this case it will use
"note_perioada" index.

The distribution of "perioada" values within "note" records is like that
:

contabil=> select perioada,count(*) from note group by perioada;
perioada|count
--------+-----
4| 2
7| 66
8| 108
9| 135
10| 151
11| 146
12| 4468
13| 3045
14| 3377
15| 3207
16| 3100
17| 3039
18| 1789
19| 1
22| 2
(15 rows)

So, I think that PostgreSQL is doing right when he chooses not to use
"note_perioada" index for that type of query by comparing different
costs (althought it still remains strange at the first look).

Is there any chance to speed up that type of simple query (select * from
note where perioada=N) ?

I dropped the index and try with a "hash" index on the same "perioada"
field. The same result.

In this case, it seems that the "note_perioada" index will never be
used. That means it can be safely dropped without affecting the
application performance, isn't it? It is expected that the database will
grow in the same manner, with approx. the same nr. of records per
"perioada" field every month.

Best regards,

Please Cc: to teo(at)flex(dot)ro

===============================
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nuchanach Klinjun 1999-07-14 07:26:45 upgrade problem
Previous Message Magnus Hagander 1999-07-14 06:06:49 RE: [HACKERS] Updated TODO list