From: | Piotr Sulecki <Piotr(dot)Sulecki(at)sybilla(dot)traxelektronik(dot)pl> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | index being ignored for "limit n" queries |
Date: | 2005-08-09 12:12:49 |
Message-ID: | 42F89DC1.9040303@sybilla.traxelektronik.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ave!
Yesterday I noticed a problem with my PostgreSQL installation. I have
three database clusters, one using version 7.4.6 and the rest using
version 8.0.1. The problem manifests itself in all three installations.
(The three databases are copies of -- more or less -- the same data; now
I'm writing a program to really synchronize these databases, but that's
besides the point. I'll only describe one of them; the same can be said
of the others, too.)
I have two really big tables, the problem is with one of them. The table
looks as follows:
Table "public.pakiety"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------
pktid | integer | not null default
nextval('pktid_seq')
stid | smallint | not null
received | timestamp(6) with time zone | not null
measured | timestamp(0) with time zone | not null
station_time | timestamp(0) with time zone |
bezwzgl | smallint |
full_cycle | boolean |
Indexes:
"pakiety_pkey" PRIMARY KEY, btree (pktid)
"pakiety_stid_received_idx" UNIQUE, btree (stid, received)
"pakiety_measured_idx" btree (measured)
"pakiety_received_idx" btree (received)
"pakiety_stid_measured_idx" btree (stid, measured)
This table holds info about data packets sent by our automated meteo
stations.
The table contains about 15 million rows. Any type of a sequential scan
will be extremely slow on this table. Even an index scan will take
minutes if the index is not fit for the search.
What's unusual is that:
1) there is only a very limited set of used stid (station ID) values -
about 500 or so, and
2) there can be some very long periods of time (measured in months) when
a specific station doesn't send any packets at all.
That's why I created indices pakiety_stid_received_idx and
pakiety_stid_measured_idx. Unfortunately, when I'm looking for the last
data packet from a specific station, I get an index scan using only the
"pakiety_received_idx" index and not the "pakiety_stid_received_idx"
one. While it takes a split second for a recently active station, it
really takes several minutes for a station which has been quiet for
several months.
trax=# explain select * from pakiety where stid = 234::smallint order by
received desc limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..6.26 rows=1 width=33)
-> Index Scan Backward using pakiety_received_idx on pakiety
(cost=0.00..193599.37 rows=30915 width=33)
Filter: (stid = 234::smallint)
(3 rows)
What's funny, I get the same plan when using "limit 10000":
trax=# explain select * from pakiety where stid = 234::smallint order by
received desc limit 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..62621.32 rows=10000 width=33)
-> Index Scan Backward using pakiety_received_idx on pakiety
(cost=0.00..193756.63 rows=30941 width=33)
Filter: (stid = 234::smallint)
(3 rows)
I have just noticed another problem: when the limit goes even higher,
the planner decides to use another unfit index.
trax=# explain select * from pakiety where stid = 234::smallint order by
received desc limit 100000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=125101.46..125178.81 rows=30940 width=33)
-> Sort (cost=125101.46..125178.81 rows=30940 width=33)
Sort Key: received
-> Index Scan using pakiety_stid_measured_idx on pakiety
(cost=0.00..122288.52 rows=30940 width=33)
Index Cond: (stid = 234::smallint)
(5 rows)
The same index is also used if I remove the "limit n" clause entirely:
trax=# explain select * from pakiety where stid = 234::smallint order by
received desc;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Sort (cost=125293.11..125370.58 rows=30986 width=33)
Sort Key: received
-> Index Scan using pakiety_stid_measured_idx on pakiety
(cost=0.00..122474.14 rows=30986 width=33)
Index Cond: (stid = 234::smallint)
(4 rows)
The "pakiety_stid_received_idx" index never gets used. Any hints about
possible reasons of such behavior?
Regards,
Piotr Sulecki.
From | Date | Subject | |
---|---|---|---|
Next Message | Alban Hertroys | 2005-08-09 12:31:16 | Referencing "less-unique" foreign keys |
Previous Message | Richard Huxton | 2005-08-09 11:38:52 | Re: Case sensitivity |