Bad Query Plan with Range Query

From: Mark Williams <mark(dot)williams(at)jivesoftware(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Bad Query Plan with Range Query
Date: 2011-04-15 17:17:32
Message-ID: 4DA87DAC.30006@jivesoftware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We are experiencing a problem with our query plans when using a range
query in Postgresql 8.3. The query we are executing attempts to select
the minimum primary key id after a certain date. Our date columns are
bigint's holding a unix epoch representation of the date. We have an
index on the primary key and the date column.

For the following query just specified the predicate modificationDate >= ?

explain SELECT min(messageID) FROM Message WHERE modificationDate >=
1302627793988;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Result (cost=2640.96..2640.97 rows=1 width=0)
InitPlan
-> Limit (cost=0.00..2640.96 rows=1 width=8)
-> Index Scan using message_pk on message
(cost=0.00..3298561.09 rows=1249 width=8)
Filter: ((messageid IS NOT NULL) AND (modificationdate
>= 1302627793988::bigint))
(5 rows)

For some reason it is deciding to scan the primary key column of the
table. This results in scanning the entire table which is huge (10
million records).

However, if we specify a fake upper bound then the planner will
correctly use the date column index:

explain SELECT min(messageID) FROM Message WHERE modificationDate >=
1302627793988 and modificationDate < 9999999999999999;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=9.64..9.65 rows=1 width=8)
-> Index Scan using jvmssg_mdate_idx on message (cost=0.00..9.64
rows=1 width=8)
Index Cond: ((modificationdate >= 1302627793988::bigint) AND
(modificationdate < 9999999999999999::bigint))
(3 rows)

We have carried out all the usual maintenance tasks. We have increase
the statistics_target on both indexes to the maximum (1000) and
performed a vacuum analyze on the table. Our resource configurations are
very good since this is our production server.

Interestingly this does not appear to happen with exactly the same
database when using 8.4. Instead we get the correct plan without having
to add the upper bound.

Here is the full description of the the table. It contains upwards of 10
million rows.

Table "public.message"
Column | Type | Modifiers
------------------+------------------------+-----------
messageid | bigint | not null
parentmessageid | bigint |
threadid | bigint | not null
containertype | integer | not null
containerid | bigint | not null
userid | bigint |
subject | character varying(255) |
body | text |
modvalue | integer | not null
rewardpoints | integer | not null
creationdate | bigint | not null
modificationdate | bigint | not null
status | integer | not null
Indexes:
"message_pk" PRIMARY KEY, btree (messageid)
"jvmssg_cdate_idx" btree (creationdate)
"jvmssg_cidctmd_idx" btree (containerid, containertype,
modificationdate)
"jvmssg_mdate_idx" btree (modificationdate)
"jvmssg_mdvle_idx" btree (modvalue)
"jvmssg_prntid_idx" btree (parentmessageid)
"jvmssg_thrd_idx" btree (threadid)
"jvmssg_usrid_idx" btree (userid)
Referenced by:
TABLE "answer" CONSTRAINT "answer_mid_fk" FOREIGN KEY (messageid)
REFERENCES message(messageid)
TABLE "messageprop" CONSTRAINT "jmp_msgid_fk" FOREIGN KEY
(messageid) REFERENCES message(messageid)

Any insight into this would be greatly appreciated. We are not able to
upgrade our databases to 8.4. We are reluctant to re-write all our range
queries if possible.

-m

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kenneth Marshall 2011-04-15 17:38:05 Re: Bad Query Plan with Range Query
Previous Message pasman pasmański 2011-04-15 15:15:06 Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?