Slow plan for MAX/MIN or LIMIT 1?

From: "Sam Wong" <sam(at)hellosam(dot)net>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Slow plan for MAX/MIN or LIMIT 1?
Date: 2013-09-24 09:24:07
Message-ID: 007801ceb907$d3706070$7a512150$@hellosam.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi There,

I have hit a query plan issue that I believe is a bug or under-estimation,
and would like to know if there it is known or if there is any workaround.

This event_log table has 4 million rows.

"log_id" is the primary key (bigint),

there is a composite index "event_data_search" over (event::text,
insert_time::datetime).

Query A:

SELECT min(log_id) FROM event_log

WHERE event='S-Create' AND

insert_time>'2013-09-15' and insert_time<'2013-09-16'

Query B:

SELECT log_id FROM event_log

WHERE event='S-Create' AND

insert_time>'2013-09-15' and insert_time<'2013-09-16'

ORDER BY log_id

What I want to achieve is Query A - get the min log_id within a range. But
it is very slow, taking 10 or 20 seconds.

If I don't limit the output to LIMIT 1 - like Query B - then it is
sub-second fast.

Explain of A - take 10~20 seconds to run

Limit (cost=0.00..132.54 rows=1 width=8)

-> Index Scan using event_log_pkey on event_log (cost=0.00..1503484.33
rows=11344 width=8)

Filter: ((insert_time > '2013-09-15 00:00:00'::timestamp without
time zone) AND (insert_time < '2013-09-16 00:00:00'::timestamp without time
zone) AND (event = 'S-Create'::text))

Explain of B - take a few milliseconds to run

Sort (cost=41015.85..41021.52 rows=11344 width=8)

Sort Key: log_id

-> Bitmap Heap Scan on event_log (cost=311.42..40863.05 rows=11344
width=8)

Recheck Cond: ((event = 'S-Create'::text) AND (insert_time >
'2013-09-15 00:00:00'::timestamp without time zone) AND (insert_time <
'2013-09-16 00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on event_data_search (cost=0.00..310.86
rows=11344 width=0)

Index Cond: ((event = 'S-Create'::text) AND (insert_time >
'2013-09-15 00:00:00'::timestamp without time zone) AND (insert_time <
'2013-09-16 00:00:00'::timestamp without time zone))

Plan of A thought that the index scan node will get the first row right at
0.00, and hence the limit node will get all the rows needed within 132.54
(because event_log_pkey are sorted)

I would like to point out that - this 0.00 estimation omits the fact that it
actually takes a much longer time for the index scan node to get the first
row, because 3.99M rows it comes across won't meet the condition filter at
all.

Other background info:

The event_log table has been vacuumed and analyzed.

I have PostgreSQL 9.2.4 (x64) on Windows Server 2008 R2 with me. 8GB ram.
1*Xeon E5606.

Thanks,

Sam

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2013-09-24 10:35:47 Re: Planner performance extremely affected by an hanging transaction (20-30 times)?
Previous Message Kevin Grittner 2013-09-23 14:00:16 Re: Planner performance extremely affected by an hanging transaction (20-30 times)?