Re: SQL select query becomes slow when using limit (with no offset)

From: Kees van Dieren <keesvandieren(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: SQL select query becomes slow when using limit (with no offset)
Date: 2009-08-07 08:00:28
Message-ID: 13bb64e70908070100l30a297a4pa7c2c7d2d39d8c9f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for your response.

I think your analysis is correct, When there are more than 100 rows that
match this query, limit 100 is fast.

However, we often have less than hundred rows, so this is not sufficient for
us.

This suggestion ('OFFSET 0' trick) did not show differences in response time
(runs in 947ms).

One thing that helps, is limiting the set by adding this to where clause:
and events_events.dateTime > '2009-07-24'.
(query now runs in approx 500ms)

The workaround we implemented, is query caching in our application (Java,
with JPA / Hibernate second level query cache). This actually solves the
problem for us, but I'd prefer to get this query perform in postgres as
well. I'd think that the Postgresql query planner should be smarter in
handling LIMIT statements.

Would it get attention if I submit this to
http://www.postgresql.org/support/submitbug ? (in fact it is not really a
bug, but an improvement request).

Best regards,

Kees

2009/8/5 Russell Smith <mr-russ(at)pws(dot)com(dot)au>

> Kees van Dieren wrote:
> > Hi Folks,
> >
> > Thanks for your response.
> >
> > I have added the following index (suggested by other post):
> >
> > CREATE INDEX events_events_cleared_eventtype
> > ON events_events
> > USING btree
> > (eventtype_id, cleared)
> > WHERE cleared = false;
> >
> > Also with columns in reversed order.
> >
> > No changes in response time noticed.
> >
> > Index on cleared column already is there (indices are in sql file
> > attached to initial post.). eventtype_id has a foreign key constraint,
> > which adds an index automatically I believe?
> >
> > The explain analyze results for both queries:
> > explain analyze select events_events.id <http://events_events.id> FROM
> > events_events
> > left join events_event_types on
> > events_events.eventType_id=events_event_types.id
> > <http://events_event_types.id>
> > where events_event_types.severity=70
> > and not events_events.cleared
> > order by events_events.dateTime DESC LIMIT 100
> > >>>
> > "Limit (cost=0.00..125.03 rows=100 width=16) (actual
> > time=0.046..3897.094 rows=77 loops=1)"
> > " -> Nested Loop (cost=0.00..120361.40 rows=96269 width=16) (actual
> > time=0.042..3896.881 rows=77 loops=1)"
> > " -> Index Scan Backward using events_events_datetime_ind on
> > events_events (cost=0.00..18335.76 rows=361008 width=24) (actual
> > time=0.025..720.345 rows=360637 loops=1)"
> > " Filter: (NOT cleared)"
> > " -> Index Scan using events_event_types_pkey on
> > events_event_types (cost=0.00..0.27 rows=1 width=8) (actual
> > time=0.003..0.003 rows=0 loops=360637)"
> > " Index Cond: (events_event_types.id
> > <http://events_event_types.id> = events_events.eventtype_id)"
> > " Filter: (events_event_types.severity = 70)"
> > "Total runtime: 3897.268 ms"
> >
> The plan here is guessing that we will find the 100 rows we want pretty
> quickly by scanning the dateTime index. As we aren't expecting to have
> to look through many rows to find 100 that match the criteria. With no
> cross column statistics it's more a guess than a good calculation. So
> the guess is bad and we end up scanning 360k rows from the index before
> we find what we want. My skills are not up to giving specific advise
> on how to avert this problem. Maybe somebody else can help there.
> > explain analyze select events_events.id <http://events_events.id> FROM
> > events_events
> > left join events_event_types on
> > events_events.eventType_id=events_event_types.id
> > <http://events_event_types.id>
> > where events_event_types.severity=70
> > and not events_events.cleared
> > order by events_events.dateTime DESC
> > >>>
> > "Sort (cost=20255.18..20495.85 rows=96269 width=16) (actual
> > time=1084.842..1084.951 rows=77 loops=1)"
> > " Sort Key: events_events.datetime"
> > " Sort Method: quicksort Memory: 20kB"
> > " -> Hash Join (cost=2.09..12286.62 rows=96269 width=16) (actual
> > time=1080.789..1084.696 rows=77 loops=1)"
> > " Hash Cond: (events_events.eventtype_id =
> > events_event_types.id <http://events_event_types.id>)"
> > " -> Seq Scan on events_events (cost=0.00..9968.06
> > rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)"
> > " Filter: (NOT cleared)"
> > " -> Hash (cost=1.89..1.89 rows=16 width=8) (actual
> > time=0.077..0.077 rows=16 loops=1)"
> > " -> Seq Scan on events_event_types (cost=0.00..1.89
> > rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)"
> > " Filter: (severity = 70)"
> > "Total runtime: 1085.145 ms"
> >
> > Any suggestions?
> This plan is faster as you avoid the index scan. The planner is
> preferring to do a tablescan to find what it needs. This is much faster
> than the 360k random I/O index lookups. You can force this type of plan
> with a subquery and the OFFSET 0 trick, but I'm not sure it's the best
> solution.
>
> eg
>
> explain analyze SELECT * FROM
> (SELECT events_events.id <http://events_events.id> FROM events_events
> LEFT JOIN events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> WHERE events_event_types.severity=70
> AND not events_events.cleared
> ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100
>
> Regards
>
> Russell
>

--
Squins | IT, Honestly
Oranjestraat 23
2983 HL Ridderkerk
The Netherlands
Phone: +31 (0)180 414520
Mobile: +31 (0)6 30413841
www.squins.com
Chamber of commerce Rotterdam: 22048547

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-08-07 12:53:43 Re: SQL select query becomes slow when using limit (with no offset)
Previous Message Euler Taveira de Oliveira 2009-08-07 07:30:34 Re: Best settings to load a fresh database