Re: Performance of query (fwd)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Edmund Dengler <edmundd(at)eSentire(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of query (fwd)
Date: 2003-06-11 03:50:28
Message-ID: 7884.1055303428@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Edmund Dengler <edmundd(at)eSentire(dot)com> writes:
> select * from event
> where timestamp > (select now() - '2 hours'::interval)
> and exists (select 1 from hack_pull_sid where sid = event.sid)

> (note: <hack_pull_sid> is a table of SIDs I am interested in so that I
> avoid the issues with IN)

I think you're creating more issues than you're avoiding. With the
above query, the planner has little chance of guessing how many rows
will be retrieved from "event" ... and indeed the EXPLAIN output shows
that its guess is off by more than a factor of 1000:

> -> Index Scan using timestamp_idx on event (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05 rows=129 loops=1)

With a misestimate of that magnitude at the core of the query, it's
unsurprising that all the other plan choices are bad too.

But actually I suspect the easiest point of attack is not the EXISTS
subquery, but the timestamp comparison. Can you get your application
to supply a simple literal constant to compare to the timestamp, viz
'2003-06-10 21:44' rather than now()-'2 hours'? The former gives the
planner something to compare to its statistics, the latter doesn't.

Oh ... you have done an ANALYZE on event reasonably recently, no?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Snyder 2003-06-11 03:50:41 Re: error restoring large objects during pg_restore
Previous Message Edmund Dengler 2003-06-11 03:18:11 Performance of query (fwd)