Quick Links

Re: pg 7.4.rc1, Range query performance

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	ow <oneway_111(at)yahoo(dot)com>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: pg 7.4.rc1, Range query performance
Date:	2003-11-11 01:54:38
Message-ID:	200311110154.hAB1scW20847@candle.pha.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

ow wrote:
> --- Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> > Strange 30 records takes 30x the time than one record. Can you run
> > ANALYZE and send us an EXPLAIN of the query to make sure it hasn't
> > changed?
> >
>
> explain analyze select * from Test
> where a >= '2002-06-18'
> and a <= '2002-07-18'
> and b = 5
> and c = 255
>
> QUERY PLAN
> Index Scan using ak_abc on test (cost=0.00..121.23 rows=34 width=53) (actual
> time=18.060..10726.387 rows=31 loops=1)
> Index Cond: (((a)::timestamp without time zone >= '2002-06-18
> 00:00:00'::timestamp without time zone) AND ((a)::timestamp without time zone
> <= '2002-07-18 00:00:00'::timestamp without time zone) AND ((b)::integer = 5)
> AND ((c) (..)
> Total runtime: 10726.663 ms

OK, I see now. You must have a lot of rows from '2002-06-18' to
'2002-07-18', but only 33 with the b,c conditions --- not much we can do
to speed this up because the condition on 'a' isn't restrictive enough
--- not sure if b or c is either. It is all those lookups to find the
rows that match a, then b/c that is taking the time. In fact, it now
make sense that it takes 30x time because all the time is spent
traversing the index looking for match #1, then match #2, etc. We would
do this quickly if there were lots of rows matching a specific 'a'
value, e.g.

> explain analyze select * from Test
> where a = '2002-06-18' <---
> and b = 5
> and c = 255

The index/cluster is grouping the rows, but the grouping is by timestamp
value, not by range >= '2002-06-18' and <= '2002-07-18'. Even though
you have index a,b,c, it really is only using the index on 'a' because
the index on b,c only happens when you have multiple duplicate 'a'
values, but in this case you have an entire months worth. The only
quick way would be to create a functional index on 'a', and cluster on
that:

create index ii on x (date_part("month", a), b,c)

or something like that. You can't actually index on a function and then
ordinary columns so you would need a pretty fancy function in plpgsql
that converted the a,b,c value into a nice text string and then index on
that. Then if you used that function call in your WHERE clause, the
index would be used and it would be very fast because all your 'a'
values would be the same, and it could then jump to b and c quickly.

Sorry there isn't a simple solution.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Re: pg 7.4.rc1, Range query performance at 2003-11-11 01:10:39 from ow

Responses

Re: pg 7.4.rc1, Range query performance at 2003-11-11 04:13:33 from Tom Lane

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Tom Lane	2003-11-11 04:13:33	Re: pg 7.4.rc1, Range query performance
Previous Message	Joe Conway	2003-11-11 01:22:03	Re: Query Problem