From: | Jean-Christophe Boggio <cat(at)thefreecat(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Optimization with dates |
Date: | 2001-11-14 12:02:51 |
Message-ID: | 868508534.20011114130251@thefreecat.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hello,
Thanks for all the answers. A little feedback :
>> I have tried :
>> where dategain>=(select now()-30);
>> and many other, syntactically absurd :-)
TL> dategain >= ago(30)
TL> where "ago" is a function that computes "date(now()) - n" and is
TL> marked "iscachable".
create function ago(interval) returns timestamp as '
select now() - $1
' language 'sql' with (iscachable);
explain select count(*) from gains where dategain>=ago('30 0:00');
Aggregate (cost=180640.90..180640.90 rows=1 width=0)
-> Seq Scan on gains (cost=0.00..179761.71 rows=351676 width=0)
=====
explain select count(*) from gains where dategain>=ago('5 days');
Aggregate (cost=172340.65..172340.65 rows=1 width=0)
-> Index Scan using ix_gains_dategain on gains (cost=0.00..172202.94 rows=55084 width=0)
=====
explain select count(*) from gains where dategain>=ago('6 days');
Aggregate (cost=179929.06..179929.06 rows=1 width=0)
-> Seq Scan on gains (cost=0.00..179761.71 rows=66940 width=0)
TL> Just out of curiosity, do the indexed timestamps correlate closely to
TL> the physical order of the table? I'd expect that to happen if you
TL> are timestamping records by insertion time and there are few or no
TL> updates.
That's right, there are very few updates.
===========================================
Now, for Jason's idea :
set enable_seqscan to off;
SET VARIABLE
explain select count(*) from gains where dategain>=now()-30;
Aggregate (cost=100256770.86..100256770.86 rows=1 width=0)
-> Seq Scan on gains (cost=100000000.00..100250847.08 rows=2369512 width=0)
Strange isn't it ?
Is it possible to do the equivalent of "set enable_seqscan to off"
out of psql (in php or perl code) ?
===========================================
To answer Stephan and Josh :
SS> Is 2367640 a reasonable estimate for the number of
SS> rows that match the condition?
JB> Hmmm... if the number of rows is actually accurate (2M out of 10M in the
JB> last 30 days) then a Seq Scan seems like a good plan to me. If the
JB> numbers aren't accurate, it's time to run a VACUUM ANALYZE.
select avg(cnt) from (select count(*) as cnt from gains group by
date(dategain)) as foo;
avg
------------------
12009.6131756757
If I did it right, this should be the average number of rows per day.
The db exists since April 1st 2000.
select date('now')-date('2000-04-01');
592
select 592*12009;
7109328
select count(*) from gains;
count
---------
7109753
As you see, dategain is *quite* linear !
So to answer your question, a reasonable estimate for the number of
rows that match the condition is :
select 30*12009;
360270
The real answer is :
select count(*) from gains where dategain>=now()-30;
231781
SS> Have you run vacuum analyze?
Every night (and it's a VEERRYYYY long process, even dropping the
indexes before and recreating them afterwards, maybe that's the real
problem ?) Keeping the index makes the VACUUM process several hours.
We'll try 7.2 which should solve part of this problem but since these
are production systems, we wait a little feedback from 7.2 users.
SS> If the estimate is right, you'll probably find that
SS> the sequence scan is actually faster than an index
SS> scan since about 1/4 of the table is being selected.
It should select 1/592 of the table !
Any further advises VERY appreciated. Thanks again everyone for your
help.
--
Jean-Christophe Boggio
cat(at)thefreecat(dot)org -o)
Independant Consultant and Developer /\\
Delphi, Linux, Perl, PostgreSQL, Debian _\_V
From | Date | Subject | |
---|---|---|---|
Next Message | Haller Christoph | 2001-11-14 13:02:08 | ago() |
Previous Message | Eaglet | 2001-11-14 11:51:30 | handling exceptions, really not simple... :-(( |