Re: Optimizing suggestions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Ford <david+cert(at)blue-labs(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Optimizing suggestions
Date: 2002-06-23 17:21:42
Message-ID: 9233.1024852902@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

David Ford <david+cert(at)blue-labs(dot)org> writes:
> bmilter=> select count(*) from history where timestamp::date >
> (now()::date - '7 days'::interval) group by timestamp::date ;

> Now.. :) How can I further improve on this?

The date-restriction clause is poorly phrased: you're taking a timestamp
from now(), coercing to date, coercing back to timestamp (because there
is no date - interval operator, only timestamp - interval), subtracting
an interval to give a timestamp ... and then on the other side of the
comparison, coercing to date and then back to timestamp to do the
comparison (unless there's a date > timestamp operator, which I doubt).

Aside from all the wasted datatype conversion operators, there is no
hope of using this clause with an index on the timestamp column.

What I'd try is

where timestamp >= (current_date - 6)

(which I think has the same boundary-condition behavior as your given
expression; adjust to taste).

As of current development sources (7.3 to be) the above WHERE clause is
directly indexscannable. In existing releases you have to play some
games to get the planner to realize that (current_date - 6) can be
treated as a constant for the purposes of making an indexscan. One
way is to make a function called, say, "ago(int)" that returns
current_date minus the parameter, and mark it isCachable.

> In the near future I'm going to split this table up into many sub tables
>>
>> Why? Aren't you just going to have a need for lots of joins, if you
>> do that?

> No, actually most of this data will be drawn into focused reporting for
> graphing metrics. Only when I drill down to a specific (small) set of
> message serial numbers, will I be doing a join of all this data.

> In my naivety, which is more performance effective? Most of the time I
> will be searching for a count of statistics on a single column.

I'd guess that you're better off with the single table anyway, just on
grounds of simplicity. I really doubt that the split-up could provide
enough performance gain to be worth the trouble...

> p.s. I am assuming count(*) v.s. count(column) is always faster because
> it's doing a generic row count v.s. accumulating stats on a column
> inside a row?

Marginally faster, yes --- the count(column) form has to take the time
to see whether the column is null or not.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Curt Sampson 2002-06-23 17:40:00 Re: SQL server application porting headache
Previous Message Joe Conway 2002-06-23 17:08:37 Re: SELECT problem