Help with query: indexes on timestamps

From: "Keith C(dot) Perry" <netadmin(at)vcsn(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Help with query: indexes on timestamps
Date: 2004-07-07 17:04:05
Message-ID: 1089219845.40ec2d0503fb7@webmail.vcsn.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap. This was just
upgraded from 7.4 (just to make sure I'm current). Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems. I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);

Here is the plan I get:

QUERY PLAN
----------------------------------------------------------------------------
Sort (cost=47420.64..47421.14 rows=200 width=8)
Sort Key: date_part('hour'::text, stamp)
-> HashAggregate (cost=47412.00..47413.00 rows=200 width=8)
-> Seq Scan on report (cost=0.00..42412.00 rows=1000000 width=8)
(4 rows)

Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part. The query under 7.4 ran in
about 8 seconds. In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet). On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds. These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question: How can I get this query to use an index build on the date_part
function. On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timestamptz)
RETURNS int4 AS
'begin
return date_part(\'hour\',$1);
end;'
LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
ON report
USING btree
(stamp)
WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row. I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table. I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work. Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2004-07-07 17:28:41 Re: Help with query: indexes on timestamps
Previous Message Bruno Wolff III 2004-07-07 17:01:12 Re: Grant Select privileges for all tables in schema