From: | Henry Drexler <alonup8tb(at)gmail(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | query performance, though it was timestamps,maybe just table size? |
Date: | 2012-11-30 13:22:07 |
Message-ID: | CAAtgU9TfeMJ==48ZjKAGJ8tptPEjyEE_bOxCWeh=wzbSD_W1TA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello, and thank you in advance.
Beyond the date vs timestamp troubleshooting I did, I am not sure what else
to look for, I know the increase of rows will have some affect but I just
don't think the query should go from 4 minutes to over 50.
system:
laptop - ubuntu 12.04 lts
"PostgreSQL 9.2.1 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit"
Summary:
I have two tables. I run a function called massive_expansion that will
look at customer_id and the_range - it will use these to query the table
massive to find all those dates for the customer_id that are in the_range
and insert these into another table.
Problem:
The table massive is about 65 million rows (about double what it was)
The query now takes 50 minutes (it used to take 4 minutes with data 1/2 the
size).
History:
When (I first started out) I had everything as date the query went
quite quickly - 4 minutes when the table massive is about 30 million rows.
I then re-did everything but this time instead of date I used the
actual timestamps (as a result there were a few more rows but only 2
million more) - the query went quickly - 5 minutes when the table massive
is about 32 million rows.
So I did not see an appreciable difference between the data having date
vs timestamp (all other things kept the same.) So I supposed I can rule
out date vs timestamp being an issue.
tsrange is always an interval of 7 days.
I have two tables:
First table:
CREATE TABLE massive
(
source character varying,
dateof timestamp without time zone,
customer_id bigint,
count_raw bigint
);
CREATE INDEX customer_id_dateof
ON massive
USING btree
(customer_id, dateof);
Second table:
CREATE TABLE critical_visitors
(
customer_id bigint,
dateof timestamp without time zone,
the_range tsrange
);
CREATE INDEX customer_id_range
ON critical_visitors
USING btree
(customer_id, the_range);
The function:
CREATE or replace FUNCTION massive_expansion(customer_id
bigint,the_range tsrange) RETURNS void AS $$
BEGIN
INSERT INTO massive_expansion_from_critical_visitors
(
select
massive.source,
massive.dateof,
massive.customer_id,
massive.count_raw
from
massive
where
massive.customer_id = $1
and
massive.dateof <@ the_range) ;
END;
$$ LANGUAGE plpgsql;
The query:
select
massive_expansion(customer_id,the_range)
from
critical_visitors;
Additional Detail:
I did a query against the 30 million volume with this query:
select
massive.source,
massive.dateof,
massive.customer_id,
massive.count_raw
from
massive
where
massive.customer_id = '<a customer_id goes here>'::bigint--$1
and
massive.dateof <@ '(2012-07-22 17:00:00,2012-07-29 17:00:00]'::tsrange;
With a query plan of:
"Index Scan using customer_id_sourcee on massive_m (cost=0.00..113.98
rows=1 width=28)"
" Index Cond: (customer_id = 9167174563::bigint)"
" Filter: (dateof <@ '("2012-07-22 17:00:00","2012-07-29
17:00:00"]'::tsrange)"
Then on the 65 million volume table I did the same query and got a plan of:
"Index Scan using customer_id_source on massive (cost=0.00..189.38
rows=1 width=28)"
" Index Cond: (customer_id = 9167174563::bigint)"
" Filter: (dateof <@ '("2012-07-22 17:00:00","2012-07-29
17:00:00"]'::tsrange)"
From | Date | Subject | |
---|---|---|---|
Next Message | ivan marchesini | 2012-11-30 13:25:24 | Re: difference in query performance due to the inclusion of a polygon geometry field |
Previous Message | Kevin Grittner | 2012-11-30 13:08:06 | Re: postgres timestamp data errors |