From: | PFC <lists(at)peufeu(dot)com> |
---|---|
To: | "samantha mahindrakar" <sam(dot)mahindrakar(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Query running slow |
Date: | 2008-04-16 23:19:38 |
Message-ID: | op.t9p8e0oocigqcu@apollo13.peufeu.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, 16 Apr 2008 23:14:11 +0200, samantha mahindrakar
<sam(dot)mahindrakar(at)gmail(dot)com> wrote:
> Hi....
> Iam finding the following query is working a bit slow:
> EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'',
> measurement_start)
> FROM ' || gettablestring(dates)|| '
> WHERE lane_id IN (' || lanesidarr || ')))'
> INTO temparr;
>
> This function is trying to find all the days in a prticular month
> whihc has data for the particular lane and put it in an array...which
> can be used later.
> gettablestring(dates) returns the partition name from which the data
> needs to be extracted. These partitions have index on the
> measurement_start field.
> lanesidarr is a lane number. The partition has an index on this field to.
> Could anyone give me some hints???/
OK so I guess you have one partition per month since there is no month in
your WHERE.
If this is a table which hasn't got much write activity (probably the
case for last month's partition, for instance), CLUSTER it on something
appropriate that you use often in queries, like lane_id here.
And you can use SELECT foo GROUP BY foo, this will use a hash, it is
faster than a sort.
Example :
CREATE TABLE blop AS SELECT '2008-01-01'::TIMESTAMP + ((n%30)*'1
DAY'::INTERVAL) AS t FROM generate_series(1,100000) AS n;
ALTER TABLE blop ADD d DATE NULL;
UPDATE blop SET d=t;
VACUUM FULL ANALYZE blop;
-- Now blop contains 100K timestamps and 100K dates from the month 2008-01
EXPLAIN ANALYZE SELECT DISTINCT EXTRACT( DAY from t ) FROM blop;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Unique (cost=10051.82..10551.82 rows=30 width=8) (actual
time=221.740..289.801 rows=30 loops=1)
-> Sort (cost=10051.82..10301.82 rows=100000 width=8) (actual
time=221.737..250.911 rows=100000 loops=1)
Sort Key: (date_part('day'::text, t))
Sort Method: quicksort Memory: 5955kB
-> Seq Scan on blop (cost=0.00..1747.00 rows=100000 width=8)
(actual time=0.021..115.254 rows=100000 loops=1)
Total runtime: 290.237 ms
(6 lignes)
Temps : 290,768 ms
EXPLAIN ANALYZE SELECT EXTRACT( DAY from t ) AS day FROM blop GROUP BY day;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1997.00..1997.38 rows=30 width=8) (actual
time=198.375..198.390 rows=30 loops=1)
-> Seq Scan on blop (cost=0.00..1747.00 rows=100000 width=8) (actual
time=0.021..129.779 rows=100000 loops=1)
Total runtime: 198.437 ms
(3 lignes)
Temps : 198,894 ms
==> Hash is faster than Sort
EXPLAIN ANALYZE SELECT d FROM blop GROUP BY d;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1747.00..1747.30 rows=30 width=4) (actual
time=101.829..101.842 rows=30 loops=1)
-> Seq Scan on blop (cost=0.00..1497.00 rows=100000 width=4) (actual
time=0.012..33.428 rows=100000 loops=1)
Total runtime: 101.905 ms
(3 lignes)
Temps : 102,516 ms
==> Not computing the EXTRACT is faster obviously
(actually EXPLAIN ANALYZE adds some overhead, the query really takes 60 ms)
If you have an index lane_id, measurement_date, you can always do :
for day in 1..31:
find 1 row with which has this day
reutrn the days you found
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Sullivan | 2008-04-17 03:19:47 | Re: Oddly slow queries |
Previous Message | Craig James | 2008-04-16 22:47:08 | Re: Anybody using the Dell Powervault MD3000 array? |