Re: planner and having clausule

From: Ľubomír Varga <varga(at)plaintext(dot)sk>
To: pgsql-performance(at)postgresql(dot)org
Cc: info <info(at)plaintext(dot)sk>
Subject: Re: planner and having clausule
Date: 2013-09-06 08:44:20
Message-ID: 128353624.931378457060911.JavaMail.root@shiva
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi again, my mistake. I have found why there are not this optimization (thus I have found other one, correct, see bellow). I can have for example:

stocktaking_id | stat_item_start
------------------------------------
abc | 2013-01-01
abc | 2013-08-08

And when applied my "optimization", it will return me abc (minimum for abc is 2013-01-01 and it does not conform having restriction, but I have applied where restriction to date which broke my result...)

Proper optimization should be:

explain
SELECT stocktaking_id
FROM t_weighting
--proper optimization restriction
WHERE stocktaking_id IN (SELECT DISTINCT stocktaking_id FROM t_weighting WHERE stat_item_start BETWEEN '2013-08-01' AND '2013-09-01')
GROUP BY stocktaking_id
HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';

with result:
"HashAggregate (cost=15485.12..15490.08 rows=248 width=32)"
" Filter: ((min(public.t_weighting.stat_item_start) >= '2013-08-01 00:00:00'::timestamp without time zone) AND (min(public.t_weighting.stat_item_start) <= '2013-09-01 00:00:00'::timestamp without time zone))"
" -> Nested Loop (cost=222.05..15441.65 rows=5796 width=32)"
" -> HashAggregate (cost=8.47..8.48 rows=1 width=32)"
" -> Subquery Scan "ANY_subquery" (cost=8.45..8.47 rows=1 width=32)"
" -> HashAggregate (cost=8.45..8.46 rows=1 width=24)"
" -> Index Scan using idx_t_weighting_stat_item_start on t_weighting (cost=0.00..8.44 rows=1 width=24)"
" Index Cond: ((stat_item_start >= '2013-08-01 00:00:00'::timestamp without time zone) AND (stat_item_start <= '2013-09-01 00:00:00'::timestamp without time zone))"
" -> Bitmap Heap Scan on t_weighting (cost=213.58..15360.73 rows=5796 width=32)"
" Recheck Cond: ((public.t_weighting.stocktaking_id)::text = ("ANY_subquery".stocktaking_id)::text)"
" -> Bitmap Index Scan on idx_t_weighting_stocktaking_id_user_id (cost=0.00..212.13 rows=5796 width=0)"
" Index Cond: ((public.t_weighting.stocktaking_id)::text = ("ANY_subquery".stocktaking_id)::text)"

This will be probably a little bit harder to use in planner in general manner.

Best regards,
--
Ing. Ľubomír Varga
+421 (0)908 541 700
varga(at)plaintext(dot)sk
www.plaintext.sk

----- "Ľubomír Varga" <varga(at)plaintext(dot)sk> wrote:

> Hi. I have one query which possibly is not optimized by planner (not
> using index for aggregate having clause restriction):
>
> explain
> SELECT stocktaking_id
> FROM t_weighting
> GROUP BY stocktaking_id
> HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';
>
> with result:
> "HashAggregate (cost=59782.43..59787.39 rows=248 width=32)"
> " Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp
> without time zone) AND (min(stat_item_start) <= '2013-09-01
> 00:00:00'::timestamp without time zone))"
> " -> Seq Scan on t_weighting (cost=0.00..49002.39 rows=1437339
> width=32)"
>
> I have probably an obvious tough, that query will touch only rows with
> stat_item_start values only within given constrains in having clause.
> If (and only if) planner have some info that MIN and MAX aggregate
> functions could return only one of values that comes into them, it can
> search only rows within given constraints in having part of select.
> Something like this:
>
>
> explain
> SELECT stocktaking_id
> FROM t_weighting
> --added restriction by hand:
> WHERE stat_item_start BETWEEN '2013-08-01' AND '2013-09-01'
> GROUP BY stocktaking_id
> HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';
>
> with result:
> "HashAggregate (cost=8.45..8.47 rows=1 width=32)"
> " Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp
> without time zone) AND (min(stat_item_start) <= '2013-09-01
> 00:00:00'::timestamp without time zone))"
> " -> Index Scan using idx_t_weighting_stat_item_start on t_weighting
> (cost=0.00..8.44 rows=1 width=32)"
> " Index Cond: ((stat_item_start >= '2013-08-01
> 00:00:00'::timestamp without time zone) AND (stat_item_start <=
> '2013-09-01 00:00:00'::timestamp without time zone))"
>
> Is this optimization by planner possible, or it is already have been
> done on newer DB version (I am using PostgreSQL 8.4.13)? IMHO it
> should be added into planner if possible for all built in aggregate
> functions.
>
> Best regards,
> --
> Ing. Ľubomír Varga
> +421 (0)908 541 700
> varga(at)plaintext(dot)sk
> www.plaintext.sk

Browse pgsql-performance by date

  From Date Subject
Next Message Hannu Krosing 2013-09-06 11:38:56 Re: [PERFORM] encouraging index-only scans
Previous Message Ľubomír Varga 2013-09-06 08:05:22 planner and having clausule