Re: Extremely slow HashAggregate in simple UNION query

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Felix Geisendörfer <felix(at)felixge(dot)de>, "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Extremely slow HashAggregate in simple UNION query
Date: 2019-08-22 05:08:23
Message-ID: CAFj8pRBsp+8FNEqVz9b76qMGn8P3-O5kFNiLkvu55Q+d+ntx5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

čt 22. 8. 2019 v 3:11 odesílatel Jeff Janes <jeff(dot)janes(at)gmail(dot)com> napsal:

> On Tue, Aug 20, 2019 at 11:12 AM Felix Geisendörfer <felix(at)felixge(dot)de>
> wrote:
> ...
>
>
>> [1] My actual query had bad estimates for other reasons (GIN Index), but
>> that's another story. The query above was of course deliberately designed
>> to have bad estimates.
>>
>
> As noted elsewhere, v12 thwarts your attempts to deliberately design the
> bad estimates. You can still get them, you just have to work a bit harder
> at it:
>
> CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select
> generate_series($1,$2) $$ rows 1000 language sql;
>
> EXPLAIN ANALYZE
> SELECT * FROM j(1, 1) a, j(1, 1) b
> UNION
> SELECT * FROM j(1, 1) a, j(1, 1) b;
> QUERY PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=80021.00..100021.00 rows=2000000 width=16) (actual
> time=11.332..13.241 rows=1 loops=1)
> Group Key: a.a, b.b
> -> Append (cost=0.50..70021.00 rows=2000000 width=16) (actual
> time=0.118..0.163 rows=2 loops=1)
> -> Nested Loop (cost=0.50..20010.50 rows=1000000 width=16)
> (actual time=0.117..0.118 rows=1 loops=1)
> -> Function Scan on j a (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.087..0.088 rows=1 loops=1)
> -> Function Scan on j b (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.027..0.027 rows=1 loops=1)
> -> Nested Loop (cost=0.50..20010.50 rows=1000000 width=16)
> (actual time=0.044..0.044 rows=1 loops=1)
> -> Function Scan on j a_1 (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.022..0.022 rows=1 loops=1)
> -> Function Scan on j b_1 (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.020..0.021 rows=1 loops=1)
> Planning Time: 0.085 ms
> Execution Time: 69.277 ms
> (11 rows)
>
> But the same advance in v12 which makes it harder to fool with your test
> case also opens the possibility of fixing your real case.
>

I think so much more interesting should be long time after query processing
- last row was processed in 13ms, but Execution Time was 69ms .. so some
cleaning is 56ms - that is pretty long.

> I've made an extension which has a function which always returns true, but
> lies about how often it is expected to return true. See the attachment.
> With that, you can fine-tune the planner.
>
> CREATE EXTENSION pg_selectivities ;
>
> EXPLAIN ANALYZE
> SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.00001)
> UNION
> SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.00001);
> QUERY PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=45021.40..45021.60 rows=20 width=16) (actual
> time=0.226..0.227 rows=1 loops=1)
> Group Key: a.a, b.b
> -> Append (cost=0.50..45021.30 rows=20 width=16) (actual
> time=0.105..0.220 rows=2 loops=1)
> -> Nested Loop (cost=0.50..22510.50 rows=10 width=16) (actual
> time=0.104..0.105 rows=1 loops=1)
> Join Filter: pg_always('1e-05'::double precision)
> -> Function Scan on j a (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.066..0.066 rows=1 loops=1)
> -> Function Scan on j b (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.035..0.035 rows=1 loops=1)
> -> Nested Loop (cost=0.50..22510.50 rows=10 width=16) (actual
> time=0.112..0.113 rows=1 loops=1)
> Join Filter: pg_always('1e-05'::double precision)
> -> Function Scan on j a_1 (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.077..0.077 rows=1 loops=1)
> -> Function Scan on j b_1 (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.034..0.034 rows=1 loops=1)
> Planning Time: 0.139 ms
> Execution Time: 0.281 ms
>
> Cheers,
>
> Jeff
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Felix Geisendörfer 2019-08-22 09:06:37 Re: Extremely slow HashAggregate in simple UNION query
Previous Message Jeff Janes 2019-08-21 18:26:33 Re: Extremely slow HashAggregate in simple UNION query