Re: Fast HashJoin only after a cluster/recreate table

From: Alexandre de Arruda Paes <adaldeia(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Fast HashJoin only after a cluster/recreate table
Date: 2016-04-01 17:34:37
Message-ID: CAGewt-u4JVuibCPuRvPwXz8bcgi=SOagS1d=WKUxKdX1k_H5Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi David,

Thanks for the explanations. But I don't undestand why when I recreate the
table, the planner choose a best mode for sometime...

>I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a
> subquery in the SELECT clause to lookup the es08desdoc.

We have a problem with this approach. Actually, this querys are generated
by a framework and can't be 'rewrite'.
Can you think in another solution directly in DB (perhaps a partial index,
table partitioning, etc) ???

Best regards,

Alexandre

2016-04-01 10:17 GMT-03:00 David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>:

> On 1 April 2016 at 15:44, Alexandre de Arruda Paes <adaldeia(at)gmail(dot)com>
> wrote:
>
>> In the query below, the planner choose an extreme slow mergejoin(380
>> seconds). 'Vacuum analyze' can't help.
>>
>
> Yeah, it appears that planner is estimating the WHERE clause on es09t
> quite badly, expecting just 1 row, but there's actually 1212 rows. This
> seems to throw the whole plan decision out quite a bit, as, if you notice
> in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2
> rows to be present, therefore most likely thinks that it's not worth
> sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to
> match the known output order of Materialize node on the inner side of that
> join. Instead it assumes the Merge join will be good enough on just the
> es09tipdoc order, and just adds the other two columns as join filters....
>
> ... and this seems to be what's killing the performance. This Merge Join
> constantly has to perform a mark/restore on the Materialize node. This why
> you're getting the insanely high "Rows Removed by Join Filter: 992875295",
> in other words the join filter throw away that many row combinations
> because they didn't match.
>
> This mark/restore is basically the rewind process that the merge join
> algorithm needs to do to match many to many rows. In actual fact, this
> rewind is pretty useless in this case as the GROUP BY subquery ensures that
> no duplicate values will make it into the inner side of that merge join.
> The planner is not currently smart enough to detect this.
>
> There are some patches currently pending for 9.6 which might help fix this
> problem in the future;
>
> 1. multivariate statistics; this might help the poor estimates on es09t.
> If this was available you could add statistics on es09codemp, es09datreq,
> which may well improve the estimate and cause the plan to change.
> https://commitfest.postgresql.org/9/450/
> 2. Unique joins. This tackles the problem a different way and allows the
> Merge Join algorithm to skip the restore with some new smarts that are
> added to the planner to detect when the inner side of the join can only
> produce, at most, a single row for each outer row.
> https://commitfest.postgresql.org/9/129/
>
> If you feel like compiling 9.6 devel from source and applying each of
> these patches independently and seeing if it helps... Of course that does
> not solve your 9.4 production dilemma, but it may help evaluation of each
> of these two patches for 9.6 or beyond.
>
> I wonder what the planner would do if you pulled out the join to ES08T. If
> that generates a better plan, then providing that es08tipdoc is the primary
> key of that table, then you could just put a subquery in the SELECT clause
> to lookup the es08desdoc.
>
>
>>
>> QUERY PLAN
>>
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=289546.93..289546.94 rows=2 width=78) (actual
>> time=380405.796..380405.929 rows=2408 loops=1)
>> Sort Key: t1.es09numdoc, t1.es09tipdoc
>> Sort Method: quicksort Memory: 435kB
>> Buffers: shared hit=82163
>> -> Merge Left Join (cost=47.09..289546.92 rows=2 width=78) (actual
>> time=1133.077..380398.160 rows=2408 loops=1)
>> Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
>> Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
>> (es09t1.es09numdoc = t1.es09numdoc))
>> Rows Removed by Join Filter: 992875295
>> Buffers: shared hit=82163
>> -> Merge Left Join (cost=46.53..49.29 rows=2 width=70) (actual
>> time=12.206..18.155 rows=2408 loops=1)
>> Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
>> Buffers: shared hit=6821
>> -> Sort (cost=9.19..9.19 rows=2 width=44) (actual
>> time=11.611..12.248 rows=2408 loops=1)
>> Sort Key: t1.es09tipdoc
>> Sort Method: quicksort Memory: 285kB
>> Buffers: shared hit=6814
>> -> Nested Loop (cost=1.11..9.18 rows=2 width=44)
>> (actual time=0.040..10.398 rows=2408 loops=1)
>> Buffers: shared hit=6814
>> -> Index Scan using ad_es09t_1 on es09t t3
>> (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
>> loops=1)
>> Index Cond: ((es09codemp = 1) AND
>> (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
>> Filter: (es09usuari ~~
>> '%%%%%%%%%%%%%%%%%%%%'::text)
>> Buffers: shared hit=108
>> -> Index Scan using es09t1_pkey on es09t1 t1
>> (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
>> loops=1212)
>> Index Cond: ((es09codemp = 1) AND
>> (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
>> Filter: (es09tipdoc ~~ '%%%%%'::text)
>> Buffers: shared hit=6706
>> -> Sort (cost=37.35..38.71 rows=547 width=32) (actual
>> time=0.592..2.206 rows=2919 loops=1)
>> Sort Key: t2.es08tipdoc
>> Sort Method: quicksort Memory: 67kB
>> Buffers: shared hit=7
>> -> Seq Scan on es08t t2 (cost=0.00..12.47 rows=547
>> width=32) (actual time=0.003..0.126 rows=547 loops=1)
>> Buffers: shared hit=7
>> -> Materialize (cost=0.56..287644.85 rows=716126 width=23)
>> (actual time=0.027..68577.800 rows=993087854 loops=1)
>> Buffers: shared hit=75342
>> -> GroupAggregate (cost=0.56..278693.28 rows=716126
>> width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
>> Group Key: es09t1.es09codemp, es09t1.es09tipdoc,
>> es09t1.es09numdoc
>> Buffers: shared hit=75342
>> -> Index Only Scan using es09t1_pkey on es09t1
>> (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031
>> rows=7160921 loops=1)
>> Index Cond: (es09codemp = 1)
>> Heap Fetches: 51499
>> Buffers: shared hit=75342
>> Planning time: 50.129 ms
>> Execution time: 380419.435 ms
>>
>
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2016-04-03 02:53:27 Re: Searching GIN-index (FTS) and sort by timestamp-column
Previous Message David Rowley 2016-04-01 13:17:12 Re: Fast HashJoin only after a cluster/recreate table