Re: Merge David and Goliath tables efficiently

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, nicolas paris <nicolas(dot)paris(at)riseup(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Merge David and Goliath tables efficiently
Date: 2023-06-19 11:53:50
Message-ID: 27cdb9c5-6dc7-87ff-303e-b0d0d20edbc4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6/19/23 09:46, Alvaro Herrera wrote:
> I came here to talk about partitionwise join, but then noticed you have
> already thought of that:
>
> On 2023-Jun-18, nicolas paris wrote:
>
>> Note that both plan acome from the same partitioned by hash table with
>> 100 parts, with a unique index on the list_id + hash_key. For strategy
>> 2.1, I turned on enable_partitionwise_join, since david table has the
>> same partitioning scheme as goliath including unique indexe. In both
>> case the query is:
>
> Hmm, I suppose the reason partitionwise join isn't having any effect is
> that the presence of WHEN NOT MATCHED clauses force an outer join, which
> probably disarms partitionwise joining, since each join pair would
> require to match for nulls, so there would be two matching partitions at
> the other end. A quick test for this hypothesis might be to try the
> MERGE without the WHEN NOT MATCHED clauses and see if partitionwise join
> works better.
>
> Maybe Tom L's new outer-join infrastructure in 16 allows to improve on
> this, not sure.
>

Not sure why would that disarm partitionwise join - attached is a simple
reproducer, generating two tables, loading 10000000 and 10000 rows into
them, and then doing explain on a simple merge.

IMHO the thing that breaks it is the ORDER BY in the merge, which likely
acts as an optimization fence and prevents all sorts of smart things
including the partitionwise join. I'd bet that if Nicolas replaces

MERGE INTO "goliath" ca
USING (SELECT * FROM "david" ORDER BY "list_id") AS t
..

with

MERGE INTO "goliath" ca
USING "david" AS t
...

it'll start doing the working much better.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
repro.sql application/sql 18.0 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message nicolas paris 2023-06-19 12:09:37 Re: Merge David and Goliath tables efficiently
Previous Message Tomas Vondra 2023-06-19 11:34:45 Re: Merge David and Goliath tables efficiently