Re: Merge David and Goliath tables efficiently

From: nicolas paris <nicolas(dot)paris(at)riseup(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Merge David and Goliath tables efficiently
Date: 2023-06-17 21:42:48
Message-ID: 516e9f9f56314f3d812615b7e6aadfa0bf45e418.camel@riseup.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> > My interpretation reading the query plan is: well sized small
> > batches of upserts leverage the indexes while the regular join
> > choose the sequential scan, including sorting and hashing which
> > takes forever time and resources including disk.
>
> You may be right, but it's hard to tell without seeing the query
> plan.

Here are part of both plans:

Bad case (strategy 2.1):

-> Merge Left Join (cost=530202629.03..255884257913.32
rows=17023331531230 width=579)
Merge Cond: (david.list_id = ca.list_id)
-> Sort (cost=2019172.91..2024398.82 rows=2090361 width=569)
Sort Key: david.list_id
-> Append (cost=0.00..192152.41 rows=2090361 width=569)
-> Seq Scan on david_0 david_1 (cost=0.00..1812.52
rows=20852 width=569)
-> Seq Scan on david_1 david_2 (cost=0.00..1800.09
rows=20709 width=569)
-> Seq Scan on david_2 david_3 (cost=0.00..1794.44
rows=20644 width=569)

Good case (strategy 3):

Merge on goliath_23 ca (cost=2139.75..11077.17 rows=0 width=0)
-> Nested Loop Left Join (cost=2139.75..11077.17 rows=1000
width=575)
-> Limit (cost=2139.19..2495.67 rows=1000 width=569)
-> Index Scan using david_23_list_id_account_id_idx on
david_23 (cost=0.29..6794.16 rows=19058 width=569)
-> Index Scan using goliath_23_list_id_account_id_idx on
goliath_23 ca (cost=0.56..8.56 rows=1 width=14)
Index Cond: (list_id = david_23.list_id)

>
> Sounds very much like you'd benefit from tuning some cost parameters
> to
> make the index scan look cheaper.
> Not sure what 'batched indexed join' would be, but it very much
> sounds
> like a nested loop with an index scan.

Agreed, a 2M nested loop over index scan would likely work as well.
Would tuning the costs param could lead to get such large nested loop ?

> What PostgreSQL version are you using, what hardware? Did you tune it
> in
> any way, or is everything just default?

It is pg 15.3, on 2 cores / 8GO / 2TO ssds, with defaults cloud
provider parameters (RDS).

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2023-06-17 23:48:51 Re: Merge David and Goliath tables efficiently
Previous Message Tomas Vondra 2023-06-17 19:52:07 Re: Merge David and Goliath tables efficiently