From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | nicolas paris <nicolas(dot)paris(at)riseup(dot)net>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Merge David and Goliath tables efficiently |
Date: | 2023-06-20 12:45:30 |
Message-ID: | 3efe2d22-504a-a1bc-79fa-cff64c81e76a@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 6/20/23 12:02, nicolas paris wrote:
>...
>
> Definitely this makes sense to add the part_col in the join columns.
> Also it helps the planner to choose a better plan, since now it goes
> with per partition nested loop without having to trick the costs
> (either enable_hashjoin/random_page_cost), with my current workload so
> far.
>
Right. With non-partitionwise join the nestloop inner lookup has to do
indexscan on every partition (it can't decide which of the partitions
will have a match, and for costing we assume there's at least 1 row in
each lookup). Which essentially amplifies the amount of random I/O by a
factor of 100x (or whatever the number of partitions is).
That is, instead of doing 100x nested loops like this:
-> Nested Loop Left Join (cost=0.29..33.42 rows=8 width=47)
-> Seq Scan on david_98 david_99 (cost=0.00..1.08
-> Index Scan using goliath_98_id_part_col_idx on
Index Cond: ((id = david_99.id) AND ...)
we end up doing one nested loop with an inner lookup like this
-> Append (cost=0.29..557.63 rows=100 width=14)
-> Index Scan using ... goliath_1 (cost=0.29..5.57 ...
Index Cond: (id = david.id)
...
And this is per-loop, of which there'll be 500 (because the small david
table has 500 rows).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2023-06-22 00:53:10 | Re: extended statistics n-distinct on multiple columns not used when join two tables |
Previous Message | nicolas paris | 2023-06-20 10:02:17 | Re: Merge David and Goliath tables efficiently |