Re: Hash Right join and seq scan

From: James Pang <jamespang886(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Hash Right join and seq scan
Date: 2024-07-05 14:43:06
Message-ID: CAHgTRfdnKsB5mgoOQw9Hknki+kMkv-GnRNPTLT=ZSxKhLQPfPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

David Rowley <dgrowleyml(at)gmail(dot)com> 於 2024年7月5日週五 上午10:15寫道:

> On Fri, 5 Jul 2024 at 12:50, James Pang <jamespang886(at)gmail(dot)com> wrote:
> > we have a daily vacuumdb and analyze job, generally speaking it's
> done in seconds, sometimes it suddenly running more than tens of minutes
> with same bind variable values and huge temp space got used and at that
> time, explain show "Hash Anti join, Hash Right join" with seq scan two
> tables.
>
> There was talk about adding costing for run-time partition pruning
> factors but nothing was ever agreed, so nothing was done. It's just
> not that obvious to me how we'd do that. If the Append had 10
> partitions as subnodes, with an equality join condition, you could
> assume we'll only match to 1 of those 10, but we've no idea at plan
> time which one that'll be and the partitions might drastically vary in
> size. The best I think we could do is take the total cost of those 10
> and divide by 10 to get the average cost. It's much harder for range
> conditions as those could match anything from 0 to all partitions. The
> best suggestion I saw for that was to multiply the costs by
> DEFAULT_INEQ_SEL.
>
> I think for now, you might want to lower the random_page_cost or
> increase effective_cache_size to encourage the nested loop -> index
> scan plan. Good ranges for effective_cache_size is anywhere between 50
> - 75% of your servers's RAM. However, that might not be ideal if your
> server is under memory pressure from other running processes. It also
> depends on how large shared_buffers are as a percentage of total RAM.
>
> David
>

We already random_page_cost=1.1 and effective_cache_size=75% physical
memory in this database server. For this SQL,
-> Nested Loop Anti Join (cost=40.32..132168227.57 rows=224338
width=78)
Join Filter: (lower((p.ctinfo)::text) =
lower((w.ctinfo)::text))
-> Nested Loop Left Join
(cost=39.63..398917.29 rows=299118 width=78)
-> Append (cost=0.56..22.36 rows=8 width=54)
-> Index Scan using
wmdata_p0_llid_hhid_stime_idx on wmdata_p0 m_1 (cost=0.5
6..2.79 rows=1 width=54)
....
-> Append (cost=39.07..49312.09 rows=54978 width=78)
-> Bitmap Heap Scan on wmvtee_p0 w.1
(cost=39.07..1491.06 rows=1669 width=78)
Recheck Cond: ((m.partitionkeyid)::text =
(partitionkeyid)::text)
-> Bitmap Index Scan on
wmvtee_p0_partitionkeyid_intid_idx (cost=0.00..38.65 rows=1669 width=0)
Index Cond: ((partitionkeyid)::text
= (m.partitionkeyid)::text)
...
-> Append (cost=0.69..516.96 rows=4010 width=78)
-> Index Only Scan using
wmpct_p0_partitionkeyid_ctinfo_idx on wmpct_p0 p_1 (cost=0.
69..15.78 rows=124 width=78)
...

for nest loop path, since the first one estimated only "8" rows
, and they use partitionkeyid as joinkey and all are hash partitions , is
it better to estimate cost to 8 (loop times) * 1600 = 12800 (each one
loop map to only 1 hash partition bitmap scan ,avg one partition cost),
that's much less than 398917.29 of all partitions ? for secondary Nest
Loop Anti join could be rows 299118 rows * 15.78(avg index scan cost of
one partition) = 4,720,082 that still much less than 132168227.57 ?
for Hash Right join, is it possible to estimate by 8 seq
partition scan instead of all 32 hash partitions since the first query
estimated 8 rows only ?
extend statistics may help estimate count(partitionkeyid) based on
other columns bind variables, but looks like that did not help table join
case.

Thanks,

James

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2024-07-06 00:32:47 Re: Hash Right join and seq scan
Previous Message David Rowley 2024-07-05 02:15:42 Re: Hash Right join and seq scan