Re: Hash Right join and seq scan

From: James Pang <jamespang886(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Hash Right join and seq scan
Date: 2024-07-08 02:21:58
Message-ID: CAHgTRfdVJXARbUdU7KjhRuTEk8tr7=H1pOrWemJ8eOvSW3VExw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Sorry for confusion, it's from attached explain output of the SQL.
please check attached. my questions is : for nestloop of two partition
tables , they use same partition key and equal join on partition key, the
cost could be "outer tables estimated rows" * (average index scan of only
one partition of inner table) , instead of "outer tables estimated rows"
* (index scans of all partitions), is it possible ? or it's still need
running time partition pruning enhancement?
random_page_cost = 1.1, seq_page_cost=1.0,
effective_cache_size=0.75*physical memory size. set random_page_cost=0.9
make optimizer to choose index scan instead of seq scan.

Thanks,

James

David Rowley <dgrowleyml(at)gmail(dot)com> 於 2024年7月6日週六 上午8:33寫道:

> On Sat, 6 Jul 2024 at 02:43, James Pang <jamespang886(at)gmail(dot)com> wrote:
> > for nest loop path, since the first one estimated only "8"
> rows , and they use partitionkeyid as joinkey and all are hash partitions ,
> is it better to estimate cost to 8 (loop times) * 1600 = 12800 (each one
> loop map to only 1 hash partition bitmap scan ,avg one partition cost),
> that's much less than 398917.29 of all partitions ?
>
> I'm not really sure where you're getting the numbers from here. The
> outer side of the deepest nested loop has an 8 row estimate, not the
> nested loop itself. I'm unsure where the 1600 is from. I only see
> 1669.
>
> As of now, we don't do a great job of costing for partition pruning
> that will happen during execution. We won't be inventing anything to
> fix that in existing releases of PostgreSQL, so you'll need to either
> adjust the code yourself, or find a workaround.
>
> You've not shown us your schema, but perhaps enable_partitionwise_join
> = on might help you. Other things that might help are further lowering
> random_page_cost or raising effective_cache_size artificially high.
> It's hard to tell from here how much random I/O is being costed into
> the index scans. You could determine this by checking if the nested
> loop plan costs change as a result of doing further increases to
> effective_cache_size. You could maybe nudge it up enough for it to win
> over the hash join plan. It is possible that this won't work, however.
>
> > for secondary Nest Loop Anti join could be rows 299118 rows *
> 15.78(avg index scan cost of one partition) = 4,720,082 that still much
> less than 132168227.57 ?
> > for Hash Right join, is it possible to estimate by 8 seq
> partition scan instead of all 32 hash partitions since the first query
> estimated 8 rows only ?
> > extend statistics may help estimate count(partitionkeyid) based
> on other columns bind variables, but looks like that did not help table
> join case.
>
> I can't quite follow this. You'll need to better explain where you're
> getting these numbers for me to be able to understand.
>
> David
>

Attachment Content-Type Size
sql_details.txt text/plain 67.2 KB

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2024-07-08 02:26:56 Re: Hash Right join and seq scan
Previous Message Vitalii Tymchyshyn 2024-07-08 00:42:27 Re: Low performance between datacenters