From: | James Pang <jamespang886(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Hash Right join and seq scan |
Date: | 2024-07-08 02:21:58 |
Message-ID: | CAHgTRfdVJXARbUdU7KjhRuTEk8tr7=H1pOrWemJ8eOvSW3VExw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Sorry for confusion, it's from attached explain output of the SQL.
please check attached. my questions is : for nestloop of two partition
tables , they use same partition key and equal join on partition key, the
cost could be "outer tables estimated rows" * (average index scan of only
one partition of inner table) , instead of "outer tables estimated rows"
* (index scans of all partitions), is it possible ? or it's still need
running time partition pruning enhancement?
random_page_cost = 1.1, seq_page_cost=1.0,
effective_cache_size=0.75*physical memory size. set random_page_cost=0.9
make optimizer to choose index scan instead of seq scan.
Thanks,
James
David Rowley <dgrowleyml(at)gmail(dot)com> 於 2024年7月6日週六 上午8:33寫道:
> On Sat, 6 Jul 2024 at 02:43, James Pang <jamespang886(at)gmail(dot)com> wrote:
> > for nest loop path, since the first one estimated only "8"
> rows , and they use partitionkeyid as joinkey and all are hash partitions ,
> is it better to estimate cost to 8 (loop times) * 1600 = 12800 (each one
> loop map to only 1 hash partition bitmap scan ,avg one partition cost),
> that's much less than 398917.29 of all partitions ?
>
> I'm not really sure where you're getting the numbers from here. The
> outer side of the deepest nested loop has an 8 row estimate, not the
> nested loop itself. I'm unsure where the 1600 is from. I only see
> 1669.
>
> As of now, we don't do a great job of costing for partition pruning
> that will happen during execution. We won't be inventing anything to
> fix that in existing releases of PostgreSQL, so you'll need to either
> adjust the code yourself, or find a workaround.
>
> You've not shown us your schema, but perhaps enable_partitionwise_join
> = on might help you. Other things that might help are further lowering
> random_page_cost or raising effective_cache_size artificially high.
> It's hard to tell from here how much random I/O is being costed into
> the index scans. You could determine this by checking if the nested
> loop plan costs change as a result of doing further increases to
> effective_cache_size. You could maybe nudge it up enough for it to win
> over the hash join plan. It is possible that this won't work, however.
>
> > for secondary Nest Loop Anti join could be rows 299118 rows *
> 15.78(avg index scan cost of one partition) = 4,720,082 that still much
> less than 132168227.57 ?
> > for Hash Right join, is it possible to estimate by 8 seq
> partition scan instead of all 32 hash partitions since the first query
> estimated 8 rows only ?
> > extend statistics may help estimate count(partitionkeyid) based
> on other columns bind variables, but looks like that did not help table
> join case.
>
> I can't quite follow this. You'll need to better explain where you're
> getting these numbers for me to be able to understand.
>
> David
>
Attachment | Content-Type | Size |
---|---|---|
sql_details.txt | text/plain | 67.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2024-07-08 02:26:56 | Re: Hash Right join and seq scan |
Previous Message | Vitalii Tymchyshyn | 2024-07-08 00:42:27 | Re: Low performance between datacenters |