From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, tomas(at)vondra(dot)me, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com> |
Subject: | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Date: | 2025-03-28 03:02:10 |
Message-ID: | CAApHDvrOOz-4UPreW03=S73tN1Vgy2OHWZM58=wAuZUk1iosRw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 27 Mar 2025 at 23:27, Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> Further, I experimented with hash table size. Attached images have
> four graphs for planning time and planner's memory consumption
> measured for a 3-way join for initial has table sizes of 64, 128 and
> 256 respectively.
I put together a benchmarking script so I could learn the performance
of this patch. See attached.
It does not seem that surprising that you're not seeing much
difference in memory consumption. I believe your test case has a
single EquivalenceClass. The hashtable bucket size is 40 bytes on my
machine, so going between 256*40 and 64*40 isn't much memory. My
off-list mention of using 64 buckets as the initial size was because
you're switching to the hashing method at 32 items. If you made the
table 32, then it's guaranteed to need to be enlarged, so that's not
good. If you make it 64, then the worst-case fillfactor is 50% rather
than 12.5% with 256 elements.
Performing lookups on an appropriately sized hash table is going to
perform better than lookups on a sparse table. The reason for this is
that hash table probes rarely ever have a predictable memory access
pattern, and the larger the bucket array is, the more chance of having
a backend stall while fetching cache lines from some higher cache
level or RAM. So, IMO, using 256, you're leaving performance on the
table and paying in RAM for the privilege.
You might not be too concerned about the memory because you've done
the tests, but testing with one EC and calling it good seems naive to
me. I recall one query that Tom posted when I was working on the EC
index stuff for 3373c7155 that had over 1000 EquivalenceClasses. I
don't know how many of those would have had > 32 ec_derives entries,
but check [1] if you want to see.
I experimented by applying your v4 along with 0001-0003 of Yuya's v35
patchset from [2]. See the attached bz2 for my results run on an AMD
Zen2 machine. The CREATE TABLE statement is in the attached script.
If I run: select testname,parts,avg(joins),sum(plan_time) as
plan_time,avg(mem_used) mem_used,avg(mem_alloc) mem_alloc from
bench_results where testname not ilike '%pwj%' and testname ilike
'%yuya%' group by 1,2 order by parts,testname;
There are no results > 32 parts where 256 elements are faster than 64.
64 averages about 1% faster. That could be noise, but slightly less
memory seems quite attractive to me when there's some evidence that
also comes with better performance.
Just to explain the names of the tests in the results:
v4_yuya_v35-0001-0003_list_free = Your v4 patch with Yuya's 0001-0003
with the fix for the pfree on the list.
v4_64buckets_yuya_v35-0001-0003_list_free is the same but with 64
bucket ec_derives_hash table. For the test, the names should be fairly
self-explanatory. If the name has "pwj" in it, then I had
partitionwise-joins enabled, if not, I had it disabled. master is
5d5f41581.
David
[1] https://www.postgresql.org/message-id/6970.1545327857%40sss.pgh.pa.us
[2] https://postgr.es/m/CAJ2pMkZ2soD_99UTGkvg4_fX=PAvd7oDNYUMOksqbEMzpdeJAA@mail.gmail.com
Attachment | Content-Type | Size |
---|---|---|
partbench.sh.txt | text/plain | 1.4 KB |
bench_results_2025-03-28.csv.bz2 | application/x-compressed | 149.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2025-03-28 03:03:29 | Re: speedup COPY TO for partitioned table. |
Previous Message | Fujii.Yuki@df.MitsubishiElectric.co.jp | 2025-03-28 02:00:44 | RE: Partial aggregates pushdown |