Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: tomas(at)vondra(dot)me
Cc: vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Date: 2024-10-10 12:06:10
Message-ID: CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Here's next set of patch with following changes
1. Addressed some of Alvaro's comments which he gave offlist.
2. One of Alvaro's comments made me realise that there is opportunity
to save planning time. Patch 0003 added for the same.
3. Alvaro suggested to try simplehash.h instead of dynahash for
RestrictInfo hash table. That's patch 0005

Patches
=======
Commit messages in each of the patches describe the code changes in detail. Here
is brief description of each patch to facilitate the results discussion.

0001
----
Implements a hash table to store and retrieve child RestrictInfos. A
pointer to hash table is added in PlannerInfo. When examining the
results, I found a possible regression in planning time for lower
number of partitions and lower join order. To assess whether the
regression is related to an increase in the size of PlannerInfo, I
separated the interface and PlannerInfo addition it their own patch.
Having this as a separate patch might help us in case we decide to go
ahead with either of 0003 or 0004.

0002
----
The new member added by 0001 increases the size of PlannerInfo from
696 bytes to 704 bytes. We had seen a performance regression because
of increase in size of a structure in another thread [2]. This patch
moves around the new member PlannerInfo::child_rinfo_hash and a
related existing member PlannerInfo::last_rinfo_serial so as to keep
the size of PlannerInfo same. If this change doesn't fix the possible
regression, it may be discarded. Otherwise it should be merged into
0001.

0003
----
Alvaro spotted an assymetry in the way my earlier patch handled child
RestrictInfo in create_join_clause. This led me to realize that
there's an optimization opportunity in that function.
create_join_clause() scans the ec_derives and ec_sources lists to find
any existing clause built using the given EC members. When there are
thousands of partitions ec_derives will have thousands of elements,
and will impact planning time adversely. This commit uses the
RestrictInfo hash introduced in patch 0001 to store the child derived
clauses, thus reducing the planning time when thousands of partitions
are involved. Detailed code level analysis of this change can be found
in the commit message of this patch. We will discuss the performance
impact a bit later in this email.

0004
----
This patch uses RestrictInfo hash table to store the translated child
RestrictInfos to avoid memory bloat due to repeated translations of
parent RestrictInfos for the same parent-child pair saving memory.
This is almost the same patch as my last email in this thread.

0005
----
Changes the code to use simplehash instead of dynahash.

Results
=======
I ran the same experiments as previously described [3]. I applied
patches 0001 to 0005 one by one cumulatively and collected planning
time and memory for queries which involved self-join of order 2 to 5,
with table having partitions 0 (unpartitioned), 10, 100, 500 and 1000
respectively.

The spreadsheet file with results can be found at [1]. The first sheet
"README" describes how to read this spreadsheet. It has sheets
containing the raw planning time and planning memory measurements,
their averages and standard deviation. But there are also sheets which
make it easy to compare the effects of the patches (explained below)
on planning time and memory. Those sheets are more useful than the raw
and average numbers.

1. With patches upto 0003, planning time improves by 10%-20%for higher
number of partitions and higher join orders. (rows 24 to 35 and
columns S, Z in planning time sheets). This improvement can be seen
with or without partitionwise join. The improvement increases with the
number of partitions and join order as expected. I have repeated the
experiments a few times and I could reproduce the improvement all the
time.

2. For lower number of partitions and lower join orders, the planning
time shows a regression. In case of unpartitioned tables, the planning
time shows improvement. If we repeat the experiments, some
combinations of number of partitions, join order show improvements and
some show regression. The only steady pattern I see is that with 100
partitions, we see regression most of the time. However, I am not sure
of the reason for this regression. Patch 0001 is playing some role
here.

3. With just patch 0001 applied, planning time usually shows
degradation (column Q and X in planning time sheets) with or without
PWJ enabled. I first thought that it might be because of the increased
size of PlannerInfo. We had seen a similar phenomenon when adding a
new member to WindowAggState [2]. Hence I introduced patch 0002 which
moves two fields around to not increase the size of structure. But
that doesn't fix the regression in the planning time (columns R and
Y). Apart from increasing the PlannerInfo size and may be object file
size, 0002 does not have any other impact. But the regression seen
with just that patch is more than what we saw in [2]. More investigate
is required to decide whether this regression is real or not and if
real, the root cause. Looking at the numbers, it seems that this
regression is causing the planning time regression in rest of the
patches. If we fix regression by 0001, we should not see much
regression in rest of the patches. I am looking for some guidance in
investigating this regression.

4. Patches upto 0003, increase the memory consumed by the planner
because of the hash table, but that increase in memory is minimal when
compared with the total memory used by the planner in case of a large
number of partitions.

5. With 0004, the memory used by the planner reduces drastically in
case of large number of partitions and higher join orders. These
numbers are similar to my previous observations [3] (Columns T and AA
in "planning memory with PWJ enabled" sheet)

6. Using simplehash (patch 0005) does not save any memory or doesn't
improve planning time. I think I have used the simplehash correct, but
someone with more experience with simplehash might find something
wrong with 0005. But otherwise, I will drop 0005 from the next set of
patches.

I believe both 0003 and 0004 are useful if we could fix the regression
in planning time for lower number of partitions and for lower join
orders.

[1] https://docs.google.com/spreadsheets/d/1uLtlkwdFYKLSAUn8-cmavxS_vY9gndNVHtVSPTaB-dw/edit?usp=sharing
[2] https://www.postgresql.org/message-id/CAApHDvqPgFtwme2Zyf75BpMLwYr2mnUstDyPiP%3DEpudYuQTPPQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAExHW5uBhV1wWrXm-V+aGPq_PBv-RbmixU=HeUj-+hSmVCFAQw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
0004-Avoid-translating-RestrictInfo-repeatedly-20241010.patch text/x-patch 14.6 KB
0001-RestrictInfo-hash-table-interface-20241010.patch text/x-patch 7.2 KB
0002-Compact-PlannerInfo-to-restore-its-previous-20241010.patch text/x-patch 1.9 KB
0003-Use-RestrictInfo-hash-table-for-storing-EC--20241010.patch text/x-patch 12.2 KB
0005-Use-simplehash-instead-of-dynamic-hash-20241010.patch text/x-patch 5.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Moench-Tegeder 2024-10-10 12:27:04 Re: sunsetting md5 password support
Previous Message Hayato Kuroda (Fujitsu) 2024-10-10 12:02:17 RE: incorrect wal removal due to max_slot_wal_keep_size