Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: tomas(at)vondra(dot)me
Cc: vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Date: 2024-08-19 13:45:13
Message-ID: CAExHW5sF9KPm2TM+SbV-_HheGm32iKN4FEtgqUb+5djLAsGz1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 19, 2024 at 6:43 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> Sorry for the delay in my response.
>
> On Wed, May 29, 2024 at 9:34 AM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > Documenting some comments from todays' patch review session
>
> I forgot to mention back then that both of the suggestions below came
> from Tom Lane.
>
> > 1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory.
>
> Done. It indeed saves memory without impacting planning time.
>
> > 2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commuted versions may be tracked outside RestrictInfo
>
> After commit 767c598954bbf72e0535f667e2e0667765604b2a,
> repameterization of parent paths for child relation happen at the time
> of creating the plan. This reduces the number of child paths produced
> by reparameterization and also reduces the number of RestrictInfos
> that get translated during reparameterization. During
> reparameterization commuted parent RestrictInfos are required to be
> translated to child RestrictInfos. Before the commit, this led to
> translating the same commuted parent RestrictInfo multiple times.
> After the commit, only one path gets reparameterized for a given
> parent and child pair. Hence we do not produce multiple copies of the
> same commuted child RestrictInfo. Hence we don't need to keep track of
> commuted child RestrictInfos anymore. Removed that portion of code
> from the patches.
>
> I made detailed memory consumption measurements with this patch for
> number of partitions changed from 0 (unpartitioned) to 1000 and for 2
> to 5-way joins. They are available in the spreadsheet at [1]. The raw
> measurement data is in the first sheet named "pwj_mem_measurements raw
> numbers". The averages over multiple runs are in second sheet named
> "avg_numbers". Rest of the sheet represent the averages in more
> consumable manner. Please note that the averages make sense only for
> planning time since the memory consumption remains same with every
> run. Also note that EXPLAIN now reports planning memory consumption in
> kB. Any changes to memory consumption below 1kB are not reported and
> hence not noticed. Here are some observations.
>
> 1. When partitionwise join is not enabled, no changes to planner's
> memory consumption are observed. See sheet named "pwj disabled,
> planning memory".
>
> 2. When partitionwise join is enabled, upto 17% (206MB) memory is
> saved by the patch in case of 5-way self-join with 1000 partitions.
> This is the maximum memory saving observed. The amount of memory saved
> increases with the number of joins and number of partitions. See sheet
> with name "pwj enabled, planning memory"
>
> 3. After commit 767c598954bbf72e0535f667e2e0667765604b2a, we do not
> translate a parent RestrictInfo multiple times for the same
> parent-child pair in case of a *2-way partitionwise join*. But we
> still consume memory in saving the child RestrictInfo in the hash
> table. Hence in case of 2-way join we see increased memory consumption
> with the patch compared to master. The memory consumption increases by
> 13kb, 23kB, 76kB and 146kB for 10, 100, 500, 1000 partitions
> respectively. This increase is smaller compared to the overall memory
> saving. In order to avoid this memory increase, we will need to avoid
> using hash table for 2-way join. We will need to know whether there
> will be more than one partitionwise join before translating the
> RestrictInfos for the first partitionwise join. This is hard to
> achieve in all the cases since the decision to use partitionwise join
> happens at the time of creating paths for a given join relation, which
> itself is computed on the fly. We may choose some heuristics which
> take into account the number of partitioned tables in the query, their
> partition schemes, and the quals in the query to decide whether or not
> to track the translated child RestrictInfos. But that will make the
> code more complex, but more importantly the heuristics may not be able
> to keep up if we start using partitionwise join as an optimization
> strategy for more cases (e.g. asymmetric partitionwise join [2]). The
> attached patch looks like a good tradeoff to me. But other opinions
> might vary. Suggestions are welcome.
>
> 4. There is no noticeable change in the planning time. I ran the same
> experiment multiple times. The planning time variations from each
> experiment do not show any noticeable pattern suggesting increase or
> decrease in the planning time with the patch.
>
> A note about the code: I have added all the structures and functions
> dealing with the RestrictInfo hash table at the end of restrictinfo.c.
> I have not come across a C file in PostgreSQL code base where private
> structures are defined in the middle the file; usually they are
> defined at the beginning of the file. But I have chosen it that way
> here since it makes it easy to document the hash table and the
> functions at one place at the beginning of this code section. I am
> open to suggestions which make the documentation easy while placing
> the structures at the beginning of the file.
>
> [1] https://docs.google.com/spreadsheets/d/1CEjRWZ02vuR8fSwhYaNugewtX8f0kIm5pLoRA95f3s8/edit?usp=sharing
> [2] https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com
>

Noticed that Tomas's email address has changed. Adding his new email address.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-08-19 14:03:28 Re: gitmaster server problem?
Previous Message Jelte Fennema-Nio 2024-08-19 13:31:17 Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest