Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, tomas(at)vondra(dot)me, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Date: 2025-03-25 06:16:53
Message-ID: CAExHW5tObRq1hcwunx=7GhaJAjjdzx7gmH=h=hEanvRSJQeUcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 24, 2025 at 7:23 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> Ok, thanks for that analysis. I don't think there's anything about
> the patch that makes it particularly less suitable for pwj=on.
>

I agree.

> I read patch 0002 in detail last week and wrote a follow-up patch
> (0003), mostly for cosmetic improvements, which I plan to squash into
> 0002.

For some reason v2-0003 didn't apply cleanly on my local branch.
Possibly v2-0002 is a modified version of my 0002. In order to not
disturb your patchset, I have attached my edits as a diff. If you find
those useful, apply them to 0003 and then squash it into 0002.

Here are some more cosmetic comments on 0003. A bit of discussion
might be needed. Hence did not edit myself.

-/* Hash table entry in ec_derives_hash. */
+/* Hash table entry used in ec_derives_hash. */

I think we don't need "used" here entry in hash table is good enough. But I am
ok even if it stays that way.

- add_derived_clauses(ec1, ec2->ec_derives_list);
+ /* Updates ec1's ec_derives_list and ec_derives_hash if present. */
+ ec_add_derived_clauses(ec1, ec2->ec_derives_list);

Suggestion in the attached diff.

ec1->ec_relids = bms_join(ec1->ec_relids, ec2->ec_relids);
ec1->ec_has_const |= ec2->ec_has_const;
/* can't need to set has_volatile */
@@ -396,7 +400,7 @@ process_equivalence(PlannerInfo *root,
/* just to avoid debugging confusion w/ dangling pointers: */
ec2->ec_members = NIL;
ec2->ec_sources = NIL;
- clear_ec_derived_clauses(ec2);
+ ec_clear_derived_clauses(ec2);

I pondered about this naming convention when naming the functions. But
it seems it's not used everywhere in this file OR I am not able to see
the underlying naming rule if any. So I used a mixed naming. Let's
keep your names though. I think they are better.

@@ -1403,6 +1403,17 @@ typedef struct JoinDomain
* entry: consider SELECT random() AS a, random() AS b ... ORDER BY b,a.
* So we record the SortGroupRef of the originating sort clause.
*
+ * Derived equality clauses between EC members are stored in ec_derives_list.

"clauses between EC members" doesn't read correctly - "clauses
equating EC members" seems better. We actually don't need to mention
"between EC members" at all - it's not relevant to the "size of the
list" which is the subject of this paragraph. What do you think?

+ * For small queries, this list is scanned directly during lookup. For larger
+ * queries -- e.g., with many partitions or joins -- a hash table
+ * (ec_derives_hash) is built for faster lookup. Both structures contain the
+ * same RestrictInfos and are maintained in parallel.

If only the list exists, there is nothing to maintain in parallel. The
sentence seems to indicate that both the hash table and the list are
always present (and maintained in parallel). How about dropping "Both
... parallel" and modifying the previous sentence as " ... faster
lookup when the list grows beyond a threshold" or something similar.
The next sentence anyway mentions that both the structures are
maintained.

+ We retain the list even
+ * when the hash is used to simplify serialization (e.g., in
+ * _outEquivalenceClass()) and support EquivalenceClass merging.

Thanks for the review.

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
amit-delta-suggestions.diff.txt text/plain 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton A. Melnikov 2025-03-25 06:20:33 Re: Use XLOG_CONTROL_FILE macro everywhere?
Previous Message Jeff Davis 2025-03-25 06:14:43 Re: Reduce TupleHashEntryData struct size by half