Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, tomas(at)vondra(dot)me, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Date: 2025-03-25 09:35:55
Message-ID: CAExHW5v7VqHOQJ_Surhch1SQf6oVWnzR6z5QvKbDyDWTAPrjHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 25, 2025 at 12:58 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> - /* Updates ec1's ec_derives_list and ec_derives_hash if present. */
> + /*
> + * Appends ec2's derived clauses to ec1->ec_derives_list and adds
> + * them to ec1->ec_derives_hash if present.
> + */

WFM.

>
> > @@ -396,7 +400,7 @@ process_equivalence(PlannerInfo *root,
> > /* just to avoid debugging confusion w/ dangling pointers: */
> > ec2->ec_members = NIL;
> > ec2->ec_sources = NIL;
> > - clear_ec_derived_clauses(ec2);
> > + ec_clear_derived_clauses(ec2);
> >
> > I pondered about this naming convention when naming the functions. But
> > it seems it's not used everywhere in this file OR I am not able to see
> > the underlying naming rule if any. So I used a mixed naming. Let's
> > keep your names though. I think they are better.
>
> Got it -- I went with the ec_ prefix mainly to make the new additions
> self-consistent, since the file doesn’t seem to follow a strict naming
> pattern. Glad the names work for you. While at it, I also applied the
> same naming convention to two new functions I hadn’t touched earlier
> for some reason.

WFM.

>
> + * Derived equality clauses are stored in ec_derives_list. For small queries,
> + * this list is scanned directly during lookup. For larger queries -- e.g.,
> + * with many partitions or joins -- a hash table (ec_derives_hash) is built
> + * when the list grows beyond a threshold, for faster lookup. When present,
> + * the hash table contains the same RestrictInfos and is maintained alongside
> + * the list. We retain the list even when the hash is used to simplify
> + * serialization (e.g., in _outEquivalenceClass()) and support
> + * EquivalenceClass merging.
>
> I've merged my delta + your suggested changes as discussed above into 0002.
>

LGTM.

> Btw, about ec_clear_derived_clauses():
>
> @@ -749,7 +749,7 @@ remove_rel_from_eclass(EquivalenceClass *ec,
> SpecialJoinInfo *sjinfo,
> * drop them. (At this point, any such clauses would be base restriction
> * clauses, which we'd not need anymore anyway.)
> */
> - ec->ec_derives = NIL;
> + ec_clear_derived_clauses(ec);
> }
>
> /*
> @@ -1544,8 +1544,7 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
> list_free(ec->ec_members);
> ec->ec_members = new_members;
>
> - list_free(ec->ec_derives);
> - ec->ec_derives = NULL;
> + ec_clear_derived_clauses(ec);
>
> We're losing that list_free() in the second hunk, aren't we?
>
> There's also this comment:
>
> + * XXX: When thousands of partitions are involved, the list can become
> + * sizable. It might be worth freeing it explicitly in such cases.
>
> So maybe ec_clear_derived_clauses() should take a free_list parameter,
> to preserve the original behavior? What do you think?

Well spotted. How about just calling list_free() in
ec_clear_derived_clauses() to simplify things. I mean list_free()
might spend some cycles under remove_rel_from_eclass() and
process_equivalence() freeing the array but that should be ok. Just
setting it to NIL by itself looks fine. If we bundle it in a function
with a flag, we will need to explain why/when to free list and when to
not. That's unnecessary complexity I feel. In other places where the
structures have potential to grow in size, we have resorted to freeing
them rather than just forgetting them. For example, we free appinfos
in try_partitionwise_join() or child_relids.

The list shouldn't be referenced anywhere else, so it should be safe
to free it. Note that I thought list_concat() used by
process_equivalence() would reuse the memory allocated to
ec2->ec_derives_list but it doesn't. I verified that by setting the
threshold to 0, thus forcing the hash table always and running a
regression suite. It runs without any segfaults. I don't see any
change in time required to run regression.

PFA patchset
0001, 0002 are same as your patchset except some of my edits to the
commit message. Please feel free to accept or reject the edits.
0003 adds list_free() to ec_clear_derived_clauses()
--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
0002-Make-derived-clause-lookup-in-EquivalenceCl-20250325.patch text/x-patch 25.3 KB
0001-Add-assertion-to-verify-derived-clause-has--20250325.patch text/x-patch 1.5 KB
0003-Free-ec_derives_list-in-ec_clear_derived_cl-20250325.patch text/x-patch 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-03-25 09:42:57 Re: Proposal - Allow extensions to set a Plan Identifier
Previous Message Andrei Lepikhov 2025-03-25 09:32:52 Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment