Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, tomas(at)vondra(dot)me, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Date: 2025-03-25 12:09:19
Message-ID: CAExHW5u+yphTLvxGqVyyAVfdwhsh4mLjqUn29LnQjNpA=UR0Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 25, 2025 at 5:20 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Tue, Mar 25, 2025 at 6:36 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > On Tue, Mar 25, 2025 at 12:58 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > Btw, about ec_clear_derived_clauses():
> > >
> > > @@ -749,7 +749,7 @@ remove_rel_from_eclass(EquivalenceClass *ec,
> > > SpecialJoinInfo *sjinfo,
> > > * drop them. (At this point, any such clauses would be base restriction
> > > * clauses, which we'd not need anymore anyway.)
> > > */
> > > - ec->ec_derives = NIL;
> > > + ec_clear_derived_clauses(ec);
> > > }
> > >
> > > /*
> > > @@ -1544,8 +1544,7 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
> > > list_free(ec->ec_members);
> > > ec->ec_members = new_members;
> > >
> > > - list_free(ec->ec_derives);
> > > - ec->ec_derives = NULL;
> > > + ec_clear_derived_clauses(ec);
> > >
> > > We're losing that list_free() in the second hunk, aren't we?
> > >
> > > There's also this comment:
> > >
> > > + * XXX: When thousands of partitions are involved, the list can become
> > > + * sizable. It might be worth freeing it explicitly in such cases.
> > >
> > > So maybe ec_clear_derived_clauses() should take a free_list parameter,
> > > to preserve the original behavior? What do you think?
> >
> > Well spotted. How about just calling list_free() in
> > ec_clear_derived_clauses() to simplify things. I mean list_free()
> > might spend some cycles under remove_rel_from_eclass() and
> > process_equivalence() freeing the array but that should be ok. Just
> > setting it to NIL by itself looks fine. If we bundle it in a function
> > with a flag, we will need to explain why/when to free list and when to
> > not. That's unnecessary complexity I feel. In other places where the
> > structures have potential to grow in size, we have resorted to freeing
> > them rather than just forgetting them. For example, we free appinfos
> > in try_partitionwise_join() or child_relids.
> >
> > The list shouldn't be referenced anywhere else, so it should be safe
> > to free it. Note that I thought list_concat() used by
> > process_equivalence() would reuse the memory allocated to
> > ec2->ec_derives_list but it doesn't. I verified that by setting the
> > threshold to 0, thus forcing the hash table always and running a
> > regression suite. It runs without any segfaults. I don't see any
> > change in time required to run regression.
> >
> > PFA patchset
> > 0001, 0002 are same as your patchset except some of my edits to the
> > commit message. Please feel free to accept or reject the edits.
>
> Thanks, I've noted your suggestions.
>
> > 0003 adds list_free() to ec_clear_derived_clauses()
>
> Thanks, I've merged it into 0002, with this blurb in its commit
> message to describe it:
>
> The new ec_clear_derived_clauses() always frees ec_derives_list, even
> though some of the original code paths that cleared the old ec_derives
> field did not. This ensures consistent cleanup and avoids leaking
> memory when the list grows large.
>
> I needed to do this though ;)
>
> - ec->ec_derives_list = NIL;
> list_free(ec->ec_derives_list);
> + ec->ec_derives_list = NIL;

Silly me. I reran regression by setting the threshold to 0 and still
didn't get segmentation fault or an increase in regression run time.
So the change is safe.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-03-25 12:11:45 Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Previous Message Hayato Kuroda (Fujitsu) 2025-03-25 12:07:59 RE: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.