From: | Yuya Watari <watari(dot)yuya(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Thom Brown <thom(at)linux(dot)com>, Zhang Mingli <zmlpostgres(at)gmail(dot)com> |
Subject: | Re: [PoC] Reducing planning time when tables have many partitions |
Date: | 2025-03-31 05:45:23 |
Message-ID: | CAJ2pMkZmOvE-xKujmttx41Mn3TvdN409_G21uq6F7JhsdPrW7A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Tom,
Thank you for your detailed review, and apologies for my late response.
On Tue, Mar 25, 2025 at 2:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> One thing I don't love is putting the children into RelOptInfos.
> That seems like an unrelated data structure. Have you thought
> about instead having, in each EC that needs it, an array indexed
> by RTI of per-relation child-member lists? I think this might
> net out as less storage because there typically aren't that many
> ECs in a query. But the main thing is to not have so many
> interconnections between ECs and RelOptInfos.
Thank you for your suggestion. Storing EquivalenceMembers in
RelOptInfos indeed complicates the data structures involved. In the
next version, I will explore alternative approaches, including the one
you have suggested.
> Another thing I really don't like is the back-link from EMs to ECs:
>
> + EquivalenceClass *em_ec; /* EquivalenceClass which has this member */
>
> That makes the data structure circular, which will cause pprint to
> recurse infinitely. (The fact that you hadn't noticed that makes
> me wonder how you debugged any of these data structure changes.)
> We could prevent the recursion with suitable annotation on this field,
> but I'd really rather not have the field in the first place. Circular
> pointers are dangerous and best avoided. Also, it's bloating a node
> type that you are concerned about supporting a lot of. Another point
> is that I don't see any code to take care of updating these links
> during an EC merge.
I apologize for missing this critical point. It is clear that avoiding
circular dependencies would be preferable, so I will reconsider this
aspect of the design.
> * setup_eclass_member_iterator_with_children is a carpal-tunnel-inducing
> name. Could we drop the "_with_children" part? It doesn't seem to
> add much, since there's no variant for "without children".
Thank you for this suggestion. I will remove "_with_children" in the
next version.
> * The root parameter should be first; IMO there should be no
> exceptions to that within the planner. Perhaps putting the target
> iterator parameter last would make it read more nicely. Or you could
> rely on struct assignment:
>
> it = setup_eclass_member_iterator(root, ec, relids);
I agree with your point. I will adjust the parameter order in the next
version to match your suggestion.
> * Why did you define the iterator as possibly returning irrelevant
> members? Doesn't that mean that every caller has to double-check?
> Wouldn't it make for less code and fewer bugs for the iterator to
> have that responsibility? If there is a good reason to do it like
> that, the comments should explain why.
This design was chosen for performance reasons. If the iterator always
filtered out irrelevant members, it would need to repeatedly check
each element against "bms_is_subset". However, some callers require
stricter conditions, such as "bms_equals", resulting in redundant
checks. Therefore, the iterator intentionally returns some false
positives, leaving it to callers to perform additional checks for the
exact conditions they require. As you pointed out, I failed to clearly
document this, and I will fix this oversight in the next version.
> I don't really like the concept of 0004 at all. Putting *all*
> the EC-related RelOptInfos into a root-stored list seems to be
> doubling down very hard on the assumption that no performance-critical
> operations will ever need to search that whole list. Is there a good
> reason to do it like that, rather than say using the bitmap-index
> concept separately within each EC? That might also alleviate the
> problem you're having with the bitmapsets getting too big.
Thank you for this suggestion. The patch series indeed has issues with
memory consumption. Your suggestion to manage bitmap indexes
separately within each EC seems worth exploring, and I will
investigate this approach further.
> Given that we've only got a week left, I see little hope of getting
> any of this into v18.
I agree that addressing these issues within the remaining time is
challenging. The design clearly needs reconsideration. Therefore, I
will postpone these changes and submit a fully revised version for
v19. Would this approach be acceptable to you?
--
Best regards,
Yuya Watari
From | Date | Subject | |
---|---|---|---|
Next Message | Yuya Watari | 2025-03-31 05:47:16 | Re: [PoC] Reducing planning time when tables have many partitions |
Previous Message | Yuya Watari | 2025-03-31 05:40:13 | Re: [PoC] Reducing planning time when tables have many partitions |