From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, tomas(at)vondra(dot)me, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com> |
Subject: | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Date: | 2025-03-31 10:00:40 |
Message-ID: | CAExHW5tOd00=66=fASn=0ntGBnanQuYe5v7rBHymBn7MT6MbcQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 31, 2025 at 8:27 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sat, 29 Mar 2025 at 05:46, Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > PFA patches. 0001 and 0002 are the same as the previous set. 0003
> > changes the initial hash table size to the length of ec_derives.
>
> I'm just not following the logic in making it the length of the
> ec_derives List. If you have 32 buckets and try to insert 32 elements,
> you're guaranteed to need a resize after inserting 28 elements. See
> the grow_threshold logic SH_UPDATE_PARAMETERS(). The point of making
> it 64 was to ensure the table is never unnecessarily sparse and to
> also ensure we make it at least big enough for the minimum number of
> ec_derives that we're about to insert.
If I am reading SH_CREATE correctly, it will initially set the size to
32/.9 = 36 and in SH_UPDATE_PARAMETERS will set grow_threshold to 36 *
.9 = 32. So it will leave some room even after inserting the initial
elements. But looking at SH_INSERT_HASH_INTERNAL(), it will soon
expand the table even if there's space. We certainly need a size much
more than 32. 32 is an arbitrary/empirical threshold to create a hash
table. Using that threshold as initial hash table size means the table
size would be arbitrary too. Using twice the length of
ec_derives_list seems more reasonable since the length will decide the
initial number of entries.
>
> Looking more closely at the patch's ec_add_clause_to_derives_hash()
> function, I see you're actually making two hash table entries for each
> RestrictInfo, so without any em_is_const members, you'll insert 64
> entries into the hash table with a ec_derives list of 32, in which
> case 64 buckets isn't enough and the table will end up growing to 128
> elements.
>
Yes, that's right.
> I think you'd be better off coming up with some logic like putting the
> lowest pointer value'd EM first in the key and ensure that all lookups
> do that too by wrapping the lookups in some helper function. That'll
> half the number of hash table entries at the cost of some very cheap
> comparisons.
That's a good suggestion. I searched for C standard documentation
which specifies that the pointer comparison, especially inequality, is
stable and safe to use. But I didn't find any. While according to the
C standard, the result
of comparison between pointers within the same array or a struct is
specified, that between pointers from two different objects is
unspecified. The existing code relies on the EM pointers being stable
and also relies on equality between
them to be stable. It has withstood the test of time and a variety of
compilers. Hence I think it should be safe to rely on pointer
comparisons being stable. But since I didn't find any documentation
which confirms it, I have left those changes as a separate patch. Some
internet sources discussing pointer comparison can be found at [1],
[2] (which mentions the C standard but doesn't provide a link).
PFA the next patchset
0001, 0002 are same as in the previous set
0003 changes the initial hash table size to
length(ec->ec_derives_list) * 4 to accomodate for commuted entries.
0004 uses canonical keys per your suggestion and also reduces the
initial hash table size to length(ec->ec_derives_list) * 2.
When the number of initial elements to be added to the hash table is
known, I see users of simplehash.h using that as an estimate rather
than the nearest power of two. Hence following the logic here.
[1] https://stackoverflow.com/questions/59516346/how-does-pointer-comparison-work-in-c-is-it-ok-to-compare-pointers-that-dont-p
[2] https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Pointer-Comparison.html
--
Best Wishes,
Ashutosh Bapat
Attachment | Content-Type | Size |
---|---|---|
0001-Add-assertion-to-verify-derived-clause-has--20250331.patch | text/x-patch | 1.5 KB |
0003-Use-length-of-ec_derives-clause-list-as-ini-20250331.patch | text/x-patch | 1.9 KB |
0002-Make-derived-clause-lookup-in-EquivalenceCl-20250331.patch | text/x-patch | 25.5 KB |
0004-Canonical-keys-for-ec_derives_hash-20250331.patch | text/x-patch | 8.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jasper Smit | 2025-03-31 10:01:46 | Re: Assertion with aborted UPDATE in subtransaction |
Previous Message | Richard Guo | 2025-03-31 09:50:04 | Re: Memoize ANTI and SEMI JOIN inner |