From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>, Sami Imseih <samimseih(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
Date: | 2025-03-11 06:50:11 |
Message-ID: | Z8_dI0YyuZnt54eM@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> It seems to me that if this fixes the issue, that the next similar one
> is already lurking in the shadows waiting to jump out on us.
For how many years will be have to wait for this threat hiddent in the
shadows? :)
This issue exists since the query jumbling exists in pgss, so it goes
really down the road. I've spent a bit more than a few minutes on
that.
> Couldn't we adjust all this code so that we pass a seed to
> AppendJumble() that's the offsetof(Struct, field) that we're jumbling
> and incorporate that seed into the hash? I don't think we could
> possibly change the offset in a minor version, so I don't think
> there's a risk that could cause jumble value instability. Possibly
> different CPU architectures would come up with different offsets
> through having different struct alignment requirements, but we don't
> make any guarantees about the jumble values matching across different
> CPU architectures anyway.
Yeah, something like that has potential to get rid of such problems
forever, particularly thanks to the fact that we automate the
generation of this code now so it is mostly cost-free. This has a
cost for the custom jumbling functions where one needs some
imagination, but with models being around while we keep the number of
custom functions low, that should be neither complicated nor costly,
at least in my experience.
When I was thinking about the addition of the offset to the jumbling
yesterday, I was wondering about two things:
- if there are some node structures where it could not work.
- if we'd need to pass down some information of the upper node when
doing the jumbling of a node attached to it, which would make the code
generation much more annoying.
But after sleeping on it I think that these two points are kind of
moot: having only the offset passed down to a single _jumbleNode()
with the offset compiled at its beginning should be sufficient. Using
"seed" as a term is perhaps a bit confusing, because this is an offset
in the upper node, but perhaps that's OK as-is. It's just more
entropy addition. If somebody has a better idea for this term, feel
free.
_jumbleList() is an interesting one. If we want an equivalent of the
offset, this could use the item number in the list which would be a
rather close equivalent to the offset used elsewhere. For the int,
oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
each member, apply the offset at the beginning of _jumbleList(), once,
I guess.
_jumbleVariableSetStmt() should be OK with the offset of "arg" in
VariableSetStmt. The addition of hash_combine64() to count for the
offset should be OK.
With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
seem to work fine, without causing noise for the other cases tracked
in the regression tests of PGSS.
What do you think?
--
Michael
Attachment | Content-Type | Size |
---|---|---|
0001-Add-node-offsets-in-query-jumbling-computations.patch | text/x-diff | 11.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2025-03-11 07:00:53 | Re: bogus error message for ALTER TABLE ALTER CONSTRAINT |
Previous Message | Dilip Kumar | 2025-03-11 06:28:06 | Re: change on_exit_nicely_list array to the dynamic array to increase slots at run time for pg_restore |