Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>, Sami Imseih <samimseih(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date: 2025-03-11 07:48:33
Message-ID: CAApHDvq2wkVTh3t2H7c5h+pWLsxQNGap61pwTc2cBRCq2Z_5GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> > It seems to me that if this fixes the issue, that the next similar one
> > is already lurking in the shadows waiting to jump out on us.
>
> For how many years will be have to wait for this threat hiddent in the
> shadows? :)
>
> This issue exists since the query jumbling exists in pgss, so it goes
> really down the road. I've spent a bit more than a few minutes on
> that.

I didn't mean to cause offence here. I just wanted to make it clear
that I don't think fixing these types of issues by swapping the order
of the fields is going to be a sustainable practice, and it would be
good to come up with a solution that eliminates the chances of this
class of bug from ever appearing again. Even if we were to trawl the
entire Query struct and everything that could ever be attached to it
and we deem that today it's fine and no more such bugs exist, the
person adding some new SQL feature in the future that adds new data to
store in the Query probably isn't going to give query jumbling much
thought, especially now that the code generation is automatic.

> > Couldn't we adjust all this code so that we pass a seed to
> > AppendJumble() that's the offsetof(Struct, field) that we're jumbling
> > and incorporate that seed into the hash? I don't think we could
> > possibly change the offset in a minor version, so I don't think
> > there's a risk that could cause jumble value instability. Possibly
> > different CPU architectures would come up with different offsets
> > through having different struct alignment requirements, but we don't
> > make any guarantees about the jumble values matching across different
> > CPU architectures anyway.
>
> Yeah, something like that has potential to get rid of such problems
> forever, particularly thanks to the fact that we automate the
> generation of this code now so it is mostly cost-free. This has a
> cost for the custom jumbling functions where one needs some
> imagination, but with models being around while we keep the number of
> custom functions low, that should be neither complicated nor costly,
> at least in my experience.

I hadn't really processed this thread fully when I responded yesterday
and my response was mostly aimed at the latest patch on the thread at
the time, which I had looked at quickly and wanted to put a stop to
changing field orders as a fix for this. Since then, I see that Ivan
has already submitted a patch that accounts for NULL nodes and adds a
byte to the jumble buffer to account for NULLs. This seems quite clean
and simple. However, Sami seems to have concerns about the overhead of
doing this. Is that warranted at all? Potentially, there could be a
decent number of NULL fields. It'll probably be much cheaper than the
offsetof idea I came up with.

> When I was thinking about the addition of the offset to the jumbling
> yesterday, I was wondering about two things:
> - if there are some node structures where it could not work.
> - if we'd need to pass down some information of the upper node when
> doing the jumbling of a node attached to it, which would make the code
> generation much more annoying.
>
> But after sleeping on it I think that these two points are kind of
> moot: having only the offset passed down to a single _jumbleNode()
> with the offset compiled at its beginning should be sufficient. Using
> "seed" as a term is perhaps a bit confusing, because this is an offset
> in the upper node, but perhaps that's OK as-is. It's just more
> entropy addition. If somebody has a better idea for this term, feel
> free.

Can you share your thoughts on Ivan's NULL jumble idea?

> _jumbleList() is an interesting one. If we want an equivalent of the
> offset, this could use the item number in the list which would be a
> rather close equivalent to the offset used elsewhere. For the int,
> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
> each member, apply the offset at the beginning of _jumbleList(), once,
> I guess.

Is this affected by the same class of bug that we're aiming to fix
here? (This class being not always jumbling all fields because of
NULLs or some other value, resulting in the next jumble field applying
the same bytes to the buffer as the previous field would have, had it
not been ignored)

> _jumbleVariableSetStmt() should be OK with the offset of "arg" in
> VariableSetStmt. The addition of hash_combine64() to count for the
> offset should be OK.
>
> With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
> seem to work fine, without causing noise for the other cases tracked
> in the regression tests of PGSS.
>
> What do you think?

I mostly now think the class of bug can be fixed by ensuring we never
ignore any jumble field for any reason and always put at least 1 byte
onto the buffer with some sort of entropy. I'm keen to understand if
I'm missing some case that you've thought about that I've not.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuki Seino 2025-03-11 07:50:06 Re: Add “FOR UPDATE NOWAIT” lock details to the log.
Previous Message Andy Fan 2025-03-11 07:38:49 Re: Send multiple statements to pg server at once