Quick Links

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From:	Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Sami Imseih <samimseih(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date:	2025-03-11 09:47:18
Message-ID:	5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello!

> Since then, I see that Ivan
> has already submitted a patch that accounts for NULL nodes and adds a
> byte to the jumble buffer to account for NULLs. This seems quite clean
> and simple. However, Sami seems to have concerns about the overhead of
> doing this. Is that warranted at all? Potentially, there could be a
> decent number of NULL fields. It'll probably be much cheaper than the
> offsetof idea I came up with.

To address the issue of it consuming a lot of bytes in the jumble buffer
for NULL marks, I have added a new patch version for Variant B.
(v2-0001-Query-ID-Calculation-Fix-Variant-B.patch).

This patch adds to JumbleState the count of NULL nodes visited before a
non-NULL one appears. The least significant byte of this count is appended
to the jumble buffer every time the count is not null (indicating that we have
visited non-NULL nodes previously), and a new chunk of data is also appended
to the jumble buffer. I intentionally did not add a final append for the NULL
count in the JumbleQuery processing, as NULL tail nodes of the query do not
improve the reliability of query ID collision resolution.

I believe that my first version, Variant B of the patch, is simple and easy to
understand. I would prefer to use the first version of the Variant B patch,
so I have attached an updated version along with tests
(v3-0001-Query-ID-Calculation-Fix-Variant-B.patch).

As you can see, v2-0001-Query-ID-Calculation-Fix-Variant-B.patch is more
complex and invasive than v3-0001-Query-ID-Calculation-Fix-Variant-B.patch.
I don't think that saving a few bytes in a 1024-byte sized buffer is worth
implementing, even for this simple optimization (v2).

> Can you share your thoughts on Ivan's NULL jumble idea?

I would appreciate any feedback. Thank you.

-----Original Message-----
From: David Rowley <dgrowleyml(at)gmail(dot)com>
Sent: Tuesday, March 11, 2025 12:49 PM
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Быков Иван Александрович <i(dot)bykov(at)modernsys(dot)ru>; Sami Imseih <samimseih(at)gmail(dot)com>; pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> > It seems to me that if this fixes the issue, that the next similar
> > one is already lurking in the shadows waiting to jump out on us.
>
> For how many years will be have to wait for this threat hiddent in the
> shadows? :)
>
> This issue exists since the query jumbling exists in pgss, so it goes
> really down the road. I've spent a bit more than a few minutes on
> that.

I didn't mean to cause offence here. I just wanted to make it clear that I don't think fixing these types of issues by swapping the order of the fields is going to be a sustainable practice, and it would be good to come up with a solution that eliminates the chances of this class of bug from ever appearing again. Even if we were to trawl the entire Query struct and everything that could ever be attached to it and we deem that today it's fine and no more such bugs exist, the person adding some new SQL feature in the future that adds new data to store in the Query probably isn't going to give query jumbling much thought, especially now that the code generation is automatic.

> > Couldn't we adjust all this code so that we pass a seed to
> > AppendJumble() that's the offsetof(Struct, field) that we're
> > jumbling and incorporate that seed into the hash? I don't think we
> > could possibly change the offset in a minor version, so I don't
> > think there's a risk that could cause jumble value instability.
> > Possibly different CPU architectures would come up with different
> > offsets through having different struct alignment requirements, but
> > we don't make any guarantees about the jumble values matching across
> > different CPU architectures anyway.
>
> Yeah, something like that has potential to get rid of such problems
> forever, particularly thanks to the fact that we automate the
> generation of this code now so it is mostly cost-free. This has a
> cost for the custom jumbling functions where one needs some
> imagination, but with models being around while we keep the number of
> custom functions low, that should be neither complicated nor costly,
> at least in my experience.

I hadn't really processed this thread fully when I responded yesterday and my response was mostly aimed at the latest patch on the thread at the time, which I had looked at quickly and wanted to put a stop to changing field orders as a fix for this. Since then, I see that Ivan has already submitted a patch that accounts for NULL nodes and adds a byte to the jumble buffer to account for NULLs. This seems quite clean and simple. However, Sami seems to have concerns about the overhead of doing this. Is that warranted at all? Potentially, there could be a decent number of NULL fields. It'll probably be much cheaper than the offsetof idea I came up with.

> When I was thinking about the addition of the offset to the jumbling
> yesterday, I was wondering about two things:
> - if there are some node structures where it could not work.
> - if we'd need to pass down some information of the upper node when
> doing the jumbling of a node attached to it, which would make the code
> generation much more annoying.
>
> But after sleeping on it I think that these two points are kind of
> moot: having only the offset passed down to a single _jumbleNode()
> with the offset compiled at its beginning should be sufficient. Using
> "seed" as a term is perhaps a bit confusing, because this is an offset
> in the upper node, but perhaps that's OK as-is. It's just more
> entropy addition. If somebody has a better idea for this term, feel
> free.

Can you share your thoughts on Ivan's NULL jumble idea?

> _jumbleList() is an interesting one. If we want an equivalent of the
> offset, this could use the item number in the list which would be a
> rather close equivalent to the offset used elsewhere. For the int,
> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
> each member, apply the offset at the beginning of _jumbleList(), once,
> I guess.

Is this affected by the same class of bug that we're aiming to fix here? (This class being not always jumbling all fields because of NULLs or some other value, resulting in the next jumble field applying the same bytes to the buffer as the previous field would have, had it not been ignored)

> _jumbleVariableSetStmt() should be OK with the offset of "arg" in
> VariableSetStmt. The addition of hash_combine64() to count for the
> offset should be OK.
>
> With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
> seem to work fine, without causing noise for the other cases tracked
> in the regression tests of PGSS.
>
> What do you think?

I mostly now think the class of bug can be fixed by ensuring we never ignore any jumble field for any reason and always put at least 1 byte onto the buffer with some sort of entropy. I'm keen to understand if I'm missing some case that you've thought about that I've not.

David

Attachment	Content-Type	Size
v2-0001-Query-ID-Calculation-Fix-Variant-B.patch	application/octet-stream	7.2 KB
v3-0001-Query-ID-Calculation-Fix-Variant-B.patch	application/octet-stream	4.3 KB

In response to

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET at 2025-03-11 07:48:33 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kirill Reshke	2025-03-11 09:48:01	Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
Previous Message	Laurenz Albe	2025-03-11 09:46:26	Re: Reducing the log spam