Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Sami Imseih <samimseih(at)gmail(dot)com>, Ivan Bykov <I(dot)Bykov(at)modernsys(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date: 2025-03-17 00:52:46
Message-ID: Z9dyXt2cuQkcOabX@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 14, 2025 at 04:14:25PM +1300, David Rowley wrote:
> I don't think I'm discarding them... As far as I'm aware, your
> remaining concern is with custom jumble functions and you showed an
> example in [1] of a hypothetical jumble function that could cause the
> same bug as this thread is discussing. My response to that was that
> the custom jumble function is broken and should be fixed, which seems
> to me to be analogous to Ivan's proposal to fix _jumbleNode() to do
> something with NULL pointers.

Okay, that's where our opinions diverge. So, well, please let me
reformulate that with some different words and terms. This is not a
problem of having a NULL node vs a non-NULL node in the jumbling,
because by design is it possible for one to decide if a Node can be
included in the jumbling depending on its internal values,
particularly if it gets used across one than one type of parent node.
You are claiming that such functions are broken, but what I am saying
is that such function can be OK depending on how one wants this code
to behave depending on their node structure and what they expect from
them. So I'm making a claim about keeping this stuff more
flexible, while also addressing the problem of the same node defined
successively more than once in a parent structure.

> I now can't tell which of the following is true: 1) I've missed one of
> your concerns, or; 2) you want to find a way to make badly coded
> custom jumble functions not suffer from this bug, or; 3) you think
> that adding jumble bytes unconditionally regardless of the field's
> value still does not fix the bug in question, or; 4) something else.
> It would be good to know which of these it is.

I hope that my opinion counts, as I've worked on this whole design
in 3db72ebcbe20, 8eba3e3f0208 and other related commits. FWIW, I
don't really see any disadvantage in supporting what I'm claiming is
OK, giving more flexibility to folks to do what they want with this
facility, if it also tackles this thread's problem with the Node
handling.

So, here is attached a counter-proposal, where we can simply added a
counter tracking a node count in _jumbleNode() to add more entropy to
the mix, incrementing it as well for NULL nodes.

By the way, I was looking at the set of tests proposed upthread at
this message, which is as far as I know the latest version proposed:
https://www.postgresql.org/message-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain

The tests do require neither a relation nor a stats reset, so let's
make it cheaper as I've proposed upthread and in the attached,
including checks with multiple queries and different constants to make
sure that these are correctly grouped in the pgss reports. The
null_sequence_number reset each time we run through a non-NULL node
from variant B reduces a bit the entropy, btw.. The argument about
adding a counter in AppendJumble() is the wrong level to use for such
a change.

Anyway, perhaps we should have your extra byte '\0' or something
equivalent added to JUMBLE_STRING() if dealing with a NULL string
rather than ignoring it. There's an argument about simplicity, IMO,
still it is true that ignoring a NULL value reduces the entropy of the
query ID. So I'd be OK with that if you feel strongly about this
point, at it sound to me that you are? This could be better than a
hardcoded '\0' byte as we could use a counter and
JUMBLE_FIELD_SINGLE() in the NULL case of JUMBLE_STRING(). It's not
proved to be a problem yet, and there's value in having a simpler
solution, as well. Anyway, one line I'd like to draw is that
appendJumble() remains as it is written currently. One extra thing
that We could do is to update it so as the size of an item given is
always more than zero, with an Assert(), to enforce a stricter policy.
--
Michael

Attachment Content-Type Size
v4-0001-Add-more-entropy-to-query-jumbling.patch text/x-diff 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sungwoo Chang 2025-03-17 00:55:07 Re: dsm_registry: Add detach and destroy features
Previous Message Tom Lane 2025-03-17 00:19:32 Re: Parallel tests publication and subscription might fail due to concurrent tuple update