Quick Links

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Sami Imseih <samimseih(at)gmail(dot)com>, Ivan Bykov <I(dot)Bykov(at)modernsys(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date:	2025-03-18 07:59:44
Message-ID:	Z9kn8B_8lxp-Y03-@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Mar 18, 2025 at 05:24:42PM +1300, David Rowley wrote:
> I had a look at this and it seems the main difference will be that
> this patch will protect against the case that a given node is non-null
> but has a custom jumble function which selects to not jumble anything
> in some cases. Since Ivan's patch only adds jumbling for non-null,
> that could lead to the same bug if a custom jumble function decided to
> not jumble anything at all.

Yeah.

> It's certainly hard to see much slowdown between master and v4, but it
> is visible if I sort the results and put them in a line graph and
> scale the vertical axis a bit. A 0.7% slowdown. See attached.

Thanks for the numbers. I'd like to say that this is within noise,
but that's clearly not if repeatable, as you are pointing out. Ugh.

> I do agree that your v4 fixes the issue at hand and I'm not going to
> stand in your way if you want to proceed with it. However, from my
> point of view, it seems that we could achieve the same goal with much
> less overhead by just insisting that custom jumble functions always
> jumble something. We could provide a jumble function to do that if the
> custom function conditionally opts to jumble nothing. That would save
> having to add this additional jumbling of the node count.

If we make the whole cheaper with only extra entropy added for NULLs
in nodes and strings, I'd rather have an insurance policy for the
custom functions. Something like that:
- Forbid a size of 0 in AppendJumble().
- When dealing with a non-NULL case in _jumbleNode(), save the initial
jumble_len and the jumble contents when starting, then complain if the
jumble_len matches with the initial length at the end and if the
contents are the same in an assert. A check on the length may be
enough, but we'd depend on JUMBLE_SIZE and nodes can get pretty big.

What do you think?

> Also, am I right in thinking that there's no way for an extension to
> add a custom jumble function? If so, then we have full control over
> any custom jumble function. That would make it easier to ensure these
> always jumble something.

Currently, no, that would not be possible through this code path. The
only way would be to fork the code as we rely entirely on the code
generated from the in-core headers.
--
Michael

In response to

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET at 2025-03-18 04:24:42 from David Rowley

Responses

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET at 2025-03-18 08:24:06 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2025-03-18 08:11:02	Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits
Previous Message	Hayato Kuroda (Fujitsu)	2025-03-18 07:33:15	RE: pg_recvlogical requires -d but not described on the documentation