Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>, Sami Imseih <samimseih(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date: 2025-03-27 01:50:53
Message-ID: Z-Su_cOAe3SzyAcc@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 27, 2025 at 12:09:35PM +1300, David Rowley wrote:
> I think it's better it's two commits. They're for different purposes.

Fine for me.

> Ok, I won't protest. I just feel the bug is a relic of the old design
> and I don't believe the tests are testing anything useful in the new
> design. It feels unlikely that someone accidentally adjusts the code
> and reverts it to the old design. Anyway, I've included the tests from
> your patch.

One habit that I've found really useful to do when it comes to this
area of the code is to apply the tests first to show what kind of
behavior we had before changing the jumbling, then apply the update to
the query jumbling. This has two benefits:
- If the update of the second phase gets reverted, we still have the
tests.
- It is possible to show in the git history how the behavior changes,
analyzing them in isolation.

So I'd suggest an extra split, with the tests committed first.

>> It is a bit disappointing that we require JumbleQuery() to jumble the
>> NULLs. There are discussions about making this code available not
>> only for Querys, but for Nodes, and the NUL computations would be
>> missed..
>
> I've adjusted the patch to add two static functions. InitJumble() and
> DoJumble(). JumbleQuery() uses these.

That's cleaner, thanks for the split. I don't think that we've
reached a consensus on the other thread about what to provide as
interface and that's likely too late for this release, what you are
proposing here is a good step forward, and will help in not messing up
again with the problem of this thread in the future.

+#ifdef USE_ASSERT_CHECKING
+ Size prev_jumble_len = jstate->total_jumble_len;
+#endif
[...]
+ /* The total number of bytes added to the jumble buffer */
+ Size total_jumble_len;
} JumbleState;

Maybe hide that in the header with one extra USE_ASSERT_CHECKING if we
are only going to increment it when assertions are enabled?

+ * These are flushed out to the jumble buffer before subsequent appends
+ * and before performing the final jumble hash.

This comment is slightly incorrect when 0001 is taken in isolation?
The flush of the NULL counter is done in the patch via
FlushPendingNulls() when jumbling the final value, not before more
appends? It becomes true with 0002, where FlushPendingNulls() is
called before appending any data.

Perhaps removing JUMBLE_FIELD_SINGLE() is better now.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-03-27 01:53:20 Re: making EXPLAIN extensible
Previous Message David G. Johnston 2025-03-27 01:42:11 Re: Possibly hard-to-read message