Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date: 2025-03-13 16:46:35
Message-ID: CAA5RZ0to4CdDnEJgKyr93oG+_KTptNXsDhfanX2EhqyRfvCujA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> If we add something for NULLs, we'll always add distinctClause before
> the sortClause. If the distinctClause is NULL we won't end up with the
> same jumble bytes as if the sortClause were NULL, as the order the
> NULL byte(s) are added to the buffer changes.

That is how I understand it as well. By appending a single character null
terminator to the jumble for every NULL expression, we change the composition
of the final jumble. Passing offsets will accomplish the same thing, but I can't
see how it buys us any additional safeguards.

> I suspect that the overhead will be minimal for all the approaches I'm
> seeing on this thread, but it would not hurt to double-check all that.

Perhaps my concern is overblown. Also, it seems the consensus is for a more
comprehensive solution than that of variant A.

Here is an idea I was playing around with. Why don't we just append a null
terminator for every expression (a variant of variant B)?
I describe this as jumbling the expression position. And if we do
that, it no longer
seems necessary to jumble the expression type either. right?

+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -236,6 +236,13 @@ do { \
if (expr->str) \
AppendJumble(jstate, (const unsigned char *)
(expr->str), strlen(expr->str) + 1); \
} while(0)
+#define JUMBLE_POSITION() \
+do { \
+ char nullterm = '\0'; \
+ AppendJumble(jstate, (const unsigned char *) &(nullterm),
sizeof(nullterm)); \
+ if (expr == NULL) \
+ return; \
+} while(0)

#include "queryjumblefuncs.funcs.c"

@@ -244,17 +251,15 @@ _jumbleNode(JumbleState *jstate, Node *node)
{
Node *expr = node;

- if (expr == NULL)
- return;
-
/* Guard against stack overflow due to overly complex expressions */
check_stack_depth();

/*
- * We always emit the node's NodeTag, then any additional
fields that are
- * considered significant, and then we recurse to any child nodes.
+ * We represent a position of a field in the jumble with a null-term.
+ * Doing so ensures that even NULL expressions are accounted for in
+ * the jumble.
*/
- JUMBLE_FIELD(type);
+ JUMBLE_POSITION();

switch (nodeTag(expr))
{

> As the overhead of a single query jumbling is weightless compared to
> the overall query processing

yeah, that is a good point. At least benchmarking the above on a
SELECT only pgbench did not reveal any regression.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-03-13 16:50:32 DROP TABLESPACE waits for checkpoint with lwlock held
Previous Message Tomas Vondra 2025-03-13 16:37:39 Advanced Patch Feedback Session / pgconf.dev 2025