From: | Sami Imseih <samimseih(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
Date: | 2025-03-13 16:46:35 |
Message-ID: | CAA5RZ0to4CdDnEJgKyr93oG+_KTptNXsDhfanX2EhqyRfvCujA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> If we add something for NULLs, we'll always add distinctClause before
> the sortClause. If the distinctClause is NULL we won't end up with the
> same jumble bytes as if the sortClause were NULL, as the order the
> NULL byte(s) are added to the buffer changes.
That is how I understand it as well. By appending a single character null
terminator to the jumble for every NULL expression, we change the composition
of the final jumble. Passing offsets will accomplish the same thing, but I can't
see how it buys us any additional safeguards.
> I suspect that the overhead will be minimal for all the approaches I'm
> seeing on this thread, but it would not hurt to double-check all that.
Perhaps my concern is overblown. Also, it seems the consensus is for a more
comprehensive solution than that of variant A.
Here is an idea I was playing around with. Why don't we just append a null
terminator for every expression (a variant of variant B)?
I describe this as jumbling the expression position. And if we do
that, it no longer
seems necessary to jumble the expression type either. right?
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -236,6 +236,13 @@ do { \
if (expr->str) \
AppendJumble(jstate, (const unsigned char *)
(expr->str), strlen(expr->str) + 1); \
} while(0)
+#define JUMBLE_POSITION() \
+do { \
+ char nullterm = '\0'; \
+ AppendJumble(jstate, (const unsigned char *) &(nullterm),
sizeof(nullterm)); \
+ if (expr == NULL) \
+ return; \
+} while(0)
#include "queryjumblefuncs.funcs.c"
@@ -244,17 +251,15 @@ _jumbleNode(JumbleState *jstate, Node *node)
{
Node *expr = node;
- if (expr == NULL)
- return;
-
/* Guard against stack overflow due to overly complex expressions */
check_stack_depth();
/*
- * We always emit the node's NodeTag, then any additional
fields that are
- * considered significant, and then we recurse to any child nodes.
+ * We represent a position of a field in the jumble with a null-term.
+ * Doing so ensures that even NULL expressions are accounted for in
+ * the jumble.
*/
- JUMBLE_FIELD(type);
+ JUMBLE_POSITION();
switch (nodeTag(expr))
{
> As the overhead of a single query jumbling is weightless compared to
> the overall query processing
yeah, that is a good point. At least benchmarking the above on a
SELECT only pgbench did not reveal any regression.
--
Sami Imseih
Amazon Web Services (AWS)
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-03-13 16:50:32 | DROP TABLESPACE waits for checkpoint with lwlock held |
Previous Message | Tomas Vondra | 2025-03-13 16:37:39 | Advanced Patch Feedback Session / pgconf.dev 2025 |