Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>, Sami Imseih <samimseih(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date: 2025-03-13 23:34:53
Message-ID: Z9NrnVk7MtMe8uNF@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 12, 2025 at 03:28:27PM +1300, David Rowley wrote:
> If we account for the NULLs, those two cases become: "<windowClause
> bytes><distinctClause bytes><byte for NULL sortClause>" and
> "<windowClause bytes><byte for NULL distinct clause><sortClause
> bytes>", which is going to be highly unlikely to hash to the same
> value due to the buffer not being the same.
>
> Can you explain where my understanding is wrong?

(I am a bit busy this week, sorry for the delay)

I have looked at the patches sent at
https://www.postgresql.org/message-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain,
and they don't really address what I'm trying to point at here.

Custom jumbling functions could still be problematic, making
uneffective a NULL check at the beginning of jumbleNode().

Here is an example, possible by design, that would still be a problem:
static void
_jumbleNodeFoo(JumbleState *jstate, Node *node)
{
Foo *expr = (Foo *) node;

if (expr->field1 == 0)
return;

JUMBLE_FIELD(field2);
[..]
}

typedef struct Foo
{
pg_node_attr(custom_query_jumble)

NodeTag type;
int field1;
char *field2;
[..]
}

Then we have the same problem with another Node using this Foo node
two times in a row, depending on how it's used by the query parsing
and transform:
typedef struct FooBar
{
pg_node_attr(custom_query_jumble)

NodeTag type;
Foo *foo1;
Foo *foo2;
[..]
}

Adding a depth level incremented once we go through a Node would add
more entropy, but it may not ve completely bullet-proof either, even
if we add an offset. I am pretty sure I could find a Node structure
that acts as counter example even if you add a depth level and an
offset in the jumbling..
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-03-13 23:57:51 Re: md.c vs elog.c vs smgrreleaseall() in barrier
Previous Message Jacob Champion 2025-03-13 23:31:08 Re: ecdh support causes unnecessary roundtrips