Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date: 2025-03-07 00:44:27
Message-ID: CAA5RZ0trf+uSN-vbJWvfiVS10FJprsx9ccASmiKwemRtXvC6yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It seems like there are multiple threads on this topic. This is the
original [0], but I suggest continuing the discussion in this thread
since it includes the examples and patches.

Regarding the issue itself, query jumbling behavior is often subjective,
making it difficult to classify as a bug. I'm not entirely sure this
qualifies as a bug either, but I do believe it should be addressed.

As I understand it, two nodes defined one after the other and in which both
could end up with the same expressions when traversed cannot be differentiated
by query jumbling when one of them is NULL. In this case, queryJumbling can't
differentiate between when limitOffset of LimitOption and they both result with
a similar FuncExpr.

By rearranging them as done in variant A, the position where the expression
is appended in the jumble changes, leading to a different hash. I just
don't like
this solution because it requires one to carefully construct a struct,
but it maybe
the best out of the other options.

Variant B is not acceptable IMO as it adds a whole bunch of null-terminators
unnecessarily. For example, in a simple "select 1", (expr == NULL) is
true 19 times,
so that is an extra 19 bytes.

I think a third option is to add a new pg_node_attr called "query_jumble_null"
and be very selective on which fields should append a null-terminator to the
jumble when the expression is null

The queryjumblefuncs.c could look like the below. JUMBLE_NULL will
be responsible for appending the null terminator.

"""
static void
_jumbleQuery(JumbleState *jstate, Node *node)
{
Query *expr = (Query *) node;
...
......
.......
if (!expr->limitOffset)
{
JUMBLE_NULL();
}
else
{
JUMBLE_NODE(limitOffset);
}
if (!expr->limitCount)
{
JUMBLE_NULL();
}
else
{
JUMBLE_NODE(limitCount);
}
"""

What do you think? Maybe someone can suggest a better solution?

Regards,

Sami Imseih
Amazon Web Services (AWS)

[0] https://www.postgresql.org/message-id/ca447b72d15745b9a877fad7e258407a@localhost.localdomain

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-03-07 00:49:40 Re: Log connection establishment timings
Previous Message Jacob Champion 2025-03-07 00:35:43 Re: [PoC] Federated Authn/z with OAUTHBEARER