Quick Links

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

From:	Bykov Ivan <i(dot)bykov(at)modernsys(dot)ru>
To:	Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Sami Imseih <samimseih(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date:	2025-03-17 07:33:42
Message-ID:	b220da2ce9a446bd90ff48ed9508786f@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello, Michael!

> So, here is attached a counter-proposal, where we can simply added a
> counter tracking a node count in _jumbleNode() to add more entropy to
> the mix, incrementing it as well for NULL nodes.

It definitely looks like a more reliable solution than my variant, which only
counts NULL nodes.

However, we already knew about the overhead of adding `\0` bytes for
every NULL field.

> So that adds about 9.1% overhead to jumbling, on average.

See:
https://www.postgresql.org/message-id/flat/5ac172e0b77a4baba50671cd1a15285f%40localhost.localdomain#6c43f354f5f42d2a27e6824faa660a86

Is it really worth spending extra execution time to increase entropy
when we have non-NULL nodes?

Maybe we should choose to add node_count to the hash every time we visit
non-NULL or NULL nodes.
We could also add entropy if we see a change in the node->type value for
non-NULL variants.

Your Variant
------------

< node_count = 1 > < node 1 >
< node_count = 2 > /* node 2 = NULL */
< node_count = 3 > < node 3 >

Alternative 1 (mark only NULL Nodes)
------------------------------------

/* node_count = 1 */ < node 1 >
< node_count = 2 > /* node 2 = NULL */
/* node_count = 3 */ < node 3 >

Alternative 2 (mark only non-NULL Nodes)
----------------------------------------
This could address concerns about problems related to visiting nodes with the
same content placed in different query tree branches.

< node_count = 1 > < node 1 >
/* node_count = 2 */ /* node 2 = NULL */
< node_count = 3 > < node 3 >

In response to

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET at 2025-03-17 00:52:46 from Michael Paquier

Responses

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET at 2025-03-17 09:03:37 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jakub Wartak	2025-03-17 07:43:55	Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message	Jakub Wartak	2025-03-17 07:28:46	Re: Draft for basic NUMA observability