Quick Links

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Subject:	Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date:	2025-03-24 21:23:18
Message-ID:	11c709e1-9b17-4c00-9ead-93609a09b9db@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 3/23/25 22:16, David Rowley wrote:
> On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> Can you explain why "Estimated Capacity" and "Estimated Distinct
> Lookup Keys" don't answer that? If there are more distinct lookup
> keys than there is capacity to store them, then some will be evicted.
I wouldn't say these parameters don't answer. I try to debate how usable
they are. To be more practical, let me demonstrate the example:

EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x;

Nested Loop (cost=0.44..7312.65 rows=211330 width=33)
-> Seq Scan on t1 (cost=0.00..492.00 rows=30000 width=22)
-> Memoize (cost=0.44..3.82 rows=7 width=11)
Cache Key: t1.x
Cache Mode: logical
Estimated Capacity: 1001 Estimated Distinct Lookup Keys: 1001
-> Index Scan using t2_x_idx2 on t2 (cost=0.43..3.81 rows=7)
Index Cond: (x = t1.x)

At first, I began to look for documentation because it was unclear what
both new parameters specifically meant. Okay, there was no documentation
but trivial code, and after a short discovery, I realised the meaning.
The first fact I see from this EXPLAIN is that Postgres estimates it has
enough memory to fit all the entries. Okay, but what does it give me? I
may just increase work_mem and provide the query with more memory if
needed. My main concern is how frequently this cache is planned to be
used. Doing some mental effort, I found the line "rows=30000."
Calculating a bit more, I may suppose it means that we have a 95% chance
to reuse the cache. Okay, I got it.
Now, see:
1. I needed to discover the meaning of the new parameters because they
were different from the earlier "hit" and "miss."
2. I need to find a common JOIN for keys of this node. Imagine a typical
200-row EXPLAIN with 2-3 Memoization keys from different tables.
3. I need to make calculations

On the opposite, the hit ratio, written instead, already known by
analogy, already provides me with necessary cache efficacy data; no need
to watch outside the node; it may be easily compared with the actual
value. Am I wrong?

Both approaches provide the data, but each one is more usable?
I think we may ask more people, for example, Nikolay Samokhvalov, who,
as I heard, works hard with explains.

>
> Once again, I'm not necessarily objecting to hit and evict ratios
> being shown, I just want to know they're actually useful enough to
> show and don't just bloat the EXPLAIN output needlessly. So far your
> arguments aren't convincing me that they are.
I'm -1 for this redundancy.

--
regards, Andrei Lepikhov

In response to

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment at 2025-03-23 21:16:18 from David Rowley

Responses

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment at 2025-03-24 22:05:16 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2025-03-24 21:25:41	Re: Snapshot related assert failure on skink
Previous Message	Nathan Bossart	2025-03-24 21:20:01	Re: vacuum_truncate configuration parameter and isset_offset