From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com> |
Subject: | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Date: | 2025-03-24 21:23:18 |
Message-ID: | 11c709e1-9b17-4c00-9ead-93609a09b9db@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/23/25 22:16, David Rowley wrote:
> On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> Can you explain why "Estimated Capacity" and "Estimated Distinct
> Lookup Keys" don't answer that? If there are more distinct lookup
> keys than there is capacity to store them, then some will be evicted.
I wouldn't say these parameters don't answer. I try to debate how usable
they are. To be more practical, let me demonstrate the example:
EXPLAIN (COSTS OFF) SELECT * FROM t1,t2 WHERE t1.x=t2.x;
Nested Loop (cost=0.44..7312.65 rows=211330 width=33)
-> Seq Scan on t1 (cost=0.00..492.00 rows=30000 width=22)
-> Memoize (cost=0.44..3.82 rows=7 width=11)
Cache Key: t1.x
Cache Mode: logical
Estimated Capacity: 1001 Estimated Distinct Lookup Keys: 1001
-> Index Scan using t2_x_idx2 on t2 (cost=0.43..3.81 rows=7)
Index Cond: (x = t1.x)
At first, I began to look for documentation because it was unclear what
both new parameters specifically meant. Okay, there was no documentation
but trivial code, and after a short discovery, I realised the meaning.
The first fact I see from this EXPLAIN is that Postgres estimates it has
enough memory to fit all the entries. Okay, but what does it give me? I
may just increase work_mem and provide the query with more memory if
needed. My main concern is how frequently this cache is planned to be
used. Doing some mental effort, I found the line "rows=30000."
Calculating a bit more, I may suppose it means that we have a 95% chance
to reuse the cache. Okay, I got it.
Now, see:
1. I needed to discover the meaning of the new parameters because they
were different from the earlier "hit" and "miss."
2. I need to find a common JOIN for keys of this node. Imagine a typical
200-row EXPLAIN with 2-3 Memoization keys from different tables.
3. I need to make calculations
On the opposite, the hit ratio, written instead, already known by
analogy, already provides me with necessary cache efficacy data; no need
to watch outside the node; it may be easily compared with the actual
value. Am I wrong?
Both approaches provide the data, but each one is more usable?
I think we may ask more people, for example, Nikolay Samokhvalov, who,
as I heard, works hard with explains.
>
> Once again, I'm not necessarily objecting to hit and evict ratios
> being shown, I just want to know they're actually useful enough to
> show and don't just bloat the EXPLAIN output needlessly. So far your
> arguments aren't convincing me that they are.
I'm -1 for this redundancy.
--
regards, Andrei Lepikhov
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2025-03-24 21:25:41 | Re: Snapshot related assert failure on skink |
Previous Message | Nathan Bossart | 2025-03-24 21:20:01 | Re: vacuum_truncate configuration parameter and isset_offset |