From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Date: | 2025-03-21 02:50:11 |
Message-ID: | CAApHDvqHe15LL=K5j71qxLdPCaFFPeohtBe7=sK29Ord3ZVQcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 21 Mar 2025 at 07:54, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> I have some doubts here.
> The number of distinct values says something only when it has taken
> together with the number of calls.
Couldn't the reader just look at the Nested Loop's outer side row
estimate for that?
> Frequently, one of the caching keys is from outer table A (10 tuples),
> and another is from outer table B (100 tuples). Calculating the number
> of successful cache fetches predicted by the planner may not be evident
> in the case of a composite cache key.
>
> What I may propose here is:
> 1. Use fraction of calls, for example - 50% duplicated key values.
> 2. Show the calculated hit and eviction ratio.
My concern with showing just the estimated hit and evict ratios it
that it might lead to more questions, like how those were calculated.
However, I can see the argument for having one or both of these in
addition to the expected unique keys and expected cache capacity.
I think the primary factors in how useful Memoize is are: 1) How many
items do we expect to be able to store in the cache concurrently, and;
2) How many unique lookups keys do we expect to be looked up, and; 3)
The total number of expected lookups. #1 is quite difficult to
figure out (maybe by looking at row width and row estimates) and
there's just no information about #2. #3 is already shown, in the
Nested Loop's outer side.
The reason that the hit and evict ratios might also be useful are that
it you need a bit of inside knowledge on how the 3 input values are
used to calculate these. For example you might come up with a higher
estimated hit ratio if you assumed the keys arrived in order vs
unordered. If they're unordered and you don't have room for all keys
in the cache at once then that increases the chances that Memoize had
to evict something that will be needed for a future lookup.
Also, I was just looking back at the concern I had with increasing the
size of struct Memoize. I suspect that might not be that much of a
concern. The WindowAgg problem I mentioned was with the executor
state, not the plan node. I see the only time we access the plan node
for Memoize during execution is when we call build_hash_table().
David
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2025-03-21 03:24:46 | Add \pset options for boolean value display |
Previous Message | Andres Freund | 2025-03-21 01:58:37 | Re: AIO v2.5 |