Quick Links

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date:	2025-03-20 18:54:07
Message-ID:	8e12aa55-39eb-4e03-a8f7-077fe02309c3@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 20/3/2025 15:03, Ilia Evdokimov wrote:
>
> On 20.03.2025 15:32, Andrei Lepikhov wrote:
>> I quite frequently need the number of distinct values (or groups)
>> predicted during the Memoize node creation to understand why caching
>> is sometimes employed or not.
>> But I had thought about an alternative way: having an extensible
>> EXPLAIN (thanks to Robert), we may save optimisation-stage data (I
>> have the same necessity in the case of IncrementalSort, for example)
>> and put it into the Plan node on-demand. So, the way I want to go is a
>> Plan::extlist node and create_plan hook, which may allow copying
>> best_path data to the final plan. So, here, we will add a new
>> parameter and avoid touching the core code.
>> But I would give +1 to current approach if it were done in a shorter
>> time.
>
>
> Then before proceeding further, I think we need to benchmark this change
> to ensure there are no performance regressions. If performance issues
> arise, then using extensible EXPLAIN might be the only viable approach.
>
> I have addressed most of the review comments except for the
> ExplainPropertyText change. I am attaching v3 of the patch with these
> updates. If anyone notices any performance issues, please let me know.
> Issue with ExplainPropertyText for this thread I'll fix in the next
> patches if it will be necessary.
>
> So far, I have tested it on small machines. There are no performance
> issues yet. I'll run benchmarks on larger ones soon.
I have some doubts here.
The number of distinct values says something only when it has taken
together with the number of calls.
Frequently, one of the caching keys is from outer table A (10 tuples),
and another is from outer table B (100 tuples). Calculating the number
of successful cache fetches predicted by the planner may not be evident
in the case of a composite cache key.

What I may propose here is:
1. Use fraction of calls, for example - 50% duplicated key values.
2. Show the calculated hit and eviction ratio.

The second option looks better to me. It is pretty understandable by a
user and may be compared to the numbers obtained during the execution.

--
regards, Andrei Lepikhov

In response to

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment at 2025-03-20 14:03:25 from Ilia Evdokimov

Responses

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment at 2025-03-21 02:50:11 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-03-20 18:54:14	Re: AIO v2.5
Previous Message	Robert Haas	2025-03-20 18:32:49	Re: why there is not VACUUM FULL CONCURRENTLY?