From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Date: | 2025-03-20 10:37:35 |
Message-ID: | CAApHDvpq2gOt3gnzkd4Ud=wrT7YRGrF3zb29jgWzDsYEhETrOA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 20 Mar 2025 at 21:48, Ilia Evdokimov
<ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
> -> Memoize (cost=0.30..0.41 rows=1 width=4)
> Cache Key: t2.thousand
> Cache Mode: logical
> Cache Estimated Entries: 655
> Cache Estimated NDistinct: 721
> -> Index Only Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..0.40 rows=1 width=4)
> Index Cond: (unique1 = t2.thousand)
> (11 rows)
>
> Additionally, since this information would only be shown in EXPLAIN when costs are enabled, it should not cause any performance regression in normal execution. However, reviewers should be especially careful when verifying test outputs, as this change could affect plan details in regression tests.
>
> Any thoughts?
> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated Entries: %d\n", ((Memoize *) plan)->est_entries);
> + ExplainIndentText(es);
> + appendStringInfo(es->str, "Cache Estimated NDistinct: %0.0f\n", ((Memoize *) plan)->ndistinct);
est_entries is a uint32, so %u is the correct format character for that type.
I don't think you need to prefix all these properties with "Cache"
just because the other two properties have that prefix. I also don't
think the names you've chosen really reflect the meaning. How about
something like: "Estimated Distinct Lookup Keys: 721 Estimated
Capacity: 655", in that order. I think maybe having that as one line
for format=text is better than 2 lines. The EXPLAIN output is already
often taking up more lines in v18 than in v17, would be good to not
make that even worse unnecessarily.
I see the existing code there could use ExplainPropertyText rather
than have a special case for text and non-text formats. That's likely
my fault. If we're touching this code, then we should probably tidy
that up. Do you want to create a precursor fixup patch for that?
+ double ndistinct; /* Estimated number of distinct memoization keys,
+ * used for cache size evaluation. Kept for EXPLAIN */
Maybe this field in MemoizePath needs a better name. How about
"est_unique_keys"? and also do the rename in struct Memoize.
I'm also slightly concerned about making struct Memoize bigger. I had
issues with a performance regression [1] for 908a96861 when increasing
the WindowAgg struct size last year and the only way I found to make
it go away was to shuffle the fields around so that the struct size
didn't increase. I think we'll need to see a benchmark of a query that
hits Memoize quite hard with a small cache size to see if the
performance decreases as a result of adding the ndistinct field. It's
unfortunate that we'll not have the luxury of squeezing this double
into padding if we do see a slowdown.
David
[1] https://postgr.es/m/flat/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com
From | Date | Subject | |
---|---|---|---|
Next Message | Vladlen Popolitov | 2025-03-20 10:40:51 | Re: PoC. The saving of the compiled jit-code in the plan cache |
Previous Message | jian he | 2025-03-20 10:34:23 | Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints |