From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Lukas Fittl <lukas(at)fittl(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com> |
Subject: | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Date: | 2025-03-28 12:20:44 |
Message-ID: | 67b5c84a-dedd-4103-af7e-0117046102fe@tantorlabs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Then we need to decide clearly what exactly to display in EXPLAIN for
the Memoize node: absolute values (estimated distinct keys and estimated
cache capacity) or ratios (hit_ratio and evict_ratio). Ratios have the
advantage of quickly reflecting the overall effectiveness of Memoize.
However, absolute values have a significant advantage as they explicitly
reveal the reason of Memoize's poor performance, making problem
diagnosis simpler.
With absolute values, users can directly understand the underlying
reason for poor performance. For example: insufficient memory (capacity
< distinct keys), inaccurate planner statistics (distinct keys
significantly different from actual values), poorly ordered keys
(capacity ~ distinct keys, but frequent evictions as seen in the
Evictions parameter), or Memoize simply not being beneficial (capacity ~
distinct keys ~ calls). Ratios, by contrast, only reflect the final
outcome without clearly indicating the cause or the specific steps
needed to resolve the issue.
Thus, absolute values do more than just inform users that a problem
exists; they provide actionable details that enable users to directly
address the problem (increase work_mem, refresh statistics, create
extended statistics, or disable Memoize entirely). Additionally, no
other plan nodes in PostgreSQL currently use a similar ratio-based
approach - everywhere else absolute values are consistently shown (e.g.,
number of rows, buckets, batches, memory used, etc.). Using absolute
values in Memoize maintains consistency with existing practice.
I've updated the patch to v5, since the new parameter est_unique_keys in
make_memoize() is now placed near est_entries, which is more logical and
readable than putting it at the end.
Any thoughts?
--
Best Regards,
Ilia Evdokimov,
Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Show-ndistinct-and-est_entries-in-EXPLAIN-for-Memoize.patch | text/x-patch | 5.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Pyhalov | 2025-03-28 12:22:39 | Re: SQLFunctionCache and generic plans |
Previous Message | Andrei Lepikhov | 2025-03-28 12:15:35 | Re: POC, WIP: OR-clause support for indexes |