Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Lukas Fittl <lukas(at)fittl(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date: 2025-03-28 12:20:44
Message-ID: 67b5c84a-dedd-4103-af7e-0117046102fe@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Then we need to decide clearly what exactly to display in EXPLAIN for
the Memoize node: absolute values (estimated distinct keys and estimated
cache capacity) or ratios (hit_ratio and evict_ratio). Ratios have the
advantage of quickly reflecting the overall effectiveness of Memoize.
However, absolute values have a significant advantage as they explicitly
reveal the reason of Memoize's poor performance, making problem
diagnosis simpler.

With absolute values, users can directly understand the underlying
reason for poor performance. For example: insufficient memory (capacity
< distinct keys), inaccurate planner statistics (distinct keys
significantly different from actual values), poorly ordered keys
(capacity ~ distinct keys, but frequent evictions as seen in the
Evictions parameter), or Memoize simply not being beneficial (capacity ~
distinct keys ~ calls). Ratios, by contrast, only reflect the final
outcome without clearly indicating the cause or the specific steps
needed to resolve the issue.

Thus, absolute values do more than just inform users that a problem
exists; they provide actionable details that enable users to directly
address the problem (increase work_mem, refresh statistics, create
extended statistics, or disable Memoize entirely). Additionally, no
other plan nodes in PostgreSQL currently use a similar ratio-based
approach - everywhere else absolute values are consistently shown (e.g.,
number of rows, buckets, batches, memory used, etc.). Using absolute
values in Memoize maintains consistency with existing practice.

I've updated the patch to v5, since the new parameter est_unique_keys in
make_memoize() is now placed near est_entries, which is more logical and
readable than putting it at the end.

Any thoughts?

--
Best Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v5-0001-Show-ndistinct-and-est_entries-in-EXPLAIN-for-Memoize.patch text/x-patch 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2025-03-28 12:22:39 Re: SQLFunctionCache and generic plans
Previous Message Andrei Lepikhov 2025-03-28 12:15:35 Re: POC, WIP: OR-clause support for indexes