Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date: 2025-04-14 16:31:22
Message-ID: CA+TgmobDHmaNTqi5yLPJoW6fTCuiWOgF9CSCSHX_OTLmGxYRUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 1, 2025 at 5:07 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I tried to move things along to address Tom's concern about not
> wanting to show this in EXPLAIN's standard output. I suggested in [1]
> that we could use EXPLAIN ANALYZE, but nobody commented on that.
> EXPLAIN ANALYZE is much more verbose than EXPLAIN already, and if we
> put these in EXPLAIN ANALYZE then the viewer can more easily compare
> planned vs actual. I had mentioned that Hash Join does something like
> this for buckets.

I don't think we should use ANALYZE for this because, IME, that should
just be about whether the query gets executed. Since this looks like
information that is available at plan time, I think it should be
displayed all the time or made contingent on VERBOSE. It would be sad
if you had to run the query to get information that was available
without needing to run the query. Personally, I think we can probably
just display it all the time. I mean, the typical plan is not going to
contain enough Memoize nodes that a little extra chatter for each one
impedes readability significantly.

As far as what to display, I have sympathy with Lukas's initial
complaint that it was hard to understand the costing of Memoize. I
haven't faced this exact problem, I think because Memoize isn't that
often used in plans I have seen, but I've faced a lot of similar
problems where you have to try to work backwards painfully to figure
out the chain of events that led to the value you're actually seeing,
and I'm +1 for efforts to make it more clear.

Having looked at v6, I think it would help, at least if the reader is
sufficiently knowledgeable. From the values displayed, it looks like
someone could reconstruct the evict_ratio value in
cost_memoize_rescan(), and if they know the loop count, also the
hit_ratio. But that seems hard: if you weren't reading the code, how
would you know how to do it? Even if you are reading the code, are you
sure you'd reconstruct it correctly? I wonder why we think it's better
to display this than a more "cooked" number like the estimated hit
ratio that was proposed in v1?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitrios Apostolou 2025-04-14 16:31:48 [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Previous Message Andrew Dunstan 2025-04-14 16:28:15 Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore