Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date: 2025-03-20 08:47:56
Message-ID: bea2a2ae-3a95-4740-8926-837bacc7bd4e@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 06.07.2023 11:27, Lukas Fittl wrote:
> On Thu, Jul 6, 2023 at 12:56 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>
> Lukas: do you have an updated patch for this commitfest to address
> David's
> comments?
>
>
> I have a draft - I should be able to post an updated patch in the next
> days. Thanks for checking!
>
> Thanks,
> Lukas
>
> --
> Lukas Fittl

Hi hackers,

While debugging a query execution plan involving Memoize, it'd be nice
to determine how many unique keys would fit into the cache. The
est_entries value provides some insight, but without knowing ndistinct,
it is unclear whether the cache is large enough to hold all unique keys
or if some will be evicted.

Given its potential usefulness, I would like to work for this. I
attached v2 patch with changes.

Example from memoize.sql

EXPLAIN SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
WHERE t2.unique1 < 1200;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Aggregate  (cost=815.12..815.13 rows=1 width=40)
   ->  Nested Loop  (cost=0.30..809.12 rows=1200 width=4)
         ->  Seq Scan on tenk1 t2  (cost=0.00..470.00 rows=1200 width=4)
               Filter: (unique1 < 1200)
         ->  Memoize  (cost=0.30..0.41 rows=1 width=4)
               Cache Key: t2.thousand
               Cache Mode: logical
               Cache Estimated Entries: 655
               Cache Estimated NDistinct: 721
               ->  Index Only Scan using tenk1_unique1 on tenk1 t1 
(cost=0.29..0.40 rows=1 width=4)
                     Index Cond: (unique1 = t2.thousand)
(11 rows)

Additionally, since this information would only be shown in EXPLAIN when
costs are enabled, it should not cause any performance regression in
normal execution. However, reviewers should be especially careful when
verifying test outputs, as this change could affect plan details in
regression tests.

Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v2-0001-Show-ndistinct-and-est_entries-in-EXPLAIN-for-Memoize.patch text/x-patch 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ni Ku 2025-03-20 08:55:47 Re: Changing shared_buffers without restart
Previous Message Ryo Kanbayashi 2025-03-20 08:39:59 Re: PGSERVICEFILE as part of a normal connection string