From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: strange slow query - lost lot of time somewhere |
Date: | 2022-05-03 02:30:35 |
Message-ID: | CAApHDvp5yS=-yoXpjwQzJ-QobaAXJqG99RSepnK9bHE0trbOGA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 3 May 2022 at 13:43, David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0
> I don't understand the adjustment factor ndistinct/calls
I've attached a spreadsheet showing you the impact of subtracting
(ndistinct / calls). What this is correcting for is the fact that the
first scan from each unique value is a cache miss. The more calls we
have, the more hits we'll get. If there was only 1 call per distinct
value then there'd never be any hits. Without subtracting (ndistinct /
calls) and assuming there's space in the cache for each ndistinct
value, we'd assume 100% cache hit ratio if calls == ndistinct. What
we should assume in that case is a 0% hit ratio as the first scan for
each distinct parameter must always be a miss as we've never had a
chance to cache any tuples for it yet.
> This is a "rescan" so aside from cache management isn't the cost of originally populating the cache already accounted for elsewhere?
The cost of the first scan is calculated in create_memoize_path().
Since the first scan will always be a cache miss, the code there just
adds some cache management surcharges. Namely:
/*
* Add a small additional charge for caching the first entry. All the
* harder calculations for rescans are performed in cost_memoize_rescan().
*/
pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost;
pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost;
David
Attachment | Content-Type | Size |
---|---|---|
memoize_cache_hits.ods | application/vnd.oasis.opendocument.spreadsheet | 9.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2022-05-03 02:31:16 | Re: strange slow query - lost lot of time somewhere |
Previous Message | David Rowley | 2022-05-03 02:13:18 | Re: strange slow query - lost lot of time somewhere |