Re: [HACKERS] Clock with Adaptive Replacement

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Clock with Adaptive Replacement
Date: 2018-07-09 23:24:43
Message-ID: CAH2-WzkrKKW88Yq4-BLN5N3DrFv93P8r+ti-KfbTeR08P8ckBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 1, 2018 at 11:58 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> All of that having been said, maybe we have an independent low-level
>> problem: we increase usage_count when we pin a buffer, even if we last
>> pinned the buffer 5ms ago. IIRC a change to this went in when ARC went
>> in (and came out with ARC, too). This is more of a problem with
>> miscounting the number of hits based on accidents around how things
>> are structured in the executor -- we need to be more careful about
>> recognizing whether or not block hits are logically distinct.
>> Particularly with stuff like nested loop joins.
>
> I agree that double-counting correlated accesses is a a problem, and I
> agree that we probably want to do something about it. I am skeptical
> that using wall-clock time is the right way to solve that problem
> because it leads to edge effects -- if for example there is a 5ms
> timeout for bumping the usage count again, then you can probably
> manufacture a workload where the buffer eviction pattern is
> dramatically different for a nested loop that hits a given page every
> 4.5ms than it is for a nested loop that hits a given page every 5.5ms.
> Note that CART aims to solve this problem in a way that doesn't
> involve any fixed amount of wall-clock time.

index_fetch_heap() calls ReleaseAndReadBuffer(), which will notice
when the buffer it has been asked to read and pin is the same as the
buffer it also needs to release/unpin. If they're the same buffer,
which is probably very common, then it simply returns the existing
buffer without releasing its pin. Naturally, pinning and unpinning
increments the heap page's usage_count, which is really bad when
references are correlated like this, so this is a really good idea.
(Of course, we'd also like to avoid the actual work of pinning and
unpinning, but that's arguably secondary -- this
ReleaseAndReadBuffer() logic went in as part of the original 2005
clock sweep commit, so it seem to actually mostly be about correlated
references.)

I wonder if the significant improvements that I've noticed from the
pending v3 of my nbtree-unique-key-heap-tid patch for certain
workloads are related to ReleaseAndReadBuffer(), and if so to what
degree. It's not hard to imagine how it could be a real problem that
heap index tuple heap TIDs are in random order. ReleaseAndReadBuffer()
could end up repeatedly pinning and unpinning the same heap pages when
they appear again and again out of order on an nbtree leaf page. If
that's true, then it might not matter how smart our algorithm is.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-07-09 23:29:42 Re: [HACKERS] Possibly too stringent Assert() in b-tree code
Previous Message Peter Geoghegan 2018-07-09 22:49:10 Re: Locking B-tree leafs immediately in exclusive mode