Re: Clock sweep not caching enough B-Tree leaf pages?

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2014-04-16 08:58:23
Message-ID: CAM3SWZT17cM2GK6T6uyVdJRxPXNt1g=7HPwZttYMy9kKi0EXzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I think this is unfortunately completely out of question. For one a
> gettimeofday() for every uffer pin will become a significant performance
> problem. Even the computation of the xact/stm start/stop timestamps
> shows up pretty heavily in profiles today - and they are far less
> frequent than buffer pins. And that's on x86 linux, where gettimeofday()
> is implemented as something more lightweight than a full syscall.

Come on, Andres. Of course exactly what I've done here is completely
out of the question as a patch that we can go and commit right now.
I've numerous caveats about bloating the buffer descriptors, and about
it being a proof of concept. I'm pretty sure we can come up with a
scheme to significantly cut down on the number of gettimeofday() calls
if it comes down to it. In any case, I'm interested in advancing our
understanding of the problem right now. Let's leave the minutiae to
one side for the time being.

> The other significant problem I see with this is that its not adaptive
> to the actual throughput of buffers in s_b. In many cases there's
> hundreds of clock cycles through shared buffers in 3 seconds. By only
> increasing the usagecount that often you've destroyed the little
> semblance to a working LRU there is right now.

If a usage_count can get to BM_MAX_USAGE_COUNT from its initial
allocation within an instant, that's bad. It's that simple. Consider
all the ways in which that can happen almost by accident.

You could probably reasonably argue that the trade-off or lack of
adaption (between an LRU and an LFU) that this particular sketch of
mine represents is inappropriate or sub-optimal, but I don't
understand why you're criticizing the patch for doing what I expressly
set out to do. I wrote "I think a very real problem that may be that
approximating an LRU is bad because an actual LRU is bad".

> It also wouldn't work well for situations with a fast changing
> workload >> s_b. If you have frequent queries that take a second or so
> and access some data repeatedly (index nodes or whatnot) only increasing
> the usagecount once will mean they'll continually fall back to disk access.

No, it shouldn't, because there is a notion of buffers getting a fair
chance to prove themselves. Now, it might well be the case that there
are workloads where what I've done to make that happen in this
prototype doesn't work out too well - I've already said so. But should
a buffer get a usage count of 5 just because the user inserted 5
tuples within a single DML command, for example? If so, why?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas 'ads' Scherbaum 2014-04-16 09:12:11 Re: Patch: iff -> if
Previous Message Boszormenyi Zoltan 2014-04-16 08:58:12 Re: ECPG FETCH readahead