From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Ants Aasma <ants(at)cybertec(dot)at> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Clock sweep not caching enough B-Tree leaf pages? |
Date: | 2014-04-15 23:30:39 |
Message-ID: | CAM3SWZTwhhW9WiniQa8e56LfoxMW=wBm1dPZ_mZ2DDSSnxdOaQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> PostgreSQL replacement algorithm is more similar to Generalized CLOCK
> or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm
> that approximates LIRS[3]. LIRS is what MySQL implements[4] and
> CLOCK-Pro is implemented by NetBSD [5] and there has been some work on
> trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double
> the cache size metadata entries and detect pages that have been
> recently referenced. Basically they provide an adaptive tradeoff
> between LRU and LFU.
That's good to know.
> There's a paper on a non blocking GCLOCK algorithm, that does lock
> free clock sweep and buffer pinning[7]. If we decide to stay with
> GCLOCK it may be interesting, although I still believe that some
> variant of buffer nailing[8] is a better idea, my experience shows
> that most of the locking overhead is cache line bouncing ignoring the
> extreme cases where our naive spinlock implementation blows up.
You might be right about that, but lets handle one problem at a time.
Who knows what the bottleneck will end up being if and when we address
the naivety around frequency? I want to better characterize that
problem first.
> There has been some research that indicates that for TPC-A workloads
> giving index pages higher weights increases hitrates[1].
Frankly, there doesn't need to be any research on this, because it's
just common sense that probabilistically, leaf pages are much more
useful than heap pages in servicing index scan queries if we assume a
uniform distribution. If we don't assume that, then they're still more
useful on average.
> I think the hardest hurdle for any changes in this area will be
> showing that we don't have any nasty regressions. I think the best way
> to do that would be to study separately the performance overhead of
> the replacement algorithm and optimality of the replacement choices.
> If we capture a bunch of buffer reference traces by instrumenting
> PinBuffer, we can pretty accurately simulate the behavior of different
> algorithm and tuning choices with different shared buffer sizes.
> Obviously full scale tests are still needed due to interactions with
> OS, controller and disk caches and other miscellaneous influences. But
> even so, simulation would get us much better coverage of various
> workloads and at least some confidence that it's a good change
> overall. It will be very hard and time consuming to gather equivalent
> evidence with full scale tests.
I think I agree with all of that. The fact that we as a community
don't appear to have too much to say about what workloads to
prioritize somewhat frustrates this. The other problem is that sizing
shared_buffers appropriately involves a surprising amount of deference
to rules of thumb that in practice no one is quite prepared to
rigorously defend - who is to say what apportionment of memory to
Postgres is appropriate here? I too was hopeful that we could evaluate
this work purely in terms of observed improvements to hit rate (at
least initially), but now I doubt even that. It would be great to be
able to say "here are the parameters of this discussion", and have
everyone immediately agree with that, but in this instance that's
legitimately not possible.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2014-04-16 00:36:50 | Re: Patch: iff -> if |
Previous Message | Tom Lane | 2014-04-15 23:25:32 | Re: Question about optimising (Postgres_)FDW |