Re: Direct I/O

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Christoph Berg <myon(at)debian(dot)org>, mikael(dot)kjellstrom(at)gmail(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Direct I/O
Date: 2023-04-19 17:43:14
Message-ID: 20230419174314.bekvesgoy36f5qn4@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-19 13:16:54 -0400, Robert Haas wrote:
> On Wed, Apr 19, 2023 at 12:54 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Interestingly, I haven't seen that as much in more recent benchmarks as it
> > used to. Partially I think because common s_b settings have gotten bigger, I
> > guess. But I wonder if we also accidentally improved something else, e.g. by
> > pin/unpin-ing the same buffer a bit less frequently.
>
> I think the problem with the algorithm is pretty fundamental. The rate
> of usage count increase is tied to how often we access buffers, and
> the rate of usage count decrease is tied to buffer eviction. But a
> given workload can have no eviction at all (in which case the usage
> counts must converge to 5) or it can evict on every buffer access (in
> which case the usage counts must mostly converget to 0, because we'll
> be decreasing usage counts at least once per buffer and generally
> more).

I don't think the "evict on every buffer access" works quite that way - unless
you have a completely even access pattern, buffer access frequency will
increase usage count much more frequently on some buffers than others. And if
you have a completely even access pattern, it's hard to come up with a good
cache replacement algorithm...

> ISTM that the only way that you can end up with a good mix of
> usage counts is if you have a workload that falls into some kind of a
> sweet spot where the rate of usage count bumps and the rate of usage
> count de-bumps are close enough together that things don't skew all
> the way to one end or the other. Bigger s_b might make that more
> likely to happen in practice, but it seems like bad algorithm design
> on a theoretical level. We should be comparing the frequency of access
> of buffers to the frequency of access of other buffers, not to the
> frequency of buffer eviction. Or to put the same thing another way,
> the average value of the usage count shouldn't suddenly change from 5
> to 1 when the server evicts 1 buffer.

I agree that there are fundamental issues with the algorithm. But practically
I think the effect of the over-saturation of s_b isn't as severe as one might
think:

If your miss rate is very low, the occasional bad victim buffer selection
won't matter that much. If the miss rate is a bit higher, the likelihood of
the usagecount being increased again after being decreased is higher if a
buffer is accessed frequently.

This is also why I think that larger s_b makes the issues less likely - with
larger s_b, it is more likely that frequently accessed buffers are accessed
again after the first of the 5 clock sweeps necessary to reduce the usage
count. Clearly, with a small-ish s_b and a high replacement rate, that's not
going to happen for sufficiently many buffers. But once you have a few GB of
s_b, multiple complete sweeps take a while.

Most, if not all, buffer replacement algorithms I have seen, don't deal well
with "small SB with a huge replacement rate". Most of the fancier algorithms
track recency information for buffers that have recently been evicted - but
you obviously can't track that to an unlimited degree, IIRC most papers
propose that the shadow map to be roughly equal to the buffer pool size.

You IMO pretty much need a policy decision on a higher level to improve upon
that (e.g. by just deciding that some buffers are sticky, perhaps because they
were used first) - but it doesn't matter much, because the miss rate is high
enough that the total amount of reads is barely affected by the buffer
replacement decisions.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-04-19 18:38:48 Re: Should we put command options in alphabetical order in the doc?
Previous Message Peter Geoghegan 2023-04-19 17:43:04 Re: Enhanced rmgr desc routines test !has_image, not has_data