Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
Date: 2025-02-17 16:59:17
Message-ID: bbxepmv3w2iapbxehkypzsut545valywzf62qlizsrzkbrdnfy@dzw5ju3itwgs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote:
> 1- It is time consuming to evict all shared buffers one by one using
> the pg_buffercache_evict() function.

This is really useful for benchmarking. When testing IO heavy workloads it
turns out to be a lot lower noise to measure with shared buffers already
"touched" before taking actual measurements. The first time a buffer is used
the kernel needs to actually initialize the memory for the buffer, which can
take a substantial amount of time. Which obviously can hide many performance
differences. And it adds a lot of noise.

> 2- It would be good to have a function to mark buffers as dirty to
> test different scenarios.

This is important to be able to measure checkpointer performance. Otherwise
one has to load data from scratch. That's bad for three reasons:

1) After re-loading the data from scratch the data can be laid-out differently
on-disk, which can have substantial performance impacts. By re-dirtying
data one instead can measure the performance effects of a change with the
same on-disk layaout.
2) It takes a lot of time to reload data from scratch.
3) The first write to data has substantially different performance
characteristics than later writes, but often isn't the most relevant factor
for real-world performance test.

> So, this patchset extends pg_buffercache with 3 functions:
>
> 1- pg_buffercache_evict_all(): This is very similar to the already
> existing pg_buffercache_evict() function. The difference is
> pg_buffercache_evict_all() does not take an argument. Instead it just
> loops over the shared buffers and tries to evict all of them. It
> returns the number of buffers evicted and flushed.

I do think that'd be rather useful. Perhaps it's also worth having a version
that just evicts a specific relation?

> 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an
> argument and tries to mark this buffer as dirty. Returns true on
> success. This returns false if the buffer is already dirty. Do you
> think this makes sense or do you prefer it to return true if the
> buffer is already dirty?

I don't really have feelings about that ;)

> From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> Date: Fri, 20 Dec 2024 14:06:47 +0300
> Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
>
> This new function provides a mechanism to evict all shared buffers at
> once. It is designed to address the inefficiency of repeatedly calling
> pg_buffercache_evict() for each individual buffer, which can be
> time-consuming when dealing with large shared buffer pools
> (e.g., ~790ms vs. ~270ms for 16GB of shared buffers).

> */
> bool
> -EvictUnpinnedBuffer(Buffer buf)
> +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
> {
> BufferDesc *desc;
> uint32 buf_state;
> bool result;
>
> + *flushed = false;
> +
> /* Make sure we can pin the buffer. */
> ResourceOwnerEnlarge(CurrentResourceOwner);
> ReservePrivateRefCountEntry();
> @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf)
> LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_SHARED);
> FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
> LWLockRelease(BufferDescriptorGetContentLock(desc));
> + *flushed = true;
> }
>
> /* This will return false if it becomes dirty or someone else pins
> it. */

I don't think *flushed is necessarily accurate this way, as it might detect
that it doesn't need to flush the data (via StartBufferIO() returning false).

> + */
> +Datum
> +pg_buffercache_evict_all(PG_FUNCTION_ARGS)
> +{
> + Datum result;
> + TupleDesc tupledesc;
> + HeapTuple tuple;
> + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM];
> + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0};
> +
> + int32 buffers_evicted = 0;
> + int32 buffers_flushed = 0;
> + bool flushed;
> +
> + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE)
> + elog(ERROR, "return type must be a row type");
> +
> + if (!superuser())
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be superuser to use pg_buffercache_evict_all function")));
> +
> + for (int buf = 1; buf < NBuffers; buf++)
> + {
> + if (EvictUnpinnedBuffer(buf, &flushed))
> + buffers_evicted++;
> + if (flushed)
> + buffers_flushed++;

I'd probably add a pre-check for the buffer not being in use. We don't need an
external function call, with an unconditional LockBufHdr() inside, for that
case.

> +/*
> + * MarkUnpinnedBufferDirty
> + *
> + * This function is intended for testing/development use only!
> + *
> + * To succeed, the buffer must not be pinned on entry, so if the caller had a
> + * particular block in mind, it might already have been replaced by some other
> + * block by the time this function runs. It's also unpinned on return, so the
> + * buffer might be occupied and flushed by the time control is returned. This
> + * inherent raciness without other interlocking makes the function unsuitable
> + * for non-testing usage.
> + *
> + * Returns true if the buffer was not dirty and it has now been marked as
> + * dirty. Returns false if it wasn't valid, if it couldn't be marked as dirty
> + * due to a pin, or if the buffer was already dirty.
> + */

Hm. One potentially problematic thing with this is that it can pin and dirty a
buffer even while its relation is locked exclusively. Which i think some code
doesn't expect. Not sure if there's a way around that :(

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2025-02-17 17:16:30 Re: pg_stat_statements and "IN" conditions
Previous Message Nathan Bossart 2025-02-17 16:48:49 Re: Introduce XID age and inactive timeout based replication slot invalidation