Quick Links

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From:	Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Lukas Fittl <lukas(at)fittl(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date:	2022-10-17 05:28:34
Message-ID:	CAOtHd0Aj-F1ogXiEWE4wV5U8A8a-mxS=hYwx9B3fsg57hG2zWg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Oct 13, 2022 at 10:29 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> I think that it makes sense to count both the initial buffers added to
> the ring and subsequent shared buffers added to the ring (either when
> the current strategy buffer is pinned or in use or when a bulkread
> rejects dirty strategy buffers in favor of new shared buffers) as
> strategy clocksweeps because of how the statistic would be used.
>
> Clocksweeps give you an idea of how much of your working set is cached
> (setting aside initially reading data into shared buffers when you are
> warming up the db). You may use clocksweeps to determine if you need to
> make shared buffers larger.
>
> Distinguishing strategy buffer clocksweeps from shared buffer
> clocksweeps allows us to avoid enlarging shared buffers if most of the
> clocksweeps are to bring in blocks for the strategy operation.
>
> However, I could see an argument that discounting strategy clocksweeps
> done because the current strategy buffer is pinned makes the number of
> shared buffer clocksweeps artificially low since those other queries
> using the buffer would have suffered a cache miss were it not for the
> strategy. And, in this case, you would take strategy clocksweeps
> together with shared clocksweeps to make your decision. And if we
> include buffers initially added to the strategy ring in the strategy
> clocksweep statistic, this number may be off because those blocks may
> not be needed in the main shared working set. But you won't know that
> until you try to reuse the buffer and it is pinned. So, I think we don't
> have a better option than counting initial buffers added to the ring as
> strategy clocksweeps (as opposed to as reuses).
>
> So, in answer to your question, no, I cannot think of a scenario like
> that.

That analysis makes sense to me; thanks.

> It also made me remember that I am incorrectly counting rejected buffers
> as reused. I'm not sure if it is a good idea to subtract from reuses
> when a buffer is rejected. Waiting until after it is rejected to count
> the reuse will take some other code changes. Perhaps we could also count
> rejections in the stats?

I'm not sure what makes sense here.

> > Not critical, but is there a list of backend types we could
> > cross-reference elsewhere in the docs?
>
> The most I could find was this longer explanation (with exhaustive list
> of types) in pg_stat_activity docs [1]. I could duplicate what it says
> or I could link to the view and say "see pg_stat_activity" for a
> description of backend_type" or something like that (to keep them from
> getting out of sync as new backend_types are added. I suppose I could
> also add docs on backend_types, but I'm not sure where something like
> that would go.

I think linking pg_stat_activity is reasonable for now. A separate
section for this might be nice at some point, but that seems out of
scope.

> > From the io_context column description:
> >
> > + The autovacuum daemon, explicit <command>VACUUM</command>,
> > explicit
> > + <command>ANALYZE</command>, many bulk reads, and many bulk
> > writes use a
> > + fixed amount of memory, acquiring the equivalent number of
> > shared
> > + buffers and reusing them circularly to avoid occupying an
> > undue portion
> > + of the main shared buffer pool.
> > + </para></entry>
> >
> > I don't understand how this is relevant to the io_context column.
> > Could you expand on that, or am I just missing something obvious?
> >
>
> I'm trying to explain why those other IO Contexts exist (bulkread,
> bulkwrite, vacuum) and why they are separate from shared buffers.
> Should I cut it altogether or preface it with something like: these are
> counted separate from shared buffers because...?

Oh I see. That makes sense; it just wasn't obvious to me this was
talking about the last three values of io_context. I think a brief
preface like that would be helpful (maybe explicitly with "these last
three values", and I think "counted separately").

> > + <row>
> > + <entry role="catalog_table_entry"><para
> > role="column_definition">
> > + <structfield>extended</structfield> <type>bigint</type>
> > + </para>
> > + <para>
> > + Extends of relations done by this
> > <varname>backend_type</varname> in
> > + order to write data in this <varname>io_context</varname>.
> > + </para></entry>
> > + </row>
> >
> > I understand what this is, but not why this is something I might want
> > to know about.
>
> Unlike writes, backends largely have to do their own extends, so
> separating this from writes lets us determine whether or not we need to
> change checkpointer/bgwriter to be more aggressive using the writes
> without the distraction of the extends. Should I mention this in the
> docs? The other stats views don't seems to editorialize at all, and I
> wasn't sure if this was an objective enough point to include in docs.

Thanks for the clarification. Just to make sure I understand, you mean
that if I see a high extended count, that may be interesting in terms
of write activity, but I can't fix that by tuning--it's just the
nature of my workload?

I think you're right that this is not objective enough. It's
unfortunate that there's not a good place in the docs for info like
that, since stats like this are hard to interpret without that
context, but I admit that it's not really this patch's job to solve
that larger issue.

> > That seems broadly reasonable, but pg_settings also has a 'unit'
> > field, and in that view, unit is '8kB' on my system--i.e., it
> > (presumably) reflects the block size. Is that something we should try
> > to be consistent with (not sure if that's a good idea, but thought it
> > was worth asking)?
> >
>
> I think this idea is a good option. I am wondering if it would be clear
> when mixed with non-block-oriented IO. Block-oriented IO would say 8kB
> (or whatever the build-time value of a block was) and non-block-oriented
> IO would say B or kB. The math would work out.

Right, yeah. Although maybe that's a little confusing? When you
originally added "unit", you had said:

>The most correct thing to do to accommodate block-oriented and
>non-block-oriented IO would be to specify all the values in bytes.
>However, I would like this view to be usable visually (as opposed to
>just in scripts and by tools). The only current value of unit is
>"block_size" which could potentially be combined with the value of the
>GUC to get bytes.

Is this still usable visually if you have to compare values across
units? I don't really have any great ideas here (and maybe this is
still the best option), just pointing it out.

> Looking at pg_settings now though, I am confused about
> how the units for wal_buffers is 8kB but then the value of wal_buffers
> when I show it in psql is "16MB"...

You mean the difference between

maciek=# select setting, unit from pg_settings where name = 'wal_buffers';
setting | unit
---------+------
512 | 8kB
(1 row)

and

maciek=# show wal_buffers;
wal_buffers
-------------
4MB
(1 row)

Poking around, I think it looks like that's due to
convert_int_from_base_unit (indirectly called from SHOW /
current_setting):

/*
* Convert an integer value in some base unit to a human-friendly
unit.
*
* The output unit is chosen so that it's the greatest unit that can
represent
* the value without loss. For example, if the base unit is
GUC_UNIT_KB, 1024
* is converted to 1 MB, but 1025 is represented as 1025 kB.
*/

> Though the units for the pg_stat_io view for block-oriented IO would be
> the build-time values for block size, so it wouldn't line up exactly
> with pg_settings.

I don't follow--what would be the discrepancy?

In response to

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) at 2022-10-13 17:29:32 from Melanie Plageman

Responses

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) at 2022-10-19 19:26:51 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro Horiguchi	2022-10-17 05:30:52	Re: fix archive module shutdown callback
Previous Message	Michael Paquier	2022-10-17 04:51:52	Re: fix archive module shutdown callback