From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date: | 2022-07-12 17:01:31 |
Message-ID: | 20220712170131.fql5addovb2oiw7w@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-07-12 12:19:06 -0400, Melanie Plageman wrote:
> > > I also realized that I am not differentiating between IOPATH_SHARED and
> > > IOPATH_STRATEGY for IOOP_FSYNC. But, given that we don't know what type
> > > of buffer we are fsync'ing by the time we call register_dirty_segment(),
> > > I'm not sure how we would fix this.
> >
> > I think there scarcely happens flush for strategy-loaded buffers. If
> > that is sensible, IOOP_FSYNC would not make much sense for
> > IOPATH_STRATEGY.
> >
>
> Why would it be less likely for a backend to do its own fsync when
> flushing a dirty strategy buffer than a regular dirty shared buffer?
We really just don't expect a backend to do many segment fsyncs at
all. Otherwise there's something wrong with the forwarding mechanism.
It'd be different if we tracked WAL fsyncs more granularly - which would be
quite interesting - but that's something for another day^Wpatch.
> > > > Wonder if it's worth making the lock specific to the backend type?
> > > >
> > >
> > > I've added another Lock into PgStat_IOPathOps so that each BackendType
> > > can be locked separately. But, I've also kept the lock in
> > > PgStatShared_BackendIOPathOps so that reset_all and snapshot could be
> > > done easily.
> >
> > Looks fine about the lock separation.
> >
>
> Actually, I think it is not safe to use both of these locks. So for
> picking one method, it is probably better to go with the locks in
> PgStat_IOPathOps, it will be more efficient for flush (and not for
> fetching and resetting), so that is probably the way to go, right?
I think it's good to just use one kind of lock, and efficiency of snapshotting
/ resetting is nearly irrelevant. But I don't see why it's not safe to use
both kinds of locks?
> > Looks fine, but I think pgstat_flush_io_ops() need more comments like
> > other pgstat_flush_* functions.
> >
> > + for (int i = 0; i < BACKEND_NUM_TYPES; i++)
> > + stats_shmem->stats[i].stat_reset_timestamp = ts;
> >
> > I'm not sure we need a separate reset timestamp for each backend type
> > but SLRU counter does the same thing..
> >
>
> Yes, I think for SLRU stats it is because you can reset individual SLRU
> stats. Also there is no wrapper data structure to put it in. I could
> keep it in PgStatShared_BackendIOPathOps since you have to reset all IO
> operation stats at once, but I am thinking of getting rid of
> PgStatShared_BackendIOPathOps since it is not needed if I only keep the
> locks in PgStat_IOPathOps and make the global shared value an array of
> PgStat_IOPathOps.
I'm strongly against introducing super granular reset timestamps. I think that
was a mistake for SLRU stats, but we can't fix that as easily.
> Currently, strategy allocs count only reuses of a strategy buffer (not
> initial shared buffers which are added to the ring).
> strategy writes count only the writing out of dirty buffers which are
> already in the ring and are being reused.
That seems right to me.
> Alternatively, we could also count as strategy allocs all those buffers
> which are added to the ring and count as strategy writes all those
> shared buffers which are dirty when initially added to the ring.
I don't think that'd provide valuable information. The whole reason that
strategy writes are interesting is that they can lead to writing out data a
lot sooner than they would be written out without a strategy being used.
> Subject: [PATCH v24 2/3] Track IO operation statistics
>
> Introduce "IOOp", an IO operation done by a backend, and "IOPath", the
> location or type of IO done by a backend. For example, the checkpointer
> may write a shared buffer out. This would be counted as an IOOp write on
> an IOPath IOPATH_SHARED by BackendType "checkpointer".
I'm still not 100% happy with IOPath - seems a bit too easy to confuse with
the file path. What about 'origin'?
> Each IOOp (alloc, fsync, extend, write) is counted per IOPath
> (direct, local, shared, or strategy) through a call to
> pgstat_count_io_op().
It seems we should track reads too - it's quite interesting to know whether
reads happened because of a strategy, for example. You do reference reads in a
later part of the commit message even :)
> The primary concern of these statistics is IO operations on data blocks
> during the course of normal database operations. IO done by, for
> example, the archiver or syslogger is not counted in these statistics.
We could extend this at a later stage, if we really want to. But I'm not sure
it's interesting or fully possible. E.g. the archiver's write are largely not
done by the archiver itself, but by a command (or module these days) it shells
out to.
> Note that this commit does not add code to increment IOPATH_DIRECT. A
> future patch adding wrappers for smgrwrite(), smgrextend(), and
> smgrimmedsync() would provide a good location to call
> pgstat_count_io_op() for unbuffered IO and avoid regressions for future
> users of these functions.
Hm. Perhaps we should defer introducing IOPATH_DIRECT for now then?
> Stats on IOOps for all IOPaths for a backend are initially accumulated
> locally.
>
> Later they are flushed to shared memory and accumulated with those from
> all other backends, exited and live.
Perhaps mention here that this later could be extended to make per-connection
stats visible?
> Some BackendTypes will not execute pgstat_report_stat() and thus must
> explicitly call pgstat_flush_io_ops() in order to flush their backend
> local IO operation statistics to shared memory.
Maybe add "flush ... during ongoing operation" or such? Because they'd all
flush at commit, IIRC.
> diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
> index 088556ab54..963b05321e 100644
> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -33,6 +33,7 @@
> #include "miscadmin.h"
> #include "nodes/makefuncs.h"
> #include "pg_getopt.h"
> +#include "pgstat.h"
> #include "storage/bufmgr.h"
> #include "storage/bufpage.h"
> #include "storage/condition_variable.h"
Hm?
> diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
> index e926f8c27c..beb46dcb55 100644
> --- a/src/backend/postmaster/walwriter.c
> +++ b/src/backend/postmaster/walwriter.c
> @@ -293,18 +293,7 @@ HandleWalWriterInterrupts(void)
> }
>
> if (ShutdownRequestPending)
> - {
> - /*
> - * Force reporting remaining WAL statistics at process exit.
> - *
> - * Since pgstat_report_wal is invoked with 'force' is false in main
> - * loop to avoid overloading the cumulative stats system, there may
> - * exist unreported stats counters for the WAL writer.
> - */
> - pgstat_report_wal(true);
> -
> proc_exit(0);
> - }
>
> /* Perform logging of memory contexts of this process */
> if (LogMemoryContextPending)
Let's do this in a separate commit and get it out of the way...
> @@ -682,16 +694,37 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
> * if this buffer should be written and re-used.
> */
> bool
> -StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
> +StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *write_from_ring)
> {
> - /* We only do this in bulkread mode */
> +
> + /*
> + * We only reject reusing and writing out the strategy buffer this in
> + * bulkread mode.
> + */
> if (strategy->btype != BAS_BULKREAD)
> + {
> + /*
> + * If the buffer was from the ring and we are not rejecting it, consider it
> + * a write of a strategy buffer.
> + */
> + if (strategy->current_was_in_ring)
> + *write_from_ring = true;
Hm. This is set even if the buffer wasn't dirty? I guess we don't expect
StrategyRejectBuffer() to be called for clean buffers...
> /*
> diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
> index d9275611f0..d3963f59d0 100644
> --- a/src/backend/utils/activity/pgstat_database.c
> +++ b/src/backend/utils/activity/pgstat_database.c
> @@ -47,7 +47,8 @@ pgstat_drop_database(Oid databaseid)
> }
>
> /*
> - * Called from autovacuum.c to report startup of an autovacuum process.
> + * Called from autovacuum.c to report startup of an autovacuum process and
> + * flush IO Operation statistics.
> * We are called before InitPostgres is done, so can't rely on MyDatabaseId;
> * the db OID must be passed in, instead.
> */
> @@ -72,6 +73,11 @@ pgstat_report_autovac(Oid dboid)
> dbentry->stats.last_autovac_time = GetCurrentTimestamp();
>
> pgstat_unlock_entry(entry_ref);
> +
> + /*
> + * Report IO Operation statistics
> + */
> + pgstat_flush_io_ops(false);
> }
Hm. I suspect this will always be zero - at this point we haven't connected to
a database, so there really can't have been much, if any, IO. I think I
suggested doing something here, but on a second look it really doesn't make
much sense.
Note that that's different from doing something in
pgstat_report_(vacuum|analyze) - clearly we've done something at that point.
> /*
> - * Report that the table was just vacuumed.
> + * Report that the table was just vacuumed and flush IO Operation statistics.
> */
> void
> pgstat_report_vacuum(Oid tableoid, bool shared,
> @@ -257,10 +257,15 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
> }
>
> pgstat_unlock_entry(entry_ref);
> +
> + /*
> + * Report IO Operations statistics
> + */
> + pgstat_flush_io_ops(false);
> }
>
> /*
> - * Report that the table was just analyzed.
> + * Report that the table was just analyzed and flush IO Operation statistics.
> *
> * Caller must provide new live- and dead-tuples estimates, as well as a
> * flag indicating whether to reset the changes_since_analyze counter.
> @@ -340,6 +345,11 @@ pgstat_report_analyze(Relation rel,
> }
>
> pgstat_unlock_entry(entry_ref);
> +
> + /*
> + * Report IO Operations statistics
> + */
> + pgstat_flush_io_ops(false);
> }
Think it'd be good to amend these comments to say that otherwise stats would
only get flushed after a multi-relatio autovacuum cycle is done / a
VACUUM/ANALYZE command processed all tables. Perhaps add the comment to one
of the two functions, and just reference it in the other place?
> --- a/src/include/utils/backend_status.h
> +++ b/src/include/utils/backend_status.h
> @@ -306,6 +306,40 @@ extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
> int buflen);
> extern uint64 pgstat_get_my_query_id(void);
>
> +/* Utility functions */
> +
> +/*
> + * When maintaining an array of information about all valid BackendTypes, in
> + * order to avoid wasting the 0th spot, use this helper to convert a valid
> + * BackendType to a valid location in the array (given that no spot is
> + * maintained for B_INVALID BackendType).
> + */
> +static inline int backend_type_get_idx(BackendType backend_type)
> +{
> + /*
> + * backend_type must be one of the valid backend types. If caller is
> + * maintaining backend information in an array that includes B_INVALID,
> + * this function is unnecessary.
> + */
> + Assert(backend_type > B_INVALID && backend_type <= BACKEND_NUM_TYPES);
> + return backend_type - 1;
> +}
In function definitions (vs declarations) we put the 'static inline int' in a
separate line from the rest of the function signature.
> +/*
> + * When using a value from an array of information about all valid
> + * BackendTypes, add 1 to the index before using it as a BackendType to adjust
> + * for not maintaining a spot for B_INVALID BackendType.
> + */
> +static inline BackendType idx_get_backend_type(int idx)
> +{
> + int backend_type = idx + 1;
> + /*
> + * If the array includes a spot for B_INVALID BackendType this function is
> + * not required.
The comments around this seem a bit over the top, but I also don't mind them
much.
> Add pg_stat_io, a system view which tracks the number of IOOp (allocs,
> writes, fsyncs, and extends) done through each IOPath (e.g. shared
> buffers, local buffers, unbuffered IO) by each type of backend.
Annoying question: pg_stat_io vs pg_statio? I'd not think of suggesting the
latter, except that we already have a bunch of views with that prefix.
> Some of these should always be zero. For example, checkpointer does not
> use a BufferAccessStrategy (currently), so the "strategy" IOPath for
> checkpointer will be 0 for all IOOps.
What do you think about returning NULL for the values that we except to never
be non-zero? Perhaps with an assert against non-zero values? Seems like it
might be helpful for understanding the view.
> +/*
> +* When adding a new column to the pg_stat_io view, add a new enum
> +* value here above IO_NUM_COLUMNS.
> +*/
> +enum
> +{
> + IO_COLUMN_BACKEND_TYPE,
> + IO_COLUMN_IO_PATH,
> + IO_COLUMN_ALLOCS,
> + IO_COLUMN_EXTENDS,
> + IO_COLUMN_FSYNCS,
> + IO_COLUMN_WRITES,
> + IO_COLUMN_RESET_TIME,
> + IO_NUM_COLUMNS,
> +};
We typedef pretty much every enum so the enum can be referenced without the
'enum' prefix. I'd do that here, even if we don't need it.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-07-12 17:09:44 | Re: making relfilenodes 56 bits |
Previous Message | Melanie Plageman | 2022-07-12 16:19:06 | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |