From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date: | 2021-12-09 19:17:52 |
Message-ID: | 20211209191752.dupgg4omrc7gck2l@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote:
> From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 11 Oct 2021 16:15:06 -0400
> Subject: [PATCH v17 1/7] Read-only atomic backend write function
>
> For counters in shared memory which can be read by any backend but only
> written to by one backend, an atomic is still needed to protect against
> torn values, however, pg_atomic_fetch_add_u64() is overkill for
> incrementing the counter. pg_atomic_inc_counter() is a helper function
> which can be used to increment these values safely but without
> unnecessary overhead.
>
> Author: Thomas Munro
> ---
> src/include/port/atomics.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> index 856338f161..39ffff24dd 100644
> --- a/src/include/port/atomics.h
> +++ b/src/include/port/atomics.h
> @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
> return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
> }
>
> +/*
> + * On modern systems this is really just *counter++. On some older systems
> + * there might be more to it, due to inability to read and write 64 bit values
> + * atomically.
> + */
> +static inline void
> +pg_atomic_inc_counter(pg_atomic_uint64 *counter)
> +{
> + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
> +}
I wonder if it's worth putting something in the name indicating that this is
not actual atomic RMW operation. Perhaps adding _unlocked?
> From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 10:32:56 -0500
> Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus
>
> Add an array of counters in PgBackendStatus which count the buffers
> allocated, extended, fsynced, and written by a given backend. Each "IO
> Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct,
> local, shared, or strategy). "local" and "shared" IO Path counters count
> operations on local and shared buffers. The "strategy" IO Path counts
> buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy.
> The "direct" IO Path counts blocks of IO which are read, written, or
> fsync'd using smgrwrite/extend/immedsync directly (as opposed to through
> [Local]BufferAlloc()).
>
> With this commit, all backends increment a counter in their
> PgBackendStatus when performing an IO operation. This is in preparation
> for future commits which will persist these stats upon backend exit and
> use the counters to provide observability of database IO operations.
>
> Note that this commit does not add code to increment the "direct" path.
> A separate proposed patch [1] which would add wrappers for smgrwrite(),
> smgrextend(), and smgrimmedsync() would provide a good location to call
> pgstat_inc_ioop() for unbuffered IO and avoid regressions for future
> users of these functions.
>
> [1] https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com
On longer thread it's nice for committers to already have Reviewed-By: in the
commit message.
> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> index 7229598822..413cc605f8 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -399,6 +399,15 @@ pgstat_bestart(void)
> lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
> lbeentry.st_progress_command_target = InvalidOid;
> lbeentry.st_query_id = UINT64CONST(0);
> + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> + {
> + IOOps *io_ops = &lbeentry.io_path_stats[io_path];
> +
> + pg_atomic_init_u64(&io_ops->allocs, 0);
> + pg_atomic_init_u64(&io_ops->extends, 0);
> + pg_atomic_init_u64(&io_ops->fsyncs, 0);
> + pg_atomic_init_u64(&io_ops->writes, 0);
> + }
>
> /*
> * we don't zero st_progress_param here to save cycles; nobody should
nit: I think we nearly always have a blank line before loops
> diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> index 646126edee..93f1b4bcfc 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
> }
>
> + pgstat_beinit();
> /*
> * Initialize local process's access to XLOG.
> */
nit: same with multi-line comments.
> @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> */
> CreateAuxProcessResourceOwner();
>
> + pgstat_bestart();
> StartupXLOG();
> /* Release (and warn about) any buffer pins leaked in StartupXLOG */
> ReleaseAuxProcessResources(true);
> @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
> EnablePortalManager();
>
> /* Initialize status reporting */
> - pgstat_beinit();
I'd like to see changes like moving this kind of thing around broken around
and committed separately. It's much easier to pinpoint breakage if the CF
breaks after moving just pgstat_beinit() around, rather than when committing
this considerably larger patch. And reordering subsystem initialization has
the habit of causing problems...
> +/* ----------
> + * IO Stats reporting utility types
> + * ----------
> + */
> +
> +typedef enum IOOp
> +{
> + IOOP_ALLOC,
> + IOOP_EXTEND,
> + IOOP_FSYNC,
> + IOOP_WRITE,
> +} IOOp;
> [...]
> +/*
> + * Structure for counting all types of IOOps for a live backend.
> + */
> +typedef struct IOOps
> +{
> + pg_atomic_uint64 allocs;
> + pg_atomic_uint64 extends;
> + pg_atomic_uint64 fsyncs;
> + pg_atomic_uint64 writes;
> +} IOOps;
To me IOop and IOOps sound to much alike - even though they're really kind of
separate things. s/IOOps/IOOpCounters/ maybe?
> @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> {
> Assert(!pgstat_is_shutdown);
>
> + /*
> + * Only need to send stats on IO Ops for IO Paths when a process exits.
> + * Users requiring IO Ops for both live and exited backends can read from
> + * live backends' PgBackendStatus and sum this with totals from exited
> + * backends persisted by the stats collector.
> + */
> + pgstat_send_buffers();
Perhaps something like this comment belongs somewhere at the top of the file,
or in the header, or ...? It's a fairly central design piece, and it's not
obvious one would need to look in the shutdown hook for it?
> +/*
> + * Before exiting, a backend sends its IO op statistics to the collector so
> + * that they may be persisted.
> + */
> +void
> +pgstat_send_buffers(void)
> +{
> + PgStat_MsgIOPathOps msg;
> +
> + PgBackendStatus *beentry = MyBEEntry;
> +
> + /*
> + * Though some backends with type B_INVALID (such as the single-user mode
> + * process) do initialize and increment IO operations stats, there is no
> + * spot in the array of IO operations for backends of type B_INVALID. As
> + * such, do not send these to the stats collector.
> + */
> + if (!beentry || beentry->st_backendType == B_INVALID)
> + return;
Why does single user mode use B_INVALID? That doesn't seem quite right.
> + memset(&msg, 0, sizeof(msg));
> + msg.backend_type = beentry->st_backendType;
> +
> + pgstat_sum_io_path_ops(msg.iop.io_path_ops,
> + (IOOps *) &beentry->io_path_stats);
> +
> + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
> + pgstat_send(&msg, sizeof(msg));
> +}
It seems worth having a path skipping sending the message if there was no IO?
> +/*
> + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
> + * local) to those in the equivalent stats structure for exited backends. Note
> + * that this adds and doesn't set, so the destination stats structure should be
> + * zeroed out by the caller initially. This would commonly be used to transfer
> + * all IO Op stats for all IO Paths for a particular backend type to the
> + * pgstats structure.
> + */
> +void
> +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
> +{
> + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> + {
Sacriligeous, but I find io_path a harder to understand variable name for the
counter than i (or io_path_off or ...) ;)
> +static void
> +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> +{
> + PgStatIOOps *src_io_path_ops;
> + PgStatIOOps *dest_io_path_ops;
> +
> + /*
> + * Subtract 1 from message's BackendType to get a valid index into the
> + * array of IO Ops which does not include an entry for B_INVALID
> + * BackendType.
> + */
> + Assert(msg->backend_type > B_INVALID);
Probably worth also asserting the upper boundary?
> From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 11:39:48 -0500
> Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters
>
> Backends count IO operations for various IO paths in their PgBackendStatus.
> Upon exit, they send these counts to the stats collector. Prior to this commit,
> these IO Ops stats would have been reset when the target was "bgwriter".
>
> With this commit, target "bgwriter" no longer will cause the IO operations
> stats to be reset, and the IO operations stats can be reset with new target,
> "buffers".
> ---
> doc/src/sgml/monitoring.sgml | 2 +-
> src/backend/postmaster/pgstat.c | 83 +++++++++++++++++++--
> src/backend/utils/activity/backend_status.c | 29 +++++++
> src/include/pgstat.h | 8 +-
> src/include/utils/backend_status.h | 2 +
> 5 files changed, 117 insertions(+), 7 deletions(-)
>
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 62f2a3332b..bda3eef309 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
> <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> </para>
> <para>
> - Time at which these statistics were last reset
> + Time at which these statistics were last reset.
> </para></entry>
> </row>
> </tbody>
Hm?
Shouldn't this new reset target be documented?
> +/*
> + * Helper function to collect and send live backends' current IO operations
> + * stats counters when a stats reset is initiated so that they may be deducted
> + * from future totals.
> + */
> +static void
> +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
> +{
> + PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> +
> + memset(ops, 0, sizeof(ops));
> + pgstat_report_live_backend_io_path_ops(ops);
> +
> + /*
> + * Iterate through the array of IO Ops for all IO Paths for each
> + * BackendType. Because the array does not include a spot for BackendType
> + * B_INVALID, add 1 to the index when setting backend_type so that there is
> + * no confusion as to the BackendType with which this reset message
> + * corresponds.
> + */
> + for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; backend_type_idx++)
> + {
> + msg->m_backend_resets.backend_type = backend_type_idx + 1;
> + memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx],
> + sizeof(msg->m_backend_resets.iop));
> + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
> + }
> +}
Probably worth explaining why multiple messages are sent?
> @@ -5583,10 +5621,45 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
> {
> if (msg->m_resettarget == RESET_BGWRITER)
> {
> - /* Reset the global, bgwriter and checkpointer statistics for the cluster. */
> - memset(&globalStats, 0, sizeof(globalStats));
> + /*
> + * Reset the global bgwriter and checkpointer statistics for the
> + * cluster.
> + */
> + memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer));
> + memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
> globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp();
> }
Oh, is this a live bug?
> + /*
> + * Subtract 1 from the BackendType to arrive at a valid index in the
> + * array, as it does not contain a spot for B_INVALID BackendType.
> + */
Instead of repeating a comment about +- 1 in a bunch of places, would it look
better to have two helper inline functions for this purpose?
> +/*
> +* When adding a new column to the pg_stat_buffers view, add a new enum
> +* value here above COLUMN_LENGTH.
> +*/
> +enum
> +{
> + COLUMN_BACKEND_TYPE,
> + COLUMN_IO_PATH,
> + COLUMN_ALLOCS,
> + COLUMN_EXTENDS,
> + COLUMN_FSYNCS,
> + COLUMN_WRITES,
> + COLUMN_RESET_TIME,
> + COLUMN_LENGTH,
> +};
COLUMN_LENGTH seems like a fairly generic name...
> From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 12:20:10 -0500
> Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
>
> Remove stats from pg_stat_bgwriter which are now more clearly expressed
> in pg_stat_buffers.
>
> TODO:
> - make pg_stat_checkpointer view and move relevant stats into it
> - add additional stats to pg_stat_bgwriter
When do you think it makes sense to tackle these wrt committing some of the
patches?
> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 6926fc5742..67447f997a 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -2164,7 +2164,6 @@ BufferSync(int flags)
> if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
> {
> TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> - PendingCheckpointerStats.m_buf_written_checkpoints++;
> num_written++;
> }
> }
> @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context)
> */
> strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
>
> - /* Report buffer alloc counts to pgstat */
> - PendingBgWriterStats.m_buf_alloc += recent_alloc;
> -
> /*
> * If we're not running the LRU scan, just stop after doing the stats
> * stuff. We mark the saved state invalid so that we can recover sanely
> @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context)
> reusable_buffers++;
> }
>
> - PendingBgWriterStats.m_buf_written_clean += num_written;
> -
Isn't num_written unused now, unless tracepoints are enabled? I'd expect some
compilers to warn... Perhaps we should just remove information from the
tracepoint?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2021-12-09 19:28:18 | do only critical work during single-user vacuum? |
Previous Message | Andres Freund | 2021-12-09 18:41:50 | Re: port conflicts when running tests concurrently on windows. |