| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> | 
| Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Lukas Fittl <lukas(at)fittl(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> | 
| Subject: | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) | 
| Date: | 2021-12-09 19:17:52 | 
| Message-ID: | 20211209191752.dupgg4omrc7gck2l@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote:
> From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 11 Oct 2021 16:15:06 -0400
> Subject: [PATCH v17 1/7] Read-only atomic backend write function
> 
> For counters in shared memory which can be read by any backend but only
> written to by one backend, an atomic is still needed to protect against
> torn values, however, pg_atomic_fetch_add_u64() is overkill for
> incrementing the counter. pg_atomic_inc_counter() is a helper function
> which can be used to increment these values safely but without
> unnecessary overhead.
>
> Author: Thomas Munro
> ---
>  src/include/port/atomics.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> index 856338f161..39ffff24dd 100644
> --- a/src/include/port/atomics.h
> +++ b/src/include/port/atomics.h
> @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
>  	return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
>  }
>  
> +/*
> + * On modern systems this is really just *counter++.  On some older systems
> + * there might be more to it, due to inability to read and write 64 bit values
> + * atomically.
> + */
> +static inline void
> +pg_atomic_inc_counter(pg_atomic_uint64 *counter)
> +{
> +	pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
> +}
I wonder if it's worth putting something in the name indicating that this is
not actual atomic RMW operation. Perhaps adding _unlocked?
> From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 10:32:56 -0500
> Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus
> 
> Add an array of counters in PgBackendStatus which count the buffers
> allocated, extended, fsynced, and written by a given backend. Each "IO
> Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct,
> local, shared, or strategy). "local" and "shared" IO Path counters count
> operations on local and shared buffers. The "strategy" IO Path counts
> buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy.
> The "direct" IO Path counts blocks of IO which are read, written, or
> fsync'd using smgrwrite/extend/immedsync directly (as opposed to through
> [Local]BufferAlloc()).
> 
> With this commit, all backends increment a counter in their
> PgBackendStatus when performing an IO operation. This is in preparation
> for future commits which will persist these stats upon backend exit and
> use the counters to provide observability of database IO operations.
> 
> Note that this commit does not add code to increment the "direct" path.
> A separate proposed patch [1] which would add wrappers for smgrwrite(),
> smgrextend(), and smgrimmedsync() would provide a good location to call
> pgstat_inc_ioop() for unbuffered IO and avoid regressions for future
> users of these functions.
>
> [1] https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com
On longer thread it's nice for committers to already have Reviewed-By: in the
commit message.
> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> index 7229598822..413cc605f8 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -399,6 +399,15 @@ pgstat_bestart(void)
>  	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
>  	lbeentry.st_progress_command_target = InvalidOid;
>  	lbeentry.st_query_id = UINT64CONST(0);
> +	for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> +	{
> +		IOOps	   *io_ops = &lbeentry.io_path_stats[io_path];
> +
> +		pg_atomic_init_u64(&io_ops->allocs, 0);
> +		pg_atomic_init_u64(&io_ops->extends, 0);
> +		pg_atomic_init_u64(&io_ops->fsyncs, 0);
> +		pg_atomic_init_u64(&io_ops->writes, 0);
> +	}
>  
>  	/*
>  	 * we don't zero st_progress_param here to save cycles; nobody should
nit: I think we nearly always have a blank line before loops
> diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> index 646126edee..93f1b4bcfc 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
>  		RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
>  	}
>  
> +	pgstat_beinit();
>  	/*
>  	 * Initialize local process's access to XLOG.
>  	 */
nit: same with multi-line comments.
> @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
>  		 */
>  		CreateAuxProcessResourceOwner();
>  
> +		pgstat_bestart();
>  		StartupXLOG();
>  		/* Release (and warn about) any buffer pins leaked in StartupXLOG */
>  		ReleaseAuxProcessResources(true);
> @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
>  	EnablePortalManager();
>  
>  	/* Initialize status reporting */
> -	pgstat_beinit();
I'd like to see changes like moving this kind of thing around broken around
and committed separately. It's much easier to pinpoint breakage if the CF
breaks after moving just pgstat_beinit() around, rather than when committing
this considerably larger patch. And reordering subsystem initialization has
the habit of causing problems...
> +/* ----------
> + * IO Stats reporting utility types
> + * ----------
> + */
> +
> +typedef enum IOOp
> +{
> +	IOOP_ALLOC,
> +	IOOP_EXTEND,
> +	IOOP_FSYNC,
> +	IOOP_WRITE,
> +} IOOp;
> [...]
> +/*
> + * Structure for counting all types of IOOps for a live backend.
> + */
> +typedef struct IOOps
> +{
> +	pg_atomic_uint64 allocs;
> +	pg_atomic_uint64 extends;
> +	pg_atomic_uint64 fsyncs;
> +	pg_atomic_uint64 writes;
> +} IOOps;
To me IOop and IOOps sound to much alike - even though they're really kind of
separate things. s/IOOps/IOOpCounters/ maybe?
> @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg)
>  {
>  	Assert(!pgstat_is_shutdown);
>  
> +	/*
> +	 * Only need to send stats on IO Ops for IO Paths when a process exits.
> +	 * Users requiring IO Ops for both live and exited backends can read from
> +	 * live backends' PgBackendStatus and sum this with totals from exited
> +	 * backends persisted by the stats collector.
> +	 */
> +	pgstat_send_buffers();
Perhaps something like this comment belongs somewhere at the top of the file,
or in the header, or ...? It's a fairly central design piece, and it's not
obvious one would need to look in the shutdown hook for it?
> +/*
> + * Before exiting, a backend sends its IO op statistics to the collector so
> + * that they may be persisted.
> + */
> +void
> +pgstat_send_buffers(void)
> +{
> +	PgStat_MsgIOPathOps msg;
> +
> +	PgBackendStatus *beentry = MyBEEntry;
> +
> +	/*
> +	 * Though some backends with type B_INVALID (such as the single-user mode
> +	 * process) do initialize and increment IO operations stats, there is no
> +	 * spot in the array of IO operations for backends of type B_INVALID. As
> +	 * such, do not send these to the stats collector.
> +	 */
> +	if (!beentry || beentry->st_backendType == B_INVALID)
> +		return;
Why does single user mode use B_INVALID? That doesn't seem quite right.
> +	memset(&msg, 0, sizeof(msg));
> +	msg.backend_type = beentry->st_backendType;
> +
> +	pgstat_sum_io_path_ops(msg.iop.io_path_ops,
> +						   (IOOps *) &beentry->io_path_stats);
> +
> +	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
> +	pgstat_send(&msg, sizeof(msg));
> +}
It seems worth having a path skipping sending the message if there was no IO?
> +/*
> + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
> + * local) to those in the equivalent stats structure for exited backends. Note
> + * that this adds and doesn't set, so the destination stats structure should be
> + * zeroed out by the caller initially. This would commonly be used to transfer
> + * all IO Op stats for all IO Paths for a particular backend type to the
> + * pgstats structure.
> + */
> +void
> +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
> +{
> +	for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> +	{
Sacriligeous, but I find io_path a harder to understand variable name for the
counter than i (or io_path_off or ...) ;)
> +static void
> +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> +{
> +	PgStatIOOps *src_io_path_ops;
> +	PgStatIOOps *dest_io_path_ops;
> +
> +	/*
> +	 * Subtract 1 from message's BackendType to get a valid index into the
> +	 * array of IO Ops which does not include an entry for B_INVALID
> +	 * BackendType.
> +	 */
> +	Assert(msg->backend_type > B_INVALID);
Probably worth also asserting the upper boundary?
> From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 11:39:48 -0500
> Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters
> 
> Backends count IO operations for various IO paths in their PgBackendStatus.
> Upon exit, they send these counts to the stats collector. Prior to this commit,
> these IO Ops stats would have been reset when the target was "bgwriter".
> 
> With this commit, target "bgwriter" no longer will cause the IO operations
> stats to be reset, and the IO operations stats can be reset with new target,
> "buffers".
> ---
>  doc/src/sgml/monitoring.sgml                |  2 +-
>  src/backend/postmaster/pgstat.c             | 83 +++++++++++++++++++--
>  src/backend/utils/activity/backend_status.c | 29 +++++++
>  src/include/pgstat.h                        |  8 +-
>  src/include/utils/backend_status.h          |  2 +
>  5 files changed, 117 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 62f2a3332b..bda3eef309 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
>         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
>        </para>
>        <para>
> -       Time at which these statistics were last reset
> +       Time at which these statistics were last reset.
>        </para></entry>
>       </row>
>      </tbody>
Hm?
Shouldn't this new reset target be documented?
> +/*
> + * Helper function to collect and send live backends' current IO operations
> + * stats counters when a stats reset is initiated so that they may be deducted
> + * from future totals.
> + */
> +static void
> +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
> +{
> +	PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> +
> +	memset(ops, 0, sizeof(ops));
> +	pgstat_report_live_backend_io_path_ops(ops);
> +
> +	/*
> +	 * Iterate through the array of IO Ops for all IO Paths for each
> +	 * BackendType. Because the array does not include a spot for BackendType
> +	 * B_INVALID, add 1 to the index when setting backend_type so that there is
> +	 * no confusion as to the BackendType with which this reset message
> +	 * corresponds.
> +	 */
> +	for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; backend_type_idx++)
> +	{
> +		msg->m_backend_resets.backend_type = backend_type_idx + 1;
> +		memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx],
> +				sizeof(msg->m_backend_resets.iop));
> +		pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
> +	}
> +}
Probably worth explaining why multiple messages are sent?
> @@ -5583,10 +5621,45 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
>  {
>  	if (msg->m_resettarget == RESET_BGWRITER)
>  	{
> -		/* Reset the global, bgwriter and checkpointer statistics for the cluster. */
> -		memset(&globalStats, 0, sizeof(globalStats));
> +		/*
> +		 * Reset the global bgwriter and checkpointer statistics for the
> +		 * cluster.
> +		 */
> +		memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer));
> +		memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
>  		globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp();
>  	}
Oh, is this a live bug?
> +		/*
> +		 * Subtract 1 from the BackendType to arrive at a valid index in the
> +		 * array, as it does not contain a spot for B_INVALID BackendType.
> +		 */
Instead of repeating a comment about +- 1 in a bunch of places, would it look
better to have two helper inline functions for this purpose?
> +/*
> +* When adding a new column to the pg_stat_buffers view, add a new enum
> +* value here above COLUMN_LENGTH.
> +*/
> +enum
> +{
> +	COLUMN_BACKEND_TYPE,
> +	COLUMN_IO_PATH,
> +	COLUMN_ALLOCS,
> +	COLUMN_EXTENDS,
> +	COLUMN_FSYNCS,
> +	COLUMN_WRITES,
> +	COLUMN_RESET_TIME,
> +	COLUMN_LENGTH,
> +};
COLUMN_LENGTH seems like a fairly generic name...
> From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Wed, 24 Nov 2021 12:20:10 -0500
> Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
> 
> Remove stats from pg_stat_bgwriter which are now more clearly expressed
> in pg_stat_buffers.
> 
> TODO:
> - make pg_stat_checkpointer view and move relevant stats into it
> - add additional stats to pg_stat_bgwriter
When do you think it makes sense to tackle these wrt committing some of the
patches?
> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 6926fc5742..67447f997a 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -2164,7 +2164,6 @@ BufferSync(int flags)
>  			if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
>  			{
>  				TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> -				PendingCheckpointerStats.m_buf_written_checkpoints++;
>  				num_written++;
>  			}
>  		}
> @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context)
>  	 */
>  	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
>  
> -	/* Report buffer alloc counts to pgstat */
> -	PendingBgWriterStats.m_buf_alloc += recent_alloc;
> -
>  	/*
>  	 * If we're not running the LRU scan, just stop after doing the stats
>  	 * stuff.  We mark the saved state invalid so that we can recover sanely
> @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context)
>  			reusable_buffers++;
>  	}
>  
> -	PendingBgWriterStats.m_buf_written_clean += num_written;
> -
Isn't num_written unused now, unless tracepoints are enabled? I'd expect some
compilers to warn... Perhaps we should just remove information from the
tracepoint?
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John Naylor | 2021-12-09 19:28:18 | do only critical work during single-user vacuum? | 
| Previous Message | Andres Freund | 2021-12-09 18:41:50 | Re: port conflicts when running tests concurrently on windows. |