From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Danil Anisimow <anisimow(dot)d(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Comments on Custom RMGRs |
Date: | 2024-07-23 13:21:53 |
Message-ID: | 3f732849-0f60-4a64-be84-ee41e239d560@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 27/05/2024 21:20, Michael Paquier wrote:
> On Fri, May 17, 2024 at 04:25:15PM -0400, Robert Haas wrote:
>> On Fri, May 17, 2024 at 4:20 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>> Regarding this particular change: the checkpointing hook seems more
>>> like a table AM feature, so I agree with you that we should have a good
>>> idea how a real table AM might use this, rather than only
>>> pg_stat_statements.
>>
>> I would even be OK with a pg_stat_statements example that is fully
>> working and fully explained. I just don't want to have no example at
>> all. The original proposal has been changed twice because of
>> complaints that the hook wasn't quite useful enough, but I think that
>> only proves that v3 is closer to being useful than v1. If v1 is 40% of
>> the way to useful and v3 is 120% of the way to useful, wonderful! But
>> if v1 is 20% of the way to being useful and v3 is 60% of the way to
>> being useful, it's not time to commit anything yet. I don't know which
>> is the case, and I think if someone wants this to be committed, they
>> need to explain clearly why it's the first and not the second.
>
> Please note that I've been studying ways to have pg_stat_statements
> being plugged in directly with the shared pgstat APIs to get it backed
> by a dshash to give more flexibility and scaling, giving a way for
> extensions to register their own stats kind. In this case, the flush
> of the stats would be controlled with a callback in the stats
> registered by the extensions, conflicting with what's proposed here.
> pg_stat_statements is all about stats, at the end. I don't want this
> argument to act as a barrier if a checkpoint hook is an accepted
> consensus here, but a checkpoint hook used for this code path is not
> the most intuitive solution I can think of in the long-term.
On the topic of concrete uses for this API: We have a bunch of built-in
resource managers that could be refactored to use this API.
CheckPointGuts currently looks like this:
> /*
> * Flush all data in shared memory to disk, and fsync
> *
> * This is the common code shared between regular checkpoints and
> * recovery restartpoints.
> */
> static void
> CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
> {
> CheckPointRelationMap();
> CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
> CheckPointSnapBuild();
> CheckPointLogicalRewriteHeap();
> CheckPointReplicationOrigin();
>
> /* Write out all dirty data in SLRUs and the main buffer pool */
> TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
> CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
> CheckPointCLOG();
> CheckPointCommitTs();
> CheckPointSUBTRANS();
> CheckPointMultiXact();
> CheckPointPredicate();
>
> RmgrCheckpoint(flags, RMGR_CHECKPOINT_BEFORE_BUFFERS);
>
> CheckPointBuffers(flags);
>
> RmgrCheckpoint(flags, RMGR_CHECKPOINT_AFTER_BUFFERS);
>
> /* Perform all queued up fsyncs */
> TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
> CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
> ProcessSyncRequests();
> CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
> TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
>
> /* We deliberately delay 2PC checkpointing as long as possible */
> CheckPointTwoPhase(checkPointRedo);
> }
Of these calls, CheckPointCLOG would be natural as the rmgr_callback of
the clog rmgr. Similarly for CheckPointMultiXact and maybe a few others.
However, let's look at the pg_stat_statements patch (pgss_001.v1.patch):
It's now writing a new WAL record for every call to pgss_store(),
turning even simple queries into WAL-logged operations. That's a
non-starter. It will also not work on a standby. This needs to be
redesigned so that the data is updated in memory, and written to disk
and/or WAL-logged only periodically. Perhaps at checkpoints, but you
could do it more frequently too.
I'm not convinced that the stats should be WAL-logged. Do you want them
to be replicated and included in backups? Maybe, but maybe not. It's
certainly a change to how it currently works.
If we don't WAL-log the stats, we don't really need a custom RMGR for
it. We just need a checkpoint hook to flush the stats to disk, but we
don't need a registered RMGR ID for it.
So, I got a feeling that adding this to the rmgr interface is not quite
right. The rmgr callbacks are for things that run when WAL is
*replayed*, while checkpoints are related to how WAL is generated. Let's
design this as an independent hook, separate from rmgrs.
Another data point: In Neon, we actually had to add a little code to
checkpoints, to WAL-log some exta data. That was a quick hack and might
not be the right design in the first place, but these hooks would not
have been useful for our purposes. We wanted to write a new WAL record
at shutdown, and in CheckPointGuts(), it's already too late for that. It
needs to be done earlier, before starting to the shutdown checkpoint.
Similar to LogStandbySnapshot(), except that LogStandbySnapshot() is not
called at shutdown like we wanted to. For a table AM, the point of a
checkpoint hook is probably to fsync() data that is managed outside of
the normal buffer manager and CheckPointGuts() is the right place for
that, but other extensions might want to hook into checkpoints for other
reasons.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Borodin | 2024-07-23 13:21:55 | Re: [PATCH] GROUP BY ALL |
Previous Message | Andrew Dunstan | 2024-07-23 13:07:19 | Re: xid_wraparound tests intermittent failure. |