Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Euler Taveira <euler(at)eulerto(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Date: 2023-11-13 14:17:40
Message-ID: CALj2ACWryGZ23dRZk3SMxMLgntOwM3fq+QD3Rbqr+-VOgfbTrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 13, 2023 at 5:43 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Nov 6, 2023 at 9:09 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sun, Nov 5, 2023 at 4:01 AM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Nov 2, 2023 at 7:19 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > But that's not quite compatible with what Alvaro [2] had written long
> > > > back ("... the only acquisitions that would log messages are those in
> > > > StartReplication and StartLogicalReplication.").
> > > >
> > > > In other words, ReplicationSlotAcquire/ReplicationSlotRelease is
> > > > called by more places than you care to log from.
> > >
> > > I refreshed my thoughts for this patch and I think it's enough if
> > > walsenders alone log messages when slots become active and inactive.
> > > How about something like the attached v11 patch?
> > >
> >
> > + * This function is currently used only in walsender.
> > + */
> > +void
> > +ReplicationSlotAcquireAndLog(const char *name, bool nowait)
> >
> > BTW, is the reason for using it only in walsender is that it is a
> > background process and it is not very apparent whether the slot is
> > created, and for foreground processes, it is a bit clear when the
> > command is executed.
> >
> > Can you please tell me the use case of this additional message?
>
> Replication slot acquisitions and releases by backends say when
> running pg_replication_slot_advance or pg_logical_slot_get_changes or
> pg_drop_replication_slot or pg_create_{physical,
> logical}_replication_slot are transient unlike walsenders which
> comparatively hold slots for longer durations. Therefore, I've added
> them only for walsenders. These messages help to know the lifetime of
> a replication slot - one can know how long a streaming standby or
> logical subscriber is down, IOW, how long a replication slot is
> inactive in production. For instance, the time between released and
> acquired slots in the below messages is the inactive replication slot
> duration.
>
> 2023-11-13 11:06:34.338 UTC [470262] LOG: acquired physical
> replication slot "sb_repl_slot"
> 2023-11-13 11:06:34.338 UTC [470262] STATEMENT: START_REPLICATION
> SLOT "sb_repl_slot" 0/3000000 TIMELINE 1
> 2023-11-13 11:09:24.918 UTC [470262] LOG: released physical
> replication slot "sb_repl_slot"
> 2023-11-13 12:01:40.530 UTC [470967] LOG: acquired physical
> replication slot "sb_repl_slot"
> 2023-11-13 12:01:40.530 UTC [470967] STATEMENT: START_REPLICATION
> SLOT "sb_repl_slot" 0/3000000 TIMELINE 1
>
> > If so, the other alternative is to either use a
> > parameter to the existing function or directly use am_walsender flag
> > to distinguish when to print the message in acquire/release calls.
>
> Done that way. PSA v12.
>
> > A few other minor comments:
> > 1.
> > + Causes each replication command and related activity to be logged in
> > + the server log.
> >
> > Can we be bit more specific by changing to something like: "Causes
> > each replication command and slot acquisition/release to be logged in
> > the server log."
>
> Done.
>
> > 2.
> > + ereport(log_replication_commands ? LOG : DEBUG1,
> > + (errmsg("walsender process with PID %d acquired %s replication slot \"%s\"",
> >
> > It seems PID and process name is quite unlike what we print in other
> > similar messages. For example, see below messages when we start
> > replication via pub/sub :
> >
> > We can get the PID from the log line as for other logs and I don't see
> > the process name printed anywhere else.
>
> There was a comment upthread to have PID printed, but I agree to be
> consistent and changed the messages to be: acquired physical/logical
> replication slot "foo" and released physical/logical replication slot
> "foo".
>
> PSA v12 patch.

Compiler isn't happy with v12
https://cirrus-ci.com/task/5543061376204800?logs=gcc_warning#L405. PSA
v13 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v13-0001-Log-messages-for-replication-slot-acquisition-an.patch application/octet-stream 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kumar, Sachin 2023-11-13 15:06:31 Re: pg_upgrade failing for 200+ million Large Objects
Previous Message Zhijie Hou (Fujitsu) 2023-11-13 13:57:28 RE: Synchronizing slots from primary to standby