Re: Track in pg_replication_slots the reason why slots conflict?

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Track in pg_replication_slots the reason why slots conflict?
Date: 2023-12-21 12:08:35
Message-ID: CAJpy0uC6aR90bGG=UMvHctZeRdxRWfjC21p=vsY1qm28t4hS8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 21, 2023 at 5:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-12-21 16:08:48 +0530, shveta malik wrote:
> > On Thu, Dec 21, 2023 at 3:10 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2023-12-21 09:21:04 +0900, Michael Paquier wrote:
> > > > While listening at Bertrand's talk about logical decoding on standbys
> > > > last week at Prague, I got surprised by the fact that we do not
> > > > reflect in the catalogs the reason why a conflict happened for a slot.
> > > > There are three of them depending on ReplicationSlotInvalidationCause:
> > > > - WAL removed.
> > > > - Invalid horizon.
> > > > - Insufficient WAL level.
> > >
> > > It should be extremely rare to hit any of these other than "WAL removed", so
> > > I'm not sure it's worth adding interface complexity to show them.
> > >
> > >
> > > > ReplicationSlotCtl holds this information, so couldn't it be useful
> > > > for monitoring purposes to know why a slot got invalidated and add a
> > > > column to pg_get_replication_slots()? This could just be an extra
> > > > text conflicting_reason, defaulting to NULL when there's nothing to
> > > > see.
> > >
> > > Extra columns aren't free from a usability perspective. IFF we do something, I
> > > think it should be a single column with a cause.
> >
> > Thanks for the feedback. But do you mean that we replace existing
> > 'conflicting' column with 'cause' in both the function and view
> > (pg_get_replication_slots() and pg_replication_slots)? Or do you mean
> > that we expose 'cause' from pg_get_replication_slots() and use that to
> > display 'conflicting' in pg_replication_slots view?
>
> I'm not entirely sure I understand the difference - just whether we add one
> new column or replace the existing 'conflicting' column? I can see arguments
> for either. A conflicting column where NULL indicates no conflict, and other
> values indicate the reason for the conflict, doesn't seem too bad.
>
>
> > And if we plan to return/display cause from either function or view,
> > then shall it be enum 'ReplicationSlotInvalidationCause' or
> > description/text corresponding to enum?
>
> We clearly can't just expose the numerical value for a C enum. So it has to be
> converted to something SQL representable.
>
>
> > In the other feature being discussed "Synchronize slots from primary
> > to standby" [1] , there is a requirement to replicate invalidation
> > cause of slot from the primary to standby and thus it is needed in
> > enum form there. And thus there was a suggestion earlier to have the
> > function return enum-value and let the view display it as
> > text/description to the user. So kindly let us know your thoughts.
> >
> > [1] - https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
>
> Can you point me to a more specific message for that requirement? It seems
> pretty odd to me. Your link goes to the top of a 400 message thread, I don't
> have time to find one specific design point in that...

It is currently implemented there as a new function
'pg_get_slot_invalidation_cause()' without changing existing view
pg_replication_slots. (See 2.1 in [1] where it was introduced).

Then it was suggested in [2] to fork a new thread as it makes sense to
have it independent of this slot-synchronization feature.

The new thread forked is [3]. In that thread, the issues in having a
new function pg_get_slot_invalidation_cause() are discussed and also
we came to know about this very thread that started the next day.

[1]: https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1K0KCDNtpDyUKucMRdyK-5KdrCRWakCpHEdHT9muAiEOw%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6N%3DanX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2023-12-21 12:26:29 Re: Detecting some cases of missing backup_label
Previous Message Robert Haas 2023-12-21 12:07:03 Re: trying again to get incremental backup