Re: Design of pg_stat_subscription_workers vs pgstats

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-01-27 22:35:57
Message-ID: CAKFQuwYS_EUe+sR6MS3aiR9UXtUJfDcmHoDjrXAeDnY5w_9bnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 27, 2022 at 2:15 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Another related thing is that using a 32bit xid for allowing skipping is a
> bad
> idea anyway - we shouldn't adding new interfaces with xid wraparound
> dangers -
> it's getting more and more common to have multiple wraparounds a day. An
> easily better alternative would be the LSN at which a transaction starts.
>
>
Interesting idea. I do not think a well-designed skipping feature need
worry about wrap-around though. The XID to be skipped was just seen be a
worker and because it failed it will continue to be the same XID
encountered by that worker until it is resolved. There is no effective
progression in time while the subscriber is stuck for wrap-around to
happen. Since we want to skip the transaction as a whole adding a layer of
hidden indirection to the process seems undesirable. I'm not against the
idea though - to the user it is basically "copy this value from the error
message in order to skip the transaction that caused the error". Then the
system verifies the value and then ensures it skips one, and only one,
transaction.

> It's pretty easy from the POV of getting into a new transaction.
>
> PG_CATCH():
>
> /* get us out of the failed transaction */
> AbortOutOfAnyTransaction();
>
> StartTransactionCommand();
> /* do something to remember the error we just got */
> CommitTransactionCommand();
>

Thank you.

> It may be a bit harder to afterwards to to not just error out the whole
> worker, because we'd need to know what to do instead.
>
>
I imagine the launcher and worker startup code can be made to deal with the
restart adequately. Just wait if the last thing seen was an error.
Require the user to manually resume the worker - unless we really think
a try-until-you-succeed with a backoff protocol is superior. Upon system
restart all error information is cleared and we start from scratch and let
the errors happen (or not depending) as they will.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-01-27 22:36:32 Re: A test for replay of regression tests
Previous Message Thomas Munro 2022-01-27 22:21:58 Re: A test for replay of regression tests