From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Design of pg_stat_subscription_workers vs pgstats |
Date: | 2022-01-27 22:35:57 |
Message-ID: | CAKFQuwYS_EUe+sR6MS3aiR9UXtUJfDcmHoDjrXAeDnY5w_9bnw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 27, 2022 at 2:15 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Another related thing is that using a 32bit xid for allowing skipping is a
> bad
> idea anyway - we shouldn't adding new interfaces with xid wraparound
> dangers -
> it's getting more and more common to have multiple wraparounds a day. An
> easily better alternative would be the LSN at which a transaction starts.
>
>
Interesting idea. I do not think a well-designed skipping feature need
worry about wrap-around though. The XID to be skipped was just seen be a
worker and because it failed it will continue to be the same XID
encountered by that worker until it is resolved. There is no effective
progression in time while the subscriber is stuck for wrap-around to
happen. Since we want to skip the transaction as a whole adding a layer of
hidden indirection to the process seems undesirable. I'm not against the
idea though - to the user it is basically "copy this value from the error
message in order to skip the transaction that caused the error". Then the
system verifies the value and then ensures it skips one, and only one,
transaction.
> It's pretty easy from the POV of getting into a new transaction.
>
> PG_CATCH():
>
> /* get us out of the failed transaction */
> AbortOutOfAnyTransaction();
>
> StartTransactionCommand();
> /* do something to remember the error we just got */
> CommitTransactionCommand();
>
Thank you.
> It may be a bit harder to afterwards to to not just error out the whole
> worker, because we'd need to know what to do instead.
>
>
I imagine the launcher and worker startup code can be made to deal with the
restart adequately. Just wait if the last thing seen was an error.
Require the user to manually resume the worker - unless we really think
a try-until-you-succeed with a backoff protocol is superior. Upon system
restart all error information is cleared and we start from scratch and let
the errors happen (or not depending) as they will.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-01-27 22:36:32 | Re: A test for replay of regression tests |
Previous Message | Thomas Munro | 2022-01-27 22:21:58 | Re: A test for replay of regression tests |