From: | "Euler Taveira" <euler(at)eulerto(dot)com> |
---|---|
To: | "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, "Euler Taveira" <euler(dot)taveira(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter(at)eisentraut(dot)org> |
Subject: | Re: State of pg_createsubscriber |
Date: | 2024-06-18 07:10:51 |
Message-ID: | 5e81b128-495d-47e9-bb79-7039ccdfd2a3@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 18, 2024, at 12:59 AM, Amit Kapila wrote:
> On Mon, May 20, 2024 at 12:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sun, May 19, 2024 at 11:20 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> > >
> > > On Sun, May 19, 2024, at 2:30 PM, Tom Lane wrote:
> > >
> > > I'm fairly disturbed about the readiness of pg_createsubscriber.
> > > The 040_pg_createsubscriber.pl TAP test is moderately unstable
> > > in the buildfarm [1], and there are various unaddressed complaints
> > > at the end of the patch thread (pretty much everything below [2]).
> > > I guess this is good enough to start beta with, but it's far from
> > > being good enough to ship, IMO. If there were active work going
> > > on to fix these things, I'd feel better, but neither the C code
> > > nor the test script have been touched since 1 April.
> > >
> > >
> > > My bad. :( I'll post patches soon to address all of the points.
> > >
> >
> > Just to summarize, apart from BF failures for which we had some
> > discussion, I could recall the following open points:
> >
> > 1. After promotion, the pre-existing replication objects should be
> > removed (either optionally or always), otherwise, it can lead to a new
> > subscriber not being able to restart or getting some unwarranted data.
> > [1][2].
> >
> > 2. Retaining synced slots on new subscribers can lead to unnecessary
> > WAL retention and dead rows [3].
> >
> > 3. We need to consider whether some of the messages displayed in
> > --dry-run mode are useful or not [4].
> >
>
> The recent commits should silence BF failures and resolve point number
> 2. But we haven't done anything yet for 1 and 3. For 3, we have a
> patch in email [1] (v3-0005-Avoid*) which can be reviewed and
> committed but point number 1 needs discussion. Apart from that
> somewhat related to point 1, Kuroda-San has raised a point in an email
> [2] for replication slots. Shlok has presented a case in this thread
> [3] where the problem due to point 1 can cause ERRORs or can cause
> data inconsistency.
I read v3-0005 and it seems to silence (almost) all "write" messages. Does it
intend to avoid the misinterpretation that the dry run mode is writing
something? It is dry run mode! If I run a tool in dry run mode, I expect it to
execute some verifications and print useful messages so I can evaluate if it is
ok to run it. Maybe it is not your expectation for dry run mode.
I think if it not clear, let's inform that it changes nothing in dry run mode.
pg_createsubscriber: no modifications are done
as a first message in dry run mode. I agree with you when you pointed out that
some messages are misleading.
pg_createsubscriber: hint: If pg_createsubscriber fails after this
point, you must recreate the physical replica before continuing.
Maybe for this one, we omit the fake information, like:
pg_createsubscriber: setting the replication progress on database "postgres"
I will post a patch to address the messages once we agree what needs to be
changed.
Regarding 3, publications and subscriptions are ok to remove. You are not
allowed to create them on standby, hence, all publications and subscriptions
are streamed from primary. However, I'm wondering if you want to remove the
publications. Replication slots on a standby server are "invalidated" despite
of the wal_status is saying "reserved" (I think it is an oversight in the
design that implements slot invalidation), however, required WAL files have
already been removed because of pg_resetwal (see modify_subscriber_sysid()).
The scenario is to promote a standby server, run pg_resetwal on it and check
pg_replication_slots.
Do you have any other scenarios in mind?
> Now, the easiest way out here is that we accept the issues with the
> pre-existing subscriptions and replication slots cases and just
> document them for now with the intention to work on those in the next
> version. OTOH, if there are no major challenges, we can try to
> implement a patch for them as well as see how it goes.
Agree.
--
Euler Taveira
EDB https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-06-18 07:18:28 | replace strtok() |
Previous Message | Anton A. Melnikov | 2024-06-18 06:57:11 | Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid? |