From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pg_upgrade and logical replication |
Date: | 2024-02-13 20:20:08 |
Message-ID: | CALDaNm3ac77+U0qBva8zjNJj1SUek321yyRKbM_0+rYOng_XVQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 14 Feb 2024 at 09:07, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Justin,
>
> > pg_upgrade/t/004_subscription.pl says
> >
> > |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
> >
> > ..but I think maybe it should not.
> >
> > When you try to use --link, it fails:
> > https://cirrus-ci.com/task/4669494061170688
> >
> > |Adding ".old" suffix to old global/pg_control ok
> > |
> > |If you want to start the old cluster, you will need to remove
> > |the ".old" suffix from
> > /tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_su
> > bscription_old_sub_data/pgdata/global/pg_control.old.
> > |Because "link" mode was used, the old cluster cannot be safely
> > |started once the new cluster has been started.
> > |...
> > |
> > |postgres: could not find the database system
> > |Expected to find it in the directory
> > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> > ubscription_old_sub_data/pgdata",
> > |but could not open file
> > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> > ubscription_old_sub_data/pgdata/global/pg_control": No such file or directory
> > |# No postmaster PID for node "old_sub"
> > |[19:36:01.396](0.250s) Bail out! pg_ctl start failed
> >
>
> Good catch! The primal reason of the failure is to reuse the old cluster, even after
> the successful upgrade. The documentation said [1]:
>
> >
> If you use link mode, the upgrade will be much faster (no file copying) and use less
> disk space, but you will not be able to access your old cluster once you start the new
> cluster after the upgrade.
> >
>
> > You could rename pg_control.old to avoid that immediate error, but that doesn't
> > address the essential issue that "the old cluster cannot be safely started once
> > the new cluster has been started."
>
> Yeah, I agreed that it should be avoided to access to the old cluster after the upgrade.
> IIUC, pg_upgrade would be run third times in 004_subscription.
>
> 1. successful upgrade
> 2. failure due to the insufficient max_replication_slot
> 3. failure because the pg_subscription_rel has 'd' state
>
> And old instance is reused in all of runs. Therefore, the most reasonable fix is to
> change the ordering of tests, i.e., "successful upgrade" should be done at last.
>
> Attached patch modified the test accordingly. Also, it contains some optimizations.
Your proposal to change the tests in the following order: a) failure
due to the insufficient max_replication_slot b) failure because the
pg_subscription_rel has 'd' state c) successful upgrade. looks good to
me.
I have also verified that your changes fixes the issue as the
successful upgrade is moved to the end and the old cluster is no
longer used after upgrade.
One minor suggestion:
There is an extra line break here, this can be removed:
@@ -181,139 +310,5 @@ is($result, qq(1),
"check the data is synced after enabling the subscription for
the table that was in init state"
);
-# cleanup
Regards,
Vignesh
From | Date | Subject | |
---|---|---|---|
Next Message | Joseph Koshakow | 2024-02-13 20:28:08 | Re: Fix overflow hazard in interval rounding |
Previous Message | Peter Geoghegan | 2024-02-13 19:54:14 | Re: index prefetching |