From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Subject: | RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Date: | 2023-11-29 09:26:26 |
Message-ID: | OS3PR01MB9882FED1F0060468FB01B9DAF583A@OS3PR01MB9882.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear hackers,
> > >
> > > Pushed!
> >
> > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > the last patch committed. Is there further work that needs to be
> > re-attached and/or rebased?
> >
>
> No. I have marked it as committed.
>
I found another failure related with the commit [1]. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old publisher.
More detail, please see below.
# Analysis of the failure
Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.
According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.
Also, server log caused by oldpub said that autovacuum worker was terminated when
it stopped. This was occurred after walsender released the logical slots. WAL records
caused by autovacuum workers could not be consumed by the slots, so that upgrading
function returned false.
# How to reproduce
I made a small file for reproducing the failure. Please see reproduce.txt. This contains
changes for launching autovacuum worker very often and for ensuring actual works are
done. After applying it, I could reproduce the same failure every time.
# How to fix
I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch file.
How do you think?
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-11-27%2020%3A52%3A10
[2]:
```
...
Checking for contrib/isn with bigint-passing mismatch ok
Checking for valid logical replication slots fatal
Your installation contains logical replication slots that can't be upgraded.
You can remove invalid slots and/or consume the pending WAL for other slots,
and then restart the upgrade.
A list of the problematic slots is in the file:
/home/bf/bf-build/skink-master/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231127T220024.480/invalid_logical_slots.txt
Failure, exiting
[22:01:20.362](86.645s) not ok 10 - run of pg_upgrade of old cluster
...
```
[3]:
```
...
2023-11-27 22:00:23.546 UTC [3567962][walsender][4/0:0] LOG: released logical replication slot "regress_sub"
2023-11-27 22:00:23.549 UTC [3559042][postmaster][:0] LOG: received fast shutdown request
2023-11-27 22:00:23.552 UTC [3559042][postmaster][:0] LOG: aborting any active transactions
*2023-11-27 22:00:23.663 UTC [3568793][autovacuum worker][5/3:738] FATAL: terminating autovacuum process due to administrator command*
2023-11-27 22:00:23.775 UTC [3559042][postmaster][:0] LOG: background worker "logical replication launcher" (PID 3560674) exited with exit code 1
...
```
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
disable_autovacuum.patch | application/octet-stream | 578 bytes |
reproduce.txt | text/plain | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-11-29 09:32:02 | Re: pg_upgrade and logical replication |
Previous Message | Zhijie Hou (Fujitsu) | 2023-11-29 09:17:04 | RE: Synchronizing slots from primary to standby |