Re: [PoC] pg_upgrade: allow to upgrade publisher node

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-07-20 05:48:58
Message-ID: CAA4eK1LBcgKN4JcFxS_g0t+hyzfOmprns5teEEtj25O7BDd14Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 19, 2023 at 7:33 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > 2.
> > + /*
> > + * Dump logical replication slots if needed.
> > + *
> > + * XXX We cannot dump replication slots at the same time as the schema
> > + * dump because we need to separate the timing of restoring
> > + * replication slots and other objects. Replication slots, in
> > + * particular, should not be restored before executing the pg_resetwal
> > + * command because it will remove WALs that are required by the slots.
> > + */
> > + if (user_opts.include_logical_slots)
> >
> > Can you explain this point a bit more with some example scenarios?
> > Basically, if we had sent all the WAL before the upgrade then why do
> > we need to worry about the timing of pg_resetwal?
>
> OK, I can tell the example here. Should it be described on the source?
>
> Assuming that there is a valid logical replication slot as follows:
>
> ```
> postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from pg_replication_slots;
> slot_name | plugin | restart_lsn | wal_status | two_phase
> -----------+---------------+-------------+------------+-----------
> test | test_decoding | 0/15665A8 | reserved | f
> (1 row)
>
> postgres=# select * from pg_current_wal_lsn();
> pg_current_wal_lsn
> --------------------
> 0/15665E0
> (1 row)
> ```
>
> And here let's execute the pg_resetwal to the pg server.
> The existing wal segment file is purged and moved to next seg.
>
> ```
> $ pg_ctl stop -D data_N1/
> waiting for server to shut down.... done
> server stopped
> $ pg_resetwal -l 000000010000000000000002 data_N1/
> Write-ahead log reset
> $ pg_ctl start -D data_N1/ -l N1.log
> waiting for server to start.... done
> server started
> ```
>
> After that the logical slot cannot move foward anymore because the required WALs
> are removed, whereas the wal_status is still "reserved".
>
> ```
> postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from pg_replication_slots;
> slot_name | plugin | restart_lsn | wal_status | two_phase
> -----------+---------------+-------------+------------+-----------
> test | test_decoding | 0/15665A8 | reserved | f
> (1 row)
>
> postgres=# select * from pg_current_wal_lsn();
> pg_current_wal_lsn
> --------------------
> 0/2028328
> (1 row)
>
> postgres=# select * from pg_logical_slot_get_changes('test', NULL, NULL);
> ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed
> ```
>
> pg_upgrade runs pg_dump and then pg_resetwal, so dumping slots must be done
> separately to avoid above error.
>

Okay, so the point is that if we create the slot in the new cluster
before pg_resetwal then its restart_lsn will be set to the current LSN
position which will later be reset by pg_resetwal. So, we won't be
able to use such a slot, right?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-07-20 06:28:19 Re: ObjectIdGetDatum() missing from SearchSysCache*() callers
Previous Message Bharath Rupireddy 2023-07-20 05:29:46 Re: Report distinct wait events when waiting for WAL "operation"