From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Stronger safeguard for archive recovery not to miss data |
Date: | 2020-12-08 03:08:20 |
Message-ID: | OSBPR01MB48889E884ABE377AE505F543EDCD0@OSBPR01MB4888.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
On Thursday, November 26, 2020 4:29 PM
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Thu, 26 Nov 2020 07:18:39 +0000, "osumi(dot)takamichi(at)fujitsu(dot)com"
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote in
> > The attached patch is intended to prevent a scenario that archive
> > recovery hits WALs which come from wal_level=minimal and the server
> > continues to work, which was discussed in the thread of [1].
> > The motivation is to protect that user ends up with both getting
> > replica that could miss data and getting the server to miss data in targeted
> recovery mode.
> >
> > About how to modify this, we reached the consensus in the thread.
> > It is by changing the ereport's level from WARNING to FATAL in
> CheckRequiredParameterValues().
> >
> > In order to test this fix, what I did is
> > 1 - get a base backup during wal_level is replica
> > 2 - stop the server and change the wal_level from replica to minimal
> > 3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
> > 4 - stop the server and make the wal_level back to replica
> > 5 - start the server again
> > 6 - execute archive recoveries in both cases
> > (1) by editing the postgresql.conf and
> > touching recovery.signal in the base backup from 1th step
> > (2) by making a replica with standby.signal
> > * During wal_level is replica, I enabled archive_mode in this test.
> >
> > First of all, I confirmed the server started up without this patch.
> > After applying this safeguard patch, I checked that the server cannot
> > start up any more in the scenario case.
> > I checked the log and got the result below with this patch.
> >
> > 2020-11-26 06:49:46.003 UTC [19715] FATAL: WAL was generated with
> > wal_level=minimal, data may be missing
> > 2020-11-26 06:49:46.003 UTC [19715] HINT: This happens if you
> temporarily set wal_level=minimal without taking a new base backup.
> >
> > Lastly, this should be backpatched.
> > Any comments ?
>
> Perhaps we need the TAP test that conducts the above steps.
I added the TAP tests to reproduce and share the result,
using the case of 6-(1) described above.
Here, I created a new file for it because the purposes of other files in
src/recovery didn't match the purpose of my TAP tests perfectly.
If you are dubious about this idea, please have a look at the comments
in each file.
When the attached patch is applied,
my TAP tests are executed like other ones like below.
t/018_wal_optimize.pl ................ ok
t/019_replslot_limit.pl .............. ok
t/020_archive_status.pl .............. ok
t/021_row_visibility.pl .............. ok
t/022_archive_recovery.pl ............ ok
All tests successful.
Also, I confirmed that there's no regression by make check-world.
Any comments ?
Best,
Takamichi Osumi
Attachment | Content-Type | Size |
---|---|---|
stronger_safeguard_for_archive_recovery_v02.patch | application/octet-stream | 3.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | tsunakawa.takay@fujitsu.com | 2020-12-08 03:09:50 | RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently |
Previous Message | Kyotaro Horiguchi | 2020-12-08 03:04:04 | Re: Huge memory consumption on partitioned table with FKs |