Re: pg_rewind WAL segments deletion pitfall

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: cyberdemn(at)gmail(dot)com
Cc: smithpb2250(at)gmail(dot)com, torikoshia(at)oss(dot)nttdata(dot)com, horikyota(dot)ntt(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org, bungina(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind WAL segments deletion pitfall
Date: 2024-07-12 07:24:06
Message-ID: 20240712.162406.92904157696470075.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

I'm reviewing patches in Commitfest 2024-07 from top to bottom:
https://commitfest.postgresql.org/48/

This is the 1st patch:
https://commitfest.postgresql.org/48/3874/

The latest patch can't be applied on master:
https://www.postgresql.org/message-id/CAFh8B=nNJtm9ke4_1mhpwGz2PV9yoyF6hMnYh5XACt0AA4VG-A@mail.gmail.com

I've rebased on master. See the attached patch.
Here are changes for it:

* Resolve conflict
* Update copyright year to 2024 from 2023
* Add an added test to meson.build
* Run pgindent

Here are my review comments:

@@ -217,6 +221,26 @@ findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex,
+ char xlogfname[MAXFNAMELEN];
+
+ tli = xlogreader->seg.ws_tli;
+ segno = xlogreader->seg.ws_segno;
+
+ snprintf(xlogfname, MAXPGPATH, XLOGDIR "/");
+ XLogFileName(xlogfname + strlen(xlogfname),
+ xlogreader->seg.ws_tli,
+ xlogreader->seg.ws_segno, WalSegSz);
+
+ /*
+ * Make sure pg_rewind doesn't remove this file, because it is
+ * required for postgres to start after rewind.
+ */
+ insert_keepwalhash_entry(xlogfname);

MAXFNAMELEN is 64 and MAXPGPATH is 1024. strlen(XLOGDIR "/")
is 7 because XLOGDIR is "pg_wal". So xlogfname has enough
size but snprintf(xlogfname, MAXPGPATH) is wrong usage.
(And XLogFileName() uses snprintf(xlogfname, MAXFNAMELEN)
internally.)

How about using one more buffer?

----
char xlogpath[MAXPGPATH];
char xlogfname[MAXFNAMELEN];

tli = xlogreader->seg.ws_tli;
segno = xlogreader->seg.ws_segno;

XLogFileName(xlogfname,
xlogreader->seg.ws_tli,
xlogreader->seg.ws_segno, WalSegSz);
snprintf(xlogpath, MAXPGPATH, "%s/%s", XLOGDIR, xlogfname);

/*
* Make sure pg_rewind doesn't remove this file, because it is
* required for postgres to start after rewind.
*/
insert_keepwalhash_entry(xlogpath);
----

Thanks,
--
kou

In <CAFh8B=mDDZEsK0jDMfvP3MmxkWaeTCxW4yN42OH33JY6sQWS5Q(at)mail(dot)gmail(dot)com>
"Re: pg_rewind WAL segments deletion pitfall" on Tue, 23 Jan 2024 09:23:29 +0100,
Alexander Kukushkin <cyberdemn(at)gmail(dot)com> wrote:

> Hi Peter,
>
> On Mon, 22 Jan 2024 at 00:38, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
>> 2024-01 Commitfest.
>>
>> Hi, This patch has a CF status of "Ready for Committer", but it is
>> currently failing some CFbot tests [1]. Please have a look and post an
>> updated version..
>>
>> ======
>> [1]
>> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/3874
>>
>>
> From what I can see all failures are not related to this patch:
> 1. Windows build failed with
> [10:52:49.679] 126/281 postgresql:recovery / recovery/019_replslot_limit
> ERROR 185.84s (exit status 255 or signal 127 SIGinvalid)
> 2. FreeBSD build failed with
> [09:11:57.656] 190/285 postgresql:psql / psql/010_tab_completion ERROR
> 0.46s exit status 2
> [09:11:57.656] 220/285 postgresql:authentication /
> authentication/001_password ERROR 0.57s exit status 2
>
> In fact, I don't even see this patch being applied for these builds and the
> introduced TAP test being executed.
>
> Regards,
> --
> Alexander Kukushkin

Attachment Content-Type Size
v9-0001-Be-more-picky-with-WAL-segment-deletion-in-pg_rew.patch text/x-patch 8.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-07-12 07:58:27 BUG #18533: pg_basebackup uses out-of-bounds memory and a segment error occurs during backup
Previous Message Michael Paquier 2024-07-12 03:42:07 Re: Potential data loss due to race condition during logical replication slot creation

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-07-12 07:54:51 Re: Restart pg_usleep when interrupted
Previous Message Richard Guo 2024-07-12 07:11:19 Re: Check lateral references within PHVs for memoize cache keys