RE: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-01-28 09:38:57
Message-ID: OSCPR01MB1496669107B3580F7041DACC0F5EF2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Sawada-san,

I love the idea. I've roughly tested the patch and worked on my env.
Here are initial comments...

1. xloglevelworker.c
```
+#include "replication/logicalxlog.h"
```

xloglevelworker.c includes replication/logicalxlog.h, but it does not exist.
The line had to be removed to build and test it.

2.
```
+static void
+writeUpdateWalLevel(int new_wal_level)
+{
+ XLogBeginInsert();
+ XLogRegisterData((char *) (&new_wal_level), sizeof(bool));
+ XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL);
+}
```

IIUC the data length should be sizeof(int) instead of sizeof(bool).

3.
Is there a reason why the process does not wait till the archiver exits?

4.
When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be recognized:

```
rmgr: XLOG len (rec/tot): 27/ 27, tx: 0, lsn: 0/03050838, prev 0/03050800, desc: UNKNOWN (f0) wal_level logical
```

xlog_identify() must be updated as well.

5.
When I changed "logical" to "replica", postgres outputs like below:

```
LOG: received SIGHUP, reloading configuration files
LOG: parameter "wal_level" changed to "replica"
LOG: wal_level control worker started
LOG: changing wal_level from "logical" to "replica"
LOG: wal_level has been decreased to "replica"
LOG: successfully changed wal_level from "logical" to "replica"
```

ISTM that both postmaster and the wal_level control worker said something like
"wal_level changed", which is bit strange for me. Since GUC can't be renamed,
can we use another name for the wal_level control state?

6.
With the patch present, the wal_level can be changed to the minimal even when the
streaming replication is going. If we do that, the walsender exits immediately and
the below FATAL appears periodically until the standby stops. Same things can be
said for the logical replication:

```
FATAL: streaming replication receiver "walreceiver" could not connect to the primary server:
connection to server on socket "/tmp/.s.PGSQL.oooo" failed:
FATAL: WAL senders require "wal_level" to be "replica" or "logical
```

I know this is not a perfect, but can we avoid the issue by reject the GUC update
if the walsender exists? Another approach is not to update the value when replication
slots need to be invalidated.

----------
Best regards,
Haato Kuroda

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-01-28 09:40:48 Re: Virtual generated columns
Previous Message Tatsuo Ishii 2025-01-28 09:02:32 Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options