Backup command and functions can cause assertion failure and segmentation fault

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Backup command and functions can cause assertion failure and segmentation fault
Date: 2022-06-30 03:28:43
Message-ID: 3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I found that the assertion failure and the segmentation fault could
happen by running pg_backup_start(), pg_backup_stop() and BASE_BACKUP
replication command, in v15 or before.

Here is the procedure to reproduce the assertion failure.

1. Connect to the server as the REPLICATION user who is granted
EXECUTE to run pg_backup_start() and pg_backup_stop().

$ psql
=# CREATE ROLE foo REPLICATION LOGIN;
=# GRANT EXECUTE ON FUNCTION pg_backup_start TO foo;
=# GRANT EXECUTE ON FUNCTION pg_backup_stop TO foo;
=# \q

$ psql "replication=database user=foo dbname=postgres"

2. Run pg_backup_start() and pg_backup_stop().

=> SELECT pg_backup_start('test', true);
=> SELECT pg_backup_stop();

3. Run BASE_BACKUP replication command with smaller MAX_RATE so that
it can take a long time to finish.

=> BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);

4. Terminate the replication connection while it's running BASE_BACKUP.

$ psql
=# SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE backend_type = 'walsender';

This procedure can cause the following assertion failure.

TRAP: FailedAssertion("XLogCtl->Insert.runningBackups > 0", File: "xlog.c", Line: 8779, PID: 69434)
0 postgres 0x000000010ab2ff7f ExceptionalCondition + 223
1 postgres 0x000000010a455126 do_pg_abort_backup + 102
2 postgres 0x000000010a8e13aa shmem_exit + 218
3 postgres 0x000000010a8e11ed proc_exit_prepare + 125
4 postgres 0x000000010a8e10f3 proc_exit + 19
5 postgres 0x000000010ab3171c errfinish + 1100
6 postgres 0x000000010a91fa80 ProcessInterrupts + 1376
7 postgres 0x000000010a886907 throttle + 359
8 postgres 0x000000010a88675d bbsink_throttle_archive_contents + 29
9 postgres 0x000000010a885aca bbsink_archive_contents + 154
10 postgres 0x000000010a885a2a bbsink_forward_archive_contents + 218
11 postgres 0x000000010a884a99 bbsink_progress_archive_contents + 89
12 postgres 0x000000010a881aba bbsink_archive_contents + 154
13 postgres 0x000000010a881598 sendFile + 1816
14 postgres 0x000000010a8806c5 sendDir + 3573
15 postgres 0x000000010a8805d9 sendDir + 3337
16 postgres 0x000000010a87e262 perform_base_backup + 1250
17 postgres 0x000000010a87c734 SendBaseBackup + 500
18 postgres 0x000000010a89a7f8 exec_replication_command + 1144
19 postgres 0x000000010a92319a PostgresMain + 2154
20 postgres 0x000000010a82b702 BackendRun + 50
21 postgres 0x000000010a82acfc BackendStartup + 524
22 postgres 0x000000010a829b2c ServerLoop + 716
23 postgres 0x000000010a827416 PostmasterMain + 6470
24 postgres 0x000000010a703e19 main + 809
25 libdyld.dylib 0x00007fff2072ff3d start + 1

Here is the procedure to reproduce the segmentation fault.

1. Connect to the server as the REPLICATION user who is granted
EXECUTE to run pg_backup_stop().

$ psql
=# CREATE ROLE foo REPLICATION LOGIN;
=# GRANT EXECUTE ON FUNCTION pg_backup_stop TO foo;
=# \q

$ psql "replication=database user=foo dbname=postgres"

2. Run BASE_BACKUP replication command with smaller MAX_RATE so that
it can take a long time to finish.

=> BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);

3. Press Ctrl-C to cancel BASE_BACKUP while it's running.

4. Run pg_backup_stop().

=> SELECT pg_backup_stop();

This procedure can cause the following segmentation fault.

LOG: server process (PID 69449) was terminated by signal 11: Segmentation fault: 11
DETAIL: Failed process was running: SELECT pg_backup_stop();

The root cause of these failures seems that sessionBackupState flag
is not reset to SESSION_BACKUP_NONE even when BASE_BACKUP is aborted.
So attached patch changes do_pg_abort_backup callback so that
it resets sessionBackupState. I confirmed that, with the patch,
those assertion failure and segmentation fault didn't happen.

But this change has one issue that; if BASE_BACKUP is run while
a backup is already in progress in the session by pg_backup_start()
and that session is terminated, the change causes XLogCtl->Insert.runningBackups
to be decremented incorrectly. That is, XLogCtl->Insert.runningBackups
is incremented by two by pg_backup_start() and BASE_BACKUP,
but it's decremented only by one by the termination of the session.

To address this issue, I think that we should disallow BASE_BACKUP
to run while a backup is already in progress in the *same* session
as we already do this for pg_backup_start(). Thought? I included
the code to disallow that in the attached patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
fix_assertion_failure_and_segv_when_backup_v1.patch text/plain 1.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shiy.fnst@fujitsu.com 2022-06-30 03:47:24 RE: Handle infinite recursion in logical replication setup
Previous Message Thomas Munro 2022-06-30 02:42:25 Re: making relfilenodes 56 bits