SIGSEGV, FPE fix in pg_controldata

From: Ilyasov Ian <ianilyasov(at)outlook(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: SIGSEGV, FPE fix in pg_controldata
Date: 2024-12-12 10:23:14
Message-ID: GV1P251MB1004DDEE2EE8B2AE12C3C37CCD3F2@GV1P251MB1004.EURP251.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers!

Recently I've faced an issue when playing with pg_controldata.
I made some rubbish pg_control files and caught a SIGSEGV.
Then I tried fuzzing REL_17_STABLE pg_controldata with AFL++ and got 7 crash cases.
Also, it is not necessary to use fuzzing. You can simply generate
any pg_control by 32-bit initdb and give it to the 64-bit pg_controldata.

SIGSEGV was caused by two places in pg_controldata.c where there
is a work with localtime(&time_tmp));. So I added a check for not NULL.

The reason is such:
The 64-bit ControlFileData has gaps for alignment after DBState,
but the 32-bit ControlFileData does not have such alignment,
so the zero bytes from the gap are read in pg_time_t time (which is actually a long long time):

(gdb output)

64-bit
type = struct ControlFileData {

/* 0 | 8 */ uint64 system_identifier;
/* 8 | 4 */ uint32 pg_control_version;
/* 12 | 4 */ uint32 catalog_version_no;
/* 16 | 4 */ DBState state;
/* XXX 4-byte hole */
/* 24 | 8 */ pg_time_t time;

32 bit
type = struct ControlFileData {
/* 0 | 8 */ uint64 system_identifier;
/* 8 | 4 */ uint32 pg_control_version;
/* 12 | 4 */ uint32 catalog_version_no;
/* 16 | 4 */ DBState state;
/* 20 | 8 */ pg_time_t time;
/* 28 | 8 */ XLogRecPtr checkPoint;
/* 36 | 76 */ CheckPoint checkPointCopy;
/* 112 | 8 */ XLogRecPtr unloggedLSN;
/* 120 | 8 */ XLogRecPtr minRecoveryPoint;
/* 128 | 4 */ TimeLineID minRecoveryPointTLI;
/* 132 | 8 */ XLogRecPtr backupStartPoint;
/* 140 | 8 */ XLogRecPtr backupEndPoint;
/* 148 | 1 */ _Bool backupEndRequired;
/* XXX 3-byte hole */

The other problem is FPE, caused by WalSegSz in pg_controldata.c.
For some reason this variable is signed, it is signed everywhere.
And when pg_control is not valid it can get a negative value, that
then causes an FPE. I changed it to unsigned in my patches.
This raised a question. What is the rationale behind using a signed
type for WAL segment size. Is there a historical or technical reason for this choice?

I’ve implemented fixes tailored to the specific contexts of the affected versions:

For REL_16 and later:

These versions utilize a function XLogFileName. So this raises a question as changing wal_segsz_bytes
argument to unsigned seems logical. But it will be unsigned only here. Maybe it is better to change it
in the other code?

For REL_12 to REL_15:

In these older versions, there is a macro in ./src/include/access/xlog_internal.h
#define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, \
(uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \
(uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes)))

Where casting second operand in % (XLogSegmentsPerXLogId(wal_segsz_bytes)) to unsigned seems enough. Would be glad to hear your thoughts.

The versions that contain the problem:
REL_12_STABLE,
REL_13_STABLE,
REL_14_STABLE,
REL_15_STABLE,
REL_16_STABLE,
REL_17_STABLE

Kind regards,
Ian Ilyasov.

Junior Software Developer at Postgres Professional

Attachment Content-Type Size
0001-REL_17_STABLE-fix-possible-SIGSEGV-situations-on-rubbi.patch text/x-patch 3.5 KB
0001-REL_13_STABLE-fix-possible-SIGSEGV-situations-on-rubbi.patch text/x-patch 2.9 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthonin Bonnefoy 2024-12-12 10:38:48 Re: Add Pipelining support in psql
Previous Message Alvaro Herrera 2024-12-12 10:02:12 Re: pg_createsubscriber TAP test wrapping makes command options hard to read.