From: | Ilyasov Ian <ianilyasov(at)outlook(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | SIGSEGV, FPE fix in pg_controldata |
Date: | 2024-12-12 10:23:14 |
Message-ID: | GV1P251MB1004DDEE2EE8B2AE12C3C37CCD3F2@GV1P251MB1004.EURP251.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, hackers!
Recently I've faced an issue when playing with pg_controldata.
I made some rubbish pg_control files and caught a SIGSEGV.
Then I tried fuzzing REL_17_STABLE pg_controldata with AFL++ and got 7 crash cases.
Also, it is not necessary to use fuzzing. You can simply generate
any pg_control by 32-bit initdb and give it to the 64-bit pg_controldata.
SIGSEGV was caused by two places in pg_controldata.c where there
is a work with localtime(&time_tmp));. So I added a check for not NULL.
The reason is such:
The 64-bit ControlFileData has gaps for alignment after DBState,
but the 32-bit ControlFileData does not have such alignment,
so the zero bytes from the gap are read in pg_time_t time (which is actually a long long time):
(gdb output)
64-bit
type = struct ControlFileData {
/* 0 | 8 */ uint64 system_identifier;
/* 8 | 4 */ uint32 pg_control_version;
/* 12 | 4 */ uint32 catalog_version_no;
/* 16 | 4 */ DBState state;
/* XXX 4-byte hole */
/* 24 | 8 */ pg_time_t time;
32 bit
type = struct ControlFileData {
/* 0 | 8 */ uint64 system_identifier;
/* 8 | 4 */ uint32 pg_control_version;
/* 12 | 4 */ uint32 catalog_version_no;
/* 16 | 4 */ DBState state;
/* 20 | 8 */ pg_time_t time;
/* 28 | 8 */ XLogRecPtr checkPoint;
/* 36 | 76 */ CheckPoint checkPointCopy;
/* 112 | 8 */ XLogRecPtr unloggedLSN;
/* 120 | 8 */ XLogRecPtr minRecoveryPoint;
/* 128 | 4 */ TimeLineID minRecoveryPointTLI;
/* 132 | 8 */ XLogRecPtr backupStartPoint;
/* 140 | 8 */ XLogRecPtr backupEndPoint;
/* 148 | 1 */ _Bool backupEndRequired;
/* XXX 3-byte hole */
The other problem is FPE, caused by WalSegSz in pg_controldata.c.
For some reason this variable is signed, it is signed everywhere.
And when pg_control is not valid it can get a negative value, that
then causes an FPE. I changed it to unsigned in my patches.
This raised a question. What is the rationale behind using a signed
type for WAL segment size. Is there a historical or technical reason for this choice?
I’ve implemented fixes tailored to the specific contexts of the affected versions:
For REL_16 and later:
These versions utilize a function XLogFileName. So this raises a question as changing wal_segsz_bytes
argument to unsigned seems logical. But it will be unsigned only here. Maybe it is better to change it
in the other code?
For REL_12 to REL_15:
In these older versions, there is a macro in ./src/include/access/xlog_internal.h
#define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, \
(uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \
(uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes)))
Where casting second operand in % (XLogSegmentsPerXLogId(wal_segsz_bytes)) to unsigned seems enough. Would be glad to hear your thoughts.
The versions that contain the problem:
REL_12_STABLE,
REL_13_STABLE,
REL_14_STABLE,
REL_15_STABLE,
REL_16_STABLE,
REL_17_STABLE
Kind regards,
Ian Ilyasov.
Junior Software Developer at Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
0001-REL_17_STABLE-fix-possible-SIGSEGV-situations-on-rubbi.patch | text/x-patch | 3.5 KB |
0001-REL_13_STABLE-fix-possible-SIGSEGV-situations-on-rubbi.patch | text/x-patch | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Anthonin Bonnefoy | 2024-12-12 10:38:48 | Re: Add Pipelining support in psql |
Previous Message | Alvaro Herrera | 2024-12-12 10:02:12 | Re: pg_createsubscriber TAP test wrapping makes command options hard to read. |