From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | WAL record CRC calculated incorrectly because of underlying buffer modification |
Date: | 2024-05-10 13:00:01 |
Message-ID: | 055bb33c-bccc-bc1d-c2f8-8598534448ac@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
I've investigated a recent buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-05-02%2006%3A40%3A36
where the test failed due to a CRC error:
2024-05-02 17:08:18.401 ACST [3406:7] LOG: incorrect resource manager data checksum in record at 0/F14D7A60
(Chipmunk produced similar errors as well:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2022-08-25%2019%3A40%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2024-03-22%2003%3A20%3A39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-19%2006%3A43%3A20
)
and discovered that XLogRecordAssemble() calculates CRC over a buffer,
that might be modified by another process.
With the attached patch applied, the following test run:
echo "
autovacuum_naptime = 1
autovacuum_vacuum_threshold = 1
wal_consistency_checking = all
" >/tmp/temp.config
for ((i=1;i<=100;i++)); do echo "iteration $i"; TEMP_CONFIG=/tmp/temp.config TESTS="test_setup hash_index" make
check-tests -s || break; done
fails for me on iterations 7, 10, 17:
ok 1 - test_setup 2557 ms
not ok 2 - hash_index 24719 ms
# (test process exited with exit code 2)
postmaster.log contains:
2024-05-10 12:46:44.320 UTC checkpointer[1881151] LOG: checkpoint starting: immediate force wait
2024-05-10 12:46:44.365 UTC checkpointer[1881151] LOG: checkpoint complete: wrote 41 buffers (0.3%); 0 WAL file(s)
added, 0 removed, 26 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, average=0.000
s; distance=439134 kB, estimate=527137 kB; lsn=0/3CE131F0, redo lsn=0/3CE13198
TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: "xloginsert.c", Line: 949, PID: 1881271
ExceptionalCondition at assert.c:52:13
XLogRecordAssemble at xloginsert.c:953:1
XLogInsert at xloginsert.c:520:9
hashbucketcleanup at hash.c:844:14
hashbulkdelete at hash.c:558:3
index_bulk_delete at indexam.c:760:1
vac_bulkdel_one_index at vacuum.c:2498:10
lazy_vacuum_one_index at vacuumlazy.c:2443:10
lazy_vacuum_all_indexes at vacuumlazy.c:2026:26
lazy_vacuum at vacuumlazy.c:1944:10
lazy_scan_heap at vacuumlazy.c:1050:3
heap_vacuum_rel at vacuumlazy.c:503:2
vacuum_rel at vacuum.c:2214:2
vacuum at vacuum.c:622:8
autovacuum_do_vac_analyze at autovacuum.c:3102:2
do_autovacuum at autovacuum.c:2425:23
AutoVacWorkerMain at autovacuum.c:1569:3
pgarch_die at pgarch.c:846:1
StartChildProcess at postmaster.c:3929:5
StartAutovacuumWorker at postmaster.c:3997:12
process_pm_pmsignal at postmaster.c:3809:3
ServerLoop at postmaster.c:1667:5
BackgroundWorkerInitializeConnection at postmaster.c:4156:1
main at main.c:184:3
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc71a8d7e40]
postgres: autovacuum worker regression(_start+0x25)[0x556a8631a5e5]
2024-05-10 12:46:45.038 UTC checkpointer[1881151] LOG: checkpoint starting: immediate force wait
2024-05-10 12:46:45.965 UTC autovacuum worker[1881275] LOG: automatic analyze of table "regression.pg_catalog.pg_attribute"
avg read rate: 0.000 MB/s, avg write rate: 5.409 MB/s
buffer usage: 1094 hits, 0 misses, 27 dirtied
system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.03 s
2024-05-10 12:46:46.892 UTC postmaster[1881150] LOG: server process (PID 1881271) was terminated by signal 6: Aborted
2024-05-10 12:46:46.892 UTC postmaster[1881150] DETAIL: Failed process was running: autovacuum: VACUUM ANALYZE
public.hash_cleanup_heap
(This can be reproduced with 027_stream_regress, of course, but it would
take more time.)
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
xloginsert-memchk.patch | text/x-patch | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nazir Bilal Yavuz | 2024-05-10 13:21:20 | Re: Fix parallel vacuum buffer usage reporting |
Previous Message | Melanie Plageman | 2024-05-10 12:48:07 | Re: open items |