Re: BUG #18658: Assert in SerialAdd() due to race condition

From: Andrew Bille <andrewbille(at)gmail(dot)com>
To: andrewbille(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18658: Assert in SerialAdd() due to race condition
Date: 2024-10-17 05:47:00
Message-ID: CAJnzarxBO18Ha==H3hX28yz2zd7cVuasXv1wxyJ7ERd9DzJf=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello again!

Using a build with no asserts enabled and without that define, I can also
see the following errors using the script with increased concurrency
(Initially we got a report from our client, who encountered these errors in
production):

2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|PANIC: could
not access status of transaction 18204
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|DETAIL: Could
not open file "pg_serial/0000": No such file or directory.

I changed ERROR to PANIC to get a core dump.

echo "CREATE TABLE t(f1 int);" | psql

res=0

for i in `seq 200`; do
echo "I $i"
for l in `seq 80`; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q &
done
wait

grep 'PANIC: could not access status' server.log && { res=1; break; }
grep 'TRAP: ' server.log && { res=1; break; }

done

#5 0x00005f824729a10a in errfinish (filename=0x5f824732df83 "slru.c",
lineno=1060, funcname=0x5f824732e220 <__func__.3> "SlruReportIOError") at
elog.c:599
#6 0x00005f8246c16c26 in SlruReportIOError (ctl=0x5f8247686c80
<SerialSlruCtlData>, pageno=17, xid=18204) at slru.c:1060
#7 0x00005f8246c15cea in SimpleLruReadPage (ctl=0x5f8247686c80
<SerialSlruCtlData>, pageno=17, write_ok=true, xid=18204) at slru.c:579
#8 0x00005f82470988c2 in SerialAdd (xid=18204,
minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
#9 0x00005f8247099854 in SummarizeOldestCommittedSxact () at
predicate.c:1520
#10 0x00005f8247099ddd in GetSerializableTransactionSnapshotInt
(snapshot=0x5f8247691740 <CurrentSnapshotData>, sourcevxid=0x0,
sourcepid=-1) at predicate.c:1798
#11 0x00005f8247099c72 in GetSerializableTransactionSnapshot
(snapshot=0x5f8247691740 <CurrentSnapshotData>) at predicate.c:1692
#12 0x00005f82472ee3b1 in GetTransactionSnapshot () at snapmgr.c:257
#13 0x00005f82470adfe8 in exec_simple_query (query_string=0x5f8248e06a28
"INSERT INTO t VALUES(42);") at postgres.c:1173
#14 0x00005f82470b3453 in PostgresMain (dbname=0x5f8248e030f0 "regression",
username=0x5f8248e3fe08 "test") at postgres.c:4793
#15 0x00005f82470aa5ac in BackendMain (startup_data=0x7fffdf93a054 "",
startup_data_len=4) at backend_startup.c:107
#16 0x00005f8246fd6a6a in postmaster_child_launch (child_type=B_BACKEND,
startup_data=0x7fffdf93a054 "", startup_data_len=4,
client_sock=0x7fffdf93a0a0) at launch_backend.c:274
#17 0x00005f8246fdc4ed in BackendStartup (client_sock=0x7fffdf93a0a0) at
postmaster.c:3420
#18 0x00005f8246fd9dd0 in ServerLoop () at postmaster.c:1653
#19 0x00005f8246fd96d0 in PostmasterMain (argc=1, argv=0x5f8248e01090) at
postmaster.c:1351
#20 0x00005f8246e9729e in main (argc=1, argv=0x5f8248e01090) at main.c:197

Assert would not be triggered in this case, because of:

(gdb) fr 8
#8 0x00005f82470988c2 in SerialAdd (xid=18204,
minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
928 slotno = SimpleLruReadPage(SerialSlruCtl,
targetPage, true, xid);
(gdb) p *serialControl
$1 = {headPage = 18, headXid = 18654, tailXid = 18557}

I'm not sure if this is the same problem.

Thanks to Alexandr Lakhin for help with reproducing the issue!

On Wed, Oct 16, 2024 at 10:33 PM PG Bug reporting form <
noreply(at)postgresql(dot)org> wrote:

> The following bug has been logged on the website:
>
> Bug reference: 18658
> Logged by: Andrew Bille
> Email address: andrewbille(at)gmail(dot)com
> PostgreSQL version: 17.0
> Operating system: Ubuntu 20.04
> Description:
>
> Hello!
>
> The following script against a server (master c259b1578) compiled with
> -DTEST_SUMMARIZE_SERIAL (without the define it's harder to reproduce...):
>
> echo "CREATE TABLE t(f1 int);" | psql test
>
> for i in seq 100; do
> echo "
> SET default_transaction_isolation to 'SERIALIZABLE';
> SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
> \gexec
> " | psql -q test &
>
> echo "
> SET default_transaction_isolation to 'SERIALIZABLE';
> SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
> \gexec
> " | psql -q test &
>
> wait
>
> grep 'TRAP:' server.log && break;
> done
>
> fails for me with the following stack trace:
> #5 0x000055e6aaf3fef5 in ExceptionalCondition
> (conditionName=conditionName(at)entry=0x55e6ab1232d1
> "TransactionIdIsValid(tailXid)", fileName=fileName(at)entry=0x55e6ab1231d9
> "predicate.c", lineNumber=lineNumber(at)entry=886) at assert.c:66
> #6 0x000055e6aadcde64 in SerialAdd (xid=976743,
> minConflictCommitSeqNo=18446744073709551615) at predicate.c:886
> #7 0x000055e6aadd0877 in SummarizeOldestCommittedSxact () at
> predicate.c:1521
> #8 0x000055e6aadd094f in GetSerializableTransactionSnapshotInt
> (snapshot=snapshot(at)entry=0x55e6ab32f3e0 <CurrentSnapshotData>,
> sourcevxid=sourcevxid(at)entry=0x0, sourcepid=sourcepid(at)entry=-1) at
> predicate.c:1789
> #9 0x000055e6aadd2a51 in GetSerializableTransactionSnapshot
> (snapshot=snapshot(at)entry=0x55e6ab32f3e0 <CurrentSnapshotData>) at
> predicate.c:1693
> #10 0x000055e6aaf868c4 in GetTransactionSnapshot () at snapmgr.c:257
> #11 0x000055e6aade2d9a in exec_simple_query
> (query_string=query_string(at)entry=0x55e6ab433fe0 "SELECT 'INSERT INTO t
> VALUES(42);' FROM generate_series(1, 100) g\n ") at postgres.c:1173
> #12 0x000055e6aade516f in PostgresMain (dbname=<optimized out>,
> username=<optimized out>) at postgres.c:4793
> #13 0x000055e6aadde301 in BackendMain (startup_data=<optimized out>,
> startup_data_len=<optimized out>) at backend_startup.c:107
> #14 0x000055e6aad2cbf8 in postmaster_child_launch
> (child_type=child_type(at)entry=B_BACKEND,
> startup_data=startup_data(at)entry=0x7ffc1ca820e4 "",
> startup_data_len=startup_data_len(at)entry=4,
> client_sock=client_sock(at)entry=0x7ffc1ca82120)
> at launch_backend.c:274
> #15 0x000055e6aad30e72 in BackendStartup
> (client_sock=client_sock(at)entry=0x7ffc1ca82120) at postmaster.c:3420
> #16 0x000055e6aad31092 in ServerLoop () at postmaster.c:1653
> #17 0x000055e6aad324e6 in PostmasterMain (argc=argc(at)entry=3,
> argv=argv(at)entry=0x55e6ab42e390) at postmaster.c:1351
> #18 0x000055e6aac48bd7 in main (argc=3, argv=0x55e6ab42e390) at main.c:197
>
> This is also reproduced on dafaa3efb
> Implement genuine serializable isolation level.
>
> ...
>
> Thanks, Andrew
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ba Jinsheng 2024-10-17 07:00:00 Re: Performance Issue on Query 18 of TPC-H Benchmark
Previous Message Fujii Masao 2024-10-17 04:19:52 Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity