From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at> |
Subject: | Re: AIO v2.5 |
Date: | 2025-04-25 16:44:14 |
Message-ID: | 5luvzovhivuhxfr7sws237zr35qbilpz6ibdl2tfbvvsi6svsy@il4rlo5jxhab |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-04-15 21:00:00 +0300, Alexander Lakhin wrote:
> Please take a look also at the simple reproducer for the crash inside
> pg_get_aios() I mentioned upthread:
> for i in {1..100}; do
> numjobs=12
> echo "iteration $i"
> date
> for ((j=1;j<=numjobs;j++)); do
> ( createdb db$j; for k in {1..300}; do
> echo "CREATE TABLE t (a INT); CREATE INDEX ON t (a); VACUUM t;
> SELECT COUNT(*) >= 0 AS ok FROM pg_aios; " \
> | psql -d db$j >/dev/null 2>&1;
> done; dropdb db$j; ) &
> done
> wait
> psql -c 'SELECT 1' || break;
> done
>
> it fails for me as follows:
> iteration 20
> Tue Apr 15 07:21:29 PM EEST 2025
> dropdb: error: connection to server on socket "/tmp/.s.PGSQL.55432" failed: No such file or directory
> Is the server running locally and accepting connections on that socket?
> ...
> 2025-04-15 19:21:30.675 EEST [3111699] LOG: client backend (PID 3320979) was terminated by signal 11: Segmentation fault
> 2025-04-15 19:21:30.675 EEST [3111699] DETAIL: Failed process was running: SELECT COUNT(*) >= 0 AS ok FROM pg_aios;
> 2025-04-15 19:21:30.675 EEST [3111699] LOG: terminating any other active server processes
Thanks for that. The bug turns out to be pretty stupid - pgaio_io_reclaim()
resets the fields in PgAioHandle *before* updating the generation/state. That
opens up a window in which pg_get_aios() thinks the copied PgAioHandle is
valid, even though it was taken while the fields were being reset.
Once I had figured that out, it was easy to make it more reproducible - put a
pg_usleep() between the fields being reset in pgaio_io_reclaim() and the
generation increase / state update.
The fix is simple, increment generation and state before resetting fields.
Will push the fix for that soon.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-04-25 17:07:32 | Re: Summarizing indexes allowing single-phase VACUUM? |
Previous Message | Masahiko Sawada | 2025-04-25 16:26:40 | Re: Fix slot synchronization with two_phase decoding enabled |