From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at> |
Subject: | Re: AIO v2.5 |
Date: | 2025-04-15 18:00:00 |
Message-ID: | 062daca9-dfad-4750-9da8-b13388301ad9@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Andres,
14.04.2025 19:06, Andres Freund wrote:
> Unfortunately I'm several hundred iterations in, without reproducing the
> issue. I'm bad at statistics, but I think that makes it rather unlikely that I
> will, without changing some aspect.
>
> Was this an assert enabled build? What compiler and what optimization settings
> did you use? Do you have huge pages configured (so that the default
> huge_pages=try would end up with huge pages)?
Yes, I used --enable-cassert; no explicit optimization setting and no huge
pages configured. pg_config says:
CONFIGURE = '--enable-debug' '--enable-cassert' '--enable-tap-tests' '--with-liburing'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2
Please look at the complete script attached. I've just run it and got:
iteration 56 (jobs: 44)
Tue Apr 15 06:30:52 PM CEST 2025
dropdb: error: database removal failed: ERROR: could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-15 18:31:00.650 CEST [1612266] LOG: could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-15 18:31:00.650 CEST [1612266] CONTEXT: completing I/O on behalf of process 1612271
2025-04-15 18:31:00.650 CEST [1612266] STATEMENT: DROP DATABASE db3;
I used gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, but now I've also
reproduced the issue with CC=clang (18.1.3 (1ubuntu1)).
Please take a look also at the simple reproducer for the crash inside
pg_get_aios() I mentioned upthread:
for i in {1..100}; do
numjobs=12
echo "iteration $i"
date
for ((j=1;j<=numjobs;j++)); do
( createdb db$j; for k in {1..300}; do
echo "CREATE TABLE t (a INT); CREATE INDEX ON t (a); VACUUM t;
SELECT COUNT(*) >= 0 AS ok FROM pg_aios; " \
| psql -d db$j >/dev/null 2>&1;
done; dropdb db$j; ) &
done
wait
psql -c 'SELECT 1' || break;
done
it fails for me as follows:
iteration 20
Tue Apr 15 07:21:29 PM EEST 2025
dropdb: error: connection to server on socket "/tmp/.s.PGSQL.55432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
...
2025-04-15 19:21:30.675 EEST [3111699] LOG: client backend (PID 3320979) was terminated by signal 11: Segmentation fault
2025-04-15 19:21:30.675 EEST [3111699] DETAIL: Failed process was running: SELECT COUNT(*) >= 0 AS ok FROM pg_aios;
2025-04-15 19:21:30.675 EEST [3111699] LOG: terminating any other active server processes
>> I reproduced this error on three different machines (all are running
>> Ubuntu 24.04, two with kernel version 6.8, one with 6.11), with PGDATA
>> located on tmpfs.
> That's another variable to try - so far I've been trying this on 6.15.0-rc1
> [1]. I guess I'll have to set up a ubuntu 24.04 VM and try with that.
>
> Greetings,
>
> Andres Freund
>
>
> [1] I wanted to play with io_uring changes that were recently merged. Namely
> support for readv/writev of "fixed" buffers. That avoids needing to pin/unpin
> buffers while IO is ongoing, which turns out to be a noticeable bottleneck in
> some workloads, particularly when using 1GB huge pages.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Attachment | Content-Type | Size |
---|---|---|
repro.tar.gz | application/gzip | 982 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2025-04-15 18:02:13 | Re: [PoC] Federated Authn/z with OAUTHBEARER |
Previous Message | James Hunter | 2025-04-15 17:58:50 | Re: BitmapHeapScan streaming read user and prelim refactoring |