Re: AIO v2.5

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-04-13 06:00:01
Message-ID: 96abefe8-fa72-41f5-8840-0517125c24e3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Andres,

07.04.2025 22:10, Alexander Lakhin wrote:
>> I ran it for a while in a VM, it hasn't triggered yet. Neither on xfs nor on
>> tmpfs.
>
> Before sharing the script I tested it on two my machines, but I had
> anticipated that the error can be hard to reproduce. Will try to reduce
> the reproducer...

I've managed to reduce it to the following:
ulimit -n 4096

echo "
fsync = off
autovacuum = off

checkpoint_timeout = 30s

io_max_concurrency = 10
io_method = io_uring
" >> $PGDATA/postgresql.conf

pg_ctl -l server.log start

for i in `seq 1000`; do
  numjobs=$((20 + $RANDOM % 60))
  echo "iteration $i (jobs: $numjobs)"
  date
  for ((j=1;j<=numjobs;j++)); do
    (
      createdb db$j;
      for ((n=1;n<=50;n++)); do
        cat << EOF | psql -d db$j -a >>/dev/null 2>&1
DROP TABLE IF EXISTS tenk1;
CREATE TABLE tenk1 (
    unique1     int4,
    unique2     int4,
    two         int4,
    four        int4,
    ten         int4,
    twenty      int4,
    hundred     int4,
    thousand    int4,
    twothousand int4,
    fivethous   int4,
    tenthous    int4,
    odd         int4,
    even        int4,
    stringu1    name,
    stringu2    name,
    string4     name
);
COPY tenk1 FROM '.../src/test/regress/data/tenk.data';
EOF
      done;
    ) &
  done
  wait

  for ((j=1;j<=numjobs;j++)); do dropdb db$j & done
  wait
  grep -A3 -E '(ERROR|could not read blocks )' server.log && break;
done

pg_ctl stop

It fails for me as below:
iteration 13 (jobs: 25)
Sun Apr 13 05:31:47 AM UTC 2025
iteration 14 (jobs: 67)
Sun Apr 13 05:31:50 AM UTC 2025
dropdb: error: database removal failed: ERROR:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-13 05:31:58.930 UTC [1153451] LOG:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-13 05:31:58.930 UTC [1153451] CONTEXT:  completing I/O on behalf of process 1153456
2025-04-13 05:31:58.930 UTC [1153451] STATEMENT:  DROP DATABASE db5;
2025-04-13 05:31:58.930 UTC [1153456] ERROR:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-13 05:31:58.930 UTC [1153456] STATEMENT:  DROP DATABASE db6;
2025-04-13 05:31:58.931 UTC [1034758] LOG:  checkpoint complete: wrote 3 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=18 kB, estimate=458931 kB; lsn=16/54589E08, redo lsn=16/54586F88
2025-04-13 05:31:58.931 UTC [1034758] LOG:  checkpoint starting: immediate force wait

I reproduced this error on three different machines (all are running
Ubuntu 24.04, two with kernel version 6.8, one with 6.11), with PGDATA
located on tmpfs.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Luzanov 2025-04-13 06:47:34 Re: Adding error messages to a few slash commands
Previous Message Tom Lane 2025-04-13 05:29:00 Re: Adding error messages to a few slash commands