Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-04-07 16:20:15
Message-ID: 4nervqmqplfr23jrjvkp5tsumi6qgouhgjqlubf7ujrudw2epb@6mszddainl4u
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-06 23:00:00 +0300, Alexander Lakhin wrote:
> 02.04.2025 14:58, Andres Freund wrote:
> When running multiple installcheck's against a single server (please find
> the ready-to-use script attached (I use more sophisticated version with
> additional patches to make installcheck pass cleanly, but that's not
> required for this case)), I've encountered an interesting error related to
> AIO/uring:
> iteration 8: Sun Apr  6 19:22:39 UTC 2025
> installchecks finished: Sun Apr  6 19:23:47 UTC 2025
> 2025-04-06 19:22:44.216 UTC [349525] LOG:  could not read blocks 0..0 in file "base/6179194/2606": Operation canceled
> 2025-04-06 19:22:44.216 UTC [349525] ERROR:  could not read blocks 0..0 in file "base/6179194/2606": Operation canceled

Thanks for the report, clearly something isn't right.

> It's reproduced better on tmpfs for me; probably you would need to increase
> NUM_INSTALLCHECKS/NUM_ITERATIONS for your machine.

I ran it for a while in a VM, it hasn't triggered yet. Neither on xfs nor on
tmpfs.

> server.log contains:
> 2025-04-06 19:22:44.215 UTC [38231] LOG:  checkpoint complete: wrote ...
> 2025-04-06 19:22:44.216 UTC [38231] LOG:  checkpoint starting: immediate force wait flush-all
> 2025-04-06 19:22:44.216 UTC [349525] LOG:  could not read blocks 0..0 in file "base/6179194/2606": Operation canceled
> 2025-04-06 19:22:44.216 UTC [349525] STATEMENT:  alter table parted_copytest
> attach partition parted_copytest_a1 for values in(1);
> 2025-04-06 19:22:44.216 UTC [349525] ERROR:  could not read blocks 0..0 in file "base/6179194/2606": Operation canceled
> 2025-04-06 19:22:44.216 UTC [349525] STATEMENT:  alter table parted_copytest
> attach partition parted_copytest_a1 for values in(1);

Hm. Does the failure vary between occurrences?
- is it always the same statement? Probably not?
- is it always 2606 (i.e. pg_constraint)?
- does the failure always happen around a checkpoint? If so, is it always
immediate?
- I do assume it's always ECANCELED?

> I can reduce the testing procedure to something trivial, if it makes sense
> for you. Probably, the same effect can be also achieved with just pgbench...

That'd be very helpful!

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-04-07 16:23:06 Re: Logging which local address was connected to in log_line_prefix
Previous Message Tom Lane 2025-04-07 15:59:48 Re: Logging which local address was connected to in log_line_prefix