Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-19 19:28:39
Message-ID: wizbgc4poefiprsm4xnu7bwxycpxn6mr56ys273ifgvbuoripd@orch7y33u7ot
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-19 13:20:17 -0400, Melanie Plageman wrote:
> On Tue, Mar 18, 2025 at 4:12 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Attached is v2.10,
>
> I noticed a few comments could be improved in 0011: bufmgr: Use AIO
> in StartReadBuffers()
> [...]

Yep.

> Above and in AsyncReadBuffers()
>
> * To support retries after short reads, the first operation->nblocks_done is
> * buffers are skipped.
>
> can't quite understand this

Heh, yea, it's easy to misunderstand. "short read" in the sense of a partial
read, i.e. a preadv() that only read some of the blocks, not all. I'm
replacing the "short" with partial.

(also removed the superfluous "is")

> * A secondary benefit is that this would allows us to measure the time in
> * pgaio_io_acquire() without causing undue timer overhead in the common,
> * non-blocking, case. However, currently the pgstats infrastructure
> * doesn't really allow that, as it a) asserts that an operation can't
> * have time without operations b) doesn't have an API to report
> * "accumulated" time.
> */
>
> allows->allow
>
> What would the time spent in pgaio_io_acquire() be reported as?

I'd report it as additional time for the IO we're trying to start, as that
wait would otherwise not happen.

> And what is "accumulated" time here? It seems like you just add the time to
> the running total and that is already accumulated.

Afaict there currently is no way to report a time delta to
pgstat. pgstat_count_io_op_time() computes the time since
pgstat_prepare_io_time(). Due to the assertions that time cannot be reported
for an operation with a zero count, we can't just do two
pgstat_prepare_io_time(); ...; pgstat_count_io_op_time();
twice, with the first one passing cnt=0.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-03-19 19:32:01 Re: optimize file transfer in pg_upgrade
Previous Message Matheus Alcantara 2025-03-19 19:25:54 Re: RFC: Additional Directory for Extensions