Re: AIO v2.5

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-20 19:52:45
Message-ID: 20250320195245.b5.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 20, 2025 at 02:54:14PM -0400, Andres Freund wrote:
> On 2025-03-19 18:11:18 -0700, Noah Misch wrote:
> > On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote:
> > > On 2025-03-19 14:25:30 -0700, Noah Misch wrote:
> > > > I see this relies on md_readv_complete having converted "result" to blocks.
> > > > Was there some win from doing that as opposed to doing the division here?
> > > > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel easier
> > > > to follow, to me.
> > >
> > > It seemed like that would be wrong layering - what if we had an smgr that
> > > could store data in a compressed format? The raw read would be of a smaller
> > > size. The smgr API deals in BlockNumbers, only the md.c layer should know
> > > about bytes.
> >
> > I hadn't thought of that. That's a good reason.
>
> I thought that was better documented, but alas, it wasn't. How about updating
> the documentation of smgrstartreadv to the following:
>
> /*
> * smgrstartreadv() -- asynchronous version of smgrreadv()
> *
> * This starts an asynchronous readv IO using the IO handle `ioh`. Other than
> * `ioh` all parameters are the same as smgrreadv().
> *
> * Completion callbacks above smgr will be passed the result as the number of
> * successfully read blocks if the read [partially] succeeds. This maintains
> * the abstraction that smgr operates on the level of blocks, rather than
> * bytes.
> */

That's good. Possibly add "(Buffers for blocks not successfully read might
bear unspecified modifications, up to the full nblocks.)"

In a bit of over-thinking this, I wondered if shared_buffer_readv_complete
would be better named shared_buffer_smgrreadv_complete, to emphasize the
smgrreadv semantics. PGAIO_HCB_SHARED_BUFFER_READV likewise. But I tend to
think not. smgrreadv() has no "result" concept, so the symmetry is limited.

> I briefly had a bug in test_aio's injection point that lead to *increasing*
> the number of bytes successfully read. That triggered an assertion failure in
> bufmgr.c, but not closer to the problem. Is it worth adding an assert against
> that to md_readv_complete? Can't quite decide.

I'd lean yes, if in doubt.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-03-20 19:53:11 Re: md.c vs elog.c vs smgrreleaseall() in barrier
Previous Message David G. Johnston 2025-03-20 19:46:29 Have postgres.bki self-identify