Quick Links

Re: AIO v2.5

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject:	Re: AIO v2.5
Date:	2025-03-20 01:11:18
Message-ID:	20250320011118.8d.nmisch@google.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote:
> On 2025-03-19 14:25:30 -0700, Noah Misch wrote:
> > commit 55b454d wrote:
> > > aio: Infrastructure for io_method=worker
> >
> > > + /* Try to launch one. */
> > > + child = StartChildProcess(B_IO_WORKER);
> > > + if (child != NULL)
> > > + {
> > > + io_worker_children[id] = child;
> > > + ++io_worker_count;
> > > + }
> > > + else
> > > + break; /* XXX try again soon? */
> >
> > I'd change the comment to something like one of:
> >
> > retry after DetermineSleepTime()
> > next LaunchMissingBackgroundProcesses() will retry in <60s
>
> Hm, we retry more frequently that that if there are new connections... Maybe
> just "try again next time"?

Works for me.

> > On Tue, Mar 18, 2025 at 04:12:18PM -0400, Andres Freund wrote:
> > > Subject: [PATCH v2.10 08/28] bufmgr: Implement AIO read support
> >
> > Some comments about BM_IO_IN_PROGRESS may need updates. This paragraph:
> >
> > * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a
> > buffer to complete (and in releases before 14, it was accompanied by a
> > per-buffer LWLock). The process doing a read or write sets the flag for the
> > duration, and processes that need to wait for it to be cleared sleep on a
> > condition variable.
>
> First draft:
> * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a
> buffer to complete (and in releases before 14, it was accompanied by a
> per-buffer LWLock). The process start a read or write sets the flag. When the
s/start/starting/
> I/O is completed, be it by the process that initiated the I/O or by another
> process, the flag is removed and the Buffer's condition variable is signalled.
> Processes that need to wait for the I/O to complete can wait for asynchronous
> I/O to using BufferDesc->io_wref and for BM_IO_IN_PROGRESS to be unset by
s/to using/by using/
> sleeping on the buffer's condition variable.

Sounds good.

> > And these individual lines from "git grep BM_IO_IN_PROGRESS":
> >
> > * i.e at most one BM_IO_IN_PROGRESS bit is set per proc.
> >
> > The last especially.
>
> Huh - yea. This isn't a "new" issue, I think I missed this comment in 16's
> 12f3867f5534. I think the comment can just be deleted?

Hmm, yes, it's orthogonal to $SUBJECT and deletion works fine.

> > * I/O already in progress. We already hold BM_IO_IN_PROGRESS for the
> > * only one process at a time can set the BM_IO_IN_PROGRESS bit.
> > * only one process at a time can set the BM_IO_IN_PROGRESS bit.
>
> > For the other three lines and the paragraph, the notion
> > of a process "holding" BM_IO_IN_PROGRESS or being the process to "set" it or
> > being the process "doing a read" becomes less significant when one process
> > starts the IO and another completes it.
>
> Hm. I think they'd be ok as-is, but we can probably improve them. Maybe

Looking again, I agree they're okay.

>
> * Now it's safe to write buffer to disk. Note that no one else should
> * have been able to write it while we were busy with log flushing because
> * we got the exclusive right to perform I/O by setting the
> * BM_IO_IN_PROGRESS bit.

That's fine too. Maybe s/perform/stage/ or s/perform/start/.

> > I see this relies on md_readv_complete having converted "result" to blocks.
> > Was there some win from doing that as opposed to doing the division here?
> > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel easier
> > to follow, to me.
>
> It seemed like that would be wrong layering - what if we had an smgr that
> could store data in a compressed format? The raw read would be of a smaller
> size. The smgr API deals in BlockNumbers, only the md.c layer should know
> about bytes.

I hadn't thought of that. That's a good reason.

In response to

Re: AIO v2.5 at 2025-03-19 22:17:37 from Andres Freund

Responses

Re: AIO v2.5 at 2025-03-20 18:54:14 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Sutou Kouhei	2025-03-20 01:24:55	Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message	David G. Johnston	2025-03-20 01:10:33	Re: Doc: Fixup misplaced filelist.sgml entities and add some commentary