From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Subject: | Re: AIO v2.5 |
Date: | 2025-03-20 01:11:18 |
Message-ID: | 20250320011118.8d.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote:
> On 2025-03-19 14:25:30 -0700, Noah Misch wrote:
> > commit 55b454d wrote:
> > > aio: Infrastructure for io_method=worker
> >
> > > + /* Try to launch one. */
> > > + child = StartChildProcess(B_IO_WORKER);
> > > + if (child != NULL)
> > > + {
> > > + io_worker_children[id] = child;
> > > + ++io_worker_count;
> > > + }
> > > + else
> > > + break; /* XXX try again soon? */
> >
> > I'd change the comment to something like one of:
> >
> > retry after DetermineSleepTime()
> > next LaunchMissingBackgroundProcesses() will retry in <60s
>
> Hm, we retry more frequently that that if there are new connections... Maybe
> just "try again next time"?
Works for me.
> > On Tue, Mar 18, 2025 at 04:12:18PM -0400, Andres Freund wrote:
> > > Subject: [PATCH v2.10 08/28] bufmgr: Implement AIO read support
> >
> > Some comments about BM_IO_IN_PROGRESS may need updates. This paragraph:
> >
> > * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a
> > buffer to complete (and in releases before 14, it was accompanied by a
> > per-buffer LWLock). The process doing a read or write sets the flag for the
> > duration, and processes that need to wait for it to be cleared sleep on a
> > condition variable.
>
> First draft:
> * The BM_IO_IN_PROGRESS flag acts as a kind of lock, used to wait for I/O on a
> buffer to complete (and in releases before 14, it was accompanied by a
> per-buffer LWLock). The process start a read or write sets the flag. When the
s/start/starting/
> I/O is completed, be it by the process that initiated the I/O or by another
> process, the flag is removed and the Buffer's condition variable is signalled.
> Processes that need to wait for the I/O to complete can wait for asynchronous
> I/O to using BufferDesc->io_wref and for BM_IO_IN_PROGRESS to be unset by
s/to using/by using/
> sleeping on the buffer's condition variable.
Sounds good.
> > And these individual lines from "git grep BM_IO_IN_PROGRESS":
> >
> > * i.e at most one BM_IO_IN_PROGRESS bit is set per proc.
> >
> > The last especially.
>
> Huh - yea. This isn't a "new" issue, I think I missed this comment in 16's
> 12f3867f5534. I think the comment can just be deleted?
Hmm, yes, it's orthogonal to $SUBJECT and deletion works fine.
> > * I/O already in progress. We already hold BM_IO_IN_PROGRESS for the
> > * only one process at a time can set the BM_IO_IN_PROGRESS bit.
> > * only one process at a time can set the BM_IO_IN_PROGRESS bit.
>
> > For the other three lines and the paragraph, the notion
> > of a process "holding" BM_IO_IN_PROGRESS or being the process to "set" it or
> > being the process "doing a read" becomes less significant when one process
> > starts the IO and another completes it.
>
> Hm. I think they'd be ok as-is, but we can probably improve them. Maybe
Looking again, I agree they're okay.
>
> * Now it's safe to write buffer to disk. Note that no one else should
> * have been able to write it while we were busy with log flushing because
> * we got the exclusive right to perform I/O by setting the
> * BM_IO_IN_PROGRESS bit.
That's fine too. Maybe s/perform/stage/ or s/perform/start/.
> > I see this relies on md_readv_complete having converted "result" to blocks.
> > Was there some win from doing that as opposed to doing the division here?
> > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel easier
> > to follow, to me.
>
> It seemed like that would be wrong layering - what if we had an smgr that
> could store data in a compressed format? The raw read would be of a smaller
> size. The smgr API deals in BlockNumbers, only the md.c layer should know
> about bytes.
I hadn't thought of that. That's a good reason.
From | Date | Subject | |
---|---|---|---|
Next Message | Sutou Kouhei | 2025-03-20 01:24:55 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | David G. Johnston | 2025-03-20 01:10:33 | Re: Doc: Fixup misplaced filelist.sgml entities and add some commentary |