From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Subject: | Re: AIO v2.5 |
Date: | 2025-03-24 22:55:22 |
Message-ID: | havtpn6xuwpybrveymzvvym4cqi37jxhcsfququk4lioyasvsq@eitpmihs2akh |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-24 11:43:47 -0400, Andres Freund wrote:
> > I didn't write a test to prove it, but I'm suspecting we'll reach the above
> > ERROR with this sequence:
> >
> > CREATE TEMP TABLE foo ...;
> > [some command that starts reading a block of foo into local buffers, then ERROR with IO ongoing]
> > DROP TABLE foo;
>
> That seems plausible. I'll try to write a test after this email.
FWIW, a test did indeed confirm that. Luckily:
> > DropRelationAllLocalBuffers() calls InvalidateLocalBuffer(bufHdr, true). I
> > think we'd need to do like pgaio_shutdown() and finish all IOs (or all IOs for
> > the particular rel) before InvalidateLocalBuffer(). Or use something like the
> > logic near elog(ERROR, "buffer is pinned in InvalidateBuffer") in
> > corresponding bufmgr code.
>
> Just waiting for the IO in InvalidateBuffer() does seem like the best bet to
> me.
This did indeed resolve the issue.
I've extended the testsuite to test for that and a bunch more things. Working
on sending out a new version...
> > We call read_stream_get_block() while in batchmode, so the stream callback
> > needs to be ready for that. A complicated case is
> > collect_corrupt_items_read_stream_next_block(), which may do its own buffer
> > I/O to read in a vmbuffer for VM_ALL_FROZEN(). That's feeling to me like a
> > recipe for corner cases reaching ERROR "starting batch while batch already in
> > progress". Are there mitigating factors?
>
> Ugh, yes, you're right. heap_vac_scan_next_block() is also affected.
>
> I don't think "starting batch while batch already in progress" is the real
> issue though - it seems easy enough to avoid starting another batch inside,
> partially because current cases seem unlikely to need to do batchable IO
> inside. What worries me more is that code might block while there's
> unsubmitted IO - which seems entirely plausible.
>
>
> I can see a few approaches:
>
> 1) Declare that all read stream callbacks have to be careful and cope with
> batch mode
>
> I'm not sure how viable that is, not starting batches seems ok, but
> ensuring that the code doesn't block is a different story.
>
>
> 2) Have read stream users opt-in to batching
>
> Presumably via a flag like READ_STREAM_USE_BATCHING. That'd be easy enough
> to implement and to add to the callsites where that's fine.
>
>
> 3) Teach read stream to "look ahead" far enough to determine all the blocks
> that could be issued in a batch outside of batchmode
>
> I think that's probably not a great idea, it'd lead us to looking further
> ahead than we really need to, which could increase "unfairness" in
> e.g. parallel sequential scan.
>
>
> 4) Just defer using batch mode for now
>
> It's a nice win with io_uring for random IO, e.g. from bitmap heap scans ,
> but there's no need to immediately solve this.
>
>
> I think regardless of what we go for, it's worth splitting
> "aio: Basic read_stream adjustments for real AIO"
> into the actually basic parts (i.e. introducing sync_mode) from the not
> actually so basic parts (i.e. batching).
>
>
> I suspect that 2) would be the best approach. Only the read stream user knows
> what it needs to do in the callback.
I still think 2) would be the best option.
Writing a patch for that.
If a callback may sometimes need to block, it can still opt into
READ_STREAM_USE_BATCHING, by submitting all staged IO before blocking.
The hardest part is to explain the flag. Here's my current attempt:
/* ---
* Opt-in to using AIO batchmode.
*
* Submitting IO in larger batches can be more efficient than doing so
* one-by-one, particularly for many small reads. It does, however, require
* the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
* batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not:
* a) block without first calling pgaio_submit_staged(), unless a
* to-be-waited-on lock cannot be part of a deadlock, e.g. because it is
* never acquired in a nested fashion
* b) directly or indirectly start another batch pgaio_enter_batchmode()
*
* As this requires care and is nontrivial in some cases, batching is only
* used with explicit opt-in.
* ---
*/
#define READ_STREAM_USE_BATCHING 0x08
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2025-03-24 23:02:31 | Re: Vacuum statistics |
Previous Message | Tom Lane | 2025-03-24 22:45:54 | Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |