Re: AIO v2.3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Subject: Re: AIO v2.3
Date: 2025-02-12 18:00:22
Message-ID: CA+TgmoZ4ga25sa+08G7N7Y_40r1xT+tXH9mfNOv=OkuOz2xTXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 11, 2025 at 4:43 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Alternatively we could make pgaio_batch_begin() basically start a critical
> section, but that doesn't seem like a good idea, because too much that needs
> to happen around buffered IO isn't compatible with critical sections.

A critical section sounds like a bad plan.

> Does anybody see a need for batches to be nested? I'm inclined to think that
> that would be indicative of bugs and should therefore error/assert out.

I can imagine somebody wanting to do it, but I think we can just say
no. I mean, it's no different from WAL record construction. There's no
theoretical reason you couldn't want to concurrently construct
multiple WAL records and then submit them one after another, but if
you want to do that, you have to do your own bookkeeping. It seems
fine to apply the same principle here.

> One way we could avoid the need for a mechanism to reset-batch-in-progress
> would be to make batch submission controlled by a flag on the IO. Something
> like
> pgaio_io_set_flag(ioh, PGAIO_HF_BATCH_SUBMIT)
>
> IFF PGAIO_HF_BATCH_SUBMIT is set, the IOs would need to be explicitly
> submitted using something like the existing
> pgaio_submit_staged();
> (although renaming it to something with batch in the name might be
> appropriate)
>
> That way there's no explicit "we are in a batch" state that needs to be reset
> in case of errors.

I'll defer to Thomas or others on whether this is better or worse,
because I don't know. It means that the individual I/Os have to know
that they are in a batch, which isn't necessary with the begin/end
batch interface. But if we're expecting that to happen in a pretty
confined amount of code -- similar to WAL record construction -- then
that might not be a problem anyway.

I think if you don't do this, I'd do (sub)xact callbacks rather than a
resowner integration, unless you decide you want to support multiple
concurrent batches. You don't really need or want to tie it to a
resowner unless there are multiple objects each of which can have its
own resources.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-02-12 18:11:07 Re: explain analyze rows=%.0f
Previous Message Tomas Vondra 2025-02-12 17:52:50 Re: Parallel CREATE INDEX for GIN indexes