Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-04-07 19:24:42
Message-ID: ftg76ptgts4h353aztjcdoq2kc2ournuxqeypd4t5aovbm4mxz@qur2qrt2k3oy
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-05 06:43:52 -0700, Noah Misch wrote:
> Yeah. Maybe this (untested):

Something like that works. I adopted your formulation of this, mine was in
GetLocalVictimBuffer(), which seems slightly less future proof.

> > > If that's right, it would still be nice to reach the right
> > > VALGRIND_MAKE_MEM_DEFINED() without involving bufmgr.
> >
> > I think that would be possible if we didn't do VALGRIND_MAKE_MEM_NOACCESS() in
> > UnpinBuffer()/UnpinLocalBuffer(). But with that I don't see how we can avoid
> > needing to remark the region as accessible?
>
> Yes, it's not that we should remove VALGRIND_MAKE_MEM_DEFINED() from bufmgr.
> I was trying to think about future AIO callers (e.g. RelationCopyStorage())
> and how they'd want things to work.

Ah, I now understand what you mean. I'm inclined to leave that out for now,
we can do that later. I spent a bit of time experimenting with going bigger,
but I think it's important to get skink a bit less red.

> That said, perhaps we should just omit the io_uring-level Valgrind calls and
> delegate the problem to higher layers until there's a concrete use case:

Yes, I think that's the right answer for now. Applied your suggested comment.

I looked for a good place to add a comment to method_io_uring.c, but couldn't
really come up with anything convincing. Then I thought the
* "Start" routines for individual IO operations
comment in aio_io.c might be a good place, but also failed to come up with
anything particularly convincing.

> > > - A complete_local callback solves those problems. However, if the
> > > AIO-defining subxact aborted, then we shouldn't set DEFINED at all, since
> > > the buffer mapping may have changed by the time of complete_local.
> >
> > I don't think that is possible, due to the aio subsystem owned pin?
>
> We currently drop the shared buffer AIO subsystem pin in complete_shared
> (buffer_readv_complete_one() -> TerminateBufferIO()). I was trying to say
> that if we put a VALGRIND_MAKE_MEM_DEFINED in complete_local without changing
> anything else, it would have this problem.

Ah, yes, I see what you mean.

> We probably wouldn't want to move the shared buffer pin drop to
> complete_local without a strong reason.

Agreed.

> > I think the point about non-aio uses is a fair one, but I don't quite know how
> > to best solve it right now, due to the local buffer issue you mentioned. I'd
> > guess that we'd best put it somewhere
> > a) in pgaio_io_process_completion(), if definer==completor || !PGAIO_HF_REFERENCES_LOCAL
> > b) pgaio_io_call_complete_local(), just before calling
> > pgaio_io_call_complete_local() if PGAIO_HF_REFERENCES_LOCAL
>
> I think that would do nothing wrong today, but it uses
> !PGAIO_HF_REFERENCES_LOCAL as a proxy for "target memory is already DEFINED,
> or a higher layer will make it DEFINED". While !PGAIO_HF_REFERENCES_LOCAL
> does imply that today, I think that's a bufmgr-specific conclusion. I have no
> particular reason to expect that to hold for future AIO use cases. As above,
> I'd be inclined to omit the io_uring-level Valgrind calls until we have a
> concrete use case to drive their design. How do you see it?

Yea, I'm not sure either. I think we'll best wait until we have non-bufmgr
AIO to address all this. That'll make it easier to shake out.

I think eventually we should do an explicit
- VALGRIND_CHECK_MEM_IS_ADDRESSABLE() in pgaio_io_start_readv

- VALGRIND_CHECK_MEM_IS_DEFINED() in pgaio_io_start_writev

- VALGRIND_MAKE_MEM_DEFINED() in when a READV completes, although some of the
details of when/where to do that aren't entirely clear to me yet. I suspect
we might have to do it in pgaio_io_start_readv(), because it's harder to
reliably do it later

I've pushed the patches. Thanks for the discussion, somehow the mix of shared
memory with valgrind tracking accessibility/definedness in a process local
manner is somewhat mindbending.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-04-07 19:24:43 Re: BAS_BULKREAD vs read stream
Previous Message Andres Freund 2025-04-07 19:16:48 Re: pgsql: Allow NOT NULL constraints to be added as NOT VALID