Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-04-04 19:16:18
Message-ID: s5bbhmfez5pe46yfvqkxkg72zsc2r3knkvapqyz37algbfw2sf@q7jcviovnalu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for the slow work on this. The cycle times are humonguous due to
valgrind being so slow...

On 2025-04-03 12:40:23 -0700, Noah Misch wrote:
> On Thu, Apr 03, 2025 at 02:19:43PM -0400, Andres Freund wrote:
> > The best fix for that one would, I think, be to have method_io_uring() iterate
> > over the IOV and mark the relevant regions as defined? That does fix the
> > issue at least and does seem to make sense?
>
> Makes sense. Valgrind knows that read() makes its target bytes "defined". It
> probably doesn't have an io_uring equivalent for that.

Correct - and I think it would be nontrivial to add, because there's not easy
syscall to intercept...

> I expect we only need this for local buffers, and it's unclear to me how the
> fix for (4a) didn't fix this.

At that time I didn't apply the fix in 4a) to local buffers, because local
buffers, in HEAD, don't have the valgrind integration. Without that marking
the buffer as NOACCESS would cause all sorts of issues, because it'd be
considered inaccessible even after pinning. As you analyzed, that then ends
up considered undefined due to the MemoryContextAlloc().

> In the general case, we could want client requests as follows:
>
> - If completor==definer and has not dropped pin:
> - Make defined before verifying page. That's all. It might be cleaner to
> do this when first retrieving a return value from io_uring, since this
> just makes up for what Valgrind already does for readv().

Yea, I think it's better to do that in io_uring. It's what I have done in the
attached.

> - If completor!=definer or has dropped pin:
> - Make NOACCESS in definer when definer cedes its own pin.

That's the current behaviour for shared buffers, right?

> - For io_method=worker, make UNDEFINED before starting readv(). It might be
> cleanest to do this when the worker first acts as the owner of the AIO
> subsystem pin, if that's a clear moment earlier than readv().

Hm, what do we need this for?

> - Make DEFINED in completor before verifying page. It might be cleaner to
> do this when the completor first retrieves a return value from io_uring,
> since this just makes up for what Valgrind already does for readv().

I think we can't rely on the marking during retrieving it from io_uring, as
that might have happened in a different backend for a temp buffer. That'd only
happen if we got io_uring events for *another* IO that involved a shared rel,
but it can happen.

> > Not quite sure if we should mark
> > the entire IOV is efined or just the portion that was actually read - the
> > latter is additional fiddly code, and it's not clear it's likely to be helpful?
>
> Seems fine to do the simpler way if that saves fiddly code.

Can't quite decide, it's just at the border of what I consider too
fiddly... See the change to method_io_uring.c in the attached patch.

> > Questions:
> >
> > 1) It'd be cleaner to implement valgrind support in localbuf.c, so we don't
> > need to have special-case logic for that. But it also makes the change less
> > localized and more "impactful", who knows what kind of skullduggery we have
> > been getting away with unnoticed.
> >
> > I haven't written the code up yet, but I don't think it'd be all that much
> > code to add valgrind support to localbuf.
>
> It would be the right thing long-term, and it's not a big deal if it causes
> some false positives initially. So if you're leaning that way, that's good.

It was easy enough.

I saw one related failure, FlushRelationBuffers() didn't pin temporary buffers
before flushing them. Pinning the buffers fixed that.

I don't think it's a real problem to not pin the local buffer during
FlushRelationBuffers(), at least not today. But it seems unnecessarily odd to
not pin it.

I wish valgrind had a way to mark the buffer as inaccessible and then
accessible again, without loosing the defined-ness information...

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-localbuf-Add-Valgrind-buffer-access-instrumentati.patch text/x-diff 3.2 KB
v1-0002-aio-Make-AIO-compatible-with-valgrind.patch text/x-diff 7.4 KB
v1-0003-aio-Avoid-spurious-coverity-warning.patch text/x-diff 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-04-04 19:25:57 Re: Draft for basic NUMA observability
Previous Message Robert Haas 2025-04-04 19:14:20 Re: Reduce "Var IS [NOT] NULL" quals during constant folding