Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Subject: Re: AIO v2.5
Date: 2025-03-08 02:11:15
Message-ID: qzxq6mqqozctlfcg2kg5744gmyubicvuehnp4a7up472thlvz2@y5xqgd5wcwhw
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Tom, CCed you since you have worked most on elog.c

On 2025-03-07 16:23:51 -0500, Andres Freund wrote:
> What about pg_io_handles?

While looking at the view I felt motivated to tackle the one FIXME in the
implementation of the view. Namely that the "error_desc" column wasn't
populated (the view did show that there was an error, but not what the error
was).

Which lead me down a sad sad rabbit hole, largely independent of AIO.

A bit of background:

For AIO completion callbacks can signal errors (e.g. a page header failing
validation). That error can be logged in the callback and/or raised later,
e.g. by the query that issued the IO.

AIO callbacks happen in critical sections, which is required to be able to use
AIO for WAL (see README.md for more details).

Currently errors are logged/raised by ereport()s in functions that gets passed
in an elevel, pretty standard.

A few of the ereports() use errcode_for_file_access() to translate an errno to
an sqlerrcode.

Now on to the problem:

The result of an ereport() can't be put into a view, obviously. I didn't think
it'd be good if the each kind of error needed to be implemented twice, once
with ereport() and once to just return a string to put in the view.

I tried a few things:

1) Use errsave() to allow delayed reporting of the error

I encountered a few problems:

- errsave() doesn't allow the log level to be specified, which means it can't
directly be used to LOG if no context is specified.

This could be worked around by always specifying the context, with
ErrorSaveContext.details_wanted = true and having generic code that changes
the elevel to whatever is appropriate and then using ThrowErrorData() to log the
message.

- ersave_start() sets assoc_context to CurrentMemoryContext and
errsave_finish() allocates an ErrorData copy in CurrentMemoryContext

This makes naive use of this approach when logging in a critical section
impossible. If ErrorSaveContext is not passed in an ERROR will be raised,
even if we just want to log. If ErrorSaveContext is used, we allocate
memory in the caller context, which isn't allowed in a critical section.

The only way I saw to work around that was to switch to ErrorContext before
calling errsave(). That's doable, the logging is called from one function
(pgaio_result_report()). That kinda works, but as a consequence we more than
double the memory usage in ErrorContext as errsave_finish() will palloc a
new ErrorData and ThrowErrorData() copies that ErrorData and all its string
back to ErrorContext.

2) Have the error callback format the error using a helper function instead of
using ereport()

Problems:

- errcode_for_file_access() would need to be reimplemented / split into a
function translating an errnode into an sqlerrcode without getting it from
the error data stack

- emitting the log message in a critical section would require either doing
the error formatting in ErrorContext or creating another context with
reserved memory to do so.

- allowing to specify DETAIL, HINT etc basically requires a small elog.c
interface reimplementation

3) Use pre_format_elog_string(), format_elog_string() similar to what guc.c
does for check hooks, via GUC_check_errmsg(), GUC_check_errhint() ...

Problems:

- Requires to duplicate errcode_for_file_access() for similar reason as in 2)

- Not exactly pretty

- Somewhat gnarly, but doable, to make use of %m safe, the way it's done in
guc.h afaict isn't safe:
pre_format_elog_string() is called for each of
GUC_check_{errmsg,errdetail,errhint}. As the global errno might get set
during the format_elog_string(), it'll not be the right one during
the next GUC_check_*.

4) Don't use ereport() directly, but instead put the errstart() in
pgaio_result_report(), before calling the error description callback.

When emitting a log message, call errfinish() after the callback. For the
view, get the message out via CopyErrorData() and free the memory again
using FlushErrorState

Problems:

- Seems extremely hacky

I implemented all, but don't really like any of them.

Unless somebody has a better idea or we agree that one of the above is
actually a acceptable approach, I'm inclined to simply remove the column
containing the description of the error. The window in which one could see an
IO with an error is rather short most of the time anyway and the error will
also be logged.

It's a bit annoying that adding the column later would require revising the
signature of the error reporting callback at that time, but I think that
degree of churn is acceptable.

The main reason I wanted to write this up is that it seems that we're just
lacking some infrastructure here.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2025-03-08 02:36:31 Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
Previous Message Michael Paquier 2025-03-08 01:57:38 Re: per backend WAL statistics