restore-command error handling

From: Sebastiaan Mannem <sebas(at)mannem(dot)nl>
To: pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: restore-command error handling
Date: 2021-12-25 08:04:51
Message-ID: 7f385d14be564e5d8fecf9a3c201c8b5@mannem.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

this should probably be for pgsql-hackers, but https://www.postgresql.org/list/ mentioned 'You must try elsewhere first!', and this list was second best...

I wanted to point you to this github issue:

https://github.com/wal-g/wal-g/issues/1126

Basically, Postgres only knows of 3 types of return codes:

0: No problem, next WAL file...

1 - 125: End of timeline? Ok, lets stop recovery and go online

>=126: Ouch, big problem. Better not proceed, but error out with a FAIL instead

Looking at https://tldp.org/LDP/abs/html/exitcodes.html exit codes beyond 125 is all OS related.

Like 'Permission problem or command is not an executable', or 'Control-C is fatal error signal 2'.

I would assume that exit code 78 would be a better choice to distinguish errors for the restore_command which are not os-related, but still would be better ending in 'Ouch, big problem. Better not proceed, but error out with a FAIL instead'.

I think I will work on a fix for wal-g to better distinguish in exit codes, but since all I currently can do is exit with a code >= 126, I wanted to bring this to the postgres community too.

Furthermore, this is beyond wal-g, basically for everything that runs as a restore_command...

Would you consider another exit code to the list so that restore_commands don't need to exit with error codes that where meant to signal OS-level issues?

I wanted to end with this quote from the second link I pointed to:

Ending a script with exit 127 would certainly cause confusion when troubleshooting (is the error code a "command not found" or a user-defined one?).

However, many scripts use an exit 1 as a general bailout-upon-error.

Since exit code 1 signifies so many possible errors, it is not particularly useful in debugging.

Which to me is not just for 127, but for all exit codes beyond 125...

Thanks.

Browse pgsql-general by date

  From Date Subject
Next Message Lucas 2021-12-26 12:09:31 Wal files in /pgsql/14/main/pg_wal not removed
Previous Message Pavel Stehule 2021-12-24 05:35:00 Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL