Re: .ready and .done files considered harmful

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: .ready and .done files considered harmful
Date: 2024-11-13 16:05:21
Message-ID: 202411131605.m66syq5i5ucl@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, sorry for necro-posting here:

On 2021-May-03, Robert Haas wrote:

> I and various colleagues of mine have from time to time encountered
> systems that got a bit behind on WAL archiving, because the
> archive_command started failing and nobody noticed right away.

We've recently had a couple of cases where the archiver hasn't been able
to keep up on systems running 13 and 14 because of this problem, causing
serious production outages. Obviously that's not a great experience. I
understand that this has been significantly improved in branch 15 by
commit beb4e9ba1652, the fix in this thread; we hypothesize that both
these production problems wouldn't have occurred, if the users had been
running the optimized pgarch.c code.

However, that commit was not backpatched. I think that was the correct
decision at the time, because it wasn't a trivial fix. It was
significantly modified by 1fb17b190341 a month later, both to fix a
critical bug as well as to make some efficiency improvements.

Now that the code has been battle-tested, I think we can consider
putting it into the older branches. I did a quick cherry-pick
experiment, and I found that it backpatches cleanly to 14. It doesn't
to 13, for lack of d75288fb27b8, which is far too invasive to backpatch,
and I don't think we should rewrite the code so that it works on the
previous state. Fortunately 13 only has one more year to live, so I
don't feel too bad about leaving it as is.

So, my question now is, would there be much opposition to backpatching
beb4e9ba1652 + 1fb17b190341 to REL_14_STABLE?

(On the other hand, we can always blame users for failing to implement
WAL archiving "correctly" ... but from my perspective, this is an
embarrasing Postgres problem, and one that's relatively easy to solve
with very low risk.)

Thanks,

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-11-13 16:06:56 Re: Fix for pageinspect bug in PG 17
Previous Message Tomas Vondra 2024-11-13 15:59:49 Re: Fix for pageinspect bug in PG 17