Re: Add notes to pg_combinebackup docs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Martín Marqués <martin(dot)marques(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add notes to pg_combinebackup docs
Date: 2024-04-25 16:16:33
Message-ID: CA+Tgmoa2RFshuG3bKudDSUvnhQ_7rKuo+sag2Bh4Aq0Er1mmNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 24, 2024 at 3:08 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> LGTM; only one small comment which you can ignore if you feel it's not worth
> the extra words.
>
> + <literal>pg_combinebackup</literal> when the checksum status of the
> + cluster has been changed; see
>
> I would have preferred that this sentence included the problematic period for
> the change, perhaps "..has been changed after the initial backup." or ideally
> something even better. In other words, clarifying that if checksums were
> enabled before any backups were taken this limitation is not in play. It's not
> critical as the link aptly documents this, it just seems like the sentence is
> cut short.

This was somewhat deliberate. The phraseology that you propose doesn't
exactly seem incorrect to me. However, consider the scenario where
someone takes a full backup A, an incremental backup B based on A, and
another incremental backup C based on B. Then, they combine A with B
to produce X, remove A and B, and later combine X with C. When we talk
about the "initial backup", are we talking about A or X? It doesn't
quite matter, in the end, because if X has a problem it must be
because A had a similar problem, and it's also sort of meaningless to
talk about when X was taken, because it wasn't ever taken from the
origin server; it was reconstructed. And it doesn't matter what was
happening on the origin server at the time it was reconstructed, but
rather what was happening on the origin server at the time its inputs
were taken. Or, in the case of its first input, that could also be a
reconstruction, in which case the time of that reconstruction doesn't
matter either; there can be any number of levels here.

I feel that all of this makes it a little murky to talk about what
happened after the "initial backup". The actual original full backup
need not even exist any more at the time of reconstruction, as in the
example above. Now, I think the user will probably still get the
point, but I also think they'll probably get the point without the
extra verbiage. I think that it will be natural for people to imagine
that what matters is not whether the checksum status has ever changed,
but whether it has changed within the relevant time period, whatever
that is exactly. If they have a more complicated situation where it's
hard to reason about what the relevant time period is, my hope is that
they'll click on the link and that the longer text they see on the
other end will help them think the situation through.

Again, this is not to say that what you're proposing is necessarily
wrong; I'm just explaining my own thinking.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-25 16:17:41 Re: Direct SSL connection with ALPN and HBA rules
Previous Message Jacob Champion 2024-04-25 16:16:16 Re: Direct SSL connection with ALPN and HBA rules