Re: Something else about Redo Logs disappearing

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Something else about Redo Logs disappearing
Date: 2020-06-14 13:05:15
Message-ID: CABUevEwVg7MuUySqeGkQ7uJKB2LWBkEgJj+qDeLWndMZpoY7Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Jun 13, 2020 at 10:13 PM Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org> wrote:

> On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote:
> ! > Okay. So lets behave like professional people and figure how that
> ! > can be achieved:
> ! > At first, we drop that WAL requirement, because with WAL archiving
> ! > it is already guaranteed that an unbroken chain of WAL is always
> ! > present in the backup (except when we have a bug like the one that
> ! > lead to this discussion).
> ! > So this is **not part of the scope**.
> ! >
> !
> ! I would assume that anybody who deals with backups professionally
> wouldn't
> ! consider that out of scope,
>
> I strongly disagree. I might suppose You haven't thought this to the
> proper end. See:
>

You may disagree, but I would argue that this is because you are the one
who has not thought it through. But hey, let's agree to disagree.

You can see that all the major attributes (scheduling, error-handling,
> signalling, ...) of a WAL backup are substantially different to that
> of any usual backup.

This is a different *Class* of backup object, therefore it needs an
> appropriate infrastructure that can handle these attributes correctly.
>

Yes, this is *exactly* why special-handling the WAL during the base backup
makes a lot of sense.

Is it required? No.
Will it make your backups more reliable? Yes.

But it depends on what your priorities are.

But, if You never have considered *continuous* archiving, and only
> intend to take a functional momentarily backup of a cluster, then You
> may well have never noticed these differences. I noticed them mainly
> because I did *BUILD* such an infrastructure (the 20 lines of shell
> script, you know).
>

Yes, if you take a simplistic view of your backups, then yes.

And yes, I was indeed talking about *professional* approaches.
>

Sure.

! There is *absolutely* no need for threading to use the current APIs. You
> ! need to run one query, go do something else, and then run another
> ! query.
>
> Wrong. The point is, I dont want to "go do something else", I have to
> exit() and get back to the initiator at that place.
>

That is not a requirement of the current PostgreSQL APIs. (in fact, using
threading would add a significant extra burden there, as libpq does not
allow sharing of connections between threads)

That is a requirement, and indeed a pretty sharp limitation, of the *other*
APIs you are working with, it sounds like.

The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do
they require any form of threading.

And the fact that you need to do an exit() would negate any threading
anyway, so that seems to be a false argument regardless.

This is also clearly visible in Laurenz' code: he utilizes two
> unchecked background tasks (processes, in this case) with loose
> coupling for the purpose, as it does not work otherwise.
>

Yes, because he is also trying to work around a severely limited API *on
the other side*.

There's plenty of backup integrations that don't have this limitation. They
all work perfectly fine with no need for exit() and certainly no weird need
for special threading.

The most interesting point in there appears to be this:
> > that the backup label and tablespace map files are not written to
> > disk. Instead, their would-be contents are returned in *labelfile
> > and *tblspcmapfile,
>
> This is in do_pg_start_backup() - so we actually HAVE this data
> already at the *START* time of the backup!

> Then why in hell do we wait until the END of the backup before we
> hand this data to the operator: at a time when the DVD with the
>

Because it cannot be safely written *into the data directory*.

Now, it could be written *somewhere else*, that is true. And then you would
add an extra step at restore time to rename it back. But then your restore
would now also require a plugin.

(

> backup is already fixated and cannot be changed anymore, so that
>

You don't need to change the the backup, only append to it. If you are
calling pg_stop_backup() at a time when that is no longer possible, then
you are calling pg_stop_backup() at the wrong time.

As I can read, there is no difference in the function requirements
> between exclusive and non-exclusive mode, in that regard: the
> backup-label file is NOT necessary in the running cluster data tree,
> BUT it should get into the RESTORED data tree before starting it.
>

Correct. It is in fact actively harmful in the running cluster data tree.

And I can't find a single one of those "big problems". What I do find
> is just people whining that their cluster doesn't start and they can't
> simply delete a file, even if told so. Like soldier complaining that
> his gun doesn't shoot and he has no idea how to reload.
>

Have you actually tried it? Or dealt with the many people who have run into
corruption around this?

Again, as suggested before, review the discussions that led up to the
changes. There are plenty of examples there.

! > I now hope very much that Magnus Hagander will tell some of the
> ! > impeding "failure scenarios", because I am getting increasingly
> ! > tired of pondering about probable ones, and searching the old
> ! > list entries for them, without finding something substantial.
>
> ! Feel free to look at the mailinglist archives. Many of them have been
> ! explained there before. Pay particular attention to the threads around
> when
> ! the deprecated APIs were actually deprecaed.
>
> I *DID* read all that stuff. About hundred messages. It is HORRIBLE.
> I was tearing out my hair in despair.

> To subsume: it all circles around catering for gross pilot error and
> stupidity.
>

Yes, and people not reading the documentation. Or not liking what they read
and therefore ignoring it.

//Magnus

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2020-06-14 13:17:14 Re: Oracle vs. PostgreSQL - a comment
Previous Message Ron 2020-06-14 07:59:32 Re: BUG #11141: Duplicate primary key values corruption