Re: Something else about Redo Logs disappearing

From: Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Something else about Redo Logs disappearing
Date: 2020-06-15 22:26:48
Message-ID: 20200615222648.GA40751@gate.oper.dinoex.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Jun 14, 2020 at 03:05:15PM +0200, Magnus Hagander wrote:

! > You can see that all the major attributes (scheduling, error-handling,
! > signalling, ...) of a WAL backup are substantially different to that
! > of any usual backup.
!
! > This is a different *Class* of backup object, therefore it needs an
! > appropriate infrastructure that can handle these attributes correctly.
! >
!
! Yes, this is *exactly* why special-handling the WAL during the base backup
! makes a lot of sense.

Certainly. Only I prefer to do the special-handling *outside of* the
base backup.

! Is it required? No.
! Will it make your backups more reliable? Yes.

*shrug* I have no benefit in increasing reliability from 250% to 330%,
if that would be the case at all.

! But, if You never have considered *continuous* archiving, and only
! > intend to take a functional momentarily backup of a cluster, then You
! > may well have never noticed these differences. I noticed them mainly
! > because I did *BUILD* such an infrastructure (the 20 lines of shell
! > script, you know).
! >
!
! Yes, if you take a simplistic view of your backups, then yes.

You appear to sound like an insurance salesman who desperately tries
to sell a third health insurance policy to somebody who already has
two of them, by trying to build on unfounded precariousness.

! ! There is *absolutely* no need for threading to use the current APIs. You
! > ! need to run one query, go do something else, and then run another
! > ! query.
! >
! > Wrong. The point is, I dont want to "go do something else", I have to
! > exit() and get back to the initiator at that place.
! >
!
! That is not a requirement of the current PostgreSQL APIs.

We'll be done with that whole API in a few more lines now. (I'm getting
tired of this.)

! (in fact, using
! threading would add a significant extra burden there, as libpq does not
! allow sharing of connections between threads)

I never said one would need to thread the DB connections.

! That is a requirement, and indeed a pretty sharp limitation, of the *other*
! APIs you are working with, it sounds like.

What "other"?

! The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do
! they require any form of threading.

Ah, nice try! But, we're *NOT* shifting blame around. We do instead
get things working. We do proper engineering.

! And the fact that you need to do an exit() would negate any threading
! anyway, so that seems to be a false argument regardless.

You do know exactly what I'm talking about.

! This is also clearly visible in Laurenz' code: he utilizes two
! > unchecked background tasks (processes, in this case) with loose
! > coupling for the purpose, as it does not work otherwise.
! >
!
! Yes, because he is also trying to work around a severely limited API *on
! the other side*.

There is no "other" side. There is only *one* side: to get things
working. And for interaction, Jon Postel's law applies:

Be conservative in what you provide, and liberal in what you require.

This is how the Internet was built. The modern-day linux-youngsters
tend to forget that we all stand on the shoulders of giants.

! The most interesting point in there appears to be this:
! > > that the backup label and tablespace map files are not written to
! > > disk. Instead, their would-be contents are returned in *labelfile
! > > and *tblspcmapfile,
! >
! > This is in do_pg_start_backup() - so we actually HAVE this data
! > already at the *START* time of the backup!
!
!
! > Then why in hell do we wait until the END of the backup before we
! > hand this data to the operator: at a time when the DVD with the
! >
!
! Because it cannot be safely written *into the data directory*.
!
! Now, it could be written *somewhere else*, that is true. And then you would
! add an extra step at restore time to rename it back. But then your restore
! would now also require a plugin.

Yes, and as it is now, it requires girl Friday to fetch them from
the line-printer and mix them up - which, as we already got explained,
can end up a *lot* worse. Or, equivalently and as here practically
demonstrated, some consultant trainee writing some script which, when
accidentially invoked twice, creates an inconsistent backup, and
this being invisible to the operator. That's indeed dangerous enough
for my taste.

But lets grab that from the start:
Yes, I didn't trust the docs. Because, as people here are so crazy
about the old API being troublesome and dangerous and must be
deprecated, and the whole thing being so imminent, then there should
be some REASON for that. And from the docs I could not see any reason
- so I supposed there must be something else in pg_start_backup();
something that is not explained in the docs, and that would explain
the whole bohei.

But, in fact, there is no such thing.

First, the backup_label, which should not stay in the running cluster
tree. So, what bad does happen when it stays there? Nothing at all.
The cluster might not start at once. But then, there was a CRASH
before - and there it is normal for some things to be messed. And,
anyway, on productive machines a crash is not supposed to happen.

But nevertheless, this can be solved, by simply deleting the
backup_label during /etc/rc.
What bad could then happen from doing that? Actually nothing - because
the backup_label is only needed between restore and rollforward.
And there is no reboot required between restore and rollforward.

If a power-loss might happen during restore - start anew with a clean
restore.
If a power-loss might happen during rollforward - start anew with a
clean restore.
And then, fix the diesel.

So much for the backup_label. Furthermore, if there is some means of
filesystem snapshots, the backup_label is entirely superfluous.

Next, the checkpoint. That's needed if one wants to build up a
timeline-zoo, and to engage full_page_writes. I prefer to have neither
of these.

Finally, the full_page_writes. The only problem here can be if
Postgres itself writes a block in piecemeal fashion.

Otherwise there will always visible either the old content or the new
content, never something in-between. Because there is only one pointer
to the block, and that does contain a single value. (But indeed, that
might not be true on a quantum computer, or with a non-transactional
filesystem like they happen to have on linux.)

So, issue debunked.

! > backup is already fixated and cannot be changed anymore, so that
! >
!
! You don't need to change the the backup, only append to it. If you are
! calling pg_stop_backup() at a time when that is no longer possible, then
! you are calling pg_stop_backup() at the wrong time.

It's not trivial to add something to a stream after the fact.

! As I can read, there is no difference in the function requirements
! > between exclusive and non-exclusive mode, in that regard: the
! > backup-label file is NOT necessary in the running cluster data tree,
! > BUT it should get into the RESTORED data tree before starting it.
!
! Correct. It is in fact actively harmful in the running cluster data tree.

Great, we're getting to the point - the remaining problem seems that
we have done away with corporal punishment, so people no longer have a
clear understanding about what "actively harmful" means.

! And I can't find a single one of those "big problems". What I do find
! > is just people whining that their cluster doesn't start and they can't
! > simply delete a file, even if told so. Like soldier complaining that
! > his gun doesn't shoot and he has no idea how to reload.
! >
!
! Have you actually tried it? Or dealt with the many people who have run into
! corruption around this?

I wasn't able to reproduce the problem.

But indeed I do know that skill-levels in general are vastly going
down the gully and are already reaching chthonian levels; and the more
so since GitHub et al. have decided to ban mastery.

So, if folks run into corruption, that is no surprise, since it was
*them* who have actively decided to fire all the experienced DBAs and
have the stuff done from Malaysia instead, for cheap. Their business
equates their business, and I couldn't care less.

Caee dismissed.

cheerio,
PMc

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter 2020-06-15 22:28:16 Re: Something else about Redo Logs disappearing
Previous Message Laurenz Albe 2020-06-15 19:46:34 Re: Something else about Redo Logs disappearing