Quick Links

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shruthi Gowda <gowdashru(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject:	Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Date:	2021-08-26 17:24:46
Message-ID:	20210826172446.GH17906@tamriel.snowman.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greetings,

* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > Yes, we're talking about either incremental (or perhaps differential)
> > backup where only the files which are actually different would be backed
> > up. Just like with PG, I can't provide any complete guarantees that
> > we'd be able to actually make this possible after a major version with
> > pgBackRest with this change, but it definitely isn't possible *without*
> > this change. I can't see any reason why we wouldn't be able to do a
> > checksum-based incremental backup though (which would be *much* faster
> > than a regular backup) once this change is made and have that be a
> > reliable and trustworthy backup. I'd want to think about it more and
> > discuss it with David in some detail before saying if we could maybe
> > perform a timestamp-based incremental backup (without checksum'ing the
> > files, as we do in normal situations), but that would really just be a
> > bonus.
>
> Well, it would be nice to know exactly how it would help pgBackRest if
> that is one of the reasons we are adding this feature.

pgBackRest keeps a manifest for every file in the PG data directory that
is backed up and we identify that file by the filename. Further, we
calculate a checksum for every file. If the filenames didn't change
then we'd be able to compare the file in the new cluster against the
file and checksum in the manifest in order to be able to perform the
incremental/differential backup. We don't store the inodes in the
manifest though, and we don't have any concept of looking at multiple
data directories at the same time or anything like that (which would
also mean that the old data directory would have to be kept around for
that to even work, which seems like a good bit of additional
complication and risk that someone might start up the old cluster by
accident..).

That's how it'd be very helpful to pgBackRest for the filenames to be
preserved across pg_upgrade's.

> > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > this code for that reason?
> > > >
> > > > That this would help with TDE (of which there seems little doubt...) is
> > > > an additional benefit to this. Specifically, taking the existing work
> > > > that's already been done to allow block-by-block encryption and
> > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > number as the IV, just like many disk encryption systems do, avoids the
> > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > it's certainly not difficult to do, but it does depend on this change.
> > > > This was all discussed previously and it sure looks like a sensible
> > > > approach to use that mirrors what many other systems already do
> > > > successfully.
> > >
> > > Well, I would think we would not add this for TDE until we were sure
> > > someone was working on adding TDE.
> >
> > That this would help with TDE is what I'd consider an added bonus.
>
> Not if we have no plans to implement TDE, which was my point. Why not
> wait to see if we are actually going to implement TDE rather than adding
> it now. It is just so obvious, why do I have to state this?

There's been multiple years of effort put into implementing TDE and I'm
sure hopeful that it continues as I'm trying to put effort into moving
it forward myself. I'm a bit baffled by the idea that we're just
suddenly going to stop putting effort into TDE as it is brought up time
and time again by clients that I've talked to as one of the few reasons
they haven't moved to PG yet- I can't believe that hasn't been
experienced by folks at other organizations too, I mean, there's people
maintaining forks of PG specifically for TDE ...

Seems like maybe we were both seeing something as obvious to the other
that wasn't actually the case.

> > > > > > make this required, general improved sanity when working with pg_upgrade
> > > > > > is frankly a benefit in its own right too...). If the additional code
> > > > >
> > > > > How? I am not aware of any advantage except cosmetic.
> > > >
> > > > Having to resort to matching up inode numbers between the two clusters
> > > > after a pg_upgrade to figure out what files are actually the same
> > > > underneath is a pain that goes beyond just cosmetics imv. Removing that
> > > > additional level that admins, and developers for that matter, have to go
> > > > through would be a nice improvement on its own.
> > >
> > > OK, I was just not aware anyone did that, since I have never hard anyone
> > > complain about it before.
> >
> > I've certainly done it and I'd be kind of surprised if others haven't,
> > but I've also played a lot with pg_dump in various modes, so perhaps
> > that's not a great representation. I've definitely had to explain to
> > clients why there's a whole different set of filenames after a
> > pg_upgrade and why that is the case for an 'in place' upgrade before
> > too.
>
> Uh, so I guess I am right that few people have mentioned this in the
> past. Why were users caring about the file names?

This is a bit baffling to me. Users and admins certainly care about
what files their data is stored in and knowing how to find them.
Covering the data directory structure is a commonly asked for part of
the training that I regularly do for clients.

> > > > > > was a huge burden or even a moderate one then that might be an argument
> > > > > > against, but it hardly sounds like it will be given Robert's thorough
> > > > > > analysis so far and the (admittedly not complete, but not that far from
> > > > > > it based on the DB OID review) proposed patch.
> > > > >
> > > > > I am find to add it if it is minor, but I want to see the calculus of
> > > > > its value vs complexity, which I have not seen spelled out.
> > > >
> > > > I feel that this, along with the prior discussions, spells it out
> > > > sufficiently given the patch's complexity looks to be reasonably minor
> > > > and very similar to the existing things that pg_upgrade already does.
> > > > Had pg_upgrade done this in the first place, I don't think there would
> > > > have been nearly this amount of discussion about it.
> > >
> > > Well, there is a reason pg_upgrade didn't initially do this --- because
> > > it adds complexity, and potentially makes future changes to pg_upgrade
> > > necessary if the server behavior changes.
> >
> > I have a very hard time seeing what changes might happen in the server
> > in this space that wouldn't have an impact on pg_upgrade, with or
> > without this.
>
> I don't know, but I have to ask since I can't know the future, so any
> "preseration" has to be studied.

We can gain, perhaps, some insight looking into the past and that seems
to indicate that this is certainly a very stable part of the server code
in the first place, which would imply that it's unlikely that there'll
be much need to adjust this code in the future in the first place.

> > > I am not saying this change is wrong, but I think the reasons need to be
> > > stated in this thread, rather than just moving forward.
> >
> > Ok, they've been stated and it seems to at least Robert and myself that
> > this is worthwhile to at least continue through to a concluded patch,
> > after which we can contemplate that patch's complexity against these
> > reasons.
>
> OK, that works for me. What bothers me is that the Desirability of this
> changes has not be clearly stated in this thread.

I hope that this email and the many many prior ones have gotten across
the desirability of the change.

Thanks,

Stephen

In response to

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce) at 2021-08-26 17:16:01 from Bruce Momjian

Responses

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce) at 2021-08-26 17:37:50 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2021-08-26 17:34:48	Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Previous Message	Robert Haas	2021-08-26 17:20:38	Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)