Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shruthi Gowda <gowdashru(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Date: 2021-08-26 17:37:50
Message-ID: 20210826173750.GK22637@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 26, 2021 at 01:24:46PM -0400, Stephen Frost wrote:
> Greetings,
>
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > > Yes, we're talking about either incremental (or perhaps differential)
> > > backup where only the files which are actually different would be backed
> > > up. Just like with PG, I can't provide any complete guarantees that
> > > we'd be able to actually make this possible after a major version with
> > > pgBackRest with this change, but it definitely isn't possible *without*
> > > this change. I can't see any reason why we wouldn't be able to do a
> > > checksum-based incremental backup though (which would be *much* faster
> > > than a regular backup) once this change is made and have that be a
> > > reliable and trustworthy backup. I'd want to think about it more and
> > > discuss it with David in some detail before saying if we could maybe
> > > perform a timestamp-based incremental backup (without checksum'ing the
> > > files, as we do in normal situations), but that would really just be a
> > > bonus.
> >
> > Well, it would be nice to know exactly how it would help pgBackRest if
> > that is one of the reasons we are adding this feature.
>
> pgBackRest keeps a manifest for every file in the PG data directory that
> is backed up and we identify that file by the filename. Further, we
> calculate a checksum for every file. If the filenames didn't change
> then we'd be able to compare the file in the new cluster against the
> file and checksum in the manifest in order to be able to perform the
> incremental/differential backup. We don't store the inodes in the
> manifest though, and we don't have any concept of looking at multiple
> data directories at the same time or anything like that (which would
> also mean that the old data directory would have to be kept around for
> that to even work, which seems like a good bit of additional
> complication and risk that someone might start up the old cluster by
> accident..).
>
> That's how it'd be very helpful to pgBackRest for the filenames to be
> preserved across pg_upgrade's.

OK, that is clear.

> > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > > this code for that reason?
> > > > >
> > > > > That this would help with TDE (of which there seems little doubt...) is
> > > > > an additional benefit to this. Specifically, taking the existing work
> > > > > that's already been done to allow block-by-block encryption and
> > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > > number as the IV, just like many disk encryption systems do, avoids the
> > > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > > it's certainly not difficult to do, but it does depend on this change.
> > > > > This was all discussed previously and it sure looks like a sensible
> > > > > approach to use that mirrors what many other systems already do
> > > > > successfully.
> > > >
> > > > Well, I would think we would not add this for TDE until we were sure
> > > > someone was working on adding TDE.
> > >
> > > That this would help with TDE is what I'd consider an added bonus.
> >
> > Not if we have no plans to implement TDE, which was my point. Why not
> > wait to see if we are actually going to implement TDE rather than adding
> > it now. It is just so obvious, why do I have to state this?
>
> There's been multiple years of effort put into implementing TDE and I'm
> sure hopeful that it continues as I'm trying to put effort into moving
> it forward myself. I'm a bit baffled by the idea that we're just

Well, this is the first time I am hearing this publicly.

> suddenly going to stop putting effort into TDE as it is brought up time
> and time again by clients that I've talked to as one of the few reasons
> they haven't moved to PG yet- I can't believe that hasn't been
> experienced by folks at other organizations too, I mean, there's people
> maintaining forks of PG specifically for TDE ...

Agreed.

> > > I've certainly done it and I'd be kind of surprised if others haven't,
> > > but I've also played a lot with pg_dump in various modes, so perhaps
> > > that's not a great representation. I've definitely had to explain to
> > > clients why there's a whole different set of filenames after a
> > > pg_upgrade and why that is the case for an 'in place' upgrade before
> > > too.
> >
> > Uh, so I guess I am right that few people have mentioned this in the
> > past. Why were users caring about the file names?
>
> This is a bit baffling to me. Users and admins certainly care about
> what files their data is stored in and knowing how to find them.
> Covering the data directory structure is a commonly asked for part of
> the training that I regularly do for clients.

I just never thought people cared about the file names, since I have
never heard a complaint about how pg_upgrade works all these years.

> > > I have a very hard time seeing what changes might happen in the server
> > > in this space that wouldn't have an impact on pg_upgrade, with or
> > > without this.
> >
> > I don't know, but I have to ask since I can't know the future, so any
> > "preseration" has to be studied.
>
> We can gain, perhaps, some insight looking into the past and that seems
> to indicate that this is certainly a very stable part of the server code
> in the first place, which would imply that it's unlikely that there'll
> be much need to adjust this code in the future in the first place.

Good, it have to ask.

> > > > I am not saying this change is wrong, but I think the reasons need to be
> > > > stated in this thread, rather than just moving forward.
> > >
> > > Ok, they've been stated and it seems to at least Robert and myself that
> > > this is worthwhile to at least continue through to a concluded patch,
> > > after which we can contemplate that patch's complexity against these
> > > reasons.
> >
> > OK, that works for me. What bothers me is that the Desirability of this
> > changes has not be clearly stated in this thread.
>
> I hope that this email and the many many prior ones have gotten across
> the desirability of the change.

Yes, I think we are in a better position now to evaluate this.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-08-26 19:02:31 Re: verify_heapam for sequences?
Previous Message Bruce Momjian 2021-08-26 17:34:48 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)