Re: Add notes to pg_combinebackup docs

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Martín Marqués <martin(dot)marques(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add notes to pg_combinebackup docs
Date: 2024-04-12 09:09:11
Message-ID: CABUevEzJC8VTFuJg5=4Cdcf4gH_OAZH5ozOoa=AKgrc7qwP=-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 12, 2024 at 12:14 AM David Steele <david(at)pgmasters(dot)net> wrote:

>
>
> On 4/11/24 20:51, Tomas Vondra wrote:
> > On 4/11/24 02:01, David Steele wrote:
> >>
> >> I have a hard time seeing this feature as being very useful, especially
> >> for large databases, until pg_combinebackup works on tar (and compressed
> >> tar). Right now restoring an incremental requires at least twice the
> >> space of the original cluster, which is going to take a lot of users by
> >> surprise.
> >
> > I do agree it'd be nice if pg_combinebackup worked with .tar directly,
> > without having to extract the directories first. No argument there, but
> > as I said in the other thread, I believe that's something we can add
> > later. That's simply how incremental development works.
>
> OK, sure, but if the plan is to make it practical later doesn't that
> make the feature something to be avoided now?
>

That could be said for any feature. When we shipped streaming replication,
the plan was to support synchronous in the future. Should we not have
shipped it, or told people to avoid it?

Sure, the current state limits it's uses in some cases. But it still leaves
a bunch of other cases where it works just fine.

> >> I know you have made some improvements here for COW filesystems, but my
> >> experience is that Postgres is generally not run on such filesystems,
> >> though that is changing a bit.
> >
> > I'd say XFS is a pretty common choice, for example. And it's one of the
> > filesystems that work great with pg_combinebackup.
>
> XFS has certainly advanced more than I was aware.
>

And it happens to be the default on at least one of our most common
platforms.

> However, who says this has to be the filesystem the Postgres instance
> > runs on? Who in their right mind put backups on the same volume as the
> > instance anyway? At which point it can be a different filesystem, even
> > if it's not ideal for running the database.
>
> My experience is these days backups are generally placed in object
> stores. Sure, people are still using NFS but admins rarely have much
> control over those volumes. They may or not be COW filesystems.
>

If it's mounted through NFS I assume pg_combinebackup won't actually be
able to use the COW features? Or does that actually work through NFS?

Mounted LUNs on a SAN I find more common today though, and there it would
do a fine job.

>
> > FWIW I think it's fine to tell users that to minimize the disk space
> > requirements, they should use a CoW filesystem and --copy-file-range.
> > The docs don't say that currently, that's true.
>
> That would probably be a good addition to the docs.
>

+1, that would be a good improvement.

> All of this also depends on how people do the restore. With the CoW
> > stuff they can do a quick (and small) copy on the backup server, and
> > then copy the result to the actual instance. Or they can do restore on
> > the target directly (e.g. by mounting a r/o volume with backups), in
> > which case the CoW won't really help.
>
> And again, this all requires a significant amount of setup and tooling.
> Obviously I believe good backup requires effort but doing this right
> gets very complicated due to the limitations of the tool.
>

It clearly needs to be documented that there are space needs. But
temporarily getting space for something like that is not very complicated
in most environments. But you do have to be aware of it.

Generally speaking it's already the case that the "restore experience" with
pg_basebackup is far from great. We don't have a "pg_baserestore". You
still have to deal with archive_command and restore_command, which we all
know can be easy to get wrong. I don't see how this is fundamentally worse
than that.

Personally, I tend to recommend that "if you want PITR and thus need to
mess with archive_command etc, you should use a backup tool like
pg_backrest. If you're fine with just daily backups or whatnot, use
pg_basebackup". The incremental backup story fits somewhere in between, but
I'd still say this is (today) primarily a tool directed at those that don't
need full PITR.

> But yeah, having to keep the backups as expanded directories is not
> > great, I'd love to have .tar. Not necessarily because of the disk space
> > (in my experience the compression in filesystems works quite well for
> > this purpose), but mostly because it's more compact and allows working
> > with backups as a single piece of data (e.g. it's much cleared what the
> > checksum of a single .tar is, compared to a directory).
>
> But again, object stores are commonly used for backup these days and
> billing is based on data stored rather than any compression that can be
> done on the data. Of course, you'd want to store the compressed tars in
> the object store, but that does mean storing an expanded copy somewhere
> to do pg_combinebackup.
>

Object stores are definitely getting more common. I wish they were getting
a lot more common than they actually are, because they simplify a lot. But
they're in my experience still very far from being a majority.

But if the argument is that all this can/will be fixed in the future, I
> guess the smart thing for users to do is wait a few releases for
> incremental backups to become a practical feature.
>

There's always going to be another set of goalposts further ahead. I think
it can still be practical for quite a few people.

I'm more worried about the issue you raised in the other thread about
missing files, for example...

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-04-12 09:11:25 Re: Incorrect handling of IS [NOT] NULL quals on inheritance parents
Previous Message jian he 2024-04-12 09:06:04 Re: altering a column's collation leaves an invalid foreign key