Re: Implementing incremental backup

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implementing incremental backup
Date: 2013-06-19 22:18:57
Message-ID: 20130619221857.GU3537@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Claudio Freire escribió:
> On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
> >> I don't see how this is better than snapshotting at the filesystem
> >> level. I have no experience with TB scale databases (I've been limited
> >> to only hundreds of GB), but from my limited mid-size db experience,
> >> filesystem snapshotting is pretty much the same thing you propose
> >> there (xfs_freeze), and it works pretty well. There's even automated
> >> tools to do that, like bacula, and they can handle incremental
> >> snapshots.
> >
> > Large databases tend to have multiple filesystems and getting a single,
> > consistent, snapshot across all of them while under load is..
> > 'challenging'. It's fine if you use pg_start/stop_backup() and you're
> > saving the XLOGs off, but if you can't do that..
>
> Good point there.
>
> I still don't like the idea of having to mark each modified page. The
> WAL compressor idea sounds a lot more workable. As in scalable.

There was a project that removed "useless" WAL records from the stream,
to make it smaller and useful for long-term archiving. It only removed
FPIs as far as I recall. It's dead now, and didn't compile on recent
(9.1?) Postgres because of changes in the WAL structs, IIRC.

This doesn't help if you have a large lot of UPDATEs that touch the same
set of rows over and over, though. Tatsuo-san's proposal would allow
this use-case to work nicely because you only keep one copy of such
data, not one for each modification.

If you have the two technologies, you could teach them to work in
conjunction: you set up WAL replication, and tell the WAL compressor to
prune updates for high-update tables (avoid useless traffic), then use
incremental backup to back these up. This seems like it would have a
lot of moving parts and be rather bug-prone, though.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-19 22:23:02 Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])
Previous Message Kevin Grittner 2013-06-19 22:03:15 Re: Git-master regression failure