Re: Backup advice

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Johann Spies <johann(dot)spies(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Backup advice
Date: 2013-04-08 15:40:16
Message-ID: 5162E4E0.4010809@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 04/08/2013 08:14 AM, Johann Spies wrote:

> Size: might be about 200Gb
> The server uses a Tivoli backup client with daily backup
> At the moment There are pg_dumps for each database on the server on a
> daily, weekly and monthly basis. All this gets backed up to the Tivoli
> server.

Ok. So far, so good.

> I have read about using pg_basebackup in an article from Shaun
> Thomas' booklet on Packt Publishers**(I will probably buy the
> booklet)*. *That seems to be a possible solution.

Ok, with pg_basebackup, you'll get a binary backup of the actual data
files involved in your database cluster. This will, in almost every
case, be larger than pg_dump, and take about the same amount of time to
produce. You also won't be able to get the described method you're using
in your Tivoli software, since pg_basebackup works on the entire install
instead of each individual database.

One benefit, as Birta pointed out, is that you could use this backup as
a base, and apply WAL / transaction logs instead, and those are
generally smaller if your database doesn't see a lot of daily overhead.
Unfortunately if you do a weekly base, and need to recover far into the
week, that can be a time-consuming process.

Then again, so can restoring a pg_dump of a 200GB cluster, thanks to the
index creation points.

One thing you might want to consider, is that 9.3 will presumably have
parallel pg_dump to complement parallel pg_restore. This would greatly
reduce the amount of time a full dump->restore cycle requires.

Aside from that, you're already likely using the smallest backup you can
have. Backing up a 600GB database takes about 120GB for us using tar,
which is close to what pg_basebackup would give you in terms of size. A
compressed pg_dump of the same data is around 50GB.

Still, we use the binaries, because we need uptime more than size, and
they are much faster to restore. We have daily binary backups going back
over four years, because the backups are probably the most important
thing in the company. Lose those under bad circumstances, and we'd
literally be out of business.

If size really is that important, you might want to check what kind of
compression is going on with Tivoli. If you can produce a pg_dump and
compress it with bzip2, or a utility such as lrzip that makes use of
better compression algorithms like LZMA, you may be able to back up much
smaller files than your current process.

I'll warn you, though. Even parallel compression methods like lbzip2 are
much slower than something like parallel gzip (pigz). You'll get a
20-30% smaller file at the cost of a 4-8x slower backup process. Since
your data is likely to grow from its current size of 200GB, it's
something to consider.

Anyone else?

I'm so used to using basic utilities, I know I haven't covered things
like deduplication backup solutions. It may be that Tivoli isn't right
for this, but I'm not familiar with that software.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Verite 2013-04-08 15:48:08 Re: postgresql command line exploit found in the wild
Previous Message Greg Sabino Mullane 2013-04-08 15:35:40 Re: PostgreSQL Downgrades