Re: BUG #7884: pg_upgradecluster is terribly slow

From: Martin Pitt <mpitt(at)debian(dot)org>
To: rtvd(at)me(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7884: pg_upgradecluster is terribly slow
Date: 2013-02-15 18:29:59
Message-ID: 20130215182959.GA2531@piware.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

rtvd(at)me(dot)com [2013-02-15 17:41 +0000]:
> I have tried upgrading a database from 8.4 to 9.1 and pg_upgradecluster is
> terribly slow.

Please note that this is a Debian/Ubuntu specific bug,
pg_upgradecluster is not shipped by upstream PostgreSQL.

I admittedly don't have any first-hand experience with migrating large
databases, and get very little feedback about it, so thanks for your
observations!

First of all, Peter has worked on a branch that makes
pg_upgradecluster use pg_upgrade, which hopefully will be a lot more
performant.

> 1. It uses "-Fc". This is bad as it enables compression by default. Which is
> pointless and wastes CPU power.

The tar format (-Ft) has also been supported since at least 8.4
(that's the earliest version which is still supported in
Debian/Ubuntu), and documentation says it's not compressed. If you
replace -Fc with -Ft, do you see a significant performance increase?

> 2. It seems to migrate the schema first and only then it moves data. This is
> bad as indices are being re-built when the data is finally poured in.

At least in earlier versions pg_dumpall wasn't able to dump BLOBs, and
the manpage doesn't indicate otherwise for 9.2 either. AFAIK this just
supports the plain SQL text format. I don't believe pg_dumpall would
avoid the rebuilding of incides?

> 3. The migration does not happen within a transaction.

> 1. migrate the schema

pg_upgradecluster does that, too.

> 2. drop all databases (leaving roles in)
> 3. re-creating the databases (without any content)

Out of interest, why does that help?

> 4. for each database run pg_dump <database> | psql --single-transaction
> <database>

As written above, I don't want to use the text format and psql, but
pg_restore also supports --single-transaction since at least 8.4.

The question which I'm not sure about is whether it's ok to use
--single-transaction even for very large databases. I. e. is piling up
gigabytes of data in a transaction and committing it all in the end
always more efficient than the default mode (which I assume will use
one transaction by row)? Is that a safe thing to do, or could one run
into out-of-memory conditions?

http://people.canonical.com/~pitti/tmp/pg_upgradecluster is a version
with these two changes: pg_restore --single-transaction and using the
tar format. Perhaps you can try this, and compare performance?

Thanks,

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message david.thomas 2013-02-15 18:33:21 BUG #7885: postmaster panic on startup does not release shared memory
Previous Message Kevin Grittner 2013-02-15 18:27:47 Re: BUG #7884: pg_upgradecluster is terribly slow