From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Speeding up pg_upgrade |
Date: | 2018-01-05 19:00:14 |
Message-ID: | CAMkU=1x-e+maqefhM1yMeSiJ8J9Z+SJHgW7c9bqo3E3JMG4iJA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades. ... we discussed speeding up pg_upgrade.
> >
> > There are clusters that take a long time to dump the schema from the old
> > cluster
>
> Maybe it isn't representative of a typical case, but I can offer a data
> point:
>
> For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
> (for which ~90% is on a ZFS tablespace with compression). We have what's
> traditionally considered to be an excessive number of child tables, which
> works
> okay since planning time is unimportant to us for the report queries which
> hit
> them. Some of the tables are wide (historically up to 1600 columns).
> Some of
> those have default values on nearly every column, and pg_attrdef was large
> (>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).
> Since
> something similar is run by pg_upgrade, I worked around the issue for now
> by
> dropping defaults on the historic children in advance of upgrades (at some
> point I'll figure out what I have to do to allow DROPing DEFAULTs). It's
> not
> the first time we've seen an issue with larger number of children*columns.
>
This is probably worth fixing independent of other ways of speeding up
pg_upgrade.
It spends most of its time making the column names unique while de-parsing
the DEFAULT clause for each column. But I don't think it ever outputs the
column name which results from that deparsing, and since there is only one
table involved, the names should already be unique anyway, unless I am
missing something.
The time seems to be quadratic in number of columns if all columns have
defaults, or proportional to the product of number of columns in table and
the number of columns with defaults.
The CREATE TABLE has a similar problem upon restoring the dump.
Cheers,
Jeff
Attachment | Content-Type | Size |
---|---|---|
pg_dump_default.sh | application/x-sh | 230 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2018-01-05 19:11:30 | pgsql: pg_upgrade: simplify code layout in a few places |
Previous Message | Alvaro Herrera | 2018-01-05 18:53:34 | Re: User defined data types in Logical Replication |