Re: pg_upgrade --jobs

From: Sherrylyn Branchaw <sbranchaw(at)gmail(dot)com>
To: Melvin Davidson <melvin6925(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, senor <frio_cervesa(at)hotmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_upgrade --jobs
Date: 2019-04-07 20:49:20
Message-ID: CAB_myF5DQ0Pqnz2=F7zLDApGv6JPDNZm1VKw4Rd=M0R5ykng1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> It may take a while for slony replication to be in sync, but when it is,
there will be very little down time to switch over.

I agree in principle, which is why I chose Slony over pg_upgrade for my
company's very similar situation, but my experience was that, out of the
box, Slony was projected to take unacceptably long (months) to sync our
350,000 tables, and downtime was going to be many hours. In order to get
those numbers down, I had to reduce the number of per-table steps Slony was
executing, e.g. by eliminating unnecessary-for-us ones and by rewriting
others to happen in bulk.

Here's something I didn't know Slony did when I didn't have 350,000 tables
to worry about: add a table to replication, run through the existing list
of replicated tables to make sure it's captured any changes that have
happened in the meantime, add one more table to replication, run through
the existing list of tables to make sure no changes have happened, and so
on. The more tables you add, the longer it takes to add the next table.
Here's another thing I didn't know it did: during the switchover, manage 4
triggers per table serially on primary and standby. 4 * 350000 * 2 = 2.8
million triggers. (I knew it managed 4 triggers per table, but not that it
was serial or how long that would take when pg_trigger had almost 3 million
relevant rows.)

I would love to help the OP out in a more hands-on way (I have
upgrade-via-Slony consulting experience), as well as to open source the
custom changes I came up with, but I'm debating whether I have the time to
offer to take on another project right now. I'm also reluctant to summarize
exactly what I did, because messing with pg_catalog directly is very
delicate and likely to go wrong, and I don't recommend it to the
inexperienced all, or to the experienced if they have alternatives.

> Plan B is to drop a lot of tables and deal with imports later.

If it were me, I would take a copy of my database, restore it to a sandbox
environment, set up Slony, and get an estimate for the projected sync time.
Let it run for a few hours to see how dramatically the sync time slows down
as the number of tables grows. I.e. don't count the number of replicated
tables after an hour and assume that's your hourly rate of syncing. If it
looks like you can sync your entire database in a few days or less, then
let it run and test out your failover time. If that's acceptable, you're
good to go.

If sync time looks like it's going to be incredibly long on your schema +
hardware + Postgres version, etc., then failover time probably would be
too. In that case, temporarily dropping the tables you can drop may be
preferable to the complexity of making changes to speed up Slony, if you
don't have a seriously experienced DBA on hand.

Sherrylyn

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Igal Sapir 2019-04-07 22:08:46 Unable to Vacuum Large Defragmented Table
Previous Message Konstantin Izmailov 2019-04-07 20:41:56 Re: assembling PGresults from multiple simultaneous queries (libpq, singlerowmode)