Re: Standby Mechanics: WAL vs Streaming

From: Don Seiler <don(at)seiler(dot)us>
To: MichaelDBA <MichaelDBA(at)sqlexec(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Standby Mechanics: WAL vs Streaming
Date: 2017-08-27 03:16:11
Message-ID: CAHJZqBC782rXEZu_FJkMog-DmtPHw9es4pb9vqO2ZYsCaTidPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Sat, Aug 26, 2017 at 9:42 PM, MichaelDBA <MichaelDBA(at)sqlexec(dot)com> wrote:

> pgbackrest archives WALs faster than a custom archive_command command. I
> use pgbackrest to automate archiving of WALs from primary to pgbackrest
> server. From there, my restore_command on the slave pulls WALs if needed
> to catch up to SR (streaming replication).
>

Interesting, I'll look into pgbackrest. Thanks!

> The standby will recover from the streaming xlog data if local WAL files
> are not available. (So the standby prefers local WAL files to streaming?)
> The other way around: The standby prefers streaming replication. It only
> pulls WALs (restore_command) if it is too far behind.
>

OK. That's what I had though initially but for some reason convinced myself
that it would think it's easier to get it locally when available.

> I saw that the WAL archives were being removed from the archive directory.
> If the standby was already ahead of the WAL files because of streaming
> replication, would it still use the restore_command to restore WAL files
> from archive to px_log directory? Or would it just delete the archive files?
> The archived files in your case (stored on the standby) would only be
> removed if you had pg_archivecleanup command also in your recovery.conf
> file for WAL cleanup (archive_cleanup_command).
>

Yes we do have that, sorry I didn't specify. I just wanted to confirm that
it would be used to clean up archives that arrive well after their
usefulness with streaming replication.

> As far as the lag goes, what can we do to increase the throughput for
> streaming replication? We tried to give it almost all of our 100Mb pipe
> between data centers and there was still significant lag until it recovered
> past the point of my manual vacuums.
> What do you expect? Manual vacuum full on big tables will generate a lot
> of WAL thereby causing SR lag.
>

As I've said in other recent emails to this list, I'm relatively new to
these waters, having spent the last 15+ years as an Oracle DBA. I figured
that manual vacuum would generate a lot of changes for replication. That's
why I'm asking where I can improve the throughput (if possible), given the
additional activity spike.

> I'm going to assume that having both streaming replication and WAL
> rsyncing are going to compete for bandwidth. Would it be advisable to only
> do streaming most of the day and then maybe rsync WAL files in a nightly
> batch? Now that I think of it, we do hourly snapmirrors of that WAL storage
> already (also using the same bandwidth), perhaps we just use that for the
> WAL replication.
> How about doing your manual vacuums at night time during low load times?
>

You're assuming that night time is our off-peak time (although I did say
"nightly batch", it doesn't mean it's a better time for maintenance work in
our case). There's a reason I was running the vacuum when I was this
morning, and it isn't something we regularly do. We had a table with 3
corrupt indexes that had prevented autovacuum from running on that table
for many months. Today I recreated those indexes and ran a manual vac
freeze (it was well past the min freeze age), during an off-peak time
negotiated with the business side of the shop.

Thanks again for the pg_backrest information. I'll check it out. I'm
thinking at the very least we could just present that NFS snapmirror as a
read-only volume to the standby and point restore_command at it. That
snapmirror is taking place regardless, so the additional rsync we're doing
in our archive_command script on the primary is just piling on and eating
up more bandwidth.

Don.
--
Don Seiler
www.seiler.us

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Guillaume Lelarge 2017-08-27 06:40:44 Re: Standby Mechanics: WAL vs Streaming
Previous Message MichaelDBA 2017-08-27 02:42:55 Re: Standby Mechanics: WAL vs Streaming