Re: Streaming replication question

From: Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
To: Keith <keith(at)keithf4(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Streaming replication question
Date: 2019-08-04 11:06:32
Message-ID: CADyzmyyG6i9fBcBVPXKzRZLBp23Sb9OWGpYYdYQT1iPNU8dYxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thank your for your prompt response Keith. You were correct. The root of
the problem was an ssh host key change which caused all logshipping to
error, and therefore pg_xlog queued up.

On Sat, Aug 3, 2019 at 11:23 AM Keith <keith(at)keithf4(dot)com> wrote:

>
>
> On Sat, Aug 3, 2019 at 9:36 AM Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
> wrote:
>
>> Good morning,
>> We run postgres 9.4. Early Saturday morning we had a production postgres
>> outage because our pg_xlog directory ran out of space. Tracing the cause
>> points to a scheduled reboot of the 3 database servers by our IT team to
>> install some monitoring software. Because we did a simple stop/restart of
>> the database our replication slots did not get reset and pg_logs queued up
>> until the out-of-space condition
>>
>> Can someone please verify that the following WOULD HAVE BEEN the
>> appropriate action:
>> (Or offer corrections)
>> 1. Stop the postgres database (on both slaves)
>> 2. Run pg_drop_replication_slot (rep1, rep2 slotnames) (on Master)
>> 3. Stop the postgres database (on master)
>> 4. Reboot Linux Server to bring in monitoring software
>> 5. Start the postgres database (on master)
>> 6. Run pg_create_replication_slot (rep1, rep2 slotnames) (On Master)
>> 7. Start the postgres database (on both slaves)
>>
>> Thank you
>>
>> --
>> *Mark Steben*
>> Database Administrator
>> @utoRevenue <http://www.autorevenue.com/> | Autobase
>> <http://www.autobase.net/>
>> CRM division of Dominion Dealer Solutions
>> 95D Ashley Ave.
>> West Springfield, MA 01089
>> t: 413.327-3045
>> f: 413.383-9567
>>
>> www.fb.com/DominionDealerSolutions
>> www.twitter.com/DominionDealer
>> www.drivedominion.com <http://www.autorevenue.com/>
>>
>> <http://autobasedigital.net/marketing/DD12_sig.jpg>
>>
>>
>>
> Restarting your databases should not have affected the replication slots
> like this. If you stopped the replicas first, then the primary would have
> just started keeping all the WAL files until the replicas came back,
> resuming from where they left off. If you stopped the primary first then
> the replicas would have just lost their connection until the primary came
> back and made its slots available again.
>
> The only time the slots don't stick around is if you do a failover from
> the primary to one of the replicas. In that case, yes, you do have to
> recreate the slots. When things were rebooted, was there any failovers
> kicked off?
>
> I'd check your postgres logs to see if there's any hint as to why things
> turned out the way they did. If all systems were rebooted at the exact same
> time, there may be some edge-case bug that's not being accounted for. But
> without a deeper dive into what happened, that would be hard to say and
> seems unlikely. I would say to make sure you are on the most recent version
> of 9.4 to ensure any known bug fixes are in place. Also highly recommend on
> planning to upgrade to a newer major version. 9.4 goes out of support this
> fall upon the release of version 12.
>
> Keith
>

--
*Mark Steben*
Database Administrator
@utoRevenue <http://www.autorevenue.com/> | Autobase
<http://www.autobase.net/>
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com <http://www.autorevenue.com/>

<http://autobasedigital.net/marketing/DD12_sig.jpg>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Wells Oliver 2019-08-05 16:32:23 Diagnosing a massive toast file
Previous Message Jānis Pūris 2019-08-04 00:27:39 HAProxy + Patroni + pgBouncer High Availability setup