Re: In-place upgrade with streaming replicas

From: Jerry Sievers <gsievers19(at)comcast(dot)net>
To: richard(at)kojedz(dot)in
Cc: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: In-place upgrade with streaming replicas
Date: 2025-02-20 07:49:03
Message-ID: m2y0y1kqds.fsf@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

richard(at)kojedz(dot)in writes:

> Dear Alvaro,
>
> Thanks for your answers. Unfortunately, I was unaware of a shutdown
> record, that makes a difference then. So, I definitely must stop the
> primary first, then use pg_controldata to obtain checkpoint
> info. Then, can I query the replicas while they are up and running if
> they've received the shutdown record or not? So, after shutting down
> the primary, how will I know if a replica has received the mentioned
> record, and is safe to shutdown?

Hmmm, not sure about that but what we do, is stop primary, wait a
$short time, then stop replicas...

Then run pg_controldata on all nodes | filter out only the line
indicating latest checkpoint and sort -u the output. Expect only a
single line if all are matched.

You may also wish to first insure that you got the same number of
lines as total node count before doing the sorting and uniqueing.

Very rarely on our huge systems, we'd have a mismatch after the
verification in in those cases, our automated upgrade procedure
restarts all nodes and then does the shutdown and verify check again.

HTH

>
> Thanks for the clarifications.
>
> Best regards,
> Richard
>
> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>> On 2025-Feb-19, richard(at)kojedz(dot)in wrote:
>>
>>> With this, I have the question, that after the shutdown of primary,
>>> what is
>>> the guarantee for replicas having the same checkpoint location? Why
>>> does the
>>> order of shutting down the servers matter? What would be the really
>>> exact
>>> and reliable way to ensure that replicas will have the same checkpoint
>>> location as the primary?
>> The replicas can't write WAL by themselves, but they will replay
>> whatever the primary has sent; by shutting down the primary first and
>> letting the replicas catch up, you ensure that the replicas will
>> actually receive the shutdown record and replay it. If you shut down
>> the replicas first, they can obviously never catch up with the shutdown
>> checkpoint of the primary.
>> As I recall, if you do shut down the primary first, one potential
>> danger
>> is that the primary fails to send the checkpoint record before shutting
>> down, so the replicas won't receive it and obviously will not replay
>> it;
>> or simply that they are behind enough that they receive it but don't
>> replay it.
>> You could use pg_controldata to read the last checkpoint info from
>> all
>> nodes. You can run it on the primary after shutting it down, and then
>> on each replica while it's still running to ensure that the correct
>> restartpoint has been created.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message richard 2025-02-20 07:55:31 Re: In-place upgrade with streaming replicas
Previous Message Tom Lane 2025-02-19 22:27:45 Re: Does export operation cancel SQLs?

Browse pgsql-general by date

  From Date Subject
Next Message richard 2025-02-20 07:55:31 Re: In-place upgrade with streaming replicas
Previous Message Jethish Jethish 2025-02-20 05:04:05 Logical decoding