Re: pg_rewind WAL segments deletion pitfall

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: torikoshia(at)oss(dot)nttdata(dot)com
Cc: bungina(at)gmail(dot)com, cyberdemn(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind WAL segments deletion pitfall
Date: 2023-06-29 01:25:33
Message-ID: 20230629.102533.2256222097295418108.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

At Wed, 28 Jun 2023 22:28:13 +0900, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote in
>
> On 2022-09-29 17:18, Polina Bungina wrote:
> > I agree with your suggestions, so here is the updated version of
> > patch. Hope I haven't missed anything.
> > Regards,
> > Polina Bungina
>
> Thanks for working on this!
> It seems like we are also facing the same issue.

Thanks for looking this.

> I tested the v3 patch under our condition, old primary has succeeded
> to become new standby.
>
>
> BTW when I used pg_rewind-removes-wal-segments-reproduce.sh attached
> in [1], old primary also failed to become standby:
>
> FATAL: could not receive data from WAL stream: ERROR: requested WAL
> segment 000000020000000000000007 has already been removed
>
> However, I think this is not a problem: just adding restore_command
> like below fixed the situation.
>
> echo "restore_command = '/bin/cp `pwd`/newarch/%f %p'" >>
> oldprim/postgresql.conf

I thought on the same line at first, but that's not the point
here. The problem we want ot address is that pg_rewind ultimately
removes certain crucial WAL files required for the new primary to
start, despite them being present previously. In other words, that
restore_command works, but it only undoes what pg_rewind wrongly did,
resulting in unnecessary consupmtion of I/O and/or network bandwidth
that essentially serves no purpose.

pg_rewind already has a feature that determines how each file should
be handled, but it is currently making wrong dicisions for WAL
files. The goal here is to rectify this behavior and ensure that
pg_rewind makes the right decisions.

> Attached modified reproduction script for reference.
>
> [1]https://www.postgresql.org/message-id/CAFh8B%3DnNiFZOAPsv49gffxHBPzwmZ%3D6Msd4miMis87K%3Dd9rcRA%40mail.gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2023-06-29 02:22:23 Re: BUG #17997: Assert failed in validatePartitionedIndex() when attaching partition index to child of valid index
Previous Message Michael Paquier 2023-06-28 22:57:40 Re: BUG #18000: Access method used by matview can be dropped leaving broken matview

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2023-06-29 01:36:48 Re: POC, WIP: OR-clause support for indexes
Previous Message Japin Li 2023-06-29 00:44:54 Re: Another incorrect comment for pg_stat_statements