BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres

From: Sylvain Marechal <marechal(dot)sylvain2(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres
Date: 2016-08-25 10:44:29
Message-ID: CAJu=pHTd1YiYJAa20Du+M6+9tB6s329Nbmj58v-jSUM_SbyWKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello all,

After uninstalling a BDR node, it becomes not possible to join it again.
The following log appears in loop:
<<<
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14620-1] LOG: starting
background worker process "bdr (6287997142852742670,1,19526,)->bdr
(6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14621-1] LOG: starting
background worker process "bdr (6287997142852742670,1,18365,)->bdr
(6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14622-1] LOG: starting
background worker process "bdr db: mydb" #local4,support
2016-08-25 10:17:08 [ll101] postgres error [6484]: [14621-1] FATAL:
mismatch in worker state, got 0, expected 1 #error,local4,support
2016-08-25 10:17:08 [ll101] postgres error [6486]: [14622-1] FATAL:
mismatch in worker state, got 0, expected 1 #error,local4,support

>>>
I can not tell how this appends: before removing the node, one of the node
was in the 'catchup' state and the lag of data between the 2 nodes was
growing, that is why I removed it (the idea was to clean the lagged node
and to reattach it again.)

Questions:
* is it possible to recover from this error without restarting postgres
* in case a restart is necessary, how to be sure the postgres restart will
work? my fear is that the restart fails, meaning the service will be
completely down.

Thanks and regards,
Sylvain

Browse pgsql-general by date

  From Date Subject
Next Message Tatsuki Kadomoto 2016-08-25 12:48:37 Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed
Previous Message John R Pierce 2016-08-25 09:14:47 Re: corruption in indexes under heavy load