From: | Mistina Michal <Michal(dot)Mistina(at)virte(dot)sk> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Re: postmaster.pid still exists after pacemaker stopped postgresql - how to remove |
Date: | 2013-08-26 14:02:30 |
Message-ID: | e4e43612d938407a851fbd4656502d8c@Electra.virte.intra |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Masao.
Thank you for suggestion. In deed that could occure. Most probably while I
was testing split-brain situation. In that case I turned off network card on
one node and on both nodes DRBD was in primary role. But after the
split-brain occurred I resync DRBD so from two primaries I promoted one as
"primary" (winner) and second one as "secondary" (victim). Data should be
consistent by that moment. But probably it wasn't consistent.
I am using DRBD only in one technical center. Data are syncing by streaming
replication to the secondary technical center where is another DRBD
instance.
It's like this:
TC1:
--- node1: DRBD (primary), pgsql
--- node2: DRBD (secondary), pgsql
TC2:
--- node1: DRBD (primary), pgsql
--- node2: DRBD (secondary), pgsql
Within one technical center only one pgsql runs only on one node. This is
done by pacemaker/corosync.
From the outside perspective it looks like only one postgresql server is
running in one TC.
TC1 (master) ==== streaming replication =====> TC2 (slave)
If one node in technical center fails, the fail-over to secondary node is
really quick. It's because fast network within technical center.
Between TC1 and TC2 there is a WAN link. If something goes wrong and TC1
became unavailable I can switch manually / automatically to TC2.
Is there more appropriate solution? Would you use something else?
Best regards,
Michal Mistina
On Mon, Aug 26, 2013 at 9:53 PM, Mistina Michal <Michal(dot)Mistina(at)virte(dot)sk>
wrote:
> Hi there.
>
> I didn't find out why this issue happened. Only backup and format of
> the filesystem where corrupted postmaster.pid file existed helped to
> get rid of it. Hopefully the file won't appear in the future.
I have encountered similar problem when I broke the filesystem by a double
mount. You may have gotten the same problem.
> Master/Slave Set: ms_drbd_pg [drbd_pg]
>
> Masters: [ tstcaps01 ]
>
> Slaves: [ tstcaps02 ]
Why do you use DRBD with streaming replicatin? If you locates the database
cluster on DRBD, it's better to check the status of DRBD filesystem.
Regards,
--
Fujii Masao
From | Date | Subject | |
---|---|---|---|
Next Message | Torello Querci | 2013-08-26 14:27:45 | Problem creating index |
Previous Message | David Johnston | 2013-08-26 13:59:27 | Re: how to use aggregate functions in this case |