Re: Built-in Raft replication

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Konstantin Osipov <kostja(dot)osipov(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Built-in Raft replication
Date: 2025-04-16 14:35:27
Message-ID: 90951dcf-30fc-4f11-87c7-44ddfdcbf768@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

16.04.2025 07:58, Andrey Borodin пишет:
> Yes, shared DCS are common these days. AFAIK, we use one Zookeeper instance per hundred Postgres clusters to coordinate pg_consuls.
>
> Actually, scalability is opposite to topic of this thread. Let me explain.
> Currently, Postgres automatic failover tools rely on databases with built-in automatic failover. Konstantin is proposing to shorten this loop and make Postgres use its build-in automatic failover.
>
> So, existing tooling allows you to have 3 hosts for DCS, with majority of 2 hosts able to elect new leader in case of failover.
> And you can have only 2 hosts for Postgres - Primary and Standby. You can have 2 big Postgres machines with 64 CPUs. And 3 one-CPU hosts for Zookeper\etcd.
>
> If you use build-in failover you have to resort to 3 big Postgres machines because you need 2/3 majority. Of course, you can install MySQL-stype arbiter - host that had no real PGDATA, only participates in voting. But this is a solution to problem induced by built-in autofailover.

Arbiter can store WAL without (almost) any data. Then it is not only for
voting, but also for reliability as almost full featured third server.

Certainly, it may become only "read-only master" - just to replicate WAL's
tail it has and commit it by commiting record in new term/timeline. Then it
should give leadership to other replica immediately.

This idea is not a fantasy. BiHA does it.

--
regards
Yura Sokolov aka funny-falcon

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2025-04-16 15:42:27 Re: transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]
Previous Message a.kozhemyakin 2025-04-16 14:31:59 Re: Add Pipelining support in psql