Re: Built-in Raft replication

From: Konstantin Osipov <kostja(dot)osipov(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Built-in Raft replication
Date: 2025-04-15 11:14:35
Message-ID: Z_4_m2LB8JL5LD0j@ark
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> [25/04/15 12:02]:

> > OTOH Raft needs to write its own log, and what's worse, it sometimes
> > needs to remove already written parts of it (so, it is not appended
> > only, unlike WAL). If you have a production system which maintains two
> > kinds of logs with different semantics, it is a very hard system to
> > maintain..
>
> Raft is log replication protocol which uses log position and term.
> But... PostgreSQL already have log position and term in its WAL structure.
> PostgreSQL's timeline is actually the Term.
> Raft implementer needs just to correct rules for Term/Timeline switching:
> - instead of "next TimeLine number is just increment of largest known
> TimeLine number" it needs to be "next TimeLine number is the result of
> Leader Election".
>
> And yes, "it sometimes needs to remove already written parts of it".
> But... It is exactly what every PostgreSQL's cluster manager software have
> to do to join previous leader as a follower to new leader - pg_rewind.
>
> So, PostgreSQL already have 70-90%% of Raft implementation details.
> Raft doesn't have to be implemented in PostgreSQL.
> Raft has to be finished!!!
>
> PS: One of the biggest issues is forced snapshot on replica promotion. It
> really slows down leader switch time. It looks like it is not really
> needed, or some small workaround should be enough.

I'd say my pet peeve is storing the cluster topology (the so
called raft configuration) inside the database, not in an external
state provider. Agree on other points.

--
Konstantin Osipov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2025-04-15 11:15:36 Re: Built-in Raft replication
Previous Message Rahila Syed 2025-04-15 11:07:51 Re: Add pg_get_injection_points() for information of injection points