Re: Cluster for an appliance-type deployment

From: Christian Ramseyer <ramseyer(at)netnea(dot)com>
To: Matthias Leisi <matthias(at)leisi(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Cluster for an appliance-type deployment
Date: 2023-11-07 23:07:10
Message-ID: 56d29e71-a049-4c42-b0bb-cef25a79a014@netnea.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 06.11.23 20:26, Matthias Leisi wrote:
> Dear all,
>
> I’m reasonably experienced with Postgres with simple (single, „rebuild
> and restore“) requirements, but would need some hints on what to look
> for in a more complex situation - deploying Postgres as the backend for
> a (virtual) appliance.
>
> This appliance can scale horizontally from a single to dozens of VMs
> (theoretically more, but most installations are small-ish). It is
> feasible to configure VMs for particular purposes (eg „you are [also] a
> DB node“), but basically all instances will/should be able to perform
> their main tasks besides (also) being a DB node. As the VMs may be
> installed in very different environments, network-based solutions are
> less feasible and we would prefer a DB-level solution. We assume that
> for most cases, primary/stand-by configurations would be sufficient in
> terms of availability / latency / throughput.
>
> We must also assume that there is no person who would be able to touch
> things if an error occurs. Data consistency and (as much as possible)
> automated recovery from error situations („VM down“, „network lost“, …)
> are therefor more important than „n nines". We can assume that the VMs
> can talk to each other over TCP (eg using SSH tunnels, direct Postgres
> connection, or some other suitable protocol). Scripting „around“ the
> database is available to initialize instances and for similar tasks.
>
> Would Postgres’ own log-shipping (file-based + streaming replication,
> possibly with remote_write) be sufficient for such a set of requirements?
>
> What aspects would you consider important for such a scenario?

The replication that ships with Postgres gives you one writeable primary
server and a number of standbys, but it has no tools to automatically
discover or recover from failure. From
https://www.postgresql.org/docs/current/warm-standby-failover.html:

---

PostgreSQL does not provide the system software required to identify a
failure on the primary and notify the standby database server. Many such
tools exist and are well integrated with the operating system facilities
required for successful failover, such as IP address migration.

Once failover to the standby occurs, there is only a single server in
operation. This is known as a degenerate state. The former standby is
now the primary, but the former primary is down and might stay down. To
return to normal operation, a standby server must be recreated, either
on the former primary system when it comes up, or on a third, possibly
new, system. The pg_rewind utility can be used to speed up this process
on large clusters. Once complete, the primary and standby can be
considered to have switched roles. Some people choose to use a third
server to provide backup for the new primary until the new standby
server is recreated, though clearly this complicates the system
configuration and operational processes.

---

So you need to add additional parts from the ecosystem to detect
failure, handle failover, potentially move an IP address with the
Primary etc. Popular tools are repmgr, Patroni, CloudNativePG, BDR,
pacemaker+corosync and endless others. They will address many of your
requirements, but still some work and understanding is required to make
them fully unattended, as well as have them expand dynamically with new
replicas if an appliance is added.

Postgres is an amazing product and I like to use it for almost
everything, but in this scenario we are of course making our live hard
with ACID compliance. In NoSQL/"eventual consistency" land, there are
products that are a lot friendlier to a setup like this - stuff like
Cassandra, etcd, CouchDb comes to mind. I'd compare the pros and cons of
such alternatives, the big con of course being a lot less consistency
and durability guarantees - the question is whether the application
needs it.

But enough about non-Postgres topics on this list :) To go with
Postgres, on a hunch I'd try Patroni first: it does a lot of the
advanced failover stuff, has a great track record, and supposedly runs
on BSD: https://openports.pl/path/databases/patroni

Cheers
Christian

--
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Achilleas Mantzios 2023-11-08 06:31:17 Re: PostgreSQL inheritance vs https://wiki.postgresql.org/wiki/Don%27t_Do_This
Previous Message Tom Lane 2023-11-07 15:08:00 Re: procedure string constant is parsed at procedure create time.