Re: Using AWS ephemeral SSD storage for production database workload?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Pritam Barhate <pritambarhate(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Using AWS ephemeral SSD storage for production database workload?
Date: 2018-01-29 17:32:16
Message-ID: a47d8499-e275-fc76-02a8-357448228dfe@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01/29/2018 05:41 PM, Pritam Barhate wrote:
> Hi everyone, 
>
> As you may know, EBS volumes though durable are very costly when you
> need provisioned IOPS. As opposed to this AWS instance attached
> ephemeral SSD is very fast but isn't durable.
>
> I have come across some ideas on the Internet where people hinted at
> running production PostgreSQL workloads on AWS ephemeral SSD
> storage. Generally, this involves shipping WAL logs continuously to
> S3 and keeping an async read replica in another AWS availability
> zone. Worst case scenario in such deployment is data loss of a few
> seconds. But beyond this the details are sketchy.
>

Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?

> Have you come across such a deployment? What are some best practices
> that need to be followed to pull this through without significant
> data loss? Even though WAL logs are being shipped to S3, in case of
> loss of both the instances, the restore time is going be quite a bit
> for databases of a few hundred GBs.
>

Pretty much everyone who is serious about HA is running such cluster. If
they can't afford any data loss, they use synchronous replicas instead.
That's a basic latency-durability trade-off.

> Just to be clear, I am not planning anything like this, anytime soon
> :-) But I am curious about trade-offs of such a deployment. Any
> concrete information in this aspect is well appreciated.
>

Pretty much everyone is using such architecture (primary + streaming
replicas) nowadays, so it's a reasonably well understood scenario. But
it's really unclear what kind of information you expect to get, or how
much time have you spent reading about this.

There is quite a bit of information in the official docs, although maybe
a bit too low level - it certainly gives you the building blocks instead
of a complete solution. There are also books like [1] for example.

And finally there are tools that help with managing such clusters, like
for example [2]. Not only it's rather bad idea to implement this on your
own (bugs, unnecessary effort) but the tools also show how to do stuff.

[1]
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition

[2] https://repmgr.org/

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pritam Barhate 2018-01-29 17:57:32 Re: Using AWS ephemeral SSD storage for production database workload?
Previous Message Kumar, Virendra 2018-01-29 17:00:51 RE: pgpool Connections Distributions Among Nodes