From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Pritam Barhate <pritambarhate(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Using AWS ephemeral SSD storage for production database workload? |
Date: | 2018-01-29 17:32:16 |
Message-ID: | a47d8499-e275-fc76-02a8-357448228dfe@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 01/29/2018 05:41 PM, Pritam Barhate wrote:
> Hi everyone,
>
> As you may know, EBS volumes though durable are very costly when you
> need provisioned IOPS. As opposed to this AWS instance attached
> ephemeral SSD is very fast but isn't durable.
>
> I have come across some ideas on the Internet where people hinted at
> running production PostgreSQL workloads on AWS ephemeral SSD
> storage. Generally, this involves shipping WAL logs continuously to
> S3 and keeping an async read replica in another AWS availability
> zone. Worst case scenario in such deployment is data loss of a few
> seconds. But beyond this the details are sketchy.
>
Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?
> Have you come across such a deployment? What are some best practices
> that need to be followed to pull this through without significant
> data loss? Even though WAL logs are being shipped to S3, in case of
> loss of both the instances, the restore time is going be quite a bit
> for databases of a few hundred GBs.
>
Pretty much everyone who is serious about HA is running such cluster. If
they can't afford any data loss, they use synchronous replicas instead.
That's a basic latency-durability trade-off.
> Just to be clear, I am not planning anything like this, anytime soon
> :-) But I am curious about trade-offs of such a deployment. Any
> concrete information in this aspect is well appreciated.
>
Pretty much everyone is using such architecture (primary + streaming
replicas) nowadays, so it's a reasonably well understood scenario. But
it's really unclear what kind of information you expect to get, or how
much time have you spent reading about this.
There is quite a bit of information in the official docs, although maybe
a bit too low level - it certainly gives you the building blocks instead
of a complete solution. There are also books like [1] for example.
And finally there are tools that help with managing such clusters, like
for example [2]. Not only it's rather bad idea to implement this on your
own (bugs, unnecessary effort) but the tools also show how to do stuff.
[1]
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Pritam Barhate | 2018-01-29 17:57:32 | Re: Using AWS ephemeral SSD storage for production database workload? |
Previous Message | Kumar, Virendra | 2018-01-29 17:00:51 | RE: pgpool Connections Distributions Among Nodes |