Re: Using AWS ephemeral SSD storage for production database workload?

From: Pritam Barhate <pritambarhate(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Using AWS ephemeral SSD storage for production database workload?
Date: 2018-01-29 17:57:32
Message-ID: CALpo98Ufx4hYZeJy2Ae59mBrxtuC25LKx-aPbTU6ijR4RrD9ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>> Both log shipping and async replication are ancient features, and should
>> be well understood. What exactly is unclear?

I know about these and I know how to operate them also. The only part I am
concerned about is the ephemeral storage. The risk appetite around it and
the steps people take in order to ensure no "serious" data is lost when
both the primary and the standby are lost (very unlikely when both are in
different AZ but still possible.). I was just wondering if there is any
secret sauce (like some wisdom that comes only from operating a real-world
deployment) to it. Even Heroku seems to be using PIOS (
https://devcenter.heroku.com/articles/heroku-postgres-production-tier-technical-characterization)
and these guys created WAL-E. Anyways I did learn some new things from
Manuel's response.

In short, I am just trying to learn from other people's experience.

Thanks for all the information.

Pritam.

On Mon, Jan 29, 2018 at 11:02 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:

>
>
> On 01/29/2018 05:41 PM, Pritam Barhate wrote:
> > Hi everyone,
> >
> > As you may know, EBS volumes though durable are very costly when you
> > need provisioned IOPS. As opposed to this AWS instance attached
> > ephemeral SSD is very fast but isn't durable.
> >
> > I have come across some ideas on the Internet where people hinted at
> > running production PostgreSQL workloads on AWS ephemeral SSD
> > storage. Generally, this involves shipping WAL logs continuously to
> > S3 and keeping an async read replica in another AWS availability
> > zone. Worst case scenario in such deployment is data loss of a few
> > seconds. But beyond this the details are sketchy.
> >
>
> Both log shipping and async replication are ancient features, and should
> be well understood. What exactly is unclear?
>
> > Have you come across such a deployment? What are some best practices
> > that need to be followed to pull this through without significant
> > data loss? Even though WAL logs are being shipped to S3, in case of
> > loss of both the instances, the restore time is going be quite a bit
> > for databases of a few hundred GBs.
> >
>
> Pretty much everyone who is serious about HA is running such cluster. If
> they can't afford any data loss, they use synchronous replicas instead.
> That's a basic latency-durability trade-off.
>
> > Just to be clear, I am not planning anything like this, anytime soon
> > :-) But I am curious about trade-offs of such a deployment. Any
> > concrete information in this aspect is well appreciated.
> >
>
> Pretty much everyone is using such architecture (primary + streaming
> replicas) nowadays, so it's a reasonably well understood scenario. But
> it's really unclear what kind of information you expect to get, or how
> much time have you spent reading about this.
>
> There is quite a bit of information in the official docs, although maybe
> a bit too low level - it certainly gives you the building blocks instead
> of a complete solution. There are also books like [1] for example.
>
> And finally there are tools that help with managing such clusters, like
> for example [2]. Not only it's rather bad idea to implement this on your
> own (bugs, unnecessary effort) but the tools also show how to do stuff.
>
> [1]
> https://www.packtpub.com/big-data-and-business-intelligence/postgresql-
> replication-second-edition
>
> [2] https://repmgr.org/
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vitaliy Garnashevich 2018-01-29 18:31:53 EXPLAIN BUFFERS: dirtied
Previous Message Tomas Vondra 2018-01-29 17:32:16 Re: Using AWS ephemeral SSD storage for production database workload?