Re: Postgres for a "data warehouse", 5-10 TB

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: sthomas(at)peak6(dot)com
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, Andy Colson <andy(at)squeakycode(dot)net>, Igor Chudov <ichudov(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres for a "data warehouse", 5-10 TB
Date: 2011-09-12 20:44:44
Message-ID: CAOR=d=2iFiL5b3j7p1oS1PxHjAZvUw7=F0XbC6sksQDYvd0RWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Sep 12, 2011 at 2:04 PM, Shaun Thomas <sthomas(at)peak6(dot)com> wrote:
> On 09/12/2011 02:48 PM, Scott Marlowe wrote:
>
>> I put it to you that your hardware has problems if you have a pg db
>> that's corrupting from having too much vacuum activity.
>
> What? No. We optimized by basically forcing autovacuum to never run during
> our active periods. We never actually encountered wrap-around corruption. I
> was just saying that 600M is a relatively high setting for
> autovacuum_freeze_max_age. :)

You don't get corruption from wrap around, you get a database that
stops and tells you to run a vacuum by hand on a single user backend
and won't come up until you do. You throw around the word corruption
a lot. The PostgreSQL team works REALLY hard to prevent any kind of
corruption scenario from rearing its ugly head, so when the word
corruption pops up I start to wonder about the system (hardware wise)
someone is using, since only killing the postmaster by hand, then
deleting the interlock file and starting a new postmaster while old
postgres children are still active is just about the only way to
corrupt pgsql, short of using vi on one of the files in
/data/base/xxx/yyy etc.

>
> I was alluding to the fact that if a DBA had his system running for a week
> at our transaction level, and PG didn't have forced auto vacuum, and their
> maintenance lapsed even slightly, they could end up with a corrupt database.
> Not too far-fetched for someone coming from MySQL, really.
>
> Our problem is we run a financial site, and the front-end very aggressively
> monitors network and database timeouts. The limit is sufficiently low that a
> vacuum would cause enough IO to trigger application timeouts, even with
> vacuum_cost_delay. And of course, setting vacuum_cost_delay too high quickly
> triples or quadruples vacuum times. Now that we're using FusionIO cards,
> I've been thinking about turning autovacuum back on, but I want to run some
> tests first.
>
> My point stands, though. Don't go crazy with transactions until you know
> your config can stand up to it, and reduce if possible. We found some tweak
> points that drastically reduced transaction count with no detrimental effect
> on the app itself, so we jumped on them.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas(at)peak6(dot)com
>
> ______________________________________________
>
> See http://www.peak6.com/email-disclaimer/ for terms and conditions related
> to this email
>

--
To understand recursion, one must first understand recursion.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2011-09-12 20:55:43 Re: Postgres for a "data warehouse", 5-10 TB
Previous Message Marti Raudsepp 2011-09-12 20:19:29 Re: Postgres for a "data warehouse", 5-10 TB