Re: Postgres for a "data warehouse", 5-10 TB

From: Igor Chudov <ichudov(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres for a "data warehouse", 5-10 TB
Date: 2011-09-11 13:59:16
Message-ID: CAMhtkAah2c4XfSec=OtgL1V51wpD=jygbZnBVBRYCV02MscebQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Sep 11, 2011 at 7:52 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>wrote:

> On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov <ichudov(at)gmail(dot)com> wrote:
> > I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> > CPU cores.
>
> 1 or 2 fast cores is plenty for what you're doing.

I need those cores to perform other tasks, like image manipulation with
imagemagick, XML forming and parsing etc.

> But the drive
> array and how it's configured etc are very important. There's a huge
> difference between 10 2TB 7200RPM SATA drives in a software RAID-5 and
> 36 500G 15kRPM SAS drives in a RAID-10 (SW or HW would both be ok for
> data warehouse.)

Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6
configuration.

They are managed by a 3WARE 9750 RAID CARD.

I would say that I am not very concerned with linear relationship of read
speed to disk speed. If that stuff is somewhat slow, it is OK with me.

What I want to avoid is severe degradation of performance due to size (time
complexity greater than O(1)), disastrous REPAIR TABLE operations etc.

> I do not know much about Postgres, but I am very eager to learn and
> > see if I can use it for my purposes more effectively than MySQL.
> > I cannot shell out $47,000 per CPU for Oracle for this project.
> > To be more specific, the batch queries that I would do, I hope,
>
> Hopefully if needs be you can spend some small percentage of that for
> a fast IO subsystem is needed.
>
>

I am actually open for suggestions here.

> > would either use small JOINS of a small dataset to a large dataset, or
> > just SELECTS from one big table.
> > So... Can Postgres support a 5-10 TB database with the use pattern
> > stated above?
>
> I use it on a ~3TB DB and it works well enough. Fast IO is the key
> here. Lots of drives in RAID-10 or HW RAID-6 if you don't do a lot of
> random writing.
>

I do not plan to do a lot of random writing. My current design is that my
perl scripts write to a temporary table every week, and then I do INSERT..ON
DUPLICATE KEY UPDATE.

By the way, does that INSERT UPDATE functionality or something like this
exist in Postgres?

i

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Igor Chudov 2011-09-11 14:00:36 Re: Postgres for a "data warehouse", 5-10 TB
Previous Message pasman pasmański 2011-09-11 13:36:41 Re: Postgres for a "data warehouse", 5-10 TB