From: | Ryan Kelly <rpkelly22(at)gmail(dot)com> |
---|---|
To: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: PostgreSQL, OLAP, and Large Clusters |
Date: | 2012-09-27 18:50:03 |
Message-ID: | 20120927185003.GD25296@llserver.lakeliving.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Sep 26, 2012 at 03:18:16PM -0600, Scott Marlowe wrote:
> On Wed, Sep 26, 2012 at 5:50 AM, Ryan Kelly <rpkelly22(at)gmail(dot)com> wrote:
> > Hi:
> >
> > The size of our database is growing rather rapidly. We're concerned
> > about how well Postgres will scale for OLAP-style queries over terabytes
> > of data. Googling around doesn't yield great results for vanilla
> > Postgres in this application, but generally links to other software like
> > Greenplum, Netezza, and Aster Data (some of which are based off of
> > Postgres). Too, there are solutions like Stado. But I'm concerned about
> > the amount of effort to use such solutions and what we would have to
> > give up feature-wise.
>
> If you want fastish OLAP on postgres you need to do several things.
>
> 1: Throw very fast disk arrays at it. Lots of spinners in a linux SW
> RAID-10 or RAID-0 if your data is easily replaceable work wonders
> here.
> 2: Throw lots of memory at it. Memory is pretty cheap. 256G is not
> unusual for OLAP machines
> 3: Throw fast CPUs at it. Faster CPUs, especially fewer faster cores,
> are often helpful.
What do you mean by "fewer faster cores"? Wouldn't "more faster cores"
be better?
> Applied in that order you can get some pretty impressive results.
>
> A lot of OLAP stuff needs to read hundreds of gigs at a time from the
> drive array. An array of 32 15kRPM drives, each reading at ~100MB/s
> or better can flood your PCI bus at 3200MB/s for reads. Note that
> most RAID controllers aren't as fast for sequential reads on large
> drive arrays. Although a battery backed cache can GREATLY improved
> parallel write speed, it doesn't really make a big difference for big
> sequential stuff and usually gets in the way here.
>
> Memory to cache as much as possible and allow all your queries to do
> hash joins etc in memory (crank up work_mem as needed, but be careful
> not to use all your memory up.)
>
> Lastly once you've thrown lots of IO and memory at it, a faster CPU
> can make a bit of a difference too. But honestly I'd rather have a
> dual core 2GHz CPU on top of 32 spinners with 256G than a 3.6GHz 8
> core CPU on top of 4 drives and 32G of RAM.
All of this seems like great advice.
Thanks,
-Ryan Kelly
From | Date | Subject | |
---|---|---|---|
Next Message | Ryan Kelly | 2012-09-27 18:54:46 | Re: PostgreSQL, OLAP, and Large Clusters |
Previous Message | Ryan Kelly | 2012-09-27 18:47:27 | Re: PostgreSQL, OLAP, and Large Clusters |