Re: [Bizgres-general] Hardware suggestions for a new data

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Filip Wuytack" <fwuytack(at)fgscapital(dot)com>, "bizgres-general" <bizgres-general(at)pgfoundry(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: [Bizgres-general] Hardware suggestions for a new data
Date: 2005-07-26 15:33:34
Message-ID: BF0BA5DE.9F9E%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Filip,

On 7/26/05 4:35 AM, "Filip Wuytack" <fwuytack(at)fgscapital(dot)com> wrote:

> Hi All,
>
> I've been lurking at both the bizgres and postgres mailing lists in recent
> months to get some ideas for building a new db server. I saw some threads on
> the 7k$ server and it sounds like the Opteron, lots of ram and raid10 is the
> way to go. (my budget is +/- £6000 (=$10000))

Cool - I'd look at the systems at Rackable, particularly this one:
http://www.rackable.com/products/s3118.htm. I've worked with someone who
bought one loaded with 7Terabytes for $15K, so you could buy one half full
of disk within your budget perhaps. SATA is the best bang for the buck, and
they seem to have the best packaging.

> But I'm keen to get some specific input on the more data warehouse type of
> db requirements. The db will hold some 1000 tables (multiple schemas, 1 db)
> with some of the larger tables going over 50m rows (and expect them to grow
> another 10m/year). This is mainly all financial/statistical data.

What's the overall size do you think? How much overall growth per year?

Table partitioning combined with tablespaces could be an important part of
your design.

If performance becomes an issue (after about 500GB of data it probably
will), Greenplum will have the MPP module to Bizgres soon which scales to
100TB+.

> I only
> have a small number of people working against the db (10 people plus maybe
> another 20 applications running batch jobs) but they crunch through lots of
> data (the bigger tables or views unioning big datasets). I'm also keen to
> make use of PL/R inside Postgresql, so I presume this might put extra
> requirements on the CPU side of things.

The primary issue here is the assignment of work_mem I think. It is a
single constant set in postgresql.conf and is the chunk of memory allocated
to every sort operator in a query execution plan. As such, any given
connection can allocate several of these chunks, and if many users come
online, you can run out of system memory if the work_mem setting is too
high. Perversely, it is also one of the important parameters that improve
performance.

We're going to work on this to make work_mem a resource managed parameter.

> I would be very keen to hear what you would suggest for this setup or what
> experiences you have to share.

We've a couple of success stories, one of them marred by hardware problems
with their Linux/Emulex Fibre Channel host bus adapter, but saved by the
replication scheme that Josh Berkus set up for them. That system is running
at 400GB and is in production running Bizgres now with Microstrategy and
KETL providing reporting and ETL, respectively.

> Btw, I would buy to hardware for the UK, so any good retailers you know of,
> let me know.

See above.

>
> Many thanks,
>
> Filip

Cheers!

- Luke

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2005-07-26 15:35:27 Re: transaction timeout
Previous Message Dr NoName 2005-07-26 15:33:19 Re: transaction timeout