From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Jason Hihn <jhihn(at)paytimepayroll(dot)com>, "Pgsql-Novice(at)Postgresql(dot) Org" <pgsql-novice(at)postgresql(dot)org> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Ideal Hardware? |
Date: | 2003-10-01 17:38:53 |
Message-ID: | 200310011038.53024.josh@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice pgsql-performance |
Jason,
Your question is really suited to the PERFORMANCE list, not NOVICE, so I have
cross-posted it there. I reccomend that you subscribe to performance, and
drop novice from your replies. There are lots of hardware geeks on
performance, but few on novice.
> We have an opportunity to purchase a new, top-notch database server. I am
> wondering what kind of hardware is recommended? We're on Linux platforms
> and kernels though. I remember a comment from Tom about how he was spending
> a lot of time debugging problems which turned out to be hardware-related. I
> of course would like to avoid that.
>
> In terms of numbers, we expect have an average of 100 active connections
> (most of which are idle 9/10ths of the time), with about 85% reading
> traffic. I expect the database with flow average 10-20kBps under moderate
> load. I hope to have one server host about 1000-2000 active databases, with
> the largest being about 60 meg (no blobs). Inactive databases will only be
> for reading (archival) purposes, and will seldom be accessed.
Is that 100 concurrent connections *total*, or per-database? If the
connections are idle 90% of the time, then are they open, or do they get
re-established with each query? Have you considered connection pooling for
the read-only queries?
> Does any of this represent a problem for Postgres? The datasets are
> typically not that large, only a few queries on a few databases ever return
> over 1000 rows. I'm worried about being able to handle the times when there
> will be spikes in the traffic.
It's all possible, it just requires careful application design and lots of
hardware. You should also cost things out; sometimes it's cheaper to have
several good servers instead of one uber-server. The latter also helps with
hardware replacement.
> The configuration that is going on in my head is:
> RAID 1, 200gig
RAID 1+0 can be good for Postgres. However, if you have a budget, RAID 5
with 6 or more disks can be better some of the time, particularly when read
queries are the vast majority of the load. There are, as yet, no difinitive
statistics, but OSDL is working on it!
More important than the RAID config is the RAID card; once again, with money,
multi-channel RAID cards with a battery-backed write cache are your best bet;
some cards even allow you to span RAID1 between cards of the same model. See
the discussion about LSI MegaRaid in the PERFORMANCE list archives over the
last 2 weeks.
> 1 server, 4g ram
> Linux 2.6
You're very brave. Me, I'm not adopting 2.6 in production until 2.6.03 is
out, at least.
> I was also wondering about storage units (IBM FAStT200) with giga-bit
> Ethernet to rack mount computer(s)... But would I need more than 1 CPU? If
> I did, how would I handle the file system? We only do a few joins, so I
> think most of it would be I/O latency.
PostgreSQL will make use of multiple processors. If you are worried about
peak time loads, having 2-4 processors to distribute queries across would be
very useful.
Also, I'm concerned about the "we only do a few joins". What that says to
me is "we don't really know how to write complex queries, so we pull a lot of
redundant data." Simple queries can be far less efficient than complex ones
if they result in you pulling entire tables across to the client.
--
Josh Berkus
Aglio Database Solutions
San Francisco
From | Date | Subject | |
---|---|---|---|
Next Message | Manfred Koizar | 2003-10-01 18:03:23 | Re: SELECT syntax question - combining COUNT and DISTINCT |
Previous Message | Nabil Sayegh | 2003-10-01 16:52:38 | text -> extracted list of words |
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2003-10-01 17:41:59 | Re: TPC-R benchmarks |
Previous Message | Rod Taylor | 2003-10-01 17:38:02 | Re: Optimizing >= and <= for numbers and dates |