Recommendations for configuring a 200 GB database

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Recommendations for configuring a 200 GB database
Date: 2005-06-08 21:04:36
Message-ID: s2a71722.022@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We have had four databases serving our web site, but due to licensing
issues, we have had to take two out of production, and we are looking to
bring those two onto PostgreSQL very quickly, with an eye toward moving
everything in the longer term. The central web DBs are all copies of
the same data, drawn from 72 servers at remote locations. We replicate
modifications made at these 72 remote sites real-time to all central
servers.

On each central server, there are 352 tables and 412 indexes holding
about 700 million rows, taking almost 200 GB of disk space. The largest
table has about 125 million of those rows, with several indexes. There
are about 3 million database transactions modifying each central
database every day, with each transaction typically containing many
inserts and/or updates -- deletes are sparse. During idle time the
replication process compares tables in the source databases to the
central databases to log any differences and correct the central copies.
To support the 2 million browser and SOAP hits per day, the web sites
spread about 6 million SELECT statements across available central
servers, using load balancing. Many of these queries involve a 10 or
more tables with many subqueries; some involve unions.

The manager of the DBA team is reluctant to change both the OS and the
DBMS at the same time, so unless I can make a strong case for why it is
important to run postgresql under Linux, we will be running this on
Windows. Currently, there are two Java-based middle tier processes
running on each central database server, one for the replication and one
for the web. We expect to keep it that way, so the database needs to
play well with these processes.

I've been reading everything I can find on postgresql configuration, but
would welcome any specific suggestions for this environment. I'd also
be really happy to hear that we're not the first to use postgresql with
this much data and load.

Thanks for any info you can provide.

-Kevin

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sam Vilain 2005-06-08 21:14:02 Re: Postgresql on an AMD64 machine
Previous Message Jim Johannsen 2005-06-08 20:48:05 Re: Help with rewriting query