Re: Very large database

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Michael Welter <mike(at)introspect(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Very large database
Date: 2002-01-14 22:32:10
Message-ID: 20020114223210.B10841042A@polaris.pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Not enough info. How many tables? Is the nightly run a bulk insert or update
of data or more complicated than that? What sort of queries (quantity and
complexity) is the database supposed to handle and what is the acceptable
performance (how many simultaneous users, how many queries per second and
what is the acceptable response time to a query)?

Other things being equal, hardware RAID beats software RAID as you will keep
the processor free for your programs. As a general rule, more spindles is
better, more memory is better but the specifics of your project will point to
the area of maximum benefit.

If lots of queries will hit the same data, cache memory on the RAID card or
external RAID subsystem will help. If you have lots of scattered writes, a
RAID with a battery-backed cache that can safely optimize disk writes (ie.,
writes don't have to be sent to disk right away to protect them - they can be
made when the disks are available) will help.

In other words, depending on what you are trying to do you may need anything
from a couple 100GB IDE in you Linux box to an external Winchester Systems
Flash Disk.

I can't speak with authority on the SMP issue but have run across items in
the newsgroups that indicate that SMP performance in Postgresql needs work
and you may be better off with a screaming single CPU machine. Don't overlook
the effects of the on-chip cache size, bus and memory speeds.

Given that you can get 4 70+GB IDE drives for not a huge investment, I'd
start there and make a machine with software RAID. Do some testing and
development in that environment and use the tools available to see if your
bottlenecks seem more influenced by disk IO, memory, CPU or just what.
Develop, test, experiment and you will be in a much better position to spec a
production system.

-Steve

On Tuesday 08 January 2002 18:34, Michael Welter wrote:
> I need some help here. We need to implement a 180+GB database with
> 120+MB of updates every evening. Rather than purchasing the big iron,
> we would like to use postgres running over Linux 2.4.x as the data
> server. Is this even possible? Who has the largest database out there
> and what does it run on?
>
> How should we implement the disk array? Should we purchase a hardware
> RAID card or should we use the software RAID capabilities in Linux 2.4?
> Should we consider a SMP system? Should we use an outboard RAID box
> (like RaidZone)?
>
> If anyone out there has implemented a database of this size then I would
> like to correspond with you.
>
> Thanks for your help,
> Mike
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Sullivan 2002-01-14 22:57:25 Re: 7.2 changes to varchar truncation
Previous Message Tom Lane 2002-01-14 22:30:31 Re: Anyway to know which users are connected to postgres?