Re: multi billion row tables: possible or insane?

From: John Arbash Meinel <john(at)arbash-meinel(dot)com>
To: Ramon Bastiaans <bastiaans(at)sara(dot)nl>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: multi billion row tables: possible or insane?
Date: 2005-03-01 15:19:00
Message-ID: 422487E4.6040107@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ramon Bastiaans wrote:

> Hi all,
>
> I am doing research for a project of mine where I need to store
> several billion values for a monitoring and historical tracking system
> for a big computer system. My currect estimate is that I have to store
> (somehow) around 1 billion values each month (possibly more).
>
If you have that 1 billion perfectly distributed over all hours of the
day, then you need 1e9/30/24/3600 = 385 transactions per second.

Which I'm pretty sure is possible with postgres, you just need pretty
beefy hardware. And like Jeff said, lots of disks for lots of IO.
Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
raid10 not raid5, etc. To improve query performance, you can do some
load balancing by having replication machines by using Slony.

Or if you can do batch processing, you could split up the work into a
few update machines, which then do bulk updates on the master database.
This lets you get more machines into the job, since you can't share a
database across multiple machines.

> I was wondering if anyone has had any experience with these kind of
> big numbers of data in a postgres sql database and how this affects
> database design and optimization.
>
Well, one of the biggest things is if you can get bulk updates, or if
clients can handle data being slightly out of date, so you can use
cacheing. Can you segregate your data into separate tables as much as
possible? Are your clients okay if aggregate information takes a little
while to update?

One trick is to use semi-lazy materialized views to get your updates to
be fast.

> What would be important issues when setting up a database this big,
> and is it at all doable? Or would it be a insane to think about
> storing up to 5-10 billion rows in a postgres database.

I think you if you can design the db properly, it is doable. But if you
have a clients saying "I need up to the second information on 1 billion
rows", you're never going to get it.

>
> The database's performance is important. There would be no use in
> storing the data if a query will take ages. Query's should be quite
> fast if possible.
>
Again, it depends on the queries being done.
There are some nice tricks you can use, like doing a month-by-month
partitioning (if you are getting 1G inserts, you might want week-by-week
partitioning), and then with a date column index, and a union all view
you should be able to get pretty good insert speed, and still keep fast
*recent* queries. Going through 1billion rows is always going to be
expensive.

> I would really like to hear people's thoughts/suggestions or "go see a
> shrink, you must be mad" statements ;)
>
> Kind regards,
>
> Ramon Bastiaans

I think it would be possible, but there are a lot of design issues with
a system like this. You can't go into it thinking that you can design a
multi billion row database the same way you would design a million row db.

John
=:->

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vig, Sandor (G/FI-2) 2005-03-01 15:40:29 Re: multi billion row tables: possible or insane?
Previous Message Markus Schaber 2005-03-01 14:01:50 Re: multi billion row tables: possible or insane?