From: | "Bucky Jordan" <bjordan(at)lumeta(dot)com> |
---|---|
To: | "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "Luke Lonergan" <llonergan(at)greenplum(dot)com> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Large tables (was: RAID 0 not as fast as expected) |
Date: | 2006-09-18 22:40:51 |
Message-ID: | 78ED28FACE63744386D68D8A9D1CF5D42099E1@MAIL.corp.lumeta.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> good normalization skills are really important for large databases,
> along with materialization strategies for 'denormalized sets'.
Good points- thanks. I'm especially curious what others have done for
the materialization. The matview project on gborg appears dead, and I've
only found a smattering of references on google. My guess is, you roll
your own for optimal performance...
> regarding the number of rows, there is no limit to how much pg can
> handle per se, just some practical limitations, especially vacuum and
> reindex times. these are important because they are required to keep
> a handle on mvcc bloat and its very nice to be able to vaccum bits of
> your database at a time.
I was hoping for some actual numbers on "practical". Hardware isn't too
much of an issue (within reason- we're not talking an amazon or google
here... the SunFire X4500 looks interesting... )- if a customer wants to
store that much data, and pay for it, we'll figure out how to do it. I'd
just rather not have to re-design the database. Say the requirement is
to keep 12 months of data accessible, each "scan" produces 100M records,
and I run one per month. What happens if the customer wants to run it
once a week? I was more trying to figure out at what point (ballpark)
I'm going to have to look into archive tables and things of that nature
(or at Bizgres/MPP). It's easier for us to add more/bigger hardware, but
not so easy to redesign/add history tables...
>
> just another fyi, if you have a really big database, you can forget
> about doing pg_dump for backups (unless you really don't care about
> being x day or days behind)...you simply have to due some type of
> replication/failover strategy. i would start with pitr.
>
> merlin
I was originally thinking replication, but I did notice some nice pitr
features in 8.x - I'll have to look into that some more.
Thanks for the pointers though...
- Bucky
From | Date | Subject | |
---|---|---|---|
Next Message | Alex Turner | 2006-09-18 23:14:56 | Re: Large tables (was: RAID 0 not as fast as expected) |
Previous Message | Jim C. Nasby | 2006-09-18 22:40:30 | Re: Vacuums on large busy databases |