Re: Partitioning Vs. Split Databases - performance?

From: Benjamin Smith <lists(at)benjamindsmith(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Partitioning Vs. Split Databases - performance?
Date: 2006-12-21 22:07:13
Message-ID: 200612211407.14017.lists@benjamindsmith.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thursday 21 December 2006 11:47, Ron Johnson wrote:
> This gives you linear growth potential, since if your current box
> gets over-utilized, buy a 2nd box and move some of the databases to it.

So far, I'm inclined to go this way, due to the option for linear scaling.

> >> 2) Copy out the data specific to a customer and load into separate tables
> >> (with slightly different names, EG table "dates" becomes "cust1_dates")
> >> and
> >> use data partitioning to help with performance as needed.
>
> Definitely *not* scalable. And *very* messy. Yech.

Scales better than present, methinks, but still not the best idea. I'd have to
revisit all my queries to make sure that they use the correct tablename.

> > 3) Put each customer in their own schema/namespace which resides within
> > its own table space.
> >
> > Then you can move customers wherever you need in terms of IO.

How is that functionally different than using a separate database? What's the
advantage here? I don't *need* to restrict myself to one database, and doing
this does require that I revisit 100% of the SQL queries to make sure that
I'm referencing the right schema.

This solution seems to have the same problems as using dynamic tablenames.

> Splitting like mentioned in these three tactics means that you've
> now got 2x as many tables. Add more customers and you've got that
> many more tables. Perfect candidate for "schema drift".
>
> If each table has cust_id in it, then you could:
>
> 4) retain 1 database and partition each table on cust_id.
>
> http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Intriguing idea, and one that I might do in the future. However, I have 170
normalized tables defined at present. The task of auditing each of these
tables for the constraint ranges sounds somewhat nightmarish. Not all the
tables have the cust_id record defined - some are implicit.

> >> Given the same physical hardware, which one is likely to perform better?
Does
> >> it make any difference? Does using separate databases use more RAM than a
> >> single database with a bunch of different tables?
>
> Config files are global, so I doubt it.
>
> >> Company is growing rapidly, so growth room is important...
>
> Then go for Option 1.

My conclusion, too. Another poster mentioned schema drift, and that's a real
concern, but we're already updating the schema through a script which could
be extended to update all databases, not just one, so I feel this problem
would be minor to non-existent.

Thanks!

-Ben
--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2006-12-21 22:41:35 Re: Partitioning Vs. Split Databases - performance?
Previous Message Walter Vaughan 2006-12-21 21:58:58 Re: Website Hosting Service and Security