Thousands databases or schemas

From: Denis <socsam(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Thousands databases or schemas
Date: 2012-11-08 09:36:16
Message-ID: 1352367376704-5731189.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We have a web application where we create a schema or a database with a
number of tables in it for each customer. Now we have about 2600 clients.

The problem we met using a separate DB for each client is that the creation
of new DB can take up to 2 minutes, that is absolutely unacceptable. Using
schemes instead (one DB with a number of schemes containing similar tables
in it) solved this problem (schemes are created in a couple of seconds), but
created two other blocking points:
1. sometimes creation of a new table in a schema takes up to 5 seconds. In
case when we have create up to 40 tables in a schema this takes way too much
time.
2. "pg_dump -n schema_name db_name" takes from 30 to 60 seconds, no matter
how big is the amount of data in the schema. Also, the dump of the tables
structure only takes at least 30 seconds. Basing on this topic
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-td5709766i60.html,
pg_dump always analyses ALL the tables in the DB, i.e. in my case more than
100 000 tables.

I know you guys will ask me about selecting this particular application
architecture.
This architecture was chosen to ease the process of backup/restoring data
and isolating client's data from each other. Sometimes clients ask us to
restore data for the last month or roll back to last week's state. This task
is easy to accomplish then the client's data is isolated in a schema/DB. If
we put all the clients data in one table - operations of this kind will be
much harder to perform. We will have to restore a huge DB with an enormously
large tables in it to find the requested data. Sometime client even doesn't
remember the exact date, he or she just say "I lost my data somewhere
between Tuesday and Friday last week" and I have to restore backups for
several days. If I have one huge table instead of small tables it will be a
nightmare!

Different clients have different activity rate and we can select different
backup strategies according to it. This would be impossible in case we keep
all the clients data in one table.
Besides all the above mentioned, the probability of massive data corruption
(if an error in our web application occurs) is much higher.

P.S.
Not to start a holywar, but FYI: in a similar project where we used MySQL
now we have about 6000 DBs and everything works like a charm.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Thousands-databases-or-schemas-tp5731189.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Samuel Gendler 2012-11-08 10:31:52 Re: Thousands databases or schemas
Previous Message Denis 2012-11-08 09:04:34 Re: [HACKERS] pg_dump and thousands of schemas