Re: Thousands of tables versus on table?

From: Thomas Andrews <tandrews(at)soliantconsulting(dot)com>
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>, <david(at)lang(dot)hm>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Thousands of tables versus on table?
Date: 2007-06-05 17:04:31
Message-ID: C28B0FDF.AD2%tandrews@soliantconsulting.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

So, partitioning in PSQL 8 is workable, but breaking up the table up into
actual separate tables is not?

Another solution we have proposed is having 'active' and 'completed' tables.
So, rather than thousands, we'd have four tables:

responders_active
responders_completed
responses_active
responses_completed

That way, the number of responses_active records would not be as huge. The
problem, as we see it, is that the responders are entering their responses
and it is taking too long. But if we separate out active and completed
surveys, then the inserts will likely cost less. We might even be able to
reduce the indices on the _active tables because survey administrators would
not want to run as many complex reports on the active responses.

There would be an extra cost, when the survey is completed, of copying the
records from the '_active' table to the '_completed' table and then deleting
them, but that operation is something a survey administrator would be
willing to accept as taking a while (as well as something we could put off
to an off hour, although we have lots of international customers so it's not
clear when our off hours are.)

=thomas

On 6/5/07 12:48 PM, "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com> wrote:

> david(at)lang(dot)hm wrote:
>> On Mon, 4 Jun 2007, Scott Marlowe wrote:
>>
>>> Gregory Stark wrote:
>>>> "Thomas Andrews" <tandrews(at)soliantconsulting(dot)com> writes:
>>>>
>>>>
>>>>> I guess my real question is, does it ever make sense to create
>>>> thousands > of
>>>>> tables like this?
>>>>>
>>>> Sometimes. But usually it's not a good idea.
>>>> What you're proposing is basically partitioning, though you may not
>>>> actually
>>>> need to put all the partitions together for your purposes.
>>>> Partitioning's
>>>> main
>>>> benefit is in the management of the data. You can drop and load
>>>> partitions
>>>> in
>>>> chunks rather than have to perform large operations on millions of
>>>> records.
>>>>
>>>> Postgres doesn't really get any faster by breaking the tables up like
>>>> that. In
>>>> fact it probably gets slower as it has to look up which of the
>>>> thousands
>>>> of
>>>> tables you want to work with.
>>>>
>>>
>>> That's not entirely true. PostgreSQL can be markedly faster using
>>> partitioning as long as you always access it by referencing the
>>> partitioning key in the where clause. So, if you partition the table
>>> by date, and always reference it with a date in the where clause, it
>>> will usually be noticeably faster. OTOH, if you access it without
>>> using a where clause that lets it pick partitions, then it will be
>>> slower than one big table.
>>>
>>> So, while this poster might originally think to have one table for
>>> each user, resulting in thousands of tables, maybe a compromise where
>>> you partition on userid ranges would work out well, and keep each
>>> partition table down to some 50-100 thousand rows, with smaller
>>> indexes to match.
>>>
>>
>> what if he doesn't use the postgres internal partitioning, but instead
>> makes his code access the tables named responsesNNNNN where NNNNN is
>> the id of the customer?
>>
>> this is what it sounded like he was asking initially.
>
> Sorry, I think I initially read your response as "Postgres doesn't
> really get any faster by breaking the tables up" without the "like that"
> part.
>
> I've found that as long as the number of tables is > 10,000 or so,
> having a lot of tables doesn't seem to really slow pgsql down a lot.
> I'm sure that the tipping point is dependent on your db machine. I
> would bet that if he's referring to individual tables directly, and each
> one has hundreds instead of millions of rows, the performance would be
> better. But the only way to be sure is to test it.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2007-06-05 17:09:58 Re: Thousands of tables versus on table?
Previous Message Hanu Kurubar 2007-06-05 17:01:24 Re: upgraded to pgsql 8.2.4, getting worse performance then 7.4.x