Re: Thousands of tables versus on table?

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Thomas Andrews <tandrews(at)soliantconsulting(dot)com>
Cc: david(at)lang(dot)hm, Gregory Stark <stark(at)enterprisedb(dot)com>, Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Thousands of tables versus on table?
Date: 2007-06-05 17:09:58
Message-ID: 466598E6.5020501@g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thomas Andrews wrote:
>
>
> On 6/5/07 12:48 PM, "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com> wrote:
>
>
>> david(at)lang(dot)hm wrote:
>>
>>> On Mon, 4 Jun 2007, Scott Marlowe wrote:
>>>
>>>
>>>> Gregory Stark wrote:
>>>>
>>>>> "Thomas Andrews" <tandrews(at)soliantconsulting(dot)com> writes:
>>>>>
>>>>>
>>>>>
>>>>>> I guess my real question is, does it ever make sense to create
>>>>>>
>>>>> thousands > of
>>>>>
>>>>>> tables like this?
>>>>>>
>>>>>>
>>>>> Sometimes. But usually it's not a good idea.
>>>>> What you're proposing is basically partitioning, though you may not
>>>>> actually
>>>>> need to put all the partitions together for your purposes.
>>>>> Partitioning's
>>>>> main
>>>>> benefit is in the management of the data. You can drop and load
>>>>> partitions
>>>>> in
>>>>> chunks rather than have to perform large operations on millions of
>>>>> records.
>>>>>
>>>>> Postgres doesn't really get any faster by breaking the tables up like
>>>>> that. In
>>>>> fact it probably gets slower as it has to look up which of the
>>>>> thousands
>>>>> of
>>>>> tables you want to work with.
>>>>>
>>>>>
>>>> That's not entirely true. PostgreSQL can be markedly faster using
>>>> partitioning as long as you always access it by referencing the
>>>> partitioning key in the where clause. So, if you partition the table
>>>> by date, and always reference it with a date in the where clause, it
>>>> will usually be noticeably faster. OTOH, if you access it without
>>>> using a where clause that lets it pick partitions, then it will be
>>>> slower than one big table.
>>>>
>>>> So, while this poster might originally think to have one table for
>>>> each user, resulting in thousands of tables, maybe a compromise where
>>>> you partition on userid ranges would work out well, and keep each
>>>> partition table down to some 50-100 thousand rows, with smaller
>>>> indexes to match.
>>>>
>>>>
>>> what if he doesn't use the postgres internal partitioning, but instead
>>> makes his code access the tables named responsesNNNNN where NNNNN is
>>> the id of the customer?
>>>
>>> this is what it sounded like he was asking initially.
>>>
>> Sorry, I think I initially read your response as "Postgres doesn't
>> really get any faster by breaking the tables up" without the "like that"
>> part.
>>
>> I've found that as long as the number of tables is > 10,000 or so,
>>
That should have been as long as the number of tables is < 10,000 or so...

>> having a lot of tables doesn't seem to really slow pgsql down a lot.
>> I'm sure that the tipping point is dependent on your db machine. I
>> would bet that if he's referring to individual tables directly, and each
>> one has hundreds instead of millions of rows, the performance would be
>> better. But the only way to be sure is to test it.
>>
>
>
Please stop top posting. This is my last reply until you stop top posting.

> So, partitioning in PSQL 8 is workable, but breaking up the table up into
> actual separate tables is not?
>
Ummm, that's not what I said. They're similar in execution. However,
partitioning might let you put 100 customers into a given table, if,
say, you partitioned on customer ID or something that would allow you to
group a few together.
> Another solution we have proposed is having 'active' and 'completed' tables.
> So, rather than thousands, we'd have four tables:
>
> responders_active
> responders_completed
> responses_active
> responses_completed
>
That's not a bad idea. Just keep up on your vacuuming.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2007-06-05 18:55:44 Re: Thousands of tables versus on table?
Previous Message Thomas Andrews 2007-06-05 17:04:31 Re: Thousands of tables versus on table?