From: | Andreas Brandl <ml(at)3(dot)141592654(dot)de> |
---|---|
To: | John R Pierce <pierce(at)hogranch(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Querying a time range across multiple partitions |
Date: | 2014-09-08 20:40:21 |
Message-ID: | 149ba37d-0688-48bc-98f0-0ada102dd377@store1.zcs.ext.wpsrv.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
John,
> On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce < pierce(at)hogranch(dot)com >
> wrote:
>> On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
>> Number of child tables: 1581
>> that's an insane number of children. We try and limit it to 50 or so
>> child tables, for instance, 6 months retention by week, of data will
>> millions of rows/day.
>>
>> I've used more than that many for testing purposes, and there was
>> little problem. The main thing is that your insert trigger (if you
>> have one on the master table) needs to be structured as a binary
>> search-like nesting of if..elsif, not a linear-searching like
>> structure. Unless of course almost all inserts go into the newest
>> partition, then it might make more sense to do the linear search
>> with that being the first test. But for performance, better to just
>> insert directly into the correct child table.
> any select that can't be preplanned to a specific child will need to
> check all 1500 children. this is far less efficient than checking,
> say, 50 and letting the b-tree index of each child reject or narrow
> down to the specific row(s). The one is roughly 1500*log(N/1500)
> while the other is 50*log(N/50) at least to a first order
> approximation.
can you explain that further? In the end, that argument sounds like it would always be more efficient to use a single table and its index instead, rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not totally lost today).
So, is there any insights of how many partitions are still useful?
I have tables with roughly 1000 partitions and did not have any issues so far. Even with having INSERT rules that are lined up worst-case (from past to current, while data is always being inserted for the current date), I haven't seen any considerable degradation of INSERT performance so far.
Thanks,
Andreas
From | Date | Subject | |
---|---|---|---|
Next Message | John R Pierce | 2014-09-08 20:51:06 | Re: Querying a time range across multiple partitions |
Previous Message | Jeff Janes | 2014-09-08 19:27:59 | Last active time for a database |