Re: Querying a time range across multiple partitions

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andreas Brandl <ml(at)3(dot)141592654(dot)de>
Cc: John R Pierce <pierce(at)hogranch(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Querying a time range across multiple partitions
Date: 2014-09-08 21:40:52
Message-ID: CAMkU=1yvqNh1MwSNXNH2ToeWhhq_f+mVwW-MELZyBFwMz8PmeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Sep 8, 2014 at 1:40 PM, Andreas Brandl <ml(at)3(dot)141592654(dot)de> wrote:

> John,
>
> > On Fri, Sep 5, 2014 at 2:17 PM, John R Pierce < pierce(at)hogranch(dot)com >
> > wrote:
> >> On 9/5/2014 10:31 AM, Cal Heldenbrand wrote:
> >> Number of child tables: 1581
> >> that's an insane number of children. We try and limit it to 50 or so
> >> child tables, for instance, 6 months retention by week, of data will
> >> millions of rows/day.
> >>
> >> I've used more than that many for testing purposes, and there was
> >> little problem. The main thing is that your insert trigger (if you
> >> have one on the master table) needs to be structured as a binary
> >> search-like nesting of if..elsif, not a linear-searching like
> >> structure. Unless of course almost all inserts go into the newest
> >> partition, then it might make more sense to do the linear search
> >> with that being the first test. But for performance, better to just
> >> insert directly into the correct child table.
> > any select that can't be preplanned to a specific child will need to
> > check all 1500 children. this is far less efficient than checking,
> > say, 50 and letting the b-tree index of each child reject or narrow
> > down to the specific row(s). The one is roughly 1500*log(N/1500)
> > while the other is 50*log(N/50) at least to a first order
> > approximation.
>
> can you explain that further? In the end, that argument sounds like it
> would always be more efficient to use a single table and its index instead,
> rather than partitioning it (log(N) < c*log(N/c) for any c > 1, if I'm not
> totally lost today).
>

Right. Partitioning is NOT a generic way to improve query performance.
With rare exceptions, partitioning makes query performance worse, and then
partition pruning may or may not serve to recover most of that lost
performance.

Partitioning allows you conspire with PostgreSQL to micromanage the the
layout of the data in ways that improve manageability and maintainability.
For example, you pay a little overhead each time you run a query and
(perhaps) each time you insert a row, but in exchange for that you can
"delete" a year of data with a nearly-instantaneous command.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Abelard Hoffman 2014-09-08 21:46:16 Re: psql and tab-delimited output
Previous Message lists 2014-09-08 21:36:01 Crash in 9.4 Beta when partially collapsing left outer joins