Re: partitioning query planner almost always scans all tables

From: Spiros Ioannou <sivann(at)inaccess(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: partitioning query planner almost always scans all tables
Date: 2015-01-23 14:35:07
Message-ID: CACKh8C_TzpdcOp0__xVDmPoSPR2v73gyQKa3nOq38vvHNSSKSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you for your input.

When doing an "ORDER BY measurement_time DESC" I had hoped for the query
planner to firstly query the most recent "child" table, i.e. the table
holding current month's data, and then move-on to the oldest table, since
it knows the partition is based on measurement_time. Instead it always
queries the oldest table first and then moves on to the current.

We did the partitioning to be able to drop old data saving disk space, as
you said. Not for performance reasons, although we hoped than having
smaller indexes would also help.

Best regards,
-Spiros

*Spiros Ioannou IT Manager, inAccesswww.inaccess.com
<http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358*

On 22 January 2015 at 17:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Spiros Ioannou <sivann(at)inaccess(dot)com> writes:
> > It is ORDER BY measurement_time, not measurement_id, and measurement_time
> > is used to create the partition. So the planner should know the correct
> > order, but instead it seems to query tables in the wrong order.
>
> The planner does not know that, and even if it attempted to figure it out
> by comparing the child tables' constraints, it could not generate a plan
> that considered only one child table as you incorrectly imagine. What if
> the "latest" table turned out to be empty at runtime?
>
> The obtained plan with a Merge Append atop Index Scan Backwards nodes
> seems perfectly reasonable to me. This will result in fetching only the
> latest row within each partition, so that the work involved is O(number of
> partitions) not O(total number of rows).
>
> If you're not happy with that, reconsider how many partitions you really
> need. Newbies almost invariably create far more partitions than is a good
> idea for performance. In my view, if you've got more than a couple dozen,
> you're doing it wrong. Partitioning is, in general, not a benefit for
> query performance (except in a few very narrow, specialized cases); and
> the more partitions you have the worse the penalty. Partitioning only
> helps for data management, in particular being able to drop old data in
> bulk rather than through expensive DELETE WHERE queries. How often do
> you do that, and do you really need to be able to do it at a small
> granularity?
>
> regards, tom lane
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2015-01-23 15:08:00 Re: In need of some JSONB examples ?
Previous Message Petr Novak 2015-01-23 14:31:15 CLOG read problem after pg_basebackup