Quick Links

Re: partitioning for speed, but query planner ignores

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	David Rysdam <drysdam(at)ll(dot)mit(dot)edu>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: partitioning for speed, but query planner ignores
Date:	2013-10-02 15:19:58
Message-ID:	1380727198.99522.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

David Rysdam <drysdam(at)ll(dot)mit(dot)edu> wrote:

> We have a by-our-standards large table (about 40e6 rows). Since it is
> the bottleneck in some places, I thought I'd experiment with
> partitioning.

In my personal experience I have gone into hundreds of millions of
rows with good performance without partitioning. It's all about
designing good indexes for the workload.

I have only seen partitioning help in two cases:
(1) There will be bulk deletes of rows, and you know at insert
time which bulk delete the row belongs with. Dropping a partition
table is a very fast way to delete a large number of rows.
(2) The bulk of activity will be on a relatively small subset of
the rows at any one time, and you can partition such that the set
of active rows will be in a small number of partitions.

In all other cases, I have only seen partitioning harm performance.
There is no reason to think that checking the table-level
constraints on every partition table will be faster than descending
through an index tree level.

> The table holds data about certain objects, each of which has an object
> number and some number of historical entries (like account activity at a
> bank, say). The typical usage pattern is: relatively rare inserts that
> happen in the background via an automated process (meaning I don't care
> if they take a little longer) and frequent querying, including some
> where a human is sitting in front of it (i.e. I'd like it to be a lot
> faster).
>
> Our most frequent queries either select "all history for object N" or
> "most recent item for some subset of objects".
>
> Because object number figure so prominently, I thought I'd partition on
> that. To me, it makes the most sense from a load-balancing perspective

Load balancing? Hitting a single partition more heavily improves
your cache hit ratio. What sort of benefit are you expecting from
spreading the reads across all the partitions? *Maybe* that could
help if you carefully placed each partition table on a separate set
of spindles, but usually you are better off having one big RAID so
that every partition is spread across all the spindles
automatically.

> Lower numbers are going to be queried much less often than higher
> numbers.

This suggests to me that you *might* get a performance boost if you
define partitions on object number *ranges*. It still seems a bit
dubious, but it has a chance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

partitioning for speed, but query planner ignores at 2013-10-02 12:34:44 from David Rysdam

Responses

Re: partitioning for speed, but query planner ignores at 2013-10-02 16:01:13 from David Rysdam

Browse pgsql-general by date

	From	Date	Subject
Next Message	Marc Fournier	2013-10-02 15:42:50	Re: [HACKERS] Who is pgFoundery administrator?
Previous Message	Adrian Klaver	2013-10-02 14:39:15	Re: Timestamp with and without timezone conversion confusion.