From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | David Rysdam <drysdam(at)ll(dot)mit(dot)edu>, pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: partitioning for speed, but query planner ignores |
Date: | 2013-10-02 15:19:58 |
Message-ID: | 1380727198.99522.YahooMailNeo@web162901.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
David Rysdam <drysdam(at)ll(dot)mit(dot)edu> wrote:
> We have a by-our-standards large table (about 40e6 rows). Since it is
> the bottleneck in some places, I thought I'd experiment with
> partitioning.
In my personal experience I have gone into hundreds of millions of
rows with good performance without partitioning. It's all about
designing good indexes for the workload.
I have only seen partitioning help in two cases:
(1) There will be bulk deletes of rows, and you know at insert
time which bulk delete the row belongs with. Dropping a partition
table is a very fast way to delete a large number of rows.
(2) The bulk of activity will be on a relatively small subset of
the rows at any one time, and you can partition such that the set
of active rows will be in a small number of partitions.
In all other cases, I have only seen partitioning harm performance.
There is no reason to think that checking the table-level
constraints on every partition table will be faster than descending
through an index tree level.
> The table holds data about certain objects, each of which has an object
> number and some number of historical entries (like account activity at a
> bank, say). The typical usage pattern is: relatively rare inserts that
> happen in the background via an automated process (meaning I don't care
> if they take a little longer) and frequent querying, including some
> where a human is sitting in front of it (i.e. I'd like it to be a lot
> faster).
>
> Our most frequent queries either select "all history for object N" or
> "most recent item for some subset of objects".
>
> Because object number figure so prominently, I thought I'd partition on
> that. To me, it makes the most sense from a load-balancing perspective
Load balancing? Hitting a single partition more heavily improves
your cache hit ratio. What sort of benefit are you expecting from
spreading the reads across all the partitions? *Maybe* that could
help if you carefully placed each partition table on a separate set
of spindles, but usually you are better off having one big RAID so
that every partition is spread across all the spindles
automatically.
> Lower numbers are going to be queried much less often than higher
> numbers.
This suggests to me that you *might* get a performance boost if you
define partitions on object number *ranges*. It still seems a bit
dubious, but it has a chance.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Marc Fournier | 2013-10-02 15:42:50 | Re: [HACKERS] Who is pgFoundery administrator? |
Previous Message | Adrian Klaver | 2013-10-02 14:39:15 | Re: Timestamp with and without timezone conversion confusion. |