Re: Partitioning

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Partitioning
Date: 2004-09-16 03:55:24
Message-ID: 87y8jarhoz.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


"J. Andrew Rogers" <jrogers(at)neopolitan(dot)com> writes:

> We do something very similar, also using table inheritance

I have a suspicion postgres's table inheritance will end up serving as a good
base for a partitioned table feature. Is it currently possible to query which
subtable a record came from though?

> A big part of performance gain is that the resulting partitions end up being
> more well-ordered than the non-partitioned version, since inserts are hashed
> to different partition according to the key and hash function. It is kind of
> like a cheap and dirty real-time CLUSTER operation.

There is also one particular performance gain that cannot be obtained via
other means: A query that accesses a large percentage of a single partition
can use a sequential table scan of just that partition. This can be several
times faster than using an index scan which is the only option if all the data
is stored in a single large table.

This isn't an uncommon occurrence. Consider an accounting table partitioned by
accounting period. Any aggregate reports for a single accounting period fall
into this category. If you define your partitions well that can often by most
or all of your reports.

Of course this applies equally if the query is accessing a small number of
partitions. A further refinement is to leverage the partitioning in GROUP BY
or ORDER BY clauses. If you're grouping by the partition key you can avoid a
large sort without having to resort to an index scan or even a hash. And of
course it's tempting to think about parallelization of such queries,
especially if the partitions are stored in separate table spaces on different
drives.

> It also lets you truncate, lock, and generally be heavy-handed with subsets
> of the table without affecting the rest of the table.

The biggest benefit by far is this management ability of being able to swap in
and out partitions in a single atomic transaction that doesn't require
extensive i/o.

In the application we used them on Oracle 8i they were an absolute life-saver.
They took a huge batch job that took several days to run in off-peak hours and
turned it into a single quick cron job that could run at peak hours. We were
able to cut the delay for our OLTP data showing up in the data warehouse from
about a week after extensive manual work to hours after a daily cron job.

> ...PARTITION ON 'date_trunc(''hour'',ts)'...
>
> There would also probably need to be some type of metadata table to
> associate specific hashes with partition table names. Other than that,
> the capabilities largely already exist, and managing the partition
> hashing and association is the ugly part when rolling your own.
> Intercepting DML when necessary and making it behave correctly is
> already pretty easy, but could probably be streamlined.

I would suggest you look at the Oracle syntax to handle this. They've already
gone through several iterations of implementations. The original Oracle 7
implementation was much as people have been describing where you had to define
a big UNION ALL view and enable an option to have the optimizer look for such
views and attempt to eliminate partitions.

In Oracle 8i they introduced first class partitions with commands to define
and manipulate them. You defined a high bound for each partition.

In Oracle 9 (or thereabouts, sometime after 8i at any rate) they introduced a
new form where you specify a specific constant value for each partition. This
seems to be more akin to how you're thinking about things.

The optimizer has several plan nodes specific for partitioned tables. It can
select a single known partition based on information present in the query. It
can also detect cases where it can be sure the query will only access a single
partition but won't be able to determine which until execution time based on
placeholder parameters for cases like "WHERE partition_key = ?". It can also
detect cases like "WHERE partition_key between ? and ?" and "WHERE
partition_key IN (?,?,?)" Or join clauses on partitions. It can also do some
magic things with "GROUP BY partition_key" and "ORDER BY partition_key".

The work in the optimizer will be the most challenging part. In an ideal world
if the optimizer is very solid it will be possible to bring some partitions to
slow or even near-line storage media. As long as no live queries accidentally
access the wrong partitions the rest of the database need never know that the
data isn't readily available.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joe Conway 2004-09-16 04:07:34 Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Previous Message J. Andrew Rogers 2004-09-15 21:09:31 Partitioning