Re: partitioning an existing table

From: Robert Blayzor <rblayzor(dot)bulk(at)inoc(dot)net>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: partitioning an existing table
Date: 2017-12-30 14:19:05
Message-ID: 942C6568-41A7-46EC-B391-D523CC36B3AB@inoc.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Dec 30, 2017, at 12:38 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> For inheritance (available in and before PG10), the parent may be nonempty,
> which works fine, although someone else might find it unintuitive. (Does the
> doc actually say "should" somewhere ?)

Well it doesn’t say should, but says “normally”..

"The parent table itself is normally empty; it exists just to represent the entire data set. …

> Just curious: are your constraints/indices on starting time or ending time?

Yes, the child tables will be strictly on a months worth of data.

CREATE TABLE table_201801
(CHECK (ts >= DATE ‘2018-01-01' AND ts < DATE ‘2018-02-01'))
INHERITS …

The application will insert directly into the child tables, so no need for triggers or rules.

> BTW depending on your requirements, it may be possible to make pg_dump much
> more efficient. For our data, it's reasonable to assume that a table is
> "final" if its constraints exclude data older than a few days ago, and it can
> be permanently dumped and excluded from future, daily backups, which makes the
> backups smaller and faster, and probably causes less cache churn, etc. But I
> imagine you might have different requirements, so that may be infeasible, or
> you'd maybe have to track insertions, either via p

The idea is only only keep a # of months available for searching over a period of months. Those months could be 3 or more, up to a year, etc. But being able to just drop and entire child table for pruning is very attractive. Right now the average months data is about 2-3 million rows each. Data is just inserted and then only searched. Never updated…

I also like the idea of skipping all this older data from a PGdump. We archive records inserted into these tables daily into cold storage. ie: export and compressed. So the data is saved cold. We dump the DB nightly also, but probably would make sense to skip anything outside of the newest child table. Just not sure how to make that happen, yet….

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2017-12-30 18:42:47 Re: partitioning an existing table - efficient pg_dump
Previous Message Justin Pryzby 2017-12-30 05:56:30 analyze stats: child vs parent