Re: partitioning an existing table

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Robert Blayzor <rblayzor(dot)bulk(at)inoc(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: partitioning an existing table
Date: 2017-12-30 05:38:21
Message-ID: 20171230053821.GE4172@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Dec 29, 2017 at 11:37:56PM -0500, Robert Blayzor wrote:
> The docs claim that the master table “should” be empty. It it possible to just create child tables off an existing master table with data, then just inserting data into the new child tables.
>
> THe plan would be to keep data in the master table and purge it over time until it’s eventually empty, then drop the indexes as well.
>
> Fully understanding that data needs to be placed in the right child tables. Data outside of those child ranges would remain as “old data” in the master table.
>
> Just trying to grab if that’s an acceptable migration of live data from a single large table and move into partitioning. Think of it as a very large table of cyclic data that ages out. New data in child tables while removing data from the master table over time.

For PG10 "partitions" (as in relkind='p') the parent is defined as empty
(actually has no underlying storage).

For inheritance (available in and before PG10), the parent may be nonempty,
which works fine, although someone else might find it unintuitive. (Does the
doc actually say "should" somewhere ?)

You almost certainly want child tables to have constraints, to allow
constraint_exclusion (which is the only reason one child table is more "right"
than any other, besides the associated pruning/retention schedule).

Since you'll be running DELETE rather than DROP on the parent, you might
consider DELETE ONLY.. but it won't matter if your children's constraints are
usable with DELETE's WHERE condition.

Also, note that autoanalyze doesn't know to analyze the PARENT's statistics
when its children are INSERTED/DROPPED/etc. So I'd suggest to consider ANALYZE
each parent following DROP of its children (or maybe on some more frequent
schedule to handle inserted rows, too). Perhaps that should be included as a
CAVEAT?
https://www.postgresql.org/docs/10/static/ddl-inherit.html#DDL-INHERIT-CAVEATS

Just curious: are your constraints/indices on starting time or ending time?

BTW depending on your requirements, it may be possible to make pg_dump much
more efficient. For our data, it's reasonable to assume that a table is
"final" if its constraints exclude data older than a few days ago, and it can
be permanently dumped and excluded from future, daily backups, which makes the
backups smaller and faster, and probably causes less cache churn, etc. But I
imagine you might have different requirements, so that may be infeasible, or
you'd maybe have to track insertions, either via pg_stat_user_tables, or at the
application layer, and redump the relevant table.

Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2017-12-30 05:56:30 analyze stats: child vs parent
Previous Message Robert Blayzor 2017-12-30 04:37:56 partitioning an existing table