Re: Declarative partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning
Date: 2015-12-17 18:56:45
Message-ID: CA+TgmoZZMfcf16YaHuhP1Vk=j8PDFeHCvfj+FJQd+eFhs+7P8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2015 at 2:14 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Syntax to create a partitioned table (up to 2 levels of partitioning):
>
> CREATE TABLE foo (
> ...
> )
> PARTITION BY R/L ON (key0)
> SUBPARTITION BY R/L ON (key1)
> [(PARTITION foo_1 FOR VALUES <val> [<storage_params>] [<tblspc>]
> [(SUBPARTITION foo_1_1 FOR VALUES <val> [<storage_params>] [<tblspc>],
> ...)], ...)];
>
> The above creates two pg_partitioned_rel entries for foo with partlevel 0
> and 1, for key0 and key1, respectively. For foo_1 and foo_1_1, this
> creates pg_partition entries, with foo and foo_1 as partparent,
> respectively.
>
> Why just 2 levels? - it seems commonplace and makes the syntax more
> intuitive? I guess it might be possible to generalize the syntax for
> multi-level partitioning. Ideas? If we want to support the notion of
> sub-partition template in future, that would require some thought, more
> importantly proper catalog organization for the same.

I do not think this is a particularly good idea. You're going to need
to dump each partition separately at least in --binary-upgrade mode,
because each is going to have its own magic OIDs that need to be
restored, and also because there will most likely be at least some
properties that are going to vary between partitions. You could
require that every partition have exactly the same set of columns,
constraints, rules, triggers, policies, attribute defaults, comments,
column comments, and everything else that might be different from one
partition to another, and further require that they have exactly
matching indexes. It would take a fair amount of code to prohibit all
that, but it could be done. However, do we really want that? There
may well be some things were we want to enforce that the parent and
the child are exactly identical, but I doubt we want that for
absolutely every property, current and future, of the partition. And
even if you did, because of the --binary-upgrade stuff, you still need
to to be able to dump them separately.

Therefore, I believe it is a whole lot better to make the primary
syntax for table partitioning something where you issue a CREATE
statement for the parent and then a CREATE statement for each child.
If we want to also have a convenience syntax so that people who want
to create a parent and a bunch of children in one fell swoop can do
so, fine.

I would not choose to model the syntax for creating partitions on
Oracle. I don't find that syntax particularly nice or easy to
remember. I say PARTITION BY RANGE, and then inside the parentheses I
use the PARTITION keyword for each partition? Really? But I think
copying the style while having the details be incompatible is an even
worse idea.

> What about ALTER TABLE? - Instead of allowing ALTER TABLE to be applied
> directly to partitions on case-by-case basis (they are tables under the
> hood after all), we should restrict AT to the master table. Most of the AT
> changes implicitly propagate from the master table to its partitions. Some
> of them could be directly applied to partitions and/or sub-partitions such
> as rename, storage manipulations like - changing tablespace, storage
> parameters (reloptions), etc.:
>
> ALTER TABLE foo
> RENAME PARTITION <partition-name> TO <new-name>;
>
> ALTER TABLE foo
> RENAME SUBPARTITION <sub-partition-name> TO <new-name>;
>
> ALTER TABLE foo
> SET TABLESPACE ... [DEFAULT] FOR PARTITION <partition-name>;
>
> ALTER TABLE foo
> SET TABLESPACE ... FOR SUBPARTITION <sub-partition-name>;
>
> ALTER TABLE foo
> SET (storage_parameter = value) [DEFAULT] FOR PARTITION <partition-name>;
>
> ALTER TABLE foo
> SET (storage_parameter = value) FOR SUBPARTITION <sub-partition-name>;

I don't think this is a very good idea. This is basically proposing
that for every DDL command that you can apply to a table, you have to
spell it differently for a partition. That seems like a lot of extra
work for no additional functionality.

> By the way, should we also allow changing the logging of
> partitions/sub-partitions as follows?

Again, I think you're coming at this from the wrong direction.
Instead of saying we're going to disallow all changes to the
partitions and then deciding we need to allow certain changes after
all, I think we should allow everything that is currently allowed for
an inherited table and then decide which of those things we need to
prohibit, and why. For example, if you insist that a child table has
to have a tuple descriptor that matches the parent, that can improve
efficiency: Append won't need to project, and so on. But it now
becomes very difficult to support taking a stand-alone table and
making it a partition of an existing partitioned table, because the
set of dropped columns might not match. Having to give an error in
that case amounts to "we're sorry, we can't attach your partition to
the partitioning hierarchy because of some invisible state that you
can't see" isn't very nice. Now I'm not saying that isn't the right
decision, but I think the design choices here need to be carefully
thought about.

Stepping away from that particular example, a blanket prohibition on
changing any attribute of a child table seems like it will prohibit a
lot of useful things that really ought to work. And again, I don't
think it's a good idea to implement separate syntax for changing a
partition vs. changing a table. If I want to set a partition as
unlogged, I should be able to say ALTER TABLE partition_name UNLOGGED
or maybe ALTER PARTITION partition_name UNLOGGED, not be forced to use
some new grammar production that looks completely different.

> What about index constraints, ie, PRIMARY KEY, UNIQUE and EXCLUSION
> constraints - 2 things must be clear here: cannot create these constraints
> on individual partitions and all partition columns (key0 + key1) must be
> the leading columns of the key. On a related note, creating index on the
> master table should create the index on all "leaf" partitions. The index
> on the mater table itself would be just a logical index. Should we allow
> creating or dropping indexes on partitions directly?

I don't find this to be particularly clear. You are assuming that
nobody wants to create a constraint that a certain value is unique
within a partition. That might not be a real common thing to want to
do, but it could certainly be useful to somebody, and the current
system with table inheritance allows it. For example, suppose that we
have orders partitioned on the order_date column, by month. The user
might want to create a UNIQUE index on order_id on each partition.
Maybe they start over with order_id 1 at the beginning of each month.
But even if, as is more likely, the order IDs keep counting up from
month to month, they don't want to be forced to include the whole
partitioning key in the index in order to have it marked UNIQUE. That
may be enough, in practice, to ensure the global uniqueness of order
IDs even though the system doesn't technically enforce it in all
cases.

If you want an index created on the parent to cascade down to all
children, that's a big project to get right. Suppose I create an
index on the parent table. After a while, I notice that it's getting
bloated, so I created another index with the same definition. Now, I
drop one of the two indexes. One of the two indexes from each child
table needs to go away, and moreover it can't be picked arbitrarily -
it has to be the one that was created at the same time as the parent
index I'm dropping. If you want it to behave like this, you need a
whole system of bookkeeping to make it work right.

For version 1, I'd go the other way and prohibit index definitions on
the empty parent rels. Let people create indexes on the children as
they wish. In a future version, we can add stuff to cascade from
parents to children.

> It would be interesting to talk about features like SPLIT, MERGE, EXCHANGE
> but I'm inclined to leave them as future enhancements. For a functionality
> similar to EXCHANGE, there are commands like ATTACH/DETACH in the latest
> patch. We could extend them to also consider sub-partitions:

We don't need to have these in the first version, but we have to make
some architectural decisions that affect how feasible they are to
implement and in which cases, as noted above.

> One cannot define rules, triggers, and RLS policies on them. Although,
> AR triggers defined on a partitioned master table are propagated to the
> "leaf" partitions.

What value do you see us getting out of restricting these particular things?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-17 18:58:04 Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
Previous Message Pavel Stehule 2015-12-17 18:43:27 psql - -dry-run option