Re: On partitioning

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-08-31 20:45:29
Message-ID: 20140831204528.GA11069@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 29, 2014 at 12:35:50PM -0400, Tom Lane wrote:
> > Each partition is assigned an Expression that receives a tuple and
> > returns boolean. This expression returns true if a given tuple belongs
> > into it, false otherwise.
>
> -1, in fact minus a lot. One of the core problems of the current approach
> is that the system, particularly the planner, hasn't got a lot of insight
> into exactly what the partitioning scheme is in a partitioned table built
> on inheritance. If you allow the partitioning rule to be a black box then
> that doesn't get any better. I want to see a design wherein the system
> understands *exactly* what the partitioning behavior is. I'd start with
> supporting range-based partitioning explicitly, and maybe we could add
> other behaviors such as hashing later.
>
> In particular, there should never be any question at all that there is
> exactly one partition that a given row belongs to, not more, not less.
> You can't achieve that with a set of independent filter expressions;
> a meta-rule that says "exactly one of them should return true" is an
> untrustworthy band-aid.
>
> (This does not preclude us from mapping the tuple through the partitioning
> rule and finding that the corresponding partition doesn't currently exist.
> I think we could view the partitioning rule as a function from tuples to
> partition numbers, and then we look in pg_class to see if such a partition
> exists.)

There is one situation where you need to be more flexible, and that is
if you ever want to support online repartitioning. To do that you have
to distinguish between "I want to insert tuple X, which partition
should it go into" and "I want to know which partitions I need to look
for partition_key=Y".

For the latter you really have need an expression per partition, or
something equivalent. If performance is an issue I suppose you could
live with having an "old" and an "new" partition scheme, so you
couldn't have two "live repartitionings" happening simultaneously.

Now, if you want to close the door on online repartitioning forever
then that fine. But being in the position of having to say "yes our
partitioning scheme sucks, but we would have to take the database down
for a week to fix it" is no fun.

Unless logical replication provides a way out maybe??

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2014-08-31 20:52:14 Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Previous Message Hannu Krosing 2014-08-31 20:38:15 Re: On partitioning