Re: Multi column range partition table

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi column range partition table
Date: 2017-06-23 08:00:09
Message-ID: CAEZATCXy_Xfd4U70iQ8L33UY9dw=AiwEnYEmAfOFq_542TREnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 June 2017 at 08:01, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> The way we have designed our syntax, we don't have a way to tell that
> p3 comes after p2 and they have no gap between those. But I don't
> think that's your question. What you are struggling with is a way to
> specify a lower bound (10, +infinity) so that anything with i1 > 10
> would go to partition 3.
>

I think actually there is a fundamental problem here, which arises
because UNBOUNDED has 2 different meanings depending on context, and
thus it is not possible in general to specify the start of one range
to be equal to the end of the previous range, as is necessary to get
contiguous non-overlapping ranges.

Note that this isn't just a problem for floating point datatypes
either, it also applies to other types such as strings. For example,
given a partition over (text, int) types defined with the following
values:

FROM ('a', UNBOUNDED) TO ('b', UNBOUNDED)

which is equivalent to

FROM ('a', -INFINITY) TO ('b', +INFINITY)

where should the next range start?

Even if you were to find a way to specify "the next string after 'b'",
it wouldn't exactly be pretty. The problem is that the above partition
corresponds to "all the strings starting with 'a', plus the string
'b', which is pretty ugly. A neater way to define the pair of ranges
in this case would be:

FROM ('a', -INFINITY) TO ('b', -INFINITY)
FROM ('b', -INFINITY) TO ('c', -INFINITY)

since then all strings starting with 'a' would fall into the first
partition and all the strings starting with 'b' would fall into the
second one.

Currently, when there are 2 partition columns, the partition
constraint is defined as

(a is not null) and (b is not null)
and
(a > al or (a = al and b >= bl))
and
(a < au or (a = au and b < bu))

if the upper bound bu were allowed to be -INFINITY (something that
should probably be forbidden unless the previous column's upper bound
were finite), then this would simplify to

(a is not null) and (b is not null)
and
(a > al or (a = al and b >= bl))
and
(a < au)

and in the example above, where al is -INFINITY, it would further simplify to

(a is not null) and (b is not null)
and
(a >= al)
and
(a < au)

There would also be a similar simplification possible if the lower
bound of a partition column were allowed to be +INFINITY.

So, I think that having UNBOUNDED represent both -INFINITY and
+INFINITY depending on context is a design flaw, and that we need to
allow both -INFINITY and +INFINITY as upper and lower bounds (provided
they are preceded by a column with a finite bound). I think that, in
general, that's the only way to allow contiguous non-overlapping
partitions to be defined on multiple columns.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-06-23 08:02:29 Re: Broken O(n^2) avoidance in wal segment recycling.
Previous Message Masahiko Sawada 2017-06-23 07:58:00 Re: Setting pd_lower in GIN metapage