Re: [PATCH] Automatic HASH and LIST partition creation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Amul Sul <sulamul(at)gmail(dot)com>
Subject: Re: [PATCH] Automatic HASH and LIST partition creation
Date: 2021-07-09 13:31:05
Message-ID: CA+TgmoZS3cjrWoTvvnCQjas-VXnPtiN8y+FNqP-0FobSS_0fYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 2, 2021 at 3:26 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> I don't know what committers will say, but I think that "ALTER TABLE" might be
> the essential thing for this patch to support, not "CREATE". (This is similar
> to ALTER..SET STATISTICS, which is not allowed in CREATE.)
>
> The reason is that ALTER is what's important for RANGE partitions, which need
> to be created dynamically (for example, to support time-series data
> continuously inserting data around 'now'). I assume it's sometimes also
> important for LIST. I think this patch should handle those cases better before
> being commited, or else we risk implementing grammar and other user-facing interface
> that fails to handle what's needed into the future (or that's non-essential).
> Even if dynamic creation isn't implemented yet, it seems important to at least
> implement the foundation for setting the configuration to *allow* that in the
> future, in a manner that's consistent with the initial implementation for
> "static" partitions.

I don't think it's a hard requirement, but it's an interesting point.
My initial reactions to the patch are:

- I don't think it's a very good idea to support LIST and HASH but not
RANGE. We need a design that can work for all three partitioning
strategies, even if we don't have support for all of them in the
initial patch. If they CAN all be in the same patch, so much the
better.

- I am not very impressed with the syntax. CONFIGURATION is an odd
word that seems too generic for what we're talking about here. It
would be tempting to use a connecting word like WITH or USING except
that both would be ambiguous here, so we can't. MySQL and Oracle use
the keyword PARTITIONS -- which I realize isn't a keyword at all in
PostgreSQL right now -- to introduce the partition specification. DB2
uses no keyword at all; it seems you just say PARTITION BY
(mypartitioncol) (...partition specifications go here...). I think
either approach could work for us. Avoiding the extra keyword is a
plus, especially since I doubt we're likely to support the exact
syntax that Oracle and MySQL offer anyway - though if we do, then I'd
be in favor of inserting the PARTITIONS keyword so that people's SQL
can work without modification.

- We need to think a little bit about exactly what we're trying to do.
The simplest imaginable thing here would be to just give people a
place to put a bunch of partition specifications. So you can imagine
letting someone say PARTITION BY HASH (FOR VALUES WITH (MODULUS 2,
REMAINDER 0), FOR VALUES WITH (MODULUS 2, REMAINDER 1)). However, the
patch quite rightly rejects that approach in favor of the theory that,
at CREATE TABLE time, you're just going to want to give a modulus and
have the system create one partition for every possible remainder. But
that could be expressed even more compactly than what the patch does.
Instead of saying PARTITION BY HASH CONFIGURATION (MODULUS 4) we could
just let people say PARTITION BY HASH (4) or probably even PARTITION
BY HASH 4.

- For list partitioning, the patch falls back to just letting you put
a bunch of VALUES IN clauses in the CREATE TABLE statement. I don't
find something like PARTITION BY LIST CONFIGURATION (VALUES IN (1, 2),
(1, 3)) to be particularly readable. What are all the extra keywords
adding? We could just say PARTITION BY LIST ((1, 2), (1, 3)). I think
I would find that easier to remember; not sure what other people
think. As an alternative, PARTITION BY LIST VALUES IN (1, 2), (1, 3)
looks workable, too.

- What about range partitioning? This is an interesting case because
while in theory you could leave gaps between range partitions, in
practice people probably don't want to do that very often, and it
might be better to have a simpler syntax that caters to the common
case, since people can always create partitions individually if they
happen to want gaps. So you can imagine making something like
PARTITION BY RANGE ((MINVALUE), (42), (163)) mean create two
partitions, one from (MINVALUE) to (42) and the other from (42) to
(163). I think that would be pretty useful.

- Another possible separating keyword here would be INITIALLY, which
is already a parser keyword. So then you could have stuff like
PARTITION BY HASH INITIALLY 4, PARTITION BY LIST INITIALLY ((1, 2),
(1, 3)), PARTITION BY RANGE INITIALLY ((MINVALUE), (42), (163)).

- The patch doesn't document the naming convention for the
automatically created partitions, and it is worth thinking a bit about
how that is going to work. Do people want to be able to specify the
name of the partitioned table when they are using this syntax, or are
they happy with automatically generated names? If the latter, are they
happy with THESE automatically generated names? I guess for HASH
appending _%d where %d is the modulus is fine, but it is not necessary
so great for LIST. If I said CREATE TABLE foo ... PARTITION BY LIST
(('en'), ('ru'), ('jp')) I think I'd be hoping to end up with
partitions named foo_en, foo_ru, and foo_jp rather than foo_0, foo_1,
foo_2. Or maybe I'd rather say PARTITION BY LIST (foo_en ('en'),
foo_ru ('ru'), foo_jp ('jp')) or something like that to be explicit
about it. Not sure. But it's worth some thought. I think this comes
into focus even more clearly for range partitions, where you probably
want the partitions to follow a convention like basetablename_yyyy_mm.

- The documentation for the CONFIGURATION option doesn't match the
grammar. The documentation makes it an independent clause, so
CONFIGURATION could be specified even if PARTITION BY is not. But the
implementation makes the better choice to treat CONFIGURATION as a
further specification of PARTITION BY.

- I don't think this patch is really all that close to being ready for
committer. Beyond the design issues which seem to need more thought,
there's stuff in the patch like:

+ elog(DEBUG1,"stransformPartitionAutoCreate HASH i %d MODULUS %d \n %s\n",
+ i, bound->modulus, nodeToString(part));

Now, on the one hand, debugging elogs like this have little business
in a final patch. And, on the other hand, if we were going to include
them in the final patch, we'd probably want to at least spell the
function name correctly. Similarly, it's evident that this test case
has not been carefully reviewed by anyone, including the author:

+REATE TABLE fail_parted (a int) PARTITION BY HASH (a) CONFIGURATION
+(MODULUS 10 DEFAULT PARTITION hash_default);

Not too surprisingly, the system isn't familiar with the REATE command.

- There's some questionable error-reporting behavior in here, too, particularly:

+CREATE TABLE fail_parted (a int) PARTITION BY LIST (a) CONFIGURATION
+(values in (1, 2), (1, 3));
+ERROR: partition "fail_parted_1" would overlap partition "fail_parted_0"
+LINE 2: (values in (1, 2), (1, 3));

Since the user hasn't provided the names fail_parted_0 or
fail_parted_1, it kind of stinks to use them in an error report. The
error cursor is good, but I wonder if we need to do better. One option
would be to go to a syntax where the user specifies the partition
names explicitly, which then justifies using that name in an error
report. Another possibility would be to give a different message in
this case, like:

ERROR: partition for values in (1, 2) would overlap partition for
values in (1, 3)

Now, that would require a pretty substantial redesign of the patch,
and I'm not sure it's worth the effort. But I'm also not sure that it
isn't.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-07-09 13:44:54 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Ranier Vilela 2021-07-09 12:03:10 Re: Why ALTER SUBSCRIPTION ... SET (slot_name='none') requires subscription disabled?