From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Keith Fiske <keith(at)omniti(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com> |
Cc: | David Steele <david(at)pgmasters(dot)net>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Adding support for Default partition in partitioning |
Date: | 2017-04-05 05:29:08 |
Message-ID: | a816af3d-dfc0-7f28-1a48-e675b206b214@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
>
> However, running your original example, I'm getting this error
>
> keith(at)keith=# CREATE TABLE list_partitioned (
> keith(# a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith(at)keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith(at)keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR: unrecognized node type: 216
It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().
> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.
I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):
Suppose you want to add a new partition which will accept key=3 rows.
1. If no default partition exists, we're done; no key=3 rows would have
been accepted by any of the table's existing partitions, so no need to
move any rows
2. Default partition exists which might contain key=3 rows, which we'll
need to move. If we do this in the same transaction, as you say, it
might result in unnecessary unavailability of table's data. So, it's
better to delegate that responsibility to a background process. The
current transaction will only add the new key=3 partition, so any key=3
rows will be routed to the new partition from this point on. But we
haven't updated the default partition's constraint yet to say that it
no longer contains key=3 rows (constraint that the planner consumes),
so it will continue to be scanned for queries that request key=3 rows
(there should be some metadata to indicate that the default partition's
constraint is invalid), along with the new partition.
3. A background process receives a "work item" requesting it to move all
key=3 rows from the default partition heap to the new partition's heap.
The movement occurs without causing the table to become unavailable;
once all rows have been moved, we momentarily lock the table to update
the default partition's constraint to mark it valid, so that it will
no longer be accessed by queries that want to see key=3 rows.
Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction. Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue. In that
case, we need to think of the interface more carefully.
Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:
* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4
> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?
I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.
Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?
I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of default partitions, but also to allow the feature to split/merge range
(only?) partitions. I'd like to work on the latter for v11 for which I
would like to post a proposal soon; if anyone would like to collaborate
(ideas, code, review), I look forward to. (sorry for hijacking this thread.)
Thanks,
Amit
From | Date | Subject | |
---|---|---|---|
Next Message | Emre Hasegeli | 2017-04-05 05:34:29 | Re: BRIN cost estimate |
Previous Message | Tom Lane | 2017-04-05 05:20:20 | Re: partitioned tables and contrib/sepgsql |