Re: UPDATE of partition key

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-09-19 18:36:08
Message-ID: CA+TgmoZ6PAv3-0jPuD_=aO9tMQwXpr+JuRdsvjX36wjSZG+qFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 15, 2017 at 7:25 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> [ new patch ]

This already fails to apply again. In general, I think it would be a
good idea to break this up into a patch series rather than have it as
a single patch. That would allow some bits to be applied earlier.
The main patch will probably still be pretty big, but at least we can
make things a little easier by getting some of the cleanup out of the
way first. Specific suggestions on what to break out below.

If the changes to rewriteManip.c are a marginal efficiency hack and
nothing more, then let's commit this part separately before the main
patch. If they're necessary for correctness, then please add a
comment explaining why they are necessary.

There appears to be no reason why the definitions of
GetInsertedColumns() and GetUpdatedColumns() need to be moved to a
header file as a result of this patch. GetUpdatedColumns() was
previously defined in trigger.c and execMain.c and, post-patch, is
still called from only those files. GetInsertedColumns() was, and
remains, called only from execMain.c. If this were needed I'd suggest
doing it as a preparatory patch before the main patch, but it seems we
don't need it at all.

If I understand correctly, the reason for changing mt_partitions from
ResultRelInfo * to ResultRelInfo ** is that, currently, all of the
ResultRelInfos for a partitioning hierarchy are allocated as a single
chunk, but we can't do that and also reuse the ResultRelInfos created
during InitPlan. I suggest that we do this as a preparatory patch.
Someone could argue that this is going the wrong way and that we ought
to instead make InitPlan() create all of the necessarily
ResultRelInfos, but it seems to me that eventually we probably want to
allow setting up ResultRelInfos on the fly for only those partitions
for which we end up needing them. The code already has some provision
for creating ResultRelInfos on the fly - see ExecGetTriggerResultRel.
I don't think it's this patch's job to try to apply that kind of thing
to tuple routing, but it seems like in the long run if we're inserting
1 tuple into a table with 1000 partitions, or performing 1 update that
touches the partition key, it would be best not to create
ResultRelInfos for all 1000 partitions just for fun. But this sort of
thing seems much easier of mt_partitions is ResultRelInfo ** rather
than ResultRelInfo *, so I think what you have is going in the right
direction.

+ * mtstate->resultRelInfo[]. Note: We assume that if the resultRelInfo
+ * does not belong to subplans, then it already matches the root tuple
+ * descriptor; although there is no such known scenario where this
+ * could happen.
+ */
+ if (rootResultRelInfo != resultRelInfo &&
+ mtstate->mt_persubplan_childparent_maps != NULL &&
+ resultRelInfo >= mtstate->resultRelInfo &&
+ resultRelInfo <= mtstate->resultRelInfo + mtstate->mt_nplans - 1)
+ {
+ int map_index = resultRelInfo - mtstate->resultRelInfo;

I think you should Assert() that it doesn't happen instead of assuming
that it doesn't happen. IOW, remove the last two branches of the
if-condition, and then add an Assert() that map_index is sane.

It is not clear to me why we need both mt_perleaf_childparent_maps and
mt_persubplan_childparent_maps.

+ * Note: if the UPDATE is converted into a DELETE+INSERT as part of
+ * update-partition-key operation, then this function is also called
+ * separately for DELETE and INSERT to capture transition table rows.
+ * In such case, either old tuple or new tuple can be NULL.

That seems pretty strange. I don't quite see how that's going to work
correctly. I'm skeptical about the idea that the old tuple capture
and new tuple capture can safely happen at different times.

I wonder if we should have a reloption controlling whether
update-tuple routing is enabled. I wonder how much more expensive it
is to execute UPDATE root SET a = a + 1 WHERE a = 1 on a table with
1000 subpartitions with this patch than without, assuming the update
succeeds in both cases.

I also wonder how efficient this implementation is in general. For
example, suppose you make a table with 1000 partitions each containing
10,000 tuples and update them all, and consider three scenarios: (1)
partition key not updated but all tuples subject to non-HOT updates
because the updated column is indexed, (2) partition key updated but
no tuple movement required as a result, (3) partition key updated and
all tuples move to a different partition. It would be useful to
compare the times, and also to look at perf profiles and see if there
are any obvious sources of inefficiency that can be squeezed out. It
wouldn't surprise me if tuple movement is a bit slower than the other
scenarios, but it would be nice to know how much slower and whether
the bottlenecks are anything that we can easily fix. I don't feel
that the performance constraints for this patch should be too tight,
because we're talking about being able to do something vs. not being
able to do it at all, but we should try to have it not stink.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-19 18:37:36 Re: PoC plpgsql - possibility to force custom or generic plan
Previous Message Tom Lane 2017-09-19 18:29:31 Re: Re: issue: record or row variable cannot be part of multiple-item INTO list