From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: UPDATE of partition key |
Date: | 2017-10-16 02:58:30 |
Message-ID: | 113df99b-cad5-f94c-995e-0185ac3c1b90@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Amit.
On 2017/10/04 22:51, Amit Khandekar wrote:
> Main patch :
> update-partition-key_v20.patch
Guess you're already working on it but the patch needs a rebase. A couple
of hunks in the patch to execMain.c and nodeModifyTable.c fail.
Meanwhile a few comments:
+void
+pull_child_partition_columns(Bitmapset **bitmapset,
+ Relation rel,
+ Relation parent)
Nitpick: don't we normally list the output argument(s) at the end? Also,
"bitmapset" could be renamed to something that conveys what it contains?
+ if (partattno != 0)
+ child_keycols =
+ bms_add_member(child_keycols,
+ partattno -
FirstLowInvalidHeapAttributeNumber);
+ }
+ foreach(lc, partexprs)
+ {
Elsewhere (in quite a few places), we don't iterate over partexprs
separately like this, although I'm not saying it is bad, just different
from other places.
+ * the transition tuplestores can be built. Furthermore, if the transition
+ * capture is happening for UPDATEd rows being moved to another
partition due
+ * partition-key change, then this function is called once when the row is
+ * deleted (to capture OLD row), and once when the row is inserted to
another
+ * partition (to capture NEW row). This is done separately because
DELETE and
+ * INSERT happen on different tables.
Extra space at the beginning from the 2nd line onwards.
+ (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup
== NULL))))
Is there some reason why a bitwise operator is used here?
+ * 'update_rri' has the UPDATE per-subplan result rels.
Could you explain why they are being received as input here?
+ * 'perleaf_parentchild_maps' receives an array of TupleConversionMap objects
+ * with on entry for every leaf partition (required to convert input
tuple
+ * based on the root table's rowtype to a leaf partition's rowtype after
+ * tuple routing is done)
Could this be named leaf_tupconv_maps, maybe? It perhaps makes clear that
they are maps needed for "tuple conversion". And the other field holding
the reverse map as leaf_rev_tupconv_maps. Either that or use underscores
to separate words, but then it gets too long I guess.
+ tuple = ConvertPartitionTupleSlot(mtstate,
+
mtstate->mt_perleaf_parentchild_maps[leaf_part_index],
+
The 2nd line here seems to have gone over 80 characters.
ISTM, ConvertPartitionTupleSlot has a very specific job and a bit complex
interface. I guess it could simply have the following interface:
static HeapTuple ConvertPartitionTuple(ModifyTabelState *mtstate,
HeapTuple tuple, bool is_update);
And figure out, based on the value of is_update, which map to use and
which slot to set *p_new_slot to (what is now "new_slot" argument).
You're getting mtstate here anyway, which contains all the information you
need here. It seems better to make that (selecting which map and which
slot) part of the function's implementation if we're having this function
at all, imho. Maybe I'm missing some details there, but my point still
remains that we should try to put more logic in that function instead of
it just do the mechanical tuple conversion.
+ * We have already checked partition constraints above, so skip them
+ * below.
How about: ", so skip checking here."?
ISTM, the newly introduced logic in ExecSetupTransitionCaptureState() to
try to reuse the per-subplan child-to-parent map as per-leaf
child-to-parent map could be simplified a bit. I mean the following code:
+ /*
+ * But for Updates, we can share the per-subplan maps with the per-leaf
+ * maps.
+ */
+ update_rri_index = 0;
+ update_rri = mtstate->resultRelInfo;
+ if (mtstate->mt_nplans > 0)
+ cur_reloid = RelationGetRelid(update_rri[0].ri_RelationDesc);
- /* Choose the right set of partitions */
- if (mtstate->mt_partition_dispatch_info != NULL)
+ for (i = 0; i < numResultRelInfos; ++i)
+ {
<snip>
How about (pseudo-code):
j = 0;
for (i = 0; i < n_leaf_parts; i++)
{
if (j < n_subplans && leaf_rri[i]->oid == subplan_rri[j]->oid)
{
leaf_childparent_map[i] = subplan_childparent_map[j];
j++;
}
else
{
leaf_childparent_map[i] = new map
}
}
I think the above would also be useful in ExecSetupPartitionTupleRouting()
where you've added similar code to try to reuse per-subplan ResultRelInfos.
In ExecInitModifyTable(), can we try to minimize the number of places
where update_tuple_routing_needed is being set. Currently, it's being set
in 3 places:
+ bool update_tuple_routing_needed = node->part_cols_updated;
&
+ /*
+ * If this is an UPDATE and a BEFORE UPDATE trigger is present,
we may
+ * need to do update tuple routing.
+ */
+ if (resultRelInfo->ri_TrigDesc &&
+ resultRelInfo->ri_TrigDesc->trig_update_before_row &&
+ operation == CMD_UPDATE)
+ update_tuple_routing_needed = true;
&
+ /* Decide whether we need to perform update tuple routing. */
+ if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+ update_tuple_routing_needed = false;
In the following:
ExecSetupPartitionTupleRouting(rel,
+ (operation == CMD_UPDATE ?
+ mtstate->resultRelInfo : NULL),
+ (operation == CMD_UPDATE ? nplans
: 0),
Can the second parameter be made to not span two lines? It was a bit hard
for me to see that there two new parameters.
+ * Construct mapping from each of the resultRelInfo attnos to the root
Maybe it's odd to say "resultRelInfo attno", because it's really the
underlying partition whose attnos we're talking about as being possibly
different from the root table's attnos.
+ * descriptor. In such case we need to convert tuples to the root
s/In such case/In such a case,/
By the way, I've seen in a number of places that the patch calls "root
table" a partition. Not just in comments, but also a variable appears to
be given a name which contains rootpartition. I can see only one instance
where root is called a partition in the existing source code, but it seems
to have been introduced only recently:
allpaths.c:1333: * A root partition will already have a
+ * qual for each partition. Note that, if there are SubPlans in
there,
+ * they all end up attached to the one parent Plan node.
The sentence starting with "Note that, " is a bit unclear.
+ Assert(update_tuple_routing_needed ||
+ (operation == CMD_INSERT &&
+ list_length(node->withCheckOptionLists) == 1 &&
+ mtstate->mt_nplans == 1));
The comment I complained about above is perhaps about this Assert.
- List *mapped_wcoList;
+ List *mappedWco;
Not sure why this rename. After this rename, it's now inconsistent with
the code above which handles non-partitioned case, which still calls it
wcoList. Maybe, because you introduced firstWco and then this line:
+ firstWco = linitial(node->withCheckOptionLists);
but note that each member of node->withCheckOptionLists is also a list, so
the original naming. Also, further below, you're assigning mappedWco to
a List * field.
+ resultRelInfo->ri_WithCheckOptions = mappedWco;
Comments on the optimizer changes:
+get_all_partition_cols(List *rtables,
Did you mean rtable?
get_all_partition_cols() seems to go over the rtable as many times as
there are partitioned tables in the tree. Is there a way to do this work
somewhere else? Maybe when the partitioned_rels list is built in the
first place. But that would require us to make changes to extract
partition columns in some place (prepunion.c) where it's hard to justify
why it's being done there at all.
+ get_all_partition_cols(root->parse->rtable, top_parentRTindex,
+ partitioned_rels, &all_part_cols);
Two more spaces needed on the 2nd line.
+ * If all_part_cols_p is non-NULL, *all_part_cols_p is set to a bitmapset
+ * of all partitioning columns used by the partitioned table or any
+ * descendent.
+ *
Dead comment? Aha, so here's where all_part_cols was being set before...
+ TupleTableSlot *mt_rootpartition_tuple_slot;
I guess I was complaining about this field where you call root a
partition. Maybe, mt_root_tuple_slot would suffice.
Thanks again for working on this.
Thanks,
Amit
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2017-10-16 05:13:12 | Re: How does postgres store the join predicate for a relation in a given query |
Previous Message | Craig Ringer | 2017-10-16 02:39:16 | Re: Determine state of cluster (HA) |