Re: UPDATE of partition key

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-06-21 20:11:01
Message-ID: CA+TgmoZy1x2j4jy1vyJrWF772BUWS0iFPr-pTw2Ldi6WgvL0Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 21, 2017 at 1:37 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> e.g. the table is partitioned on order number, and you do UPDATE
>> lineitem SET product_code = 'K372B' WHERE product_code = 'K372'.
>
> This query does not update order number, so here there is no
> partition-key-update. Are you thinking that the patch is generating
> the per-leaf-partition WCO expressions even for a update not involving
> a partition key ?

No, it just wasn't a great example. Sorry.

>> Second, it will amount to a functional bug if you get a
>> different answer than the planner did.
>
> Actually, the per-leaf WCOs are meant to be executed on the
> destination partitions where the tuple is moved, while the WCOs
> belonging to the per-subplan resultRelInfo are meant for the
> resultRelinfo used for the UPDATE plans. So actually it should not
> matter whether they look same or different, because they are fired at
> different objects. Now these objects can happen to be the same
> relations though.
>
> But in any case, it's not clear to me how the mapped WCO and the
> planner's WCO would yield a different answer if they are both the same
> relation. I am possibly missing something. The planner has already
> generated the withCheckOptions for each of the resultRelInfo. And then
> we are using one of those to re-generate the WCO for a leaf partition
> by only adjusting the attnos. If there is already a WCO generated in
> the planner for that leaf partition (because that partition was
> present in mtstate->resultRelInfo), then the re-built WCO should be
> exactly look same as the earlier one, because they are the same
> relations, and so the attnos generated in them would be same since the
> Relation TupleDesc is the same.

If the planner's WCOs and mapped WCOs are always the same, then I
think we should try to avoid generating both. If they can be
different, but that's intentional and correct, then there's no
substantive problem with the patch but the comments need to make it
clear why we are generating both.

> Actually I meant, "above works for only local updates. For
> row-movement-updates, we need per-leaf partition WCOs, because when
> the row is inserted into target partition, that partition may be not
> be included in the above planner resultRelInfo, so we need WCOs for
> all partitions". I think this said comment should be sufficient if I
> add this in the code ?

Let's not get too focused on updating the comment until we are in
agreement about what the code ought to be doing. I'm not clear
whether you accept the point that the patch needs to be changed to
avoid generating the same WCOs and returning lists in both the planner
and the executor.

>> Also, I feel like it's probably not correct to use the first result
>> relation as the nominal relation for building WCOs and returning lists
>> anyway. I mean, if the first result relation has a different column
>> order than the parent relation, isn't this just broken? If it works
>> for some reason, the comments don't explain what that reason is.
>
> Not sure why parent relation should come into picture. As long as the
> first result relation belongs to one of the partitions in the whole
> partition tree, we should be able to use that to build WCOs of any
> other partitions, because they have a common set of attributes having
> the same name. So we are bound to find each of the attributes of first
> resultRelInfo in the other leaf partitions during attno mapping.

Well, at least for returning lists, we've got to generate the
returning lists so that they all match the column order of the parent,
not the parent's first child. Otherwise, for example, UPDATE
parent_table ... RETURNING * will not work correctly. The tuples
returned by the returning clause have to have the attribute order of
parent_table, not the attribute order of parent_table's first child.
I'm not sure whether WCOs have the same issue, but it's not clear to
me why they wouldn't: they contain a qual which is an expression tree,
and presumably there are Var nodes in there someplace, and if so, then
they have varattnos that have to be right for the purpose for which
they're going to be used.

>> + for (i = 0; i < num_rels; i++)
>> + {
>> + ResultRelInfo *resultRelInfo = &result_rels[i];
>> + Relation rel = resultRelInfo->ri_RelationDesc;
>> + Bitmapset *expr_attrs = NULL;
>> +
>> + pull_varattnos((Node *) rel->rd_partcheck, 1, &expr_attrs);
>> +
>> + /* Both bitmaps are offset by FirstLowInvalidHeapAttributeNumber. */
>> + if (bms_overlap(expr_attrs, GetUpdatedColumns(resultRelInfo, estate)))
>> + return true;
>> + }
>>
>> This seems like an awfully expensive way of performing this test.
>> Under what circumstances could this be true for some result relations
>> and false for others;
>
> One resultRelinfo can have no partition key column used in its quals,
> but the next resultRelinfo can have quite different quals, and these
> quals can have partition key referred. This is possible if the two of
> them have different parents that have different partition-key columns.

Hmm, true. So if we have a table foo that is partitioned by list (a),
and one of its children is a table bar that is partitioned by list
(b), then we need to consider doing tuple-routing if either column a
is modified, or if column b is modified for a partition which is a
descendant of bar. But visiting that only requires looking at the
partitioned table and those children that are also partitioned, not
all of the leaf partitions as the patch does.

>> - and it should be able to test that by a
>> *single* comparison between the set of columns being updated and the
>> partitioning columns, without needing to repeat for every partitions.
>
> If bms_overlap() returns true for the very first resultRelinfo, it
> will return immediately. But yes, if there are no relations using
> partition key, we will have to scan all of these relations. But again,
> note that these are pruned leaf partitions, they typically will not
> contain all the leaf partitions.

But they might, and then this will be inefficient. Just because the
patch doesn't waste many cycles in the case where most partitions are
pruned doesn't mean that it's OK for it to waste cycles when few
partitions are pruned.

>> One problem is that, if BR UPDATE triggers are in fact allowed to
>> cause tuple routing as I proposed above, the presence of a BR UPDATE
>> trigger for any partition could necessitate UPDATE tuple routing for
>> queries that wouldn't otherwise need it.
>
> You mean always setup update tuple routing if there's a BR UPDATE
> trigger ?

Yes.

> Actually I was going for disallowing BR update trigger to
> initiate tuple routing, as I described above.

I know that! But as I said before, they requires evaluating every
partition key constraint twice per tuple, which seems very expensive.
I'm very doubtful that's a good approach.

>> But even if you end up
>> inserting a test for that case, it can surely be a lot cheaper than
>> this,
>
> I didn't exactly get why the bitmap_overlap() test needs to be
> compared with the presence-of-trigger test.

My point was: If you always set up tuple routing when a BR UPDATE
trigger is present, then you don't need to check the partition
constraint twice per tuple.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-06-21 20:13:41 Re: Making server name part of the startup message
Previous Message Merlin Moncure 2017-06-21 20:10:04 Re: postgresql transactons not fully isolated