Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2018-11-08 04:01:21
Message-ID: 2db83a46-22cb-9a8c-3b45-e31958c4ac1e@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018/11/08 11:01, Robert Haas wrote:
> On Wed, Nov 7, 2018 at 7:06 PM David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> While the find_all_inheritors() call is something I'd like to see
>> gone, I assume it was done that way since an UPDATE might route a
>> tuple to a partition that there is no subplan for and due to INSERT
>> with VALUES not having any RangeTblEntry for any of the partitions.
>> Simply, any partition which is a descendant of the target partition
>> table could receive the tuple regardless of what might have been
>> pruned.
>
> Thanks. I had figured out since my email of earlier today that it was
> needed in the INSERT case, but I had not thought of/discovered the
> case of an UPDATE that routes a tuple to a pruned partition. I think
> that latter case may not be tested in our regression tests, which is
> perhaps something we ought to change.
>
> Honestly, I *think* that the reason that find_all_inheritors() call is
> there is because I had the idea that it was important to try to lock
> partition hierarchies in the same order in all cases so as to avoid
> spurious deadlocks. However, I don't think we're really achieving
> that goal despite this code. If we arrive at this point having
> already locked some relations, and then lock some more, based on
> whatever got pruned, we're clearly not using a deterministic locking
> order. So I think we could probably rip out the find_all_inheritors()
> call here and change the NoLock in get_partition_dispatch_recurse() to
> just take a lock. That's probably a worthwhile simplification and a
> slight optimization regardless of anything else.

A patch that David and I have been working on over at:

https://commitfest.postgresql.org/20/1690/

does that. With that patch, partitions (leaf or not) are locked and
opened only if a tuple is routed to them. In edd44738bc (Be lazier about
partition tuple routing), we postponed the opening of leaf partitions, but
we still left the RelationGetPartitionDispatchInfo machine which
recursively creates PartitionDispatch structs for all partitioned tables
in a tree. The patch mentioned above postpones even the partitioned
partition initialization to a point after a tuple is routed to it.

The patch doesn't yet eliminate the find_all_inheritors call from
ExecSetupPartitionTupleRouting. But that's mostly because of the fear
that if we start becoming lazier about locking individual partitions too,
we'll get non-deterministic locking order on partitions that we might want
to avoid for deadlock fears. Maybe, we don't need to be fearful though.

> But I really think it would be better if we could also jigger this to
> avoid reopening relations which the executor has already opened and
> locked elsewhere. Unfortunately, I don't see a really simple way to
> accomplish that. We get the OIDs of the descendents and want to know
> whether there is range table entry for that OID; but there's no data
> structure which answers that question at present, I believe, and
> introducing one just for this purpose seems like an awful lot of new
> machinery. Perhaps that new machinery would still have less
> far-reaching consequences than the machinery Alvaro is proposing, but,
> still, it's not very appealing.

The newly added ExecGetRangeTableRelation opens (if not already done) and
returns a Relation pointer for tables that are present in the range table,
so requires to be passed a valid RT index. That works for tables that the
planner touched. UPDATE tuple routing benefits from that in cases where
the routing target is already in the range table.

For insert itself, planner adds only the target partitioned table to the
range table. Partitions that the inserted tuples will route to may be
present in the range table via some other plan node, but the insert's
execution state won't know about them, so it cannot use
EcecGetRangeTableRelation.

> Perhaps one idea is only open and lock partitions on demand - i.e. if
> a tuple actually gets routed to them. There are good reasons to do
> that independently of reducing lock levels, and we certainly couldn't
> do it without having some efficient way to check whether it had
> already been done. So then the mechanism wouldn't feel like so much
> like a special-purpose hack just for concurrent ATTACH/DETACH. (Was
> Amit Langote already working on this, or was that some other kind of
> on-demand locking?)

I think the patch mentioned above gets us closer to that goal.

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-08 04:31:28 Re: move PartitionBoundInfo creation code
Previous Message Amit Langote 2018-11-08 03:59:27 Re: move PartitionBoundInfo creation code