Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Sergei Kornilov <sk(at)zsrv(dot)org>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2019-02-21 17:08:39
Message-ID: CA+TgmoakZiRz+jcUJ0PzgHXjmbJisZqoT89Xjt8vLEWVw5XPeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 4, 2019 at 12:54 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Feb 4, 2019 at 12:02 AM David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > If the PartitionDesc from the parallel worker has an extra partition
> > than what was there when the plan was built then the partition index
> > to subplan index translation will be incorrect as the
> > find_matching_subplans_recurse() will call get_matching_partitions()
> > using the context with the PartitionDesc containing the additional
> > partition. The return value from get_matching_partitions() is fine,
> > it's just that the code inside the while ((i =
> > bms_next_member(partset, i)) >= 0) loop that will do the wrong thing.
> > It could even crash if partset has an index out of bounds of the
> > subplan_map or subpart_map arrays.
>
> Is there any chance you've missed the fact that in one of the later
> patches in the series I added code to adjust the subplan_map and
> subpart_map arrays to compensate for any extra partitions?

In case that wasn't clear enough, my point here is that while the
leader and workers could end up with different ideas about the shape
of the PartitionDesc, each would end up with a subplan_map and
subpart_map array adapted to the view of the PartitionDesc with which
they ended up, and therefore, I think, everything should work. So far
there is, to my knowledge, no situation in which a PartitionDesc index
gets passed between one backend and another, and as long as we don't
do that, it's not really necessary for them to agree; each backend
needs to individually ignore any concurrently added partitions not
contemplated by the plan, but it doesn't matter whether backend A and
backend B agree on which partitions were concurrently added, just that
each ignores the ones it knows about.

Since time is rolling along here, I went ahead and committed 0001
which seems harmless even if somebody finds a huge problem with some
other part of this. If anybody wants to review the approach or the
code before I proceed further, that would be great, but please speak
up soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-02-21 17:11:27 Re: insensitive collations
Previous Message Tom Lane 2019-02-21 17:02:25 Re: Unnecessary checks for new rows by some RI trigger functions?