From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY |
Date: | 2020-08-26 23:40:07 |
Message-ID: | 20200826234007.GA4584@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020-Aug-04, Robert Haas wrote:
> On Mon, Aug 3, 2020 at 7:49 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > Why two transactions? The reason is that in order for this to work, we
> > make a catalog change (mark it detached), and commit so that all
> > concurrent transactions can see the change. A second transaction waits
> > for anybody who holds any lock on the partitioned table and grabs Access
> > Exclusive on the partition (which now no one cares about, if they're
> > looking at the partitioned table), where the DDL action on the partition
> > can be completed.
>
> Is there a more detailed theory of operation of this patch somewhere?
> What exactly do you mean by marking it detached? Committing the change
> makes it possible for all concurrent transactions to see the change,
> but does not guarantee that they will. If you can't guarantee that,
> then I'm not sure how you can guarantee that they will behave sanely.
Sorry for the long delay. I haven't written up the theory of operation.
I suppose it is complicated enough that it should be documented
somewhere.
To mark it detached means to set pg_inherits.inhdetached=true. That
column name is a bit of a misnomer, since that column really means "is
in the process of being detached"; the pg_inherits row goes away
entirely once the detach process completes. This mark guarantees that
everyone will see that row because the detaching session waits long
enough after committing the first transaction and before deleting the
pg_inherits row, until everyone else has moved on.
The other point is that the partition directory code can be asked to
include detached partitions, or not to. The executor does the former,
and the planner does the latter. This allows a transient period during
which the partition descriptor returned to planner and executor is
different; this makes the situation equivalent to what would have
happened if the partition was attached during the operation: in executor
we would detect that there is an additional partition that was not seen
by the planner, and we already know how to deal with that situation by
your handling of the ATTACH code.
> Even if you can, it's not clear what the sane behavior is: what
> happens when a tuple gets routed to an ex-partition? What happens when
> an ex-partition needs to be scanned?
During the transient period, any transaction that obtained a partition
descriptor before the inhdetached mark is committed should be able to
get the tuple routing done successfully, but transactions using later
snapshots should get their insertions rejected. Reads should behave in
the same way.
> What prevents problems if a partition is detached, possibly modified,
> and then reattached, possibly with different partition bounds?
This should not be a problem, because the partition needs to be fully
detached before it can be attached again. And if the partition bounds
are different, that won't be a problem, because the previous partition
bounds won't be present in the pg_class row. Of course, the new
partition bounds will be checked to the existing contents.
There is one fly in the ointment though, which is that if you cancel the
wait and then immediately do the ALTER TABLE DETACH FINALIZE without
waiting for as long as the original execution would have waited, you
might end up killing the partition ahead of time. One solution to this
would be to cause the FINALIZE action to wait again at start. This
would cause it to take even longer, but it would be safer. (If you
don't want it to take longer, you can just not cancel it in the first
place.) This is not a problem if the server crashes in between (which
is the scenario I had in mind when doing the FINALIZE thing), because of
course no transaction can continue to run across a crash.
I'm going to see if I can get the new delay_execution module to help
test this, to verify whether my claims are true.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-08-27 00:05:04 | Re: How is bushy plans generated in join_search_one_lev |
Previous Message | Tom Lane | 2020-08-26 23:27:31 | Re: Issue with past commit: Allow fractional input values for integer GUCs ... |