Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-18 05:15:36
Message-ID: CAJcOf-c30g_L=H2EkpwqhZ0w3fxHdgTVfSAfP1fhnOqbZqb-Wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Here is an additional review of
> v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
> quite a few comments raised on the V11-0001* patch. I suggest first
> post a revised version of V11-0001* patch addressing those comments
> and then you can separately post a revised version of
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.
>

1)

>Here, it seems we are accessing the relation descriptor without any
>lock on the table which is dangerous considering the table can be
>modified in a parallel session. Is there a reason why you think this
>is safe? Did you check anywhere else such a coding pattern?

Yes, there's a very good reason and I certainly have checked for the
same coding pattern elsewhere, and not just randomly decided that
locking can be ignored.
The table has ALREADY been locked (by the caller) during the
parse/analyze phase.
(This is not the case for a partition, in which case the patch code
uses AccessShareLock, as you will see).
And BTW, with asserts enabled, an attempt to table_open() with NoLock
when you haven't already locked the table will fire an assert - see
following code in relation_open():

/*
* If we didn't get the lock ourselves, assert that caller holds one,
* except in bootstrap mode where no locks are used.
*/
Assert(lockmode != NoLock ||
IsBootstrapProcessingMode() ||
CheckRelationLockedByMe(r, AccessShareLock, true));

2)

>+ /*
>+ * If there are any index expressions, check that they are parallel-mode
>+ * safe.
>+ */
>+ max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context);
>+ if (max_parallel_hazard_test(max_hazard, context))
>+ {
>+ table_close(rel, lockmode);
>+ return context->max_hazard;
>+ }

>Here and at all other similar places, the call to
>max_parallel_hazard_test seems redundant because
>index_expr_max_parallel_hazard_for_modify would have already done
>that. Why can't we just return true/false from
>index_expr_max_parallel_hazard_for_modify? The context would have been
>already updated for max_hazard.

Yes, you're right, it's redundant to call max_parallel_hazard_test(_)
again. The max_hazard can always be retrieved from the context if
needed (rather than explicitly returned), so I'll change this function
(and any similar cases) to just return true if the max_hazard of
interest has been reached.

3)

>I don't like this way of checking parallel_hazard for modify. This not
>only duplicates some code in max_parallel_hazard_for_modify from
>max_parallel_hazard but also appears quite awkward. Can we move
>max_parallel_hazard_for_modify inside max_parallel_hazard? Basically,
>after calling max_parallel_hazard_walker, we can check for modify
>statement and call the new function.

Agree, I'll move it, as you suggest.

4)

>domain_max_parallel_hazard_for_modify()
>{
>..
>+ if (isnull)
>+ {
>+ /*
>+ * This shouldn't ever happen, but if it does, log a WARNING
>+ * and return UNSAFE, rather than erroring out.
>+ */
>+ elog(WARNING, "null conbin for constraint %u", con->oid);
>+ context->max_hazard = PROPARALLEL_UNSAFE;
>+ break;
>+ }
>..
>}
>index_expr_max_parallel_hazard_for_modify()
>{
>..
>+ if (index_expr_item == NULL) /* shouldn't happen */
>+ {
>+ index_close(index_rel, lockmode);
>+ context->max_hazard = PROPARALLEL_UNSAFE;
>+ return context->max_hazard;
>+ }
>..
>}

>It is not clear why the above two are shouldn't happen cases and if so
>why we want to treat them as unsafe. Ideally, there should be an
>Assert if these can't happen but it is difficult to decide without
>knowing why you have considered them unsafe?

The checks being done here for "should never happen" cases are THE
SAME as other parts of the Postgres code.
For example, search Postgres code for "null conbin" and you'll find 6
other places in the Postgres code which actually ERROR out if conbin
(binary representation of the constraint) in a pg_constraint tuple is
found to be null.
The cases that you point out in the patch used to also error out in
the same way, but Vignesh suggested changing them to just return
parallel-unsafe instead of erroring-out, which I agree with. Such
cases could surely ever only happen if the DB had been corrupted, so
extremely rare IMHO and most likely to have caused an ERROR elsewhere
before ever reaching here...
I can add some Asserts to the current code, to better alert for such
cases, for when asserts are enabled, but otherwise I think that the
current code is OK (cleaning up other Postgres code is beyond the
scope of the task here).

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-01-18 05:18:16 Re: New IndexAM API controlling index vacuum strategies
Previous Message Michael Paquier 2021-01-18 05:12:29 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly