Re: Parallel copy

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-06-03 18:38:59
Message-ID: 20200603183859.jnozwudjyyb4r7ag@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-06-03 12:13:14 -0400, Robert Haas wrote:
> On Mon, May 18, 2020 at 12:48 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > In the above case, even though we are executing a single command from
> > the user perspective, but the currentCommandId will be four after the
> > command. One increment will be for the copy command and the other
> > three increments are for locking tuple in PK table
> > (tab_fk_referenced_chk) for three tuples in FK table
> > (tab_fk_referencing_chk). Now, for parallel workers, it is
> > (theoretically) possible that the three tuples are processed by three
> > different workers which don't get synced as of now. The question was
> > do we see any kind of problem with this and if so can we just sync it
> > up at the end of parallelism.

> I strongly disagree with the idea of "just sync(ing) it up at the end
> of parallelism". That seems like a completely unprincipled approach to
> the problem. Either the command counter increment is important or it's
> not. If it's not important, maybe we can arrange to skip it in the
> first place. If it is important, then it's probably not OK for each
> backend to be doing it separately.

That scares me too. These command counter increments definitely aren't
unnecessary in the general case.

Even in the example you share above, aren't we potentially going to
actually lock rows multiple times from within the same transaction,
instead of once? If the command counter increments from within
ri_trigger.c aren't visible to other parallel workers/leader, we'll not
correctly understand that a locked row is invisible to heap_lock_tuple,
because we're not using a new enough snapshot (by dint of not having a
new enough cid).

I've not dug through everything that'd potentially cause, but it seems
pretty clearly a no-go from here.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-03 18:45:50 Re: Expand the use of check_canonical_path() for more GUCs
Previous Message Tom Lane 2020-06-03 18:35:29 Re: elog(DEBUG2 in SpinLocked section.