From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Ants Aasma <ants(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-05-11 12:12:42 |
Message-ID: | CAA4eK1+ANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 15, 2020 at 11:49 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> To be clear, I was only thinking of using a ringbuffer to indicate split
> boundaries. And that workers would just pop entries from it before they
> actually process the data (stored outside of the ringbuffer). Since the
> split boundaries will always be read in order by workers, and the
> entries will be tiny, there's no need to avoid copying out entries.
>
I think the binary mode processing will be slightly different because
unlike text and csv format, the data is stored in Length, Value format
for each column and there are no line markers. I don't think there
will be a big difference but still, we need to somewhere keep the
information what is the format of data in ring buffers. Basically, we
can copy the data in Length, Value format and once the writers know
about the format, they will parse the data in the appropriate format.
We currently also have a different way of parsing the binary format,
see NextCopyFrom. I think we need to be careful about avoiding
duplicate work as much as possible.
Apart from this, we have analyzed the other cases as mentioned below
where we need to decide whether we can allow parallelism for the copy
command.
Case-1:
Do we want to enable parallelism for a copy when transition tables are
involved? Basically, during the copy, we do capture tuples in
transition tables for certain cases like when after statement trigger
accesses the same relation on which we have a trigger. See the
example below [1]. We decide this in function
MakeTransitionCaptureState. For such cases, we collect minimal tuples
in the tuple store after processing them so that later after statement
triggers can access them. Now, if we want to enable parallelism for
such cases, we instead need to store and access tuples from shared
tuple store (sharedtuplestore.c/sharedtuplestore.h). However, it
doesn't have the facility to store tuples in-memory, so we always need
to store and access from a file which could be costly unless we also
have an additional way to store minimal tuples in shared memory till
work_memory and then in shared tuple store. It is possible to do all
this or part of this work to enable parallel copy for such cases but I
am not sure if it is worth it. We can decide to not enable parallelism
for such cases and later allow if we see demand for the same and it
will also help us to not introduce additional work/complexity in the
first version of the patch.
Case-2:
The Single Insertion mode (CIM_SINGLE) is performed in various
scenarios and whether we can allow parallelism for those depends on
case to case basis which is discussed below:
a. When there are BEFORE/INSTEAD OF triggers on the table. We don't
allow multi-inserts in such cases because such triggers might query
the table we're inserting into and act differently if the tuples that
have already been processed and prepared for insertion are not there.
Now, if we allow parallelism with such triggers the behavior would
depend on if the parallel worker has already inserted or not that
particular row. I guess such functions should ideally be marked as
parallel-unsafe. So, in short in this case whether to allow
parallelism or not depends upon the parallel-safety marking of this
function.
b. For partitioned tables, we can't support multi-inserts when there
are any statement-level insert triggers. This is because as of now,
we expect that any before row insert and statement-level insert
triggers are on the same relation. Now, there is no harm in allowing
parallelism for such cases but it depends upon if we have the
infrastructure (basically allow tuples to be collected in shared tuple
store) to support statement-level insert triggers.
c. For inserts into foreign tables. We can't allow the parallelism in
this case because each worker needs to establish the FDW connection
and operate in a separate transaction. Now unless we have a
capability to provide a two-phase commit protocol for "Transactions
involving multiple postgres foreign servers" (which is being discussed
in a separate thread [2]), we can't allow this.
d. If there are volatile default expressions or the where clause
contains a volatile expression. Here, we can check if the expression
is parallel-safe, then we can allow parallelism.
Case-3:
In copy command, for performing foreign key checks, we take KEY SHARE
lock on primary key table rows which inturn will increment the command
counter and updates the snapshot. Now, as we share the snapshots at
the beginning of the command, we can't allow it to be changed later.
So, unless we do something special for it, I think we can't allow
parallelism in such cases.
I couldn't think of many problems if we allow parallelism in such
cases. One inconsistency, if we allow FK checks via workers, would be
that at the end of COPY the value of command_counter will not be what
we expect as we wouldn't have accounted for that from workers. Now,
if COPY is being done in a transaction it will not assign the correct
values to the next commands. Also, for executing deferred triggers,
we use transaction snapshot, so if anything is changed in snapshot via
parallel workers, ideally it should have synced the changed snapshot
in the worker.
Now, the other concern could be that different workers can try to
acquire KEY SHARE lock on the same tuples which they will be able to
acquire due to group locking or otherwise but I don't see any problem
with it.
I am not sure if it above leads to any user-visible problem but I
might be missing something here. I think if we can think of any real
problems we can try to design a better solution to address those.
Case-4:
For Deferred Triggers, it seems we record CTIDs of tuples (via
ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred
triggers at transaction end using AfterTriggerFireDeferred or at end
of the statement. The challenge to allow parallelism for such cases
is we need to capture the CTID events in shared memory. For that, we
either need to invent a new infrastructure for event capturing in
shared memory which will be a huge task on its own. The other idea is
to get CTIDs via shared memory and then add those to event queues via
leader but I think in that case we need to ensure the order of CTIDs
(basically it should be in the same order in which we have processed
them).
[1] -
create or replace function dump_insert() returns trigger language plpgsql as
$$
begin
raise notice 'trigger = %, new table = %',
TG_NAME,
(select string_agg(new_table::text, ', ' order by a)
from new_table);
return null;
end;
$$;
create table test (a int);
create trigger trg1_test after insert on test referencing new table
as new_table for each statement execute procedure dump_insert();
copy test (a) from stdin;
1
2
3
\.
[2] - https://www.postgresql.org/message-id/20191206.173215.1818665441859410805.horikyota.ntt%40gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2020-05-11 12:26:33 | Re: gcov coverage data not full with immediate stop |
Previous Message | Andrew Dunstan | 2020-05-11 12:07:27 | Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES) |