Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Anastasia Lubennikova <lubennikovaAV(at)gmail(dot)com>
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-06-16 17:53:52
Message-ID: 2F15DA8D-4FFF-4C2E-8110-F6FDB7DB9C09@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 13 Jun 2017, at 01:44, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> I am not going to start with "speculative insertion" right now, but it would
>> be very
>> useful, if you give me a point, where to start. Maybe I will at least try to
>> evaluate
>> the complexity of the problem.
>
> Speculative insertion has the following special entry points to
> heapam.c and execIndexing.c, currently only called within
> nodeModifyTable.c
>
> Offhand, it doesn't seem like it would be that hard to teach another
> heap_insert() caller the same tricks.

I went through the nodeModifyTable.c code and it seems not to be so
difficult to do the same inside COPY.

> My sense is that it's going to be hard to sell a committer on any
> design that consumes subtransactions in a way that's not fairly
> obvious to the user, and doesn't have a pretty easily understood worse
> case.

Yes, and worse case probably will be a quite frequent case, since it is not possible to do heap_multy_insert, if BEFORE/INSTEAD triggers or partitioning exist (according to the current copy.c code). Thus, it will frequently fall back into a single heap_insert, each being wrapped with subtransaction will consume XIDs too greedy and seriously affect performance. I like my previous idea less and less.

> I haven't thought about this very carefully, but I guess you could do
> something like passing a flag to ExecConstraints() that indicates
> "don't throw an error; instead, just return false so I know not to
> proceed"

Currently ExecConstraints always throws an error and I do not think, that it would be wise from my side to modify its behaviour.

I have updated my patch (rebased over the topmost master commit 94da2a6a9a05776953524424a3d8079e54bc5d94). Please, find patch file attached or always up to date version on GitHub https://github.com/ololobus/postgres/pull/1/files <https://github.com/ololobus/postgres/pull/1/files>

Currently, It caches all major errors in the input data:

1) Rows with less/extra columns cause WARNINGs and are skipped

2) I found that input type format errors are thrown from the InputFunctionCall; and wrapped it up with PG_TRY/CATCH. I am not 100%

Alexey

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-06-16 17:58:28 Re: pg_waldump command line arguments
Previous Message Andres Freund 2017-06-16 17:52:30 Re: Why forcing Hot_standby_feedback to be enabled when creating a logical decoding slot on standby