From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com> |
Cc: | Стас <stas(dot)kelvich(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Subject: | Re: GSOC'17 project introduction: Parallel COPY execution with errors handling |
Date: | 2017-04-06 13:47:46 |
Message-ID: | CAPpHfdvV8FC67Emeb9XJpULkMOtrJiyC0dGL7FMSyRZ2SLk=5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Alexey!
On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov <
kondratov(dot)aleksey(at)gmail(dot)com> wrote:
> Thank you for your responses and valuable comments!
>
> I have written draft proposal https://docs.google.com/document/d/1Y4mc_
> PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit
>
> It seems that COPY currently is able to return first error line and error
> type (extra or missing columns, type parse error, etc).
> Thus, the approach similar to the Stas wrote should work and, being
> optimised for a small number of error rows, should not
> affect COPY performance in such case.
>
> I will be glad to receive any critical remarks and suggestions.
>
I've following questions about your proposal.
1. Suppose we have to insert N records
> 2. We create subtransaction with these N records
> 3. Error is raised on k-th line
> 4. Then, we can safely insert all lines from 1st and till (k - 1)
>
5. Report, save to errors table or silently drop k-th line
> 6. Next, try to insert lines from (k + 1) till N with another
> subtransaction
> 7. Repeat until the end of file
Do you assume that we start new subtransaction in 4 since subtransaction we
started in 2 is rolled back?
I am planning to use background worker processes for parallel COPY
> execution. Each process will receive equal piece of the input file. Since
> file is splitted by size not by lines, each worker will start import from
> the first new line to do not hit a broken line.
I think that situation when backend is directly reading file during COPY is
not typical. More typical case is \copy psql command. In that case "COPY
... FROM stdin;" is actually executed while psql is streaming the data.
How can we apply parallel COPY in this case?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2017-04-06 14:37:25 | Re: LWLock optimization for multicore Power machines |
Previous Message | Kevin Grittner | 2017-04-06 13:31:37 | Re: [HACKERS] [GSoC] Push-based query executor discussion |