Re: Parallel copy

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-02-24 01:09:51
Message-ID: 20200224010951.bxecdyaduyjktg6q@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-02-19 11:38:45 +0100, Tomas Vondra wrote:
> I generally agree with the impression that parsing CSV is tricky and
> unlikely to benefit from parallelism in general. There may be cases with
> restrictions making it easier (e.g. restrictions on the format) but that
> might be a bit too complex to start with.
>
> For example, I had an idea to parallelise the planning by splitting it
> into two phases:

FWIW, I think we ought to rewrite our COPY parsers before we go for
complex schemes. They're way slower than a decent green-field
CSV/... parser.

> The one piece of information I'm missing here is at least a very rough
> quantification of the individual steps of CSV processing - for example
> if parsing takes only 10% of the time, it's pretty pointless to start by
> parallelising this part and we should focus on the rest. If it's 50% it
> might be a different story. Has anyone done any measurements?

Not recently, but I'm pretty sure that I've observed CSV parsing to be
way more than 10%.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-02-24 01:31:09 Re: Error on failed COMMIT
Previous Message Robert Haas 2020-02-24 00:48:54 Re: Parallel copy