Re: Parallel copy

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-04-15 17:15:45
Message-ID: 20200415171545.vzz33udntym7bjnb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-15 20:36:39 +0530, Kuntal Ghosh wrote:
> I was thinking from this point of view - the sooner we introduce
> parallelism in the process, the greater the benefits.

I don't really agree. Sure, that's true from a theoretical perspective,
but the incremental gains may be very small, and the cost in complexity
very high. If we can get single threaded splitting of rows to be >4GB/s,
which should very well be attainable, the rest of the COPY work is going
to dominate the time. We shouldn't add complexity to parallelize more
of the line splitting, caring too much about scalable datastructures,
etc when the bottleneck after some straightforward optimization is
usually still in the parallelized part.

I'd expect that for now we'd likely hit scalability issues in other
parts of the system first (e.g. extension locks, buffer mapping).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-15 17:19:07 Re: Poll: are people okay with function/operator table redesign?
Previous Message Steven Pousty 2020-04-15 17:10:59 Re: Poll: are people okay with function/operator table redesign?