Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-04-09 10:52:05
Message-ID: CAA4eK1LNq_juinpTUhsLMwLQiz5Q0mv6=MebiMtP0TZ3ijVyyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 9, 2020 at 4:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants(at)cybertec(dot)at> wrote:
> > >
> > > With option 1 it's not possible to read input data into shared memory
> > > and there needs to be an extra memcpy in the time critical sequential
> > > flow of the leader. With option 2 data could be read directly into the
> > > shared memory buffer. With future async io support, reading and
> > > looking for tuple boundaries could be performed concurrently.
> >
> > But option 2 still seems significantly worse than your proposal above, right?
> >
> > I really think we don't want a single worker in charge of finding
> > tuple boundaries for everybody. That adds a lot of unnecessary
> > inter-process communication and synchronization. Each process should
> > just get the next tuple starting after where the last one ended, and
> > then advance the end pointer so that the next process can do the same
> > thing. Vignesh's proposal involves having a leader process that has to
> > switch roles - he picks an arbitrary 25% threshold - and if it doesn't
> > switch roles at the right time, performance will be impacted. If the
> > leader doesn't get scheduled in time to refill the queue before it
> > runs completely empty, workers will have to wait. Ants's scheme avoids
> > that risk: whoever needs the next tuple reads the next line. There's
> > no need to ever wait for the leader because there is no leader.
> >
>
> Hmm, I think in his scheme also there is a single reader process. See
> the email above [1] where he described how it should work.
>

oops, I forgot to specify the link to the email. See
https://www.postgresql.org/message-id/CANwKhkO87A8gApobOz_o6c9P5auuEG1W2iCz0D5CfOeGgAnk3g%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-04-09 11:01:43 Re: Parallel copy
Previous Message Amit Kapila 2020-04-09 10:50:49 Re: Parallel copy