From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-04-09 10:50:49 |
Message-ID: | CAA4eK1JO7A5wywsp8v-F=7zhcnOAYUgQAjBHvt_ZM4recjz_Vw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants(at)cybertec(dot)at> wrote:
> >
> > With option 1 it's not possible to read input data into shared memory
> > and there needs to be an extra memcpy in the time critical sequential
> > flow of the leader. With option 2 data could be read directly into the
> > shared memory buffer. With future async io support, reading and
> > looking for tuple boundaries could be performed concurrently.
>
> But option 2 still seems significantly worse than your proposal above, right?
>
> I really think we don't want a single worker in charge of finding
> tuple boundaries for everybody. That adds a lot of unnecessary
> inter-process communication and synchronization. Each process should
> just get the next tuple starting after where the last one ended, and
> then advance the end pointer so that the next process can do the same
> thing. Vignesh's proposal involves having a leader process that has to
> switch roles - he picks an arbitrary 25% threshold - and if it doesn't
> switch roles at the right time, performance will be impacted. If the
> leader doesn't get scheduled in time to refill the queue before it
> runs completely empty, workers will have to wait. Ants's scheme avoids
> that risk: whoever needs the next tuple reads the next line. There's
> no need to ever wait for the leader because there is no leader.
>
Hmm, I think in his scheme also there is a single reader process. See
the email above [1] where he described how it should work. I think
the difference is in the division of work. AFAIU, in Ants scheme, the
worker needs to pick the work from tuple_offset queue whereas in
Vignesh's scheme it will be based on the size (each worker will get
probably 64KB of work). I think in his scheme the main thing to find
out is how many tuple offsets to be assigned to each worker in one-go
so that we don't unnecessarily add contention for finding the work
unit. I think we need to find the right balance between size and
number of tuples. I am trying to consider size here because larger
sized tuples will probably require more time as we need to allocate
more space for them and also probably requires more processing time.
One way to achieve that could be each worker will try to claim 500
tuples (or some other threshold number) but if their size is greater
than 64K (or some other threshold size) then the worker will try with
lesser number of tuples (such that the size of the chunk of tuples is
less than a threshold size.).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-04-09 10:52:05 | Re: Parallel copy |
Previous Message | Jehan-Guillaume de Rorthais | 2020-04-09 09:35:12 | Re: [BUG] non archived WAL removed during production crash recovery |