From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-04-10 18:26:05 |
Message-ID: | 20200410182605.fc7nd2n5fv4ubedm@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-04-10 07:40:06 -0400, Robert Haas wrote:
> On Thu, Apr 9, 2020 at 4:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Imo, yes, there should be only one process doing the chunking. For ilp, cache efficiency, but also because the leader is the only process with access to the network socket. It should load input data into one large buffer that's shared across processes. There should be a separate ringbuffer with tuple/partial tuple (for huge tuples) offsets. Worker processes should grab large chunks of offsets from the offset ringbuffer. If the ringbuffer is not full, the worker chunks should be reduced in size.
>
> My concern here is that it's going to be hard to avoid processes going
> idle. If the leader does nothing at all once the ring buffer is full,
> it's wasting time that it could spend processing a chunk. But if it
> picks up a chunk, then it might not get around to refilling the buffer
> before other processes are idle with no work to do.
An idle process doesn't cost much. Processes that use CPU inefficiently
however...
> Still, it might be the case that having the process that is reading
> the data also find the line endings is so fast that it makes no sense
> to split those two tasks. After all, whoever just read the data must
> have it in cache, and that helps a lot.
Yea. And if it's not fast enough to split lines, then we have a problem
regardless of which process does the splitting.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2020-04-10 18:56:48 | Re: pg_validatebackup -> pg_verifybackup? |
Previous Message | Pavel Stehule | 2020-04-10 17:30:47 | Re: proposal: schema variables |