From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-02-18 12:29:20 |
Message-ID: | CANwKhkPmM18UYpOt_AEB4JC6fa0dfA1PfgiQyNzeNUxEpG=XUw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> This is something similar to what I had also in mind for this idea. I
> had thought of handing over complete chunk (64K or whatever we
> decide). The one thing that slightly bothers me is that we will add
> some additional overhead of copying to and from shared memory which
> was earlier from local process memory. And, the tokenization (finding
> line boundaries) would be serial. I think that tokenization should be
> a small part of the overall work we do during the copy operation, but
> will do some measurements to ascertain the same.
I don't think any extra copying is needed. The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.
For serial performance of tokenization into lines, I really think a SIMD
based approach will be fast enough for quite some time. I hacked up the code in
the simdcsv project to only tokenize on line endings and it was able to
tokenize a CSV file with short lines at 8+ GB/s. There are going to be many
other bottlenecks before this one starts limiting. Patch attached if you'd
like to try that out.
Regards,
Ants Aasma
Attachment | Content-Type | Size |
---|---|---|
simdcsv-find-only-lineendings.diff | text/x-patch | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-02-18 12:31:57 | Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side |
Previous Message | Juan José Santamaría Flecha | 2020-02-18 11:26:06 | Re: Clean up some old cruft related to Windows |