| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-committers(at)lists(dot)postgresql(dot)org | 
| Subject: | pgsql: Improve performance of binary COPY FROM through better buffering | 
| Date: | 2020-07-25 20:34:48 | 
| Message-ID: | E1jzQsa-0005xb-Jw@gemulon.postgresql.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-committers | 
Improve performance of binary COPY FROM through better buffering.
At least on Linux and macOS, fread() turns out to have far higher
per-call overhead than one could wish.  Reading 64KB of data at a time
and then parceling it out with our own memcpy logic makes binary COPY
from a file significantly faster --- around 30% in simple testing for
cases with narrow text columns (on Linux ... even more on macOS).
In binary COPY from frontend, there's no per-call fread(), and this
patch introduces an extra layer of memcpy'ing, but it still manages
to eke out a small win.  Apparently, the control-logic overhead in
CopyGetData() is enough to be worth avoiding for small fetches.
Bharath Rupireddy and Amit Langote, reviewed by Vignesh C,
cosmetic tweaks by me
Discussion: https://postgr.es/m/CALj2ACU5Bz06HWLwqSzNMN=Gupoj6Rcn_QVC+k070V4em9wu=A@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/0a0727ccfc5f4e2926623abe877bdc0b5bfd682e
Modified Files
--------------
src/backend/commands/copy.c | 118 +++++++++++++++++++++++++++++++-------------
1 file changed, 83 insertions(+), 35 deletions(-)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Noah Misch | 2020-07-25 21:52:47 | pgsql: Remove optimization for RAND_poll() failing. | 
| Previous Message | Tom Lane | 2020-07-25 16:55:23 | pgsql: Mark built-in coercion functions as leakproof where possible. |