From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Perform COPY FROM encoding conversions in larger chunks |
Date: | 2020-12-22 20:01:48 |
Message-ID: | CAFBsxsH4Zum8e+i1jGjQhGW+8fYWwJ7EqOKCx6P_cUzOJUK9qA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 16, 2020 at 8:18 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> Currently, COPY FROM parses the input one line at a time. Each line is
> converted to the database encoding separately, or if the file encoding
> matches the database encoding, we just check that the input is valid for
> the encoding. It would be more efficient to do the encoding
> conversion/verification in larger chunks. At least potentially; the
> current conversion/verification implementations work one byte a time so
> it doesn't matter too much, but there are faster algorithms out there
> that use SIMD instructions or lookup tables that benefit from larger
inputs.
Hi Heikki,
This is great news. I've seen examples of such algorithms and that'd be
nice to have. I haven't studied the patch in detail, but it looks fine on
the whole.
In 0004, it seems you have some doubts about upgrade compatibility. Is that
because user-defined conversions would no longer have the right signature?
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alastair Turner | 2020-12-22 20:15:27 | Re: Proposed patch for key managment |
Previous Message | Tom Lane | 2020-12-22 19:33:22 | Re: libpq compression |