From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu> |
Subject: | Re: speed up verifying UTF-8 |
Date: | 2021-10-19 21:42:40 |
Message-ID: | CAFBsxsHUgNeytyF6TyoUBgf8whqRxvStbWtok9qcDJzDZ78FLw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've decided I'm not quite comfortable with the additional complexity in
the build system introduced by the SIMD portion of the previous patches. It
would make more sense if the pure C portion were unchanged, but with the
shift-based DFA plus the bitwise ASCII check, we have a portable
implementation that's still a substantial improvement over the current
validator. In v24, I've included only that much, and the diff is only about
1/3 as many lines. If future improvements to COPY FROM put additional
pressure on this path, we can always add SIMD support later.
One thing not in this patch is a possible improvement to
pg_utf8_verifychar() that Heikki and I worked on upthread as part of
earlier attempts to rewrite pg_utf8_verifystr(). That's worth looking into
separately.
On Thu, Aug 26, 2021 at 12:09 PM Vladimir Sitnikov <
sitnikov(dot)vladimir(at)gmail(dot)com> wrote:
>
> >Attached is v23 incorporating the 32-bit transition table, with the
necessary comment adjustments
>
> 32bit table is nice.
Thanks for taking a look!
> Would you please replace
https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL
with
>
https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp
> in the header of src/port/pg_utf8_fallback.c?
>
> It would make the URL more stable in case the file gets renamed.
>
> Vladimir
>
Makes sense, so done that way.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v24-0001-Add-fast-path-for-validating-UTF-8-text.patch | application/octet-stream | 23.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2021-10-19 21:57:31 | Re: [RFC] building postgres with meson |
Previous Message | Isaac Morland | 2021-10-19 21:29:16 | Re: CREATE ROLE IF NOT EXISTS |