From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [POC] verifying UTF-8 using SIMD instructions |
Date: | 2021-02-24 16:25:49 |
Message-ID: | CAFBsxsHDje3heVrN+ky_rYc+1DfS-Hg+By=EWKwWB5d5Uvtkjg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
The cfbot reported a build failure on Windows because of the use of binary
literals. I've turned those into hex for v6, so let's see how far it gets
now.
I also decided to leave out the patch that adds an ascii fast path to
non-UTF-8 encodings. That would really require more testing than I have
time for.
As before, 0001 is v4 of Heikk's noError conversion patch, whose
regressions tests I build upon.
0002 has no ascii fast path in the fallback implementation. 0003 and 0004
add it back in using 8- and 16-byte strides, respectively. That will make
it easier to test on non-Intel platforms, so we can decide which way to go
here. Also did a round of editing the comments in the SSE4.2 file.
I ran the multibyte conversion regression test found in the message below,
and it passed. That doesn't test UTF-8 explicitly, but all conversions
round-trip through UTF-8, so it does get some coverage.
https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Add-noError-argument-to-encoding-conversion-funct.patch | application/octet-stream | 225.0 KB |
v6-0002-Use-SSE-4-for-verifying-UTF-8-text.patch | application/octet-stream | 48.7 KB |
v6-0003-Add-an-ASCII-fast-path-to-the-fallback-UTF-8-vali.patch | application/octet-stream | 2.2 KB |
v6-0004-Widen-the-ASCII-fast-path-stride-in-the-fallback-.patch | application/octet-stream | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-02-24 16:47:49 | Re: Bizarre behavior of \w in a regular expression bracket construct |
Previous Message | Alexey Lesovsky | 2021-02-24 16:15:14 | Re: Asynchronous and "direct" IO support for PostgreSQL. |