From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [POC] verifying UTF-8 using SIMD instructions |
Date: | 2021-02-10 04:00:53 |
Message-ID: | CAFBsxsHqsgKc60+2u5FpRQMCcmkzemtMK0avm7fmuDzL-R0KPw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 9, 2021 at 4:22 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 09/02/2021 22:08, John Naylor wrote:
> > Maybe there's a smarter way to check for zeros in C. Or maybe be more
> > careful about cache -- running memchr() on the whole input first might
> > not be the best thing to do.
>
> The usual trick is the haszero() macro here:
> https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's
> how memchr() is typically implemented, too.
Thanks for that. Checking with that macro each loop iteration gives a small
boost:
v1, but using memcpy()
mixed | ascii
-------+-------
601 | 129
with haszero()
mixed | ascii
-------+-------
583 | 105
remove zero-byte check:
mixed | ascii
-------+-------
588 | 93
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Ajin Cherian | 2021-02-10 04:07:33 | Re: Single transaction in the tablesync worker? |
Previous Message | Greg Nancarrow | 2021-02-10 03:44:12 | Re: Parallel INSERT (INTO ... SELECT ...) |