Quick Links

Re: [POC] verifying UTF-8 using SIMD instructions

From:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [POC] verifying UTF-8 using SIMD instructions
Date:	2021-02-10 04:00:53
Message-ID:	CAFBsxsHqsgKc60+2u5FpRQMCcmkzemtMK0avm7fmuDzL-R0KPw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Feb 9, 2021 at 4:22 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 09/02/2021 22:08, John Naylor wrote:
> > Maybe there's a smarter way to check for zeros in C. Or maybe be more
> > careful about cache -- running memchr() on the whole input first might
> > not be the best thing to do.
>
> The usual trick is the haszero() macro here:
> https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's
> how memchr() is typically implemented, too.

Thanks for that. Checking with that macro each loop iteration gives a small
boost:

v1, but using memcpy()

mixed | ascii
-------+-------
601 | 129

with haszero()

mixed | ascii
-------+-------
583 | 105

remove zero-byte check:

mixed | ascii
-------+-------
588 | 93

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Re: [POC] verifying UTF-8 using SIMD instructions at 2021-02-09 20:22:02 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ajin Cherian	2021-02-10 04:07:33	Re: Single transaction in the tablesync worker?
Previous Message	Greg Nancarrow	2021-02-10 03:44:12	Re: Parallel INSERT (INTO ... SELECT ...)