Quick Links

Re: speed up verifying UTF-8

From:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speed up verifying UTF-8
Date:	2021-06-03 15:42:51
Message-ID:	CAFBsxsHUcyxpRPYi22d0LOD+YQz4FsJo5POEWh47=zkwbypU=w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:

> On Thu, Jun 3, 2021 at 10:42 AM Greg Stark <stark(at)mit(dot)edu> wrote:
> >

> > If
> > we're processing much more than 128 bits and happy to detect NUL
> > errors only at the end after wasting some work then you could hoist
> > that has_zero check entirely out of the loop (removing the branch
> > though it's probably a correctly predicted branch anyways).
> >
> > Do something like:
> >
> > zero_accumulator = zero_accumulator & next_chunk
> >
> > in the loop and then only at the very end check for zeros in that.
>
> That's the approach taken in the SSE4 patch, and in fact that's the
logical way to do it there. I hadn't considered doing it that way in the
pure C case, but I think it's worth trying.

Actually, I spoke too quickly. We can't have an error accumulator in the C
case because we need to return how many bytes were valid. In fact, in the
SSE case, it checks the error vector at the end and then reruns with the
fallback case to count the valid bytes.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Re: speed up verifying UTF-8 at 2021-06-03 15:33:21 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2021-06-03 15:51:13	Re: security_definer_search_path GUC
Previous Message	Andrey Lepikhov	2021-06-03 15:33:56	Re: Asynchronous Append on postgres_fdw nodes.