Quick Links

Re: speed up verifying UTF-8

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speed up verifying UTF-8
Date:	2021-06-03 19:22:15
Message-ID:	8f9ceae6-9d16-bb66-9292-a82cc8f150e4@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03/06/2021 22:16, Heikki Linnakangas wrote:
> On 03/06/2021 22:10, John Naylor wrote:
>> On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi
>> <mailto:hlinnaka(at)iki(dot)fi>> wrote:
>> > x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
>> > x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
>> >
>> > /* then check that the high bit is set in each byte. */
>> > x = (x1 | x2);
>> > x &= UINT64CONST(0x8080808080808080);
>> > if (x != UINT64CONST(0x8080808080808080))
>> > return 0;
>>
>> That seems right, I'll try that and update the patch. (Forgot to attach
>> earlier anyway)
>
> Ugh, actually that has the same issue as before. If one of the bytes is
> in one half is zero, but not in the other half, this fail to detect it.
> Sorry for the noise..

If you replace (x1 | x2) with (x1 & x2) above, I think it's correct.

- Heikki

In response to

Re: speed up verifying UTF-8 at 2021-06-03 19:16:04 from Heikki Linnakangas

Responses

Re: speed up verifying UTF-8 at 2021-06-06 19:21:51 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2021-06-03 19:24:31	Re: security_definer_search_path GUC
Previous Message	David Christensen	2021-06-03 19:17:53	Re: [PATCH] expand the units that pg_size_pretty supports on output