From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: speed up verifying UTF-8 |
Date: | 2021-07-15 22:00:05 |
Message-ID: | CAFBsxsEzzTR=Zd=HnT2TZcQ8So1AzWbD1xXUvRsos8w-0C_nPg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> To simplify the constants, I do shift down to uint32, and I didn't bother
working around that. v16alpha regressed on worst-case input, so for v16beta
I went back to earlier coding for the one-byte ascii check. That helped,
but it's still slower than v14.
It occurred to me that I could rewrite the switch test into simple
comparisons, like I already had for the 2- and 4-byte lead cases. While at
it, I folded the leading byte and continuation tests into a single
operation, like this:
/* 3-byte lead with two continuation bytes */
else if ((chunk & 0xF0C0C00000000000) == 0xE080800000000000)
...and also tried using 64-bit constants to avoid shifting. Still didn't
quite beat v14, but got pretty close:
> The numbers on Power8 / gcc 4.8 (little endian):
>
> HEAD:
>
> chinese | mixed | ascii | mixed16 | mixed8
> ---------+-------+-------+---------+--------
> 2951 | 1521 | 871 | 1474 | 1508
>
> v14:
>
> chinese | mixed | ascii | mixed16 | mixed8
> ---------+-------+-------+---------+--------
> 885 | 607 | 179 | 774 | 1325
v16gamma:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
952 | 632 | 180 | 800 | 1333
A big-endian 64-bit platform just might shave enough cycles to beat v14
this way... or not.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v16gamma-Rewrite-pg_utf8_verifystr-for-speed.txt | text/plain | 12.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-07-15 22:32:07 | Re: data corruption hazard in reorderbuffer.c |
Previous Message | Mark Dilger | 2021-07-15 21:17:32 | Re: data corruption hazard in reorderbuffer.c |