From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com> |
Cc: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: speed up verifying UTF-8 |
Date: | 2021-07-19 01:26:47 |
Message-ID: | CAFBsxsHu4WrdQHbr73+aez4wjzLHF-LYihutn0-Huex8vNX5zg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> On Fri, Jul 16, 2021 at 1:44 AM Vladimir Sitnikov <
sitnikov(dot)vladimir(at)gmail(dot)com> wrote:
> >
> > Have you considered shift-based DFA for a portable implementation
https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725 ?
>
> I did consider some kind of DFA a while back and it was too slow.
I took a closer look at this "shift-based DFA", and it seemed pretty
straightforward to implement this on top of my DFA attempt from some months
ago. The DFA technique is not a great fit with our API, since we need to
return how many bytes we found valid. On x86 (not our target for the
fallback, but convenient to test) all my attempts were either worse than
HEAD in multiple cases, or showed no improvement for the important ASCII
case. On Power8, it's more compelling, and competitive with v14, so I'll
characterize it on that platform as I describe the patch series:
0001 is a pure DFA, and has decent performance on multibyte, but terrible
on ascii.
0002 dispatches on the leading byte category, unrolls the DFA loop
according to how many valid bytes we need, and only checks the DFA state
afterwards. It's good on multibyte (3-byte, at least) but still terrible on
ascii.
0003 adds a 1-byte ascii fast path -- while robust on all inputs, it still
regresses a bit on ascii.
0004 uses the same 8-byte ascii check as previous patches do.
0005 and 0006 use combinations of 1- and 8-byte ascii checks similar to in
v17.
0005 seems the best on Power8, and is very close to v4. FWIW, v14's
measurements seem lucky and fragile -- if I change any little thing, even
- return -1;
+ return 0;
it easily loses 100-200ms on non-pure-ascii tests. That said, v14 still
seems the logical choice, unless there is some further tweak on top of v17
or v18 that gives some non-x86 platform a significant boost.
Power8, gcc 4.8:
HEAD:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
2944 | 1523 | 871 | 1473 | 1509
v18-0001:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1257 | 1681 | 1385 | 1744 | 2018
v18-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
951 | 1381 | 1217 | 1469 | 1172
v18-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
911 | 1111 | 942 | 1112 | 865
v18-0004:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
987 | 730 | 222 | 1325 | 2306
v18-0005:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
962 | 664 | 180 | 928 | 1179
v18-0006:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
908 | 663 | 244 | 1026 | 1464
and for comparison,
v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
888 | 607 | 179 | 777 | 1328
v17-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1205 | 662 | 180 | 767 | 1256
Macbook, clang 12:
HEAD:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
974 | 691 | 370 | 456 | 526
v18-0001:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1334 | 2713 | 2802 | 2665 | 2541
v18-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
733 | 1212 | 1064 | 1034 | 1007
v18-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
653 | 560 | 370 | 420 | 465
v18-0004:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
574 | 402 | 88 | 584 | 1033
v18-0005:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1345 | 730 | 334 | 578 | 909
v18-0006:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
674 | 485 | 153 | 594 | 989
and for comparison,
v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
674 | 346 | 78 | 309 | 504
v17-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
516 | 324 | 78 | 331 | 544
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v18-0001-Use-pure-DFA.patch | application/octet-stream | 13.5 KB |
v18-0002-Unroll-loop-in-DFA.patch | application/octet-stream | 1.7 KB |
v18-0004-Check-ascii-8-bytes-at-a-time-with-bitwise-opera.patch | application/octet-stream | 3.4 KB |
v18-0003-Add-ascii-fast-path-before-resorting-to-DFA.patch | application/octet-stream | 1004 bytes |
v18-0005-Do-1-byte-ascii-check-if-8-byte-check-fails.patch | application/octet-stream | 785 bytes |
v18-0006-Do-8-byte-check-only-if-1-byte-check-succeeds.patch | application/octet-stream | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo NAGATA | 2021-07-19 01:51:36 | Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error |
Previous Message | Kyotaro Horiguchi | 2021-07-19 01:16:18 | Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups? |