Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> That's good to know. If we can assume that 1) all bytes of a multibyte
> character have the high bit set and 2) all multibyte characters actually
> require multiple bytes, then there are just a handful of cases that require
> multiple lookups, and we can restrict even those to some extent, too.
I'm failing to parse your (2). Either that's content-free or you're
thinking something that probably isn't true. There are encodings
(mostly the LATINn series) that have high-bit-set characters that
only occupy one byte. So I don't think we can take any shortcuts
compared to the strip-one-byte-at-a-time approach.
regards, tom lane