Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> This isn't about the number of bytes, but about whether or not we should
> count characters encoded as two or more combined code points as a single
> char or not.
It's really about whether we should support non-canonical encodings.
AFAIK that's a hack to cope with implementations that are restricted
to UTF-16, and we should Just Say No. Clients that are sending these
things converted to UTF-8 are in violation of the standard.
regards, tom lane