From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Florian Weimer <fweimer(at)bfk(dot)de> |
Cc: | Alexander Shulgin <ash(at)commandprompt(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Making TEXT NUL-transparent |
Date: | 2011-11-24 13:48:09 |
Message-ID: | 21D9E9C6-552A-4CE1-BF9A-178D4C2DC272@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Nov24, 2011, at 10:54 , Florian Weimer wrote:
>> Or is it not only about being able to *store* NULs in a text field?
>
> No, the entire core should be NUL-transparent.
That's unlikely to happen. A more realistic approach would be to solve
this only for UTF-8 encoded strings by encoding the NUL character not as
a single 0 byte, but as sequence of non-0 bytes.
Such a thing is possible in UTF-8 because there are multiple ways to
encode the same character once you drop the requirement that characters
be encoded in the *shortest* possible way.
Since we very probably won't loosen up UTF-8's integrity checks to allow
that, it'd have to be done as a new encoding, say 'utf8-loose'.
That new encoding could, for example, use 0xC0 0x80 to represent NUL
characters. This byte sequence is invalid in standard-conforming UTF-8
because it's a non-normalized (i.e. overly long) representation a code
point (the code point NUL, incidentally). A bit of googling suggests that
quite a few piece of software use this kind of modified UTF-8 encoding.
Java, for example, seems to use it to serialize Strings (which may contain
NUL characters) to UTF-8.
Should you try to add a new encoding which supports that, you might also
want to allow CESU-8-style encoding of UTF-16 surrogate pairs. This means
that code points representable by UTF-16 surrogate pairs may be encoded by
separately encoding the two surrogate characters in UTF-8.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-11-24 13:48:39 | Re: Time bug with small years |
Previous Message | Alexander Shulgin | 2011-11-24 13:43:11 | Re: Notes on implementing URI syntax for libpq |