From: | Marko Kreen <markokr(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: UTF16 surrogate pairs in UTF8 encoding |
Date: | 2010-09-08 10:45:37 |
Message-ID: | AANLkTimdQmCGKpt6X6xcodTCfiF2DTUEuaN8GuGQBmOW@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 9/8/10, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On ons, 2010-09-08 at 10:18 +0300, Marko Kreen wrote:
> > On 9/7/10, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > > On sön, 2010-08-22 at 15:15 -0400, Tom Lane wrote:
> > > > > We combine the surrogate pair components to a single code point and
> > > > > encode that in UTF-8. We don't encode the components separately;
> > > > that
> > > > > would be wrong.
> > > >
> > > > Oh, OK. Should the docs make that a bit clearer?
> > >
> > >
> > > Done.
> >
> > This is confusing:
> >
> > (When surrogate
> > pairs are used when the server encoding is <literal>UTF8</>, they
> > are first combined into a single code point that is then encoded
> > in UTF-8.)
> >
> > So something else happens if encoding is not UTF8?
>
>
> Then you can't specify surrogate pairs because they are outside of the
> ASCII range, per constraint mentioned earlier in the paragraph.
>
>
> > I think this part can be simply removed, it does not add anything.
> >
> > Or say that surrogate pairs are only allowed in UTF8 encoding.
> > Reason is that you cannot encode 0..7F codepoints with them,
> > and only those are allowed to be given numerically. But this is
> > already mentioned before.
>
>
> Well, Tom wanted an additional explanation. I personally agree with
> you; this is not the place to explain encoding and Unicode internals,
> when really the code only does what it's supposed to.
Ah OK, I had the impression you changed wording before that too,
so then this addition seemed unnecessary. But seems you only changed
formatting.
Anyway, this "when" makes it weird. Maybe more concise version:
To repeat, surrogate pairs are combined to single character and then
encoded, not stored separately.
Although it does seem unnecessary.
--
marko
From | Date | Subject | |
---|---|---|---|
Next Message | Boszormenyi Zoltan | 2010-09-08 10:52:09 | Re: Synchronization levels in SR |
Previous Message | Fujii Masao | 2010-09-08 10:38:01 | Re: Synchronization levels in SR |