From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: The "char" type versus non-ASCII characters |
Date: | 2021-12-03 19:42:11 |
Message-ID: | 2320640.1638560531@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/3/21 14:12, Tom Lane wrote:
>> I can think of at least three ways we might address this:
>>
>> * Forbid all non-ASCII values for type "char". This results in
>> simple and portable semantics, but it might break usages that
>> work okay today.
>>
>> * Allow such values only in single-byte server encodings. This
>> is a bit messy, but it wouldn't break any cases that are not
>> problematic already.
>>
>> * Continue to allow non-ASCII values, but change charin/charout,
>> char_text, etc so that the external representation is encoding-safe
>> (perhaps make it an octal or decimal number).
> I don't like #2.
Yeah, it's definitely messy --- for example, maybe é works in
a latin1 database but is rejected when you try to restore into
a DB with utf8 encoding.
> Is #3 going to change the external representation only
> for non-ASCII values? If so, that seems OK.
Right, I envisioned that ASCII behaves the same but we'd use
a numeric representation for high-bit-set values. These
cases could be told apart fairly easily by charin(), since
the numeric representation would always be three digits.
> #1 is the simplest to implement and to understand,
> and I suspect it would break very little in practice, but others might
> disagree with that assessment.
We'd still have to decide what to do with pg_upgrade'd
non-ASCII values, so there's messiness there too.
Having charout() throw an error seems not very nice.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2021-12-03 20:02:24 | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Previous Message | Andrew Dunstan | 2021-12-03 19:35:03 | Re: The "char" type versus non-ASCII characters |