Re: SQL_ASCII vs. 7-bit ASCII encodings

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SQL_ASCII vs. 7-bit ASCII encodings
Date: 2005-05-13 11:22:52
Message-ID: 42848E0C.5010404@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Oliver Jowett <oliver(at)opencloud(dot)com> writes:
>
>>Peter Eisentraut wrote:
>>
>>>That would cripple a system that many users are perfectly content with now.
>
>
>>Well, I wasn't thinking of using a 7-bit encoding always, just as a
>>replacement for the cases where we currently choose SQL_ASCII. Does that
>>sound reasonable?
>
>
> I agree with what (I think) Peter is saying: that would break things for
> many people for whom the default works fine now.
>
> We are currently seeing a whole lot of complaints due to the fact that
> 8.0 tends to default to Unicode encoding in environments where previous
> versions defaulted to SQL-ASCII. That says to me that a whole lot of
> people were getting along just fine in SQL-ASCII, and therefore that
> moving further away from that behavior is the wrong thing. In
> particular, there is not any single one of those complainants who would
> be happier with a 7-bit-only default; if they were using 7-bit-only
> data, they'd not have noticed a problem anyway.

This is exactly the case where JDBC has problems, and the case I'd like
to prevent happening in the first place where possible: SQL_ASCII with
non-7-bit data. How do you propose that the JDBC driver converts from
SQL_ASCII to UTF-16 (the internal Java String representation)? Changing
client_encoding does not help. Requiring the JDBC client to specify the
right encoding to use is error-prone at best, and impossible at worst
(who says that only one encoding has been used?)

I'm not suggesting that a 7-bit encoding is necessarily useful to
everyone. I'm saying that we should make it a setting that users have to
think about and correctly set before they can insert 8-bit data. If they
decide they want SQL_ASCII and the associated client_encoding problems,
rather than an appropriate encoding the database understands, so be it;
but it's on their head, and requires active intervention before the
database starts losing encoding information.

If SQL_ASCII plus 8-bit data is considered the right thing to do, then
I'd consider the ability to change client_encoding on a SQL_ASCII
database without an error to be a bug -- you've asked the server to give
you (for example) UTF8, but it isn't doing that. In that case, can we
get this to generate an error when client_encoding is set instead of
producing invalid output?

-O

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruno Wolff III 2005-05-13 13:00:27 Re: Views, views, views: Summary of Arguments
Previous Message Andreas Pflug 2005-05-13 09:17:39 Re: Server instrumentation for 8.1