Quick Links

Re: Non-ASCII DSN name troubles

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	"Inoue, Hiroshi" <inoue(at)tpf(dot)co(dot)jp>, "pgsql-odbc(at)postgresql(dot)org" <pgsql-odbc(at)postgresql(dot)org>
Subject:	Re: Non-ASCII DSN name troubles
Date:	2014-06-24 06:57:46
Message-ID:	53A9216A.4000805@vmware.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-odbc

On 06/23/2014 11:58 PM, Inoue, Hiroshi wrote:
> (2014/06/21 20:37), Heikki Linnakangas wrote:
>> If you try to create a data source with a name that contains non-ASCII
>> characters, funny things will happen. I wouldn't expect the ANSI driver
>> to support that, but a Unicode driver ought to handle it.
>
> Currently NON-ascii characters are not recommended because they are
> mainly used at connection time.

Note that the DSN name is never sent to the server. Even if we conclude
that we want to keep the behavior of username, password and database as
is, we should still allow the DSN name to contain any characters.

> Though Unicode version SQLDriverConnect
> uses UTF-8 encoded user, password, database ... because I don't think of
> other ways, it has little meaning IMHO. Was there a decision that
> the encoding of user, password or database is utf-8?

Not sure what you mean. There has been no changes in the server around
this. The server just treats the username, password and database as raw
bytes. Which is unfortunate, but we'll just have to deal with it in the
driver.

The question is, what encoding should we use to send the username,
password and database to the server?

1. Current behavior: The username, password and database are encoded
using the current Windows ANSI codepage. If there are characters that
cannot be encoded using the ANSI codepage, Windows will replace them with ?.

2. Behavior with the patch: The username, password and database are
always encoded using UTF-8, when using the Unicode driver.

Both behaviors have pros and cons. If you assume that the server uses
UTF-8, and the client uses the Unicode driver and is fully
Unicode-enabled, then the patched behavior is clearly better. With the
current behavior, if e.g the username contains any non-ASCII characters,
you cannot connect.

But if you assume that the server is not using UTF-8, but LATIN1 for
example, and the client uses the Unicode driver, then the current
behavior is better. It will allow the client to connect, assuming that
the Windows ANSI codepage is set to LATIN1, while with the patch it will
not work. However, if the server and client both use LATIN1 rather than
Unicode/UTF-8, then you probably should be using the ANSI driver instead.

Overall, I think the patched behavior is better.

If we want to make it really flexible, we could add a new parameter to
explicitly specify the encoding used for username, password and
database. Then you could connect to any database with the Unicode
driver, as long as you set the parameter correctly.

- Heikki

In response to

Re: Non-ASCII DSN name troubles at 2014-06-23 20:58:17 from Inoue, Hiroshi

Browse pgsql-odbc by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2014-06-24 08:49:31	Upgrading by double-clicking the .msi
Previous Message	Inoue, Hiroshi	2014-06-24 03:55:46	Re: Out of memory while reading tuples error