PostgreSQL 7.1 and bugs with locale support

From: pgsql-bugs(at)postgresql(dot)org
To: pgsql-bugs(at)postgresql(dot)org
Subject: PostgreSQL 7.1 and bugs with locale support
Date: 2001-04-04 23:23:07
Message-ID: 200104042323.f34NN7i14830@hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Rob Gaszewski (graszew(at)poland(dot)com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
PostgreSQL 7.1 and bugs with locale support

Long Description
I've discovered bugs in locale support in PostgreSQL (encoding set to UNICODE, locale set to pl_PL).

I've compiled PostgreSQL 7.RC2 with --enable-multibyte=UNICODE
--enable-unicode-conversion --enable-locale

locale settings:
LANG=pl_PL LC_ALL=pl_PL LC_CTYPE=pl_PL LC_COLLATE=pl_PL LC_MONETARY=pl_PL

I have Debian GNU/Linux 2.2 "Potato" - Intel Celeron - kernel 2.2.19
PostgreSQL compiled with gcc 2.95.2 - glibc 2.1

When I try SELECT UPPER('some_text_with_polish_national_chars'); or
SELECT LOWER('some_text_with_polish_national_chars'); I get wrong results.
But when I try upper() and lower() functions with other chars (a...z A...Z)
everything works OK.
Detailed results below.

Tests doing with polish national chars
|----------------------------------------------------
| char | Hex || UPPER(char) |
| | ||-----------------------------------|
No | | || result | should be | conclusion |
----|------|--------||----------------------|------------|
1| | 0xc485 || 0xc485 | 0xc484 | WRONG |
2| | 0xc487 || 0xc487 | 0xc486 | WRONG |
3| | 0xc499 || 0xc499 | 0xc498 | WRONG |
4| | 0xc582 || 0xc582 | 0xc581 | WRONG |
5| | 0xc584 || 0xc584 | 0xc583 | WRONG |
6| | 0xc3b3 || 0xc3a3 | 0xc393 | WRONG |
7| | 0xc59b || 0xc59b | 0xc59a | WRONG |
8| | 0xc5ba || 0xc5aa | 0xc5b9 | WRONG |
9| | 0xc5bc || 0xc5ac | 0xc5bb | WRONG |
| | || | | |
10| | 0xc484 || 0xc484 | 0xc484 | OK |
11| | 0xc486 || 0xc486 | 0xc486 | OK |
12| | 0xc498 || 0xc498 | 0xc498 | OK |
13| | 0xc581 || 0xc581 | 0xc581 | OK |
14| | 0xc583 || 0xc583 | 0xc583 | OK |
15| | 0xc393 || 0xc393 | 0xc393 | OK |
16| | 0xc59a || 0xc59a | 0xc59a | OK |
17| | 0xc5b9 || 0xc5b9 | 0xc5b9 | OK |
18| | 0xc5bb || 0xc5bb | 0xc5bb | OK |
---------------------------------------------------------

|----------------------------------------------------
| char | Hex || LOWER(char) |
| | ||-----------------------------------|
No | | || result | should be | conclusion |
----|------|--------||----------------------|------------|
1| | 0xc485 || 0xe485 | 0xc485 | WRONG |
2| | 0xc487 || 0xe487 | 0xc487 | WRONG |
3| | 0xc499 || 0xe499 | 0xc499 | WRONG |
4| | 0xc582 || 0xe582 | 0xc582 | WRONG |
5| | 0xc584 || 0xe584 | 0xc584 | WRONG |
6| | 0xc3b3 || 0xe3b3 | 0xc3b3 | WRONG |
7| | 0xc59b || 0xe59b | 0xc59b | WRONG |
8| | 0xc5ba || 0xe5ba | 0xc5ba | WRONG |
9| | 0xc5bc || 0xe5bc | 0xc5bc | WRONG |
| | || | | |
10| | 0xc484 || 0xe484 | 0xc485 | WRONG |
11| | 0xc486 || 0xe486 | 0xc487 | WRONG |
12| | 0xc498 || 0xe498 | 0xc499 | WRONG |
13| | 0xc581 || 0xe581 | 0xc582 | WRONG |
14| | 0xc583 || 0xe583 | 0xc584 | WRONG |
15| | 0xc393 || 0xe393 | 0xc3b3 | WRONG |
16| | 0xc59a || 0xe59a | 0xc59b | WRONG |
17| | 0xc5b9 || 0xe5b9 | 0xc5ba | WRONG |
18| | 0xc5bb || 0xe5bb | 0xc5bc | WRONG |
---------------------------------------------------------
Letters from 1 to 9 are small, from 10 to 18 are capital.
For example: letter 12 is capital version of letter 3

Also I've discovered that rows are sorted (ORDER BY) impropertly.

And "automatic encoding translation between backend and frontend" works
improperly. For example:
setting client encoding \encoding LATIN2 and doing a test :
SELECT upper('acelnoszx'); (these are Polish national chars, not the ASCII ones),
I keep getting the message:

utf_to_latin: could not convert UTF-8 (0xc3a3) ignored
(repeated 3x for different chars).

The letters are not converted to uppercase, either.

When I do all tests with PostgreSQL compiled only with --enable-locale, everything works good.

Unfortunately, unicode support is a must because of the i18n issues with Tcl 8.x.

Greetings,
Robert

------------------
Robert Gaszewski
graszew(at)poland(dot)com

Sample Code

No file was uploaded with this report

Browse pgsql-bugs by date

  From Date Subject
Next Message Miguel A. Juan 2001-04-05 07:40:56 Unexpected query results
Previous Message Karel Zak 2001-04-04 07:25:21 Re: to_char miscalculation on April Fool's Day, the start of daylight savings