Re: BUG #14885: mistake in sorting win1251 chars

From: Kalin Daskalov <k(dot)daskalov(dot)911(at)gmail(dot)com>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: BUG #14885: mistake in sorting win1251 chars
Date: 2017-11-03 11:18:53
Message-ID: CAPxEw0rqMwaDTkE5iQhUaqTCSF_jugVZmC2HpU1dyzXMiFZUog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

It's the proper locale. It's set to Bulgarian everywhere.

- Windows settings:
format: Bulgarian (Bulgaria)
Location: Bulgaria
Current language for non-Unicode programs: Bulgarian (Bulgaria)

- PostgreSQL Database settings:
ENCODING = 'UTF8'
LC_COLLATE = 'Bulgarian_Bulgaria.1251'
LC_CTYPE = 'Bulgarian_Bulgaria.1251'

Probably as you say if I consider it is right I can try to document it
further and try to get the collation changed.
Just I'm not sure and have to check on whether it depends on Russian
language and whether they consider that 'И' and 'Й' are different letters -
probably it's like in Bulgarian.

For now I noticed that following settings works good, but I'm still not
sure whether there are no other side effects:
ENCODING = 'UTF8'
LC_COLLATE = 'C'
LC_CTYPE = 'Bulgarian_Bulgaria.1251'

Thank a lot
Kalin Daskalov

On Fri, Nov 3, 2017 at 9:22 AM, Francisco Olarte <folarte(at)peoplecall(dot)com>
wrote:

> Kalin:
>
> On Thu, Nov 2, 2017 at 6:27 PM, Kalin Daskalov <k(dot)daskalov(dot)911(at)gmail(dot)com>
> wrote:
> > I understand you well and this exactly is the situation.
> ...
> > I have to admit that this is not PostgreSQL problem.
>
> Ok then.
>
> > In fact my previous compares are based on ASCII comparison - based on the
> > order of the chars.
>
> I doubt it was ASCII. ASCII is a 7 byte code. You were probably using
> an 8 bit code partially based on ascii ( Like the ISO-8859-1 typically
> used in spain, or its superset win-1252 ). What you were doing was
> probably a lexicographic compare using the unsigned 8 bit value. This
> is good enough to keep a table for a bsearch or build a btree, but is
> not what modern collations do ( among other things they collate uper
> and lower case together, like paper dictionaries normally do )
>
> > Now I test with ANSI comparison realized with MS Windows system functions
> > and the result the same as in PostgreSQL.
>
> Also remember what you refer as Windows is probably Win NT, which has
> been internally unicode since the beginning. Besides, it's been 15
> years since I used it but even then the windows API had lots of ways
> to do things.
>
>
> > But this is not appropriate. In fact if Cyrillic alphabet these are
> > different letters and in Bulgarian language no one does expect this
> > behavior. It's almost like to decide that Latin letters "i" and "y"
> should
> > have such behavior.
>
> I'm not a Bulgarian speaker, but you should raise it to then. And the
> i/y letter behaviour depends on the language, "i" is a vowel, but in
> spanish it can be or not, depending on the word. It sorts between x
> and z, but that has always been that way. Not knowing Bulgarian I do
> not know if the two letters you used are different, like n and ñ in
> spanish, or not, like a and à. If you consider it is right you could
> try to document it further and try to get the collation changed, but I
> would consult some references first.
>
> Also, which is your locale? Remember collation order depends on it.
>
> Francisco Olarte
>

--
Kalin Daskalov,
k(dot)daskalov(dot)911(at)gmail(dot)com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2017-11-03 14:12:12 Re: BUG #14883: Syntax SQL error (42601), but should be a different error no
Previous Message Stefan Hanenberg 2017-11-03 11:09:47 Re: BUG #14883: Syntax SQL error (42601), but should be a different error no