From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | Durumdara <durumdara(at)gmail(dot)com> |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: String comparison problem in select - too many results |
Date: | 2018-01-12 10:08:39 |
Message-ID: | CA+bJJbx-GWK5GPF3B8G+LQy=rf_MWpdLTV35BP8of7j54p6pmw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Durumdara:
On Thu, Jan 11, 2018 at 1:23 PM, Durumdara <durumdara(at)gmail(dot)com> wrote:
> Dear Francesco!
FrancIsco, with an I, Spanish, not Italian.
> My "bug" is that I commonly used Windows environment where the default
> collation is ANSI, based on Windows language which is Hungarian here
> (Windows1250).
I'm not sure ANSI is a collation in windows. You should read a bit
about locales, as they are not the same as collations. In fact I
suspect the order issues you've got are not UTF8 derived, as all your
data is ASCII(7bits), which represents the same in ANSI and UTF8 or
ISO-8859-whatever ( IIRC win1250 is a bastard superset of ISO8859-1 )
> But because of special characters we used UTF8 to store data in database.
> I supposed that UTF8.hu_HU is working like local natural order here, and the
> common ASCII chars are (like '/') in same position.
You must define "local natural order". But your problem is in the
locale, not in the encoding. If you managed to use win1250.hu_HU your
sorting will be the same, what you need is to use UTF8.C
> Python/Delphi/LibreOffice can sort these numbers correctly (based on local
> ANSI sort).
ANSI does not define sort order. And those three use, IIRC, C-locale
like sort. You are incorrectly assuming this is the correct one.
> I supposed that UTF8.hu_HU is using mostly same order which is valid here
> and it contains all ASCII + ANSI characters we are using here in daily work,
> and they are in very similar order.
> I never thought that it can't handle normal characters in 7 bit range...
It can. Your assumptions are BAD. You are not used to working with
collation-aware systems like postgres, and you assume they must use
the same as non-locale-awaer programs by default. This is normally
never true.
Also, windows is notoriously dificult to configure for locales. IIRC (
haven't used it in 15 years ) you had to go to keyboard preferences to
change it.
> For these numbers I can use C collation, it's ok.
C collation is like sorting raw bytes, it doesn't event sort
upper/lower case correctly ( Do not know how you do it in HU, but in
ES we sort aA before bB, while C locale normally sorts AB..ab.. It's
what non-locale aware programs use because it's dirt cheap to
implement, just sort the data as unsigned byte arrays
lexicographically. And in fact, IIRC, utf-8 is dessigned in a way that
this works too ( You can sort by raw bytes and you get the data sorted
lexicographically sorted by code points ).
Another thing, if you have a column which you want sorted in C locale,
it can be defined as such so it does it by default ( but if you do it
with an alter column or similar you'll need a reindex probably ).
As a final note, please, DO NOT TOP POST and TRIM YOUR QUOTES.
Specially I do not need my signature quoted.
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | Viktor Fougstedt | 2018-01-12 10:27:49 | Incredibly slow queries on information_schema.constraint_column_usage? |
Previous Message | Raghavendra Rao J S V | 2018-01-12 07:56:23 | pg_basebackup is taking more time than expected |