From: | Cody Pisto <cpisto(at)rvweb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: lc_collate issue |
Date: | 2007-08-24 20:47:15 |
Message-ID: | 46CF43D3.9080707@rvweb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Tom,
Thanks for answering,
I pretty much assumed that was the case (whatever library postgres is
using for encoding is causing the "issue")
The[my] problem is, it just seems like completely incorrect behavior..
The quickest and dirtiest examples I can do are that both python and
mysql sort these 3 example strings (in utf8 encoding) the way I would
expect (like a C locale)
python:
>>> x = [unicode("Somethang", "utf8"), unicode("-SOMETHING ELSE-",
"utf8"), unicode("Something else", "utf8")]
>>> x
[u'Somethang', u'-SOMETHING ELSE-', u'Something else']
>>> x.sort()
>>> x
[u'-SOMETHING ELSE-', u'Somethang', u'Something else']
mysql:
mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.02 sec)
mysql> SELECT x FROM (SELECT 'Something else' AS x UNION SELECT
'-SOMETHING ELSE-' AS x UNION SELECT 'Somethang' AS x) y ORDER BY LOWER(x);
+------------------+
| x |
+------------------+
| -SOMETHING ELSE- |
| Somethang |
| Something else |
+------------------+
3 rows in set (0.00 sec)
postgres:
SELECT x FROM (SELECT 'Something else' AS x UNION SELECT '-SOMETHING
ELSE-' AS x UNION SELECT 'Somethang' AS x) y ORDER BY LOWER(x);
x
------------------
Somethang
-SOMETHING ELSE-
Something else
(3 rows)
And I bet oracle, firebird, sqlite, mssql, and everything else out there
that does utf8 would return it in the "right" order (I'm willing to test
that too if needed..)
If this is potentially a problem in postgres somewhere, point me in the
general direction and I'm more than willing to fix it myself..
Thanks for your consideration..
-Cody
Tom Lane wrote:
> Cody Pisto <cpisto(at)rvweb(dot)com> writes:
>
>> I'm looking for any kind of a reason (and potential workarounds), be it
>> bug or otherwise, why the following two queries produce different
>> results under a database encoding of UTF8 and lc_collate of en_US.UTF-8:
>>
>
> That's just how it is in most non-C locales --- they use some weird
> algorithm that's alleged to approximate what dictionary makers
> traditionally do with phrases. I don't recall the details but there's
> something about multiple passes with spaces being ignored in earlier
> passes. You'll find that sort(1) sorts these lines the same way.
>
> If you don't like it, use C locale, or put together your own locale
> definition. (No, I'm not sure how hard that is ...)
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>
>
>
--
*Cody Pisto*
Redzia RVs
10555 Montgomery NE
Suite 80
Albuquerque, NM 87111
Phone: (866) 844-1986
Fax: (866) 204-4403
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew | 2007-08-24 20:55:42 | Add Column BEFORE/AFTER another column |
Previous Message | Tom Lane | 2007-08-24 20:16:37 | Re: lc_collate issue |