From: | Dennis Gearon <gearond(at)cvc(dot)net> |
---|---|
To: | Kathy zhu <Kathy(dot)zhu(at)Sun(dot)COM> |
Cc: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gianni Mariani <gianni(at)mariani(dot)ws>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Sorting Problem |
Date: | 2003-08-13 21:43:08 |
Message-ID: | 3F3AB0EC.6040307@cvc.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
From what I've read, the db will ONLY sort on the language it was 'initdb'd on. I guess the 'C' locale should would sort it sort of, in other languages.
Kathy zhu wrote:
> Ok, thanks for all the discussion followed, vey educational :-))
> But nobody really followed up my question :-(
>
> For example, you have a table that is displayed in the browser. You want
> to let the user do sorting on one or multible columns, including those
> which contain localized strings.
>
> If the db supports sorting for multi-linguages, you can retreive all the
> rows and do "ORDER BY" on the columns directly.
>
> But if db doesn't support that, in Java, you will have to retrieve all
> the rows first, sort the objects on multiple columnes listed in the
> "ORDER BY" in java. It is much slower than the previous one.
>
> My problem is that if initdb is done with en_US, if a customer in Japan
> request the page with that sorted table, the text is not sorted in a
> Japanese one. Then I have a problem.
>
> Any suggestions on that ??
>
>
> Dennis Gearon wrote:
>
>> I agree, mostly. In the case of a database, I would bet that the
>> INTERNAL, IN-APPLICATION processing FAAAAAAAAAAARRRRRRRRRRR exceeds
>> that of sending and receiving it.
>>
>> i.e. comparisons, sorts, triggers, indexes, views, functions,
>> logging to tables, ordering by,
>> grouping, etc.
>>
>> except backups, restores, logging to text files (these would be
>> good in UTF8)
>>
>> Bruce Momjian wrote:
>>
>>> I think the question is how often are you passing data around/storing it
>>> _in_ your application and how often are you processing it.
>>>
>>> ---------------------------------------------------------------------------
>>>
>>>
>>> Dennis Gearon wrote:
>>>
>>>> I agree with all of that except for one caveat:
>>>>
>>>> all my reading, and just general off the cuff thinking, says
>>>> that processing variable width characters SIGNIFICANTLY slows an
>>>> application. It seems better to PROCESS fixed width characters
>>>> (1,2,4 byte), and TRANSMIT variable width characters (avoiding the
>>>> null problem.)
>>>>
>>>> Gianni Mariani wrote:
>>>>
>>>>
>>>>> Dennis Gearon wrote:
>>>>>
>>>>>
>>>>>> Got a link to that section of the standard, or better yet, to a
>>>>>> 'interpreted' version of the standard? :-)
>>>>>>
>>>>>> Stephan Szabo wrote:
>>>>>>
>>>>>>
>>>>>>> On Wed, 13 Aug 2003, Dennis Gearon wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Dennis Bj?rklund wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> In the future we need indexes that depend on the locale (and a
>>>>>>>>> lot of other changes).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree. I've been looking at the web on this subject a lot
>>>>>>>> lately. I
>>>>>>>> am **NOT** a microslop fan, but SQL-SERVER even lets a user
>>>>>>>> define a
>>>>>>>> language(maybe encoding) down to the column level!
>>>>>>>>
>>>>>>>> I've been reading on GNU-C and on languages, encoding, and
>>>>>>>> localization.
>>>>>>>>
>>>>>>>> http://pauillac.inria.fr/~lang/hotlist/free/licence/fsf96/drepper/paper-1.html
>>>>>>>>
>>>>>>>> http://h21007.www2.hp.com/dspp/tech/tech_TechSingleTipDetailPage_IDX/1,2366,1222,00.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> There are three basic approaches to doing different langauges in
>>>>>>>> computerized text:
>>>>>>>>
>>>>>>>> A/ various adaptations of the 8 bit character set, I.E. the
>>>>>>>> ISO-8859-x series.
>>>>>>>> B/ wide characters
>>>>>>>> ********This should be how Postgress stores data
>>>>>>>> internally.********
>>>>>>>> C/ Multibyte characters
>>>>>>>> ********This is how Postgress should default to sending data
>>>>>>>> OUT of the application,
>>>>>>>> i.e. to the display or the web, or other system
>>>>>>>> applications********
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> SQL has a system for defining character set specifications,
>>>>>>> collations and
>>>>>>> such (per column/literal in some cases). We should probably look
>>>>>>> at it
>>>>>>> before making decisions on how to do things.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> I thought UNIX (SCOTM) systems also had a way of being able to
>>>>> define collation order.
>>>>>
>>>>> see:
>>>>> ftp://dkuug.dk/i18n/WG15-collection/locales
>>>>>
>>>>> for a collection of all ISO standardized locales (the WG15 ISO work
>>>>> group's stuff).
>>>>>
>>>>> Do a "man localedef" on most Linuxen or UNIXen.
>>>>>
>>>>> As for wide characters vs multibyte, there is no clear winner. The
>>>>> right answer DEPENDS on the situation.
>>>>>
>>>>> Wide characters on some platforms are 16 bit which means that when
>>>>> you do Unicode you'll still have problems with surrogate pairs
>>>>> (meaning that it's still multi (wide) char) so you still have all
>>>>> the problems of multi-byte encodings.
>>>>>
>>>>> You could decide to process everything in a PG specific 4 byte wide
>>>>> char and do all text in Unicode but the overhead in processing 4
>>>>> times the data is quite significant. The other option is to store
>>>>> all data in utf-8 and have all text code become utf-8 aware.
>>>>>
>>>>> I have found in practice that the utf-8 option is significantly
>>>>> easier to implement, 100% Unicode compliant and the best performer
>>>>> (because of reduced memory requirements).
>>>>> The Posix API's for locales are not very good for modern day
>>>>> programs, I'm not sure where the "mbr*" and the "wcr*" apis are in
>>>>> the standardization process but if these are not well supported,
>>>>> you're on your own and will need to implement similar functionality
>>>>> from scratch and for that matter, the collation functions all
>>>>> operate on a "current" locate which is really difficult to work
>>>>> with on multi-locale applications.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 8: explain analyze is your friend
>>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 7: don't forget to increase your free space map settings
>>>>
>>>
>>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index scan if
>> your
>> joining column's datatypes do not match
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Johnson | 2003-08-13 21:53:55 | Re: How to get the total number of rows with a query |
Previous Message | Jason Godden | 2003-08-13 21:34:55 | Re: importing db as text files |