Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From: James Keener <jim(at)jimkeener(dot)com>
To: John McKown <john(dot)archie(dot)mckown(at)gmail(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?
Date: 2017-12-12 15:11:10
Message-ID: CAG8g3txAxd0FMX_GAufmF-jrf+McUWZH7329tppKJfH5uvc01g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> The default C locale on Linux (I don't know Windows) will sort "digits",
> then alphabetic with the lower then upper case of each letter in order
> like: "aAbB...zZ"
>

That's no true at all! The C locales are 0-9A-Za-z

#include <locale.h>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
>
> static int myCompare (const void * a, const void * b)
> {
> return strcmp (*(const char **) a, *(const char **) b);
> }
>
> void with_other_locale (char *new_locale,
> const char **arr,
> const int n
> )
> {
> char *old_locale, *saved_locale;
>
> /* Get the name of the current locale. */
> old_locale = setlocale (LC_ALL, NULL);
>
> /* Copy the name so it won’t be clobbered by setlocale. */
> saved_locale = strdup (old_locale);
>
> if (strlen(new_locale) == 0)
> {
> new_locale = saved_locale;
> }
>
> /* Now change the locale and do some stuff with it. */
> setlocale (LC_ALL, new_locale);
> qsort (arr, n, sizeof (const char *), myCompare);
>
> printf("\nSorted array in locale %s is\n", new_locale);
> for (int i = 0; i < n; i++)
> printf("%d: %s \n", i, arr[i]);
>
> /* Restore the original locale. */
> setlocale (LC_ALL, saved_locale);
> free (saved_locale);
> }
>
> int main ()
> {
> const char *arr[] = {"Jim", "job", "Anne", "aardvark", "Isaac",
> "island", "12 Days of Christmas", "12 drummers"};
> int n = sizeof(arr)/sizeof(arr[0]);
> int i;
>
> printf("Given array is\n");
> for (i = 0; i < n; i++)
> printf("%d: %s \n", i, arr[i]);
>
> with_other_locale("", arr, n);
> with_other_locale("C", arr, n);
> with_other_locale("en_US.UTF-8", arr, n);
> with_other_locale("UTF-8", arr, n);
> return 0;
> }
>

Gives

Given array is
0: Jim
1: job
2: Anne
3: aardvark
4: Isaac
5: island
6: 12 Days of Christmas
7: 12 drummers

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale en_US.UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

I actually don't think there is a case-insensitive locale (bydefault?) on a
unix machine.

Jim

On Tue, Dec 12, 2017 at 8:18 AM, John McKown <john(dot)archie(dot)mckown(at)gmail(dot)com>
wrote:

> On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki <
> tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
>
>> Hi Laurenz, Tom, Peter,
>>
>> Thanks for your suggestions. The practical solution seems to be to
>> override comparison operators of char, varchar and text data types with
>> UDFs that behave as Tom mentioned.
>>
>> From: Peter Geoghegan [mailto:pg(at)bowt(dot)ie]
>> > That said, the idea of an "EBCDIC collation" seems limiting. Why
>> > should a system like DB2 for the mainframe (that happens to use EBCDIC
>> > as its encoding) not have a more natural, human-orientated collation
>> > even while using EBCDIC? ISTM that the point of using the "C" locale
>> > (with EBDIC or with UTF-8 or with any other encoding) is to get a
>> > performance benefit where the actual collation's behavior doesn't
>> > matter much to users. Are you sure it's really important to be
>> > *exactly* compatible with EBCDIC order? As long as you're paying for a
>> > custom collation, why not just use a collation that is helpful to
>> > humans?
>>
>> You are right. I'd like to ask the customer whether and why they need
>> EBCDIC ordering.
>>
>
> ​This is a guess on my part, based on many years on an EBCDIC system. But
> I'll bet that they are doing a conversion off of the EBCDIC system (maybe
> Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
> want to be able to compare the output from the existing system to the
> output on the new system. EBCDIC orders "lower case", "upper case", then
> "digits". The default C locale on Linux (I don't know Windows) will sort
> "digits", then alphabetic with the lower then upper case of each letter in
> order like: "aAbB...zZ". Comparing identical data which is not presented in
> exactly the same order would be very difficult. ​
>
>
>
>>
>> Regards
>> Takayuki Tsunakawa
>>
>>
>
>
> --
> I have a theory that it's impossible to prove anything, but I can't prove
> it.
>
> Maranatha! <><
> John McKown
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John McKown 2017-12-12 15:24:10 Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?
Previous Message John McKown 2017-12-12 13:18:10 Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?