Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()

From: Marko Karppinen <marko(at)karppinen(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rough draft for Unicode-aware UPPER()/LOWER()/INITCAP()
Date: 2004-05-16 18:56:16
Message-ID: B5E5BD42-A76A-11D8-9207-000A95C56374@karppinen.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> This code will only work if the database is running under an LC_CTYPE
> setting that implies the same encoding specified by server_encoding.
> However, I don't see that as a fatal objection, because in point of
> fact
> the existing upper/lower code assumes the same thing.

I think this interaction between the locale and server_encoding is
confusing. Is there any use case for running an incompatible mix?
If not, would it not make sense to fetch initdb's default database
encoding with nl_langinfo(CODESET) instead of using SQL_ASCII?

initdb could even emit a warning if the --encoding option was
used without also specifying --no-locale.

Using nl_langinfo(CODESET) was discussed and quietly dismissed a
year ago (although the topic was the client encoding back then).
But I think that the idea is worth revisiting because it would
allow UPPER() and LOWER() to work correctly with international
alphabets -- out of the box and without configuration -- on a
wide variety of modern systems.

mk

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-05-16 19:45:06 Re: Call for 7.5 feature completion
Previous Message Jan Wieck 2004-05-16 18:46:38 Re: Call for 7.5 feature completion