Re: Camel case identifiers and folding

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
Cc: Rob Sargent <robjsargent(at)gmail(dot)com>, wim(dot)bertels(at)ucll(dot)be, Steve Haresnape <s(dot)haresnape(at)creativeintegrity(dot)co(dot)nz>, "pgsql-general\(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Camel case identifiers and folding
Date: 2019-03-16 21:11:06
Message-ID: 87bm2a6233.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>>>> "Morris" == Morris de Oryx <morrisdeoryx(at)gmail(dot)com> writes:

Morris> UUIDs as a type are an interesting case in Postgres. They're
Morris> stored as a large numeric for efficiency (good!), but are
Morris> presented by default in the 36-byte format with the dashes.
Morris> However, you can also search using the dashes 32-character
Morris> format....and it all works. Case-insensitively.

That works because UUIDs have a convenient canonical form (the raw
bytes) which all input is converted to before comparison.

Text is ... not like this.

Even citext is really only a hack - it assumes that comparisons can be
done by conversion to lowercase, which may work well enough for English
but I'm pretty sure it does not correctly handle the edge cases in, for
example, German (consider 'SS', 'ss', 'ß') or Greek (final sigma). Doing
it better would require proper application of case-folding rules, and
even that would require handling of edge cases (the Unicode case folding
algorithm is designed to be language-independent, which means that it
breaks for Turkish without special-case exceptions).

--
Andrew (irc:RhodiumToad)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2019-03-16 21:46:41 Re: Camel case identifiers and folding
Previous Message Rob Sargent 2019-03-16 20:00:34 Re: Camel case identifiers and folding