Re: BUG #13440: unaccent does not remove all diacritics

From: Curd Reinert <curdreinert(at)gmx(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Gradek <mike(at)busbud(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, pgsql-bugs-owner(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: BUG #13440: unaccent does not remove all diacritics
Date: 2015-06-17 08:44:38
Message-ID: 55813376.2040204@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> schrieb am 17.06.2015 00:01:48:
> Also, while my German is nearly nonexistent, I had the idea that sharp-S
> to "S" would be considered a case-folding transformation not an accent
> removal. Comments from German speakers welcome of course.
The sharp-s 'ß' is historically a ligature of two different kinds of s,
of which the first one looks more like an f and the second one looks
either like a normal 's' or a 'z' (that's why it is called 'szlig' in
html). It is usually considered to be a lower-case only character, event
though an uppercase sharp-s has recently been defined. If you are using
an encoding that doesn't support 'ß', the rule is to substitute it with
'ss'. If you want to capitalize a word containing a 'ß', you substitute
it with 'SS'. For sorting purposes, DIN 5007 says that 'ß' should be
treated as 'ss'.

That's just the German point of view. Thinks can be a little bit
different in other german speaking countries, e.g. in Switzerland, where
you may always substite 'ß' with 'ss' (even if your encoding has an 'ß').

In short: I would think that replacing 'ß' with 's' is wrong, and
certainly not an accent removal.

Best regards,

Curd

Browse pgsql-bugs by date

  From Date Subject
Next Message galaxyshih 2015-06-17 09:30:29 BUG #13450: problem about applying point-in-time recovery
Previous Message Xavier 12 2015-06-17 07:23:58 Re: pg_xlog on a hot_stanby slave