Quick Links

Re: BUG #13440: unaccent does not remove all diacritics

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	mike(at)busbud(dot)com, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: BUG #13440: unaccent does not remove all diacritics
Date:	2015-06-15 04:47:01
Message-ID:	CAEepm=2b1df83h68tTiuk_xGC-PVmru02+rE2xp6_Hs5q_zHSg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Mon, Jun 15, 2015 at 5:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> mike(at)busbud(dot)com writes:
>> Sorry, I couldn't install the most recent minor release, but I did try this
>> on several different versions. I used Heroku to try a 9.4.3 build, and got
>> the same results
>
>> select 'ț' as input, unaccent('ț') as observed, 't' as expected;
>> input | observed | expected
>> -------+----------+----------
>> ț | ț | t
>> (1 row)
>
> Hm, I do see
>
> ţ t
>
> in unaccent.rules, so the transformation ought to happen. I suspect
> an encoding issue, eg your terminal window is not transmitting characters
> in the encoding Postgres thinks you're using. You did not provide any
> info about server encoding, client encoding, or client LC_xxx environment,
> so it's hard to debug from here.

The one that is in unaccent.rules is apparently t-cedilla, from Gagauz
and Romanian:

https://en.wiktionary.org/wiki/%C5%A3

The one that is referred to above is apparently t-comma, from Livonian
and Romanian, but "[o]ften replaced by Ţ / ţ (t with cedilla),
especially in computing":

https://en.wiktionary.org/wiki/%C8%9B

--
Thomas Munro
http://www.enterprisedb.com

In response to

Re: BUG #13440: unaccent does not remove all diacritics at 2015-06-14 17:59:18 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Alvaro Herrera	2015-06-15 04:50:56	Re: BUG #13440: unaccent does not remove all diacritics
Previous Message	Michael Gradek	2015-06-15 04:02:28	Re: BUG #13440: unaccent does not remove all diacritics