From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | mike(at)busbud(dot)com, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #13440: unaccent does not remove all diacritics |
Date: | 2015-06-15 04:47:01 |
Message-ID: | CAEepm=2b1df83h68tTiuk_xGC-PVmru02+rE2xp6_Hs5q_zHSg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Jun 15, 2015 at 5:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> mike(at)busbud(dot)com writes:
>> Sorry, I couldn't install the most recent minor release, but I did try this
>> on several different versions. I used Heroku to try a 9.4.3 build, and got
>> the same results
>
>> select 'ț' as input, unaccent('ț') as observed, 't' as expected;
>> input | observed | expected
>> -------+----------+----------
>> ț | ț | t
>> (1 row)
>
> Hm, I do see
>
> ţ t
>
> in unaccent.rules, so the transformation ought to happen. I suspect
> an encoding issue, eg your terminal window is not transmitting characters
> in the encoding Postgres thinks you're using. You did not provide any
> info about server encoding, client encoding, or client LC_xxx environment,
> so it's hard to debug from here.
The one that is in unaccent.rules is apparently t-cedilla, from Gagauz
and Romanian:
https://en.wiktionary.org/wiki/%C5%A3
The one that is referred to above is apparently t-comma, from Livonian
and Romanian, but "[o]ften replaced by Ţ / ţ (t with cedilla),
especially in computing":
https://en.wiktionary.org/wiki/%C8%9B
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2015-06-15 04:50:56 | Re: BUG #13440: unaccent does not remove all diacritics |
Previous Message | Michael Gradek | 2015-06-15 04:02:28 | Re: BUG #13440: unaccent does not remove all diacritics |