From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | hugh(at)whtc(dot)ca, daniel(at)manitou-mail(dot)org, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Date: | 2018-12-16 02:26:20 |
Message-ID: | CAEepm=3gSmNWkteBxCEL-W+j1dmbcNzDin_iv+f_Om6o+1fAiA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Sun, Dec 16, 2018 at 8:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hugh Ranalli <hugh(at)whtc(dot)ca> writes:
> > The problem is that I downloaded the latest version of the Latin-ASCII
> > transliteration file (r34 rather than the r28 specified in the URL). Over 3
> > years ago (in r29, of course) they changed the file format (
> > https://unicode.org/cldr/trac/ticket/5873) so that
> > parse_cldr_latin_ascii_transliterator loads an empty rules set.
>
> Ah-hah.
>
> > I'd be
> > happy to either a) support both formats, or b), support just the newest and
> > update the URL. Option b) is cleaner, and I can't imagine why anyone would
> > want to use an older rule set (then again, struggling with Unicode always
> > makes my head hurt; I am not an expert on it). Thoughts?
>
> (b) seems sufficient to me, but perhaps someone else has a different
> opinion.
>
> Whichever we do, I think it should be a separate patch from the feature
> addition for combining diacriticals, just to keep the commit history
> clear.
+1 for updating to the latest file from time to time. After
http://unicode.org/cldr/trac/ticket/11383 makes it into a new release,
our special_cases() function will have just the two Cyrillic
characters, which should almost certainly be handled by adding
Cyrillic to the ranges we handle via the usual code path, and DEGREE
CELSIUS and DEGREE FAHRENHEIT. Those degree signs could possibly be
extracted from Unicode.txt (or we could just forget about them), and
then we could drop special_cases().
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2018-12-16 07:27:03 | Re: Errors creating partitioned tables from existing using (LIKE <table>) after renaming table constraints |
Previous Message | Tom Lane | 2018-12-15 21:20:11 | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2018-12-16 03:29:39 | Re: Computing the conflict xid for index page-level-vacuum on primary |
Previous Message | Tom Lane | 2018-12-16 00:22:21 | Re: simple query on why a merge join plan got selected |