From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | cees(dot)van(dot)zeeland(at)freedom(dot)nl, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18362: unaccent rules and Old Greek text |
Date: | 2024-02-25 23:15:57 |
Message-ID: | CA+hUKG+nL9VYx5S_mPnraXKLKcWP_WFkTrwKb1osq0q=am6fEw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sun, Feb 25, 2024 at 4:21 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Sun, Feb 25, 2024 at 11:14 AM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> > So, there are reasons to keep the current unaccent.rules as it is, but...
> > there are other reasons to add a few lines to it, f.e. after line 955 and
> > insert five greek vowels with Oxia
> > Please add:
> > ά α
Oh, I think I see it. "ά" is:
1F71;GREEK SMALL LETTER ALPHA WITH OXIA;Ll;0;L;03AC;;;;N;;;1FBB;;1FBB
The Python script is looking for combining sequences that add accents,
but this one has just "03AC" in the combining sequence field, so it's
a kind of "simple" redirection that points here:
03AC;GREEK SMALL LETTER ALPHA WITH TONOS;Ll;0;L;03B1 0301;;;;N;GREEK
SMALL LETTER ALPHA TONOS;;0386;;0386
That has a normal looking sequence that we can understand (α + an
accent). If I tell the script to follow such "simple" redirections, I
get over a thousand new mappings, including those. See attached.
There is probably more correct terminology that I'm using here...
Attachment | Content-Type | Size |
---|---|---|
0001-Add-simple-codepoint-redirections-to-unaccent.rules.patch | text/x-patch | 12.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-02-25 23:19:53 | Re: BUG #18362: unaccent rules and Old Greek text |
Previous Message | Tom Lane | 2024-02-25 23:04:58 | Re: BUG #18363: Assert !ReindexIsProcessingIndex falsified with expression index over select from table |