Quick Links

Re: [PATCH] Completed unaccent dictionary with many missing characters

From:	Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] Completed unaccent dictionary with many missing characters
Date:	2022-05-05 19:40:09
Message-ID:	425e10c2-95ae-8ff4-4185-ab9ebbfff16f@sztoch.pl
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Peter Eisentraut wrote on 5/4/2022 5:17 PM:
> On 28.04.22 18:50, Przemysław Sztoch wrote:
>> Current unnaccent dictionary does not include many popular numeric
>> symbols,
>> in example: "m²" -> "m2"
> Seems reasonable.
>
> Can you explain what your patch does to achieve this?
I used an existing python implementation of the generator.
It is based on ready-made unicode dictionary:
src/common/unicode/UnicodeData.txt.
The current generator was filtering UnicodeData.txt too much.
I relaxed these conditions, because the previous implementation focused
only on selected character types.

Browsing the unaccent.rules file is the easiest way to see how many and
what missing characters have been completed.

For FTS, the addition of these characters is very much needed.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

In response to

Re: [PATCH] Completed unaccent dictionary with many missing characters at 2022-05-04 15:17:34 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Przemysław Sztoch	2022-05-05 19:44:15	Re: [PATCH] Completed unaccent dictionary with many missing characters
Previous Message	Imseih (AWS), Sami	2022-05-05 19:26:51	Re: Add index scan progress to pg_stat_progress_vacuum