pgsql: Simplify a bit the special rules generating unaccent.rules

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Simplify a bit the special rules generating unaccent.rules
Date: 2022-07-05 07:19:45
Message-ID: E1o8cqW-001HJ4-4B@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Simplify a bit the special rules generating unaccent.rules

As noted by Thomas Munro, CLDR 36 has added SOUND RECORDING COPYRIGHT
(U+2117), and we use CLDR 41, so this can be removed from the set of
special cases.

The set of regression tests is expanded for degree signs, which are two
of the special cases, and a fancy case with U+210C in Latin-ASCII.xml
that we have discovered about when diving into what could be done for
Cyrillic characters (this last part is material for a future patch, not
tackled yet).

While on it, some of the assertions of generate_unaccent_rules.py are
expanded to report the codepoint on which a failure is found, something
useful for debugging.

Extracted from a larger patch by the same author.

Author: Przemysław Sztoch
Discussion: https://postgr.es/m/8478da0d-3b61-d24f-80b4-ce2f5e971c60@sztoch.pl

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/e3dd7c06e62774628e102c3cd47ee46e85519de7

Modified Files
--------------
contrib/unaccent/expected/unaccent.out | 44 +++++++++++++++++++++++++++++
contrib/unaccent/generate_unaccent_rules.py | 5 ++--
contrib/unaccent/sql/unaccent.sql | 10 +++++++
3 files changed, 56 insertions(+), 3 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2022-07-05 08:35:58 pgsql: Fix pg_prepared_statements.result_types for DML statements
Previous Message Peter Eisentraut 2022-07-05 05:30:29 pgsql: Add result_types column to pg_prepared_statements view