From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: snowball ASCII stemmer configuration |
Date: | 2020-06-16 13:53:46 |
Message-ID: | 1300297.1592315626@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> There are two cases where these two columns are not the same:
> hindi english \
> russian english \
> The second one is old; the first one I added using the second one as
> example. But I wonder what the rationale for this is. Maybe for hindi
> one could make some kind of cultural argument, but for russian this
> seems entirely arbitrary.
Perhaps it is, but we have actual Russians who think it's a good idea.
I recall questioning that point some years ago, and Oleg replied that
they'd done that intentionally because (a) technical Russian uses a lot
of English words, and (b) it's easy to tell which is which thanks to
the disjoint letter sets.
Whether the same is true for Hindi, I have no idea.
> Moreover, AFAIK, the following other languages do not use Latin-based
> alphabets:
> arabic arabic \
> greek greek \
> nepali nepali \
> tamil tamil \
Hmm. I think all of those entries are ones that got added by me while
absorbing post-2007 Snowball updates, and I confess that I did not think
about this point. Maybe these should be changed.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | amul sul | 2020-06-16 13:55:40 | [Patch] ALTER SYSTEM READ ONLY |
Previous Message | Masahiko Sawada | 2020-06-16 13:43:58 | Re: Transactions involving multiple postgres foreign servers, take 2 |