From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Subject: | Re: PATCH: Update snowball stemmers |
Date: | 2018-09-24 21:36:39 |
Message-ID: | 31126.1537824999@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> writes:
> Ah, I see. I attached new version made with --no-renames. Will wait for
> what cfbot will say.
I reviewed and pushed this.
As a cross-check on the patch, I cloned the Snowball github repo
and built the derived files in it. I noticed that they'd incorporated
several new stemmers since 2007 --- not only your Nepali one, but
half a dozen more besides. Since the point here is (IMO) mostly to
follow their lead on what's interesting, I went ahead and added those
as well.
In short, therefore, the commit includes the Nepali stuff from your
other thread as well as what was in this one.
Although I added nepali.stop from the other patch, I've not done
anything about updating our other stopword lists. Presumably those
are a bit obsolete by now as well. I wonder if we can prevail on
the Snowball people to make those available in some less painful way
than scraping them off assorted web pages. Ideally they'd stick them
into their git repo ...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Lukas Fittl | 2018-09-24 21:38:26 | Re: auto_explain: Include JIT output if applicable |
Previous Message | Peter Geoghegan | 2018-09-24 21:11:31 | Re: Collation versioning |