Re: PATCH: Update snowball stemmers

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: PATCH: Update snowball stemmers
Date: 2018-09-24 21:36:39
Message-ID: 31126.1537824999@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> writes:
> Ah, I see. I attached new version made with --no-renames. Will wait for
> what cfbot will say.

I reviewed and pushed this.

As a cross-check on the patch, I cloned the Snowball github repo
and built the derived files in it. I noticed that they'd incorporated
several new stemmers since 2007 --- not only your Nepali one, but
half a dozen more besides. Since the point here is (IMO) mostly to
follow their lead on what's interesting, I went ahead and added those
as well.

In short, therefore, the commit includes the Nepali stuff from your
other thread as well as what was in this one.

Although I added nepali.stop from the other patch, I've not done
anything about updating our other stopword lists. Presumably those
are a bit obsolete by now as well. I wonder if we can prevail on
the Snowball people to make those available in some less painful way
than scraping them off assorted web pages. Ideally they'd stick them
into their git repo ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lukas Fittl 2018-09-24 21:38:26 Re: auto_explain: Include JIT output if applicable
Previous Message Peter Geoghegan 2018-09-24 21:11:31 Re: Collation versioning