From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Tsearch2 and Snowball |
Date: | 2006-10-03 18:53:29 |
Message-ID: | 1159901609.2659.341.camel@holly |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I'm looking at some of the code in contrib/tsearch2/snowball and see
that the code there is *generated* code. The Snowball stemmer produces
this C code in much the same way bison reads gram.y
My understanding is that the Snowball code moves forwards regularly and
there are many other stemmers we could be including with the
distribution.
Snowball has a BSD licence: http://snowball.tartarus.org/license.php
Would it be possible to include the Snowball source directly and allow
its execution to be part of the make process for tsearch2? Or have
configure check for Snowball at make time? At the very least it would be
good to have a Readme file explaining how to modify the Snowball stemmer
and regenerate for tsearch2.
That would then encourage people to improve the stemmers, as well as
allow us to include French and Spanish versions etc..
Perhaps we should ask translators to provide stop word lists for their
languages. It seems a shame to have docs in so many languages, but no
language capability for Tsearch2.
Also, why do we have another crc32 implementation in there?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-10-03 19:03:07 | Re: src/tools/msvc usage instructions |
Previous Message | Magnus Hagander | 2006-10-03 18:42:48 | Re: src/tools/msvc usage instructions |