| From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
|---|---|
| To: | Christoph Gößmann <mail(at)goessmann(dot)io> |
| Cc: | pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: How to drop all tokens that a snowball dictionary cannot stem? |
| Date: | 2019-11-23 19:18:32 |
| Message-ID: | CAMkU=1xnBfTm3LFeXT2-EvUuOM=px_h7O9sE1cjYm6CUetoKjw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Sat, Nov 23, 2019 at 10:42 AM Christoph Gößmann <mail(at)goessmann(dot)io>
wrote:
> Hi Jeff,
>
> You're right about that point. Let me redefine. I would like to drop all
> tokens which neither are the stemmed or unstemmed version of a known word.
> Would there be the possibility of putting a wordlist as a filter ahead of
> the stemming? Or do you know about a good English lexeme list that could be
> used to filter after stemming?
>
I think what you describe is the opposite of what snowball was designed to
do. You want an ispell-based dictionary instead.
PostgreSQL doesn't ship with real ispell dictionaries, so you have to
retrieve the files yourself and install them into $SHAREDIR/tsearch_data as
described in the docs for
https://www.postgresql.org/docs/12/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
Cheers,
Jeff
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Blake McBride | 2019-11-23 21:28:37 | Trouble incrementing a column |
| Previous Message | Tom Lane | 2019-11-23 18:20:15 | Re: Remote Connection Help |