From: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Improving FTS for Greek |
Date: | 2023-06-06 22:30:55 |
Message-ID: | AA782163-36A3-46C3-8775-84B34C567471@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 7 Jun 2023, at 12:13 AM, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> On 03.06.23 19:47, Florents Tselai wrote:
>> There’s another previous relevant patch [0] but was never merged. I’ve included these stop words and added some more (info in README.md).
>> For my personal projects looks like it yields much better results.
>> I’d like some feedback on the extension ; particularly on the installation infra (I’m not sure I’ve handled properly the permissions in the .sql files)
>> I’ll then try to make a .patch for this.
>
> The open question at the previous attempt was that it wasn't clear what the upstream source or long-term maintenance of the stop words list would be. If it's just a personally composed list, then it's okay if you use it yourself, but for including it into PostgreSQL it ought to come from a reputable non-individual source like snowball.
I’ve used the NLTK list [0] as my base of stopwords; Wouldn’t this be considered reputable enough ?
0 https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip (see greek.stop file in the archive)
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ian Lawrence Barwick | 2023-06-07 00:08:51 | doc patch: note AttributeRelationId passed to FDW validator function |
Previous Message | Thomas Munro | 2023-06-06 22:26:07 | Re: Let's make PostgreSQL multi-threaded |