| From: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com> |
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Improving FTS for Greek |
| Date: | 2023-06-06 22:30:55 |
| Message-ID: | AA782163-36A3-46C3-8775-84B34C567471@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On 7 Jun 2023, at 12:13 AM, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> On 03.06.23 19:47, Florents Tselai wrote:
>> There’s another previous relevant patch [0] but was never merged. I’ve included these stop words and added some more (info in README.md).
>> For my personal projects looks like it yields much better results.
>> I’d like some feedback on the extension ; particularly on the installation infra (I’m not sure I’ve handled properly the permissions in the .sql files)
>> I’ll then try to make a .patch for this.
>
> The open question at the previous attempt was that it wasn't clear what the upstream source or long-term maintenance of the stop words list would be. If it's just a personally composed list, then it's okay if you use it yourself, but for including it into PostgreSQL it ought to come from a reputable non-individual source like snowball.
I’ve used the NLTK list [0] as my base of stopwords; Wouldn’t this be considered reputable enough ?
0 https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip (see greek.stop file in the archive)
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ian Lawrence Barwick | 2023-06-07 00:08:51 | doc patch: note AttributeRelationId passed to FDW validator function |
| Previous Message | Thomas Munro | 2023-06-06 22:26:07 | Re: Let's make PostgreSQL multi-threaded |