Re: Improving FTS for Greek

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving FTS for Greek
Date: 2023-06-13 06:11:42
Message-ID: 913e407e-535c-ca62-a79a-aadb16519005@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07.06.23 00:30, Florents Tselai wrote:
>
>
>> On 7 Jun 2023, at 12:13 AM, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>>
>> On 03.06.23 19:47, Florents Tselai wrote:
>>> There’s another previous relevant patch [0] but was never merged.
>>> I’ve included these stop words and added some more (info in README.md).
>>> For my personal projects looks like it yields much better results.
>>> I’d like some feedback on the extension ; particularly on the
>>> installation infra (I’m not sure I’ve handled properly the
>>> permissions in the .sql files)
>>> I’ll then try to make a .patch for this.
>>
>> The open question at the previous attempt was that it wasn't clear
>> what the upstream source or long-term maintenance of the stop words
>> list would be.  If it's just a personally composed list, then it's
>> okay if you use it yourself, but for including it into PostgreSQL it
>> ought to come from a reputable non-individual source like snowball.
>
> I’ve used the NLTK list [0] as my base of stopwords; Wouldn’t this be
> considered reputable enough ?
>
> 0
> https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip <https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip> (see greek.stop file in the archive)

Who is NLTK, where did they get their stopwords file from, what is their
open source license, how do we know when to pull updates, what is the
mechanical process for pulling in those updates?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message kaido vaikla 2023-06-13 06:16:08 Re: query_id, pg_stat_activity, extended query protocol
Previous Message Michael Paquier 2023-06-13 06:11:26 Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)