Re: Feature: Add Greek language fulltext search

From: Panagiotis Mavrogiorgos <pmav99(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Feature: Add Greek language fulltext search
Date: 2019-07-09 14:18:00
Message-ID: CAAVvtwrnGCoiG5csey14=mrn_jTUEO2R2TzUWR2+TuezA3wR3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 4, 2019 at 1:39 PM Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> > Last November snowball added support for Greek language [1]. Following
> > the instructions [2], I wrote a patch that adds fulltext search for
> > Greek in Postgres. The patch is attached.
>
> I have committed a full sync from the upstream snowball repository,
> which pulled in the new greek stemmer.
>
> Could you please clarify where you got the stopword list from? The
> README says those need to be downloaded separately, but I wasn't able to
> find the download location. It would be good to document this, for
> example in the commit message. I haven't committed the stopword list yet.
>

Thank you Peter,

Here is the repo with the stop-words:
https://github.com/pmav99/greek_stopwords
The list is based on an earlier publication with modification by me. All
the relevant info is on github.

Disclaimer 1: The list has not been validated by an expert.

Disclaimer 2: There are more stop-words lists on the internet, but they are
less complete and they also use ancient greek words. Furthermore, my
testing showed that snowball needs to handle accents (tonous) and ς (teliko
sigma) in a special way if you want the stemmer to work with capitalized
words too.

https://github.com/Xangis/extra-stopwords/blob/master/greek
https://github.com/stopwords-iso/stopwords-el/tree/master/raw

all the best,
Panagiotis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-07-09 14:20:10 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Antonin Houska 2019-07-09 13:47:44 Re: [HACKERS] WIP: Aggregation push-down