No Greek stop words in FTS ?

From: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: No Greek stop words in FTS ?
Date: 2023-06-03 08:24:29
Message-ID: ABE54B8D-CA1F-45B3-A87B-DE34A4FA1A03@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I maintain a project (diofanti.org <http://diofanti.org/>) that tracks public spending in Greece.
It’s a PG instance hosting 55M+ json documents with searching functionality on top of them.

It relies heavily on to_tsvector(‘greek’, ..), as users search for company names, invoice descriptions etc.

The results are fairly good, but as I was trying to experiment with adding some more domain-specific stopwords, I realised there’s no greek.stop under $(pg_config —sharedir)/tsearch_data
And indeed looks like stop words are maintained with to_tsvector(‘greek’, ..).

select to_tsvector('greek', 'ΚΑΛΗΜΕΡΑ ΚΑΙ ΣΕ ΕΣΑΣ'); --> 'εσ':4 'κα':2 'καλημερ':1 'σε':3
select to_tsvector('english', 'AND GOOD MORNING TO YOU TOO'); --> 'good':2 'morn’:3

I found an older discussion on pgsql-hackers [0] but not sure where this stopped / if started ?

Am I missing something?
Is there another thread/patch I can peek up myself ?

[0] https://www.postgresql.org/message-id/flat/e1c79330-48a5-abef-c309-8d4499e3180b%402ndquadrant.com#7431fdb9ae24b694155aef3f040b7b60

Browse pgsql-general by date

  From Date Subject
Next Message Rajiv Harlalka 2023-06-03 10:12:00 Number of dashes in the expanded view of psql
Previous Message Andrus 2023-06-03 07:16:54 How to remove user specific grant and revoke