Multilang text search. Is this correct?

From: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Multilang text search. Is this correct?
Date: 2023-03-30 08:15:20
Message-ID: 92C5D7E0-A9C6-441B-80D8-C86F80AB7F30@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In my table (mix of text and jsonb columns ) I have text in multiple languages.
I’d like search in all the possible regconfigs, so I’ve come up with the following recipe:

CREATE FUNCTION to_tsvector_multilang(text) RETURNS tsvector AS
$$
SELECT to_tsvector('english', $1) ||
to_tsvector('greek', $1) ||
to_tsvector('simple', $1)
$$ LANGUAGE sql IMMUTABLE;

CREATE FUNCTION jsonb_to_tsvector_multilang(jsonb, jsonb) RETURNS tsvector AS
$$
SELECT jsonb_to_tsvector('english', $1, $2) ||
jsonb_to_tsvector('simple', $1, $2) ||
jsonb_to_tsvector('greek', $1, $2)
$$ LANGUAGE sql IMMUTABLE;

CREATE FUNCTION to_tsquery_multilang(query text) RETURNS tsquery AS
$$
SELECT websearch_to_tsquery('english', query) ||
websearch_to_tsquery('simple', query) ||
websearch_to_tsquery('greek', query)
$$ LANGUAGE sql IMMUTABLE;
Thus, in searching I use to_tsvector_multilang(’TEXT') @@ to_tsquery_multilang(‘QUERY’);
Looks like it’s working as expected from my prelim tests, but anyone seeing any potential pitfalls?

Note: I’m using GIN & RUM indices as well.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2023-03-30 09:42:12 Re: Using CTID system column as a "temporary" primary key
Previous Message Dominique Devienne 2023-03-30 08:01:29 Re: Using CTID system column as a "temporary" primary key