From: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Multilang text search. Is this correct? |
Date: | 2023-03-30 08:15:20 |
Message-ID: | 92C5D7E0-A9C6-441B-80D8-C86F80AB7F30@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
In my table (mix of text and jsonb columns ) I have text in multiple languages.
I’d like search in all the possible regconfigs, so I’ve come up with the following recipe:
CREATE FUNCTION to_tsvector_multilang(text) RETURNS tsvector AS
$$
SELECT to_tsvector('english', $1) ||
to_tsvector('greek', $1) ||
to_tsvector('simple', $1)
$$ LANGUAGE sql IMMUTABLE;
CREATE FUNCTION jsonb_to_tsvector_multilang(jsonb, jsonb) RETURNS tsvector AS
$$
SELECT jsonb_to_tsvector('english', $1, $2) ||
jsonb_to_tsvector('simple', $1, $2) ||
jsonb_to_tsvector('greek', $1, $2)
$$ LANGUAGE sql IMMUTABLE;
CREATE FUNCTION to_tsquery_multilang(query text) RETURNS tsquery AS
$$
SELECT websearch_to_tsquery('english', query) ||
websearch_to_tsquery('simple', query) ||
websearch_to_tsquery('greek', query)
$$ LANGUAGE sql IMMUTABLE;
Thus, in searching I use to_tsvector_multilang(’TEXT') @@ to_tsquery_multilang(‘QUERY’);
Looks like it’s working as expected from my prelim tests, but anyone seeing any potential pitfalls?
Note: I’m using GIN & RUM indices as well.
From | Date | Subject | |
---|---|---|---|
Next Message | Francisco Olarte | 2023-03-30 09:42:12 | Re: Using CTID system column as a "temporary" primary key |
Previous Message | Dominique Devienne | 2023-03-30 08:01:29 | Re: Using CTID system column as a "temporary" primary key |