Re: tsvector not giving expected results on one host

From: "Dan Langille" <dan(at)langille(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: tsvector not giving expected results on one host
Date: 2022-12-17 20:33:33
Message-ID: 5cd18dcb-c526-4c1f-ac4f-c56cfbdcd55e@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
>> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>>> "Dan Langille" <dan(at)langille(dot)org> writes:
>>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored
>>>
>>> That is not likely to play well with this:
>>>
>>>> freshports.org=> show default_text_search_config ;
>>>> default_text_search_config
>>>> ----------------------------
>>>> pg_catalog.simple
>>>
>>> because "english" and "simple" will stem words differently.
>>>
>>> regression=# select websearch_to_tsquery('english', 'example');
>>> websearch_to_tsquery
>>> ----------------------
>>> 'exampl'
>>> (1 row)
>>>
>>> regression=# select websearch_to_tsquery('simple', 'example');
>>> websearch_to_tsquery
>>> ----------------------
>>> 'example'
>>> (1 row)
>>>
>>> If what is in your tsvector is 'exampl', then only the first of
>>> these will match. So IMO the question is not "why is it failing
>>> on prod?", it's "how the heck did it work on the other machine?".
>>> You won't get nice results if websearch_to_tsquery is using a
>>> different TS configuration than to_tsvector did.
>>
>> I think this shows why we are getting the results we see. Credit to ch
>> on IRC for asking this question.
>>
>> The problem host:
>>
>> freshports.org=> select websearch_to_tsquery('example');
>> websearch_to_tsquery
>> ----------------------
>> 'example'
>> (1 row)
>
> Ahh, this explains the differences and as to why it works where it shouldn't?
>
> freshports.org=> select setting, source from pg_settings where name =
> 'default_text_search_config';
> setting | source
> -------------------+---------
> pg_catalog.simple | default
> (1 row)
>
>
>>
>> The hosts on which this search works
>>
>> freshports.devgit=# select websearch_to_tsquery('example');
>> websearch_to_tsquery
>> ----------------------
>> 'exampl'
>> (1 row)
>
>
> freshports.devgit=# select setting, source from pg_settings where name
> = 'default_text_search_config';
> setting | source
> --------------------+--------------------
> pg_catalog.english | configuration file
> (1 row)
>
>
> At least now I know what I can play with to get all hosts in sync.

Here we go, on the problem database, create a new field, based on simple, not english.

ALTER TABLE public.ports
ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored;

Index it:

CREATE INDEX ports_pkgmessage_textsearchable3_idx
ON public.ports USING gin
(pkgmessage_textsearchable3)
TABLESPACE pg_default;
CREATE INDEX

query it:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
FROM ports
WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example');
port_id | element_pathname
---------+----------------------------------------------------------------------
34126 | /ports/head/security/pond
74559 | /ports/branches/2015Q3/emulators/linux_base-c6
60310 | /ports/branches/2020Q4/www/gitlab-ce
38345 | /ports/head/www/gitlab
46842 | /ports/branches/2018Q1/mail/postfix-sasl
51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod

Success. Thank you Mr Lane.

--
Dan Langille
dan(at)langille(dot)org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Arnold 2022-12-17 23:55:16 Postgres Date Type Value
Previous Message Dan Langille 2022-12-17 20:22:37 Re: tsvector not giving expected results on one host