Re: tsvector not giving expected results on one host

From: Oleg Bartunov <obartunov(at)postgrespro(dot)ru>
To: Dan Langille <dan(at)langille(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: tsvector not giving expected results on one host
Date: 2022-12-21 10:12:37
Message-ID: CAF4Au4x0UmiJDXVecFxPhhX2dApXFeQg0y4TCEv08-7hJPJ-cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dan,

it's always good to specify configuration name in a query to avoid
recheck, since
websearch_to_tsquery(regconfig, text) is immutable, while
websearch_to_tsquery(text) is stable.

See the difference:

[local]:5433 oleg(at)oleg=# explain (analyze,costs off) select title
from apod where websearch_to_tsquery('simple','galaxies') @@ fts;
QUERY PLAN
---------------------------------------------------------------------------------------
Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
Recheck Cond: ('''galaxies'''::tsquery @@ fts)
-> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
rows=0 loops=1)
Index Cond: (fts @@ '''galaxies'''::tsquery)
Planning Time: 0.134 ms
Execution Time: 0.022 ms
(6 rows)

Time: 0.369 ms
[local]:5433 oleg(at)oleg=# explain (analyze,costs off) select title
from apod where websearch_to_tsquery('galaxies') @@ fts;
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
Heap Blocks: exact=276
-> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
rows=493 loops=1)
Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
Planning Time: 0.125 ms
Execution Time: 1.518 ms
(7 rows)

On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan(at)langille(dot)org> wrote:
>
> On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> >>> "Dan Langille" <dan(at)langille(dot)org> writes:
> >>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored
> >>>
> >>> That is not likely to play well with this:
> >>>
> >>>> freshports.org=> show default_text_search_config ;
> >>>> default_text_search_config
> >>>> ----------------------------
> >>>> pg_catalog.simple
> >>>
> >>> because "english" and "simple" will stem words differently.
> >>>
> >>> regression=# select websearch_to_tsquery('english', 'example');
> >>> websearch_to_tsquery
> >>> ----------------------
> >>> 'exampl'
> >>> (1 row)
> >>>
> >>> regression=# select websearch_to_tsquery('simple', 'example');
> >>> websearch_to_tsquery
> >>> ----------------------
> >>> 'example'
> >>> (1 row)
> >>>
> >>> If what is in your tsvector is 'exampl', then only the first of
> >>> these will match. So IMO the question is not "why is it failing
> >>> on prod?", it's "how the heck did it work on the other machine?".
> >>> You won't get nice results if websearch_to_tsquery is using a
> >>> different TS configuration than to_tsvector did.
> >>
> >> I think this shows why we are getting the results we see. Credit to ch
> >> on IRC for asking this question.
> >>
> >> The problem host:
> >>
> >> freshports.org=> select websearch_to_tsquery('example');
> >> websearch_to_tsquery
> >> ----------------------
> >> 'example'
> >> (1 row)
> >
> > Ahh, this explains the differences and as to why it works where it shouldn't?
> >
> > freshports.org=> select setting, source from pg_settings where name =
> > 'default_text_search_config';
> > setting | source
> > -------------------+---------
> > pg_catalog.simple | default
> > (1 row)
> >
> >
> >>
> >> The hosts on which this search works
> >>
> >> freshports.devgit=# select websearch_to_tsquery('example');
> >> websearch_to_tsquery
> >> ----------------------
> >> 'exampl'
> >> (1 row)
> >
> >
> > freshports.devgit=# select setting, source from pg_settings where name
> > = 'default_text_search_config';
> > setting | source
> > --------------------+--------------------
> > pg_catalog.english | configuration file
> > (1 row)
> >
> >
> > At least now I know what I can play with to get all hosts in sync.
>
> Here we go, on the problem database, create a new field, based on simple, not english.
>
> ALTER TABLE public.ports
> ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored;
>
> Index it:
>
> CREATE INDEX ports_pkgmessage_textsearchable3_idx
> ON public.ports USING gin
> (pkgmessage_textsearchable3)
> TABLESPACE pg_default;
> CREATE INDEX
>
> query it:
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
> FROM ports
> WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example');
> port_id | element_pathname
> ---------+----------------------------------------------------------------------
> 34126 | /ports/head/security/pond
> 74559 | /ports/branches/2015Q3/emulators/linux_base-c6
> 60310 | /ports/branches/2020Q4/www/gitlab-ce
> 38345 | /ports/head/www/gitlab
> 46842 | /ports/branches/2018Q1/mail/postfix-sasl
> 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod
>
> Success. Thank you Mr Lane.
>
> --
> Dan Langille
> dan(at)langille(dot)org
>
>

--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2022-12-21 14:24:30 dropped default locale
Previous Message Chetan Kosanam 2022-12-21 04:48:58 Regd. the Implementation of Wallet (in Oracle) config equivalent in postgreSQL whilst the database migration