From: | Jean Gabriel <pgml(at)hasbani(dot)ca> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | websearch_to_tsquery fails to transform compound words from a thesaurus dictionary |
Date: | 2022-06-14 14:38:36 |
Message-ID: | d9874680-8292-0728-dca0-f9312afd3221@hasbani.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
Affected versions: PG 11 to 14.3 (all).
Affected OS: windows 10 + x86_64-pc-linux-gnu (from dbfiddle)
Issue:
Thesaurus dictionary can transform a compound word to another one. The
example provided in the doc is "supernovae stars : *sn". When used with
websearch_to_tsquery, this transformation does not occur and the
original words are kept, **OR**, if there is another single word entry
in the thesaurus, this single transformation occurs.
Why it is a problem:
since other text search functions apply the transformation, a document
containing the compound word can't be found when using
websearch_to_tsquery.
Expected result:
websearch_to_tsquery should transform compound words from the thesaurus
Good to know:
1) the expected behavior occurs with single words from the thesaurus.
2) the bad behavior occurs regardless of pre or post stemming
3) If the compound word is double quoted, websearch_to_tsquery returns
the expected output in V14 but a bad one in previous versions.
Steps to reproduce:
create a test_theasaurus.ths file with the lines
supernovae stars : *sn
supernovae : *sn
abc def: xy
CREATE TEXT SEARCH DICTIONARY test_thesaurus (
TEMPLATE = thesaurus,
DictFile = test_theasaurus,
Dictionary = pg_catalog.english_stem
);
CREATE TEXT SEARCH CONFIGURATION public.test ( COPY = pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION public.test
ALTER MAPPING FOR hword, hword_part, word, asciihword,
hword_asciipart, asciiword
WITH public.test_thesaurus, english_stem;
select to_tsvector('test','abc def') @@ websearch_to_tsquery('test','abc
def'); --FALSE - wrong result
select to_tsvector('test','supernovae stars') @@
websearch_to_tsquery('test','supernovae stars'); --FALSE - wrong result
select websearch_to_tsquery('test','abc def'); --'abc def' --> no
transformation occurred
select websearch_to_tsquery('test','supernovae stars'); --'sn' & 'star'
--> 1st word is listed by itself in the thesaurus and was transformed
select websearch_to_tsquery('test','"abc def"'); -- 'xy' --> in V14,
double quoted compound words are transformed as expected
select to_tsvector('test','abc def'), plainto_tsquery('test','abc def');
--'xy', expected behavior in other functions
select to_tsvector('test','supernovae stars'),
plainto_tsquery('test','supernovae stars'); --'sn', expected behavior in
other functions
Let me know if there is anything else I can provide!
Thank you for taking the time to look at this issue, it is much appreciated
JG
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2022-06-14 15:17:46 | BUG #17518: Getting Error "new multixact has more than one updating member" when trying to delete records. |
Previous Message | Michael Paquier | 2022-06-14 01:31:21 | Re: BUG #17504: psql --single-transaction -vON_ERROR_STOP=1 still commits after client-side error |