Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: esemmano(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses
Date: 2024-06-13 23:59:22
Message-ID: 2184370.1718323162@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

[ couldn't let go of this ... ]

I wrote:
> It's fairly confusing that this code manages to ignore not-ISOPERATOR
> punctuation. It seems like that gets eaten by gettoken_tsvector()
> and then later we decide there's not really a word there.

Yeah, further investigation shows that such cases effectively act
like stopwords: they are passed back to makepol() as VAL strings,
but then lexize processing rejects them as not words.

> I'm also confused how come the same thing doesn't happen in the
> english tsconfig. Not sure it's worth poking at more, though.

D'oh: "or" is a stopword in the english config. The english case
is still wrong of course, just differently:

regression=# select websearch_to_tsquery('english', 'foo or (baz bar) or (ding dong)');
websearch_to_tsquery
-----------------------------------------
'foo' | 'baz' & 'bar' & 'ding' & 'dong'
(1 row)

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2024-06-14 00:37:35 Re: error "can only drop stats once" brings down database
Previous Message Tom Lane 2024-06-13 22:04:20 Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses