From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | esemmano(at)gmail(dot)com |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |
Date: | 2024-06-13 23:59:22 |
Message-ID: | 2184370.1718323162@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
[ couldn't let go of this ... ]
I wrote:
> It's fairly confusing that this code manages to ignore not-ISOPERATOR
> punctuation. It seems like that gets eaten by gettoken_tsvector()
> and then later we decide there's not really a word there.
Yeah, further investigation shows that such cases effectively act
like stopwords: they are passed back to makepol() as VAL strings,
but then lexize processing rejects them as not words.
> I'm also confused how come the same thing doesn't happen in the
> english tsconfig. Not sure it's worth poking at more, though.
D'oh: "or" is a stopword in the english config. The english case
is still wrong of course, just differently:
regression=# select websearch_to_tsquery('english', 'foo or (baz bar) or (ding dong)');
websearch_to_tsquery
-----------------------------------------
'foo' | 'baz' & 'bar' & 'ding' & 'dong'
(1 row)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-06-14 00:37:35 | Re: error "can only drop stats once" brings down database |
Previous Message | Tom Lane | 2024-06-13 22:04:20 | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |