From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | esemmano(at)gmail(dot)com |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |
Date: | 2024-06-13 22:04:20 |
Message-ID: | 2130969.1718316260@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> Although the docs
> https://www.postgresql.org/docs/current/textsearch-controls.html say nothing
> about websearch_to_tsquery supporting parentheses in queries, I noticed some
> inconsistent behaviour when using multiple 'or' keywords with parentheses in
> postgres 15.4
The definition of websearch_to_tsquery says pretty plainly that
"Other punctuation is ignored". So I'd expect parens to do nothing.
That makes this problematic:
> select websearch_to_tsquery('german', 'foo or baz bar or (ding dong)');
> websearch_to_tsquery
> -----------------------------------------
> 'foo' | 'baz' & 'bar' | 'ding' & 'dong'
> select websearch_to_tsquery('german', 'foo or (baz bar) or (ding dong)');
> websearch_to_tsquery
> ------------------------------------------------
> 'foo' | 'baz' & 'bar' & 'or' & 'ding' & 'dong'
I found what seems to be the issue in gettoken_query_websearch: it
ignores ISOPERATOR chars (including parens) in WAITOPERAND state,
but not in WAITOPERATOR state. That results in switching back to
WAITOPERAND state which will consume the "or" as a regular word.
So a minimal fix could look like the attached.
It's fairly confusing that this code manages to ignore not-ISOPERATOR
punctuation. It seems like that gets eaten by gettoken_tsvector()
and then later we decide there's not really a word there.
I'm also confused how come the same thing doesn't happen in the
english tsconfig. Not sure it's worth poking at more, though.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
draft-bug18479-fix.patch | text/x-diff | 527 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-06-13 23:59:22 | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |
Previous Message | Pawel Kudzia | 2024-06-13 18:49:48 | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |