| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | esemmano(at)gmail(dot)com |
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |
| Date: | 2024-06-13 22:04:20 |
| Message-ID: | 2130969.1718316260@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> Although the docs
> https://www.postgresql.org/docs/current/textsearch-controls.html say nothing
> about websearch_to_tsquery supporting parentheses in queries, I noticed some
> inconsistent behaviour when using multiple 'or' keywords with parentheses in
> postgres 15.4
The definition of websearch_to_tsquery says pretty plainly that
"Other punctuation is ignored". So I'd expect parens to do nothing.
That makes this problematic:
> select websearch_to_tsquery('german', 'foo or baz bar or (ding dong)');
> websearch_to_tsquery
> -----------------------------------------
> 'foo' | 'baz' & 'bar' | 'ding' & 'dong'
> select websearch_to_tsquery('german', 'foo or (baz bar) or (ding dong)');
> websearch_to_tsquery
> ------------------------------------------------
> 'foo' | 'baz' & 'bar' & 'or' & 'ding' & 'dong'
I found what seems to be the issue in gettoken_query_websearch: it
ignores ISOPERATOR chars (including parens) in WAITOPERAND state,
but not in WAITOPERATOR state. That results in switching back to
WAITOPERAND state which will consume the "or" as a regular word.
So a minimal fix could look like the attached.
It's fairly confusing that this code manages to ignore not-ISOPERATOR
punctuation. It seems like that gets eaten by gettoken_tsvector()
and then later we decide there's not really a word there.
I'm also confused how come the same thing doesn't happen in the
english tsconfig. Not sure it's worth poking at more, though.
regards, tom lane
| Attachment | Content-Type | Size |
|---|---|---|
| draft-bug18479-fix.patch | text/x-diff | 527 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2024-06-13 23:59:22 | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |
| Previous Message | Pawel Kudzia | 2024-06-13 18:49:48 | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |