From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | sssnarr1(at)gmail(dot)com |
Subject: | BUG #17739: postgres ts_headline function is not returning matches it should during full text search |
Date: | 2023-01-08 16:28:24 |
Message-ID: | 17739-2f20af14b5644ef9@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 17739
Logged by: Sam S
Email address: sssnarr1(at)gmail(dot)com
PostgreSQL version: 15.1
Operating system: ubuntu
Description:
Pretty version of this question originally posted on StackExchange:
https://dba.stackexchange.com/questions/321718/postgres-ts-headline-function-is-not-returning-matches-it-should-during-full-tex
**Background**:
I've been using Postgres full-text search and it has met my needs quite
well. Though there is some unexpected behavior that I cannot seem to wrap my
head around. It has to do with the full-text search results returning the
highlighted matches using the `ts_headline` function. It returns the correct
matches most of the time, but often it will not return a match as I expect
it. I think an example is the best way to demonstrate this.
Relevant Postgres full-text highlighting docs:
https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE
**Versions**: I have tried Postgres 15 and 12 and experienced this bug
(feature?) in both.
**Examples**:
In the following 3 full-text search highlighting queries, why do the first
one's results not match the second and third's? Im trying to figure out what
I can do to get matches on the first query. I first noticed that some of my
highlight queries were coming back with no 'hits'. I created the following 3
queries to show the issue I'm having.
According to the docs, when there are no matches (identified by `<b></b>`
tags) it simply returns the first `MinWords`. That's what is happening in
the first query when I think that we should actually get the 2 results back
as we do in the following two queries.
```
postgres=# SELECT ts_headline('english', 'beginning word word word word
CHILD word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word SERVICE ending',
to_tsquery('english', 'CHILD & SERVICE'),
'MaxFragments=2,MinWords=5,MaxWords=10');
ts_headline
-------------------------------
beginning word word word word
(1 row)
```
Now let's increase `MaxWords`, now we get 2 matches
```
postgres=# SELECT ts_headline('english', 'beginning word word word word
CHILD word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word SERVICE ending',
to_tsquery('english', 'CHILD & SERVICE'),
'MaxFragments=2,MinWords=5,MaxWords=11');
ts_headline
------------------------------------------------------------------------------------------------------------------------
beginning word word word word <b>CHILD</b> word word word word word ...
word word word word word <b>SERVICE</b> ending
(1 row)
```
Now let's increase `MaxFragments`, now we get 2 matches
```
postgres=# SELECT ts_headline('english', 'beginning word word word word
CHILD word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word SERVICE ending',
to_tsquery('english', 'CHILD & SERVICE'),
'MaxFragments=3,MinWords=5,MaxWords=10');
ts_headline
---------------------------------------------------------------------------------------------------------
word word word word <b>CHILD</b> word word word word word ... word word
word word <b>SERVICE</b> ending
(1 row)
```
I feel like something subtle is going on between all the `MaxFragments,
MinWords, MaxWords` settings, or maybe this is undefined behavior or a bug.
Im hoping to find a way to get the first query to match as I do believe it
should. Please correct me if I'm wrong.
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2023-01-09 08:25:45 | BUG #17740: Connecting postgresql 13 with different psql versions |
Previous Message | Alexander Korotkov | 2023-01-08 11:19:30 | Re: Bug in jsonb_path_exists (maybe _match) one-element scalar/variable jsonpath handling |