Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alex Malek <magicagent(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, ngigi(at)at(dot)co(dot)ke
Subject: Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly
Date: 2023-10-28 19:42:16
Message-ID: ZT1kGIbiALGclTUA@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Aug 3, 2022 at 02:02:51PM -0400, Alex Malek wrote:
> On Wed, Aug 3, 2022 at 1:58 PM PG Bug reporting form <noreply(at)postgresql(dot)org>
> wrote:
> I have a noticed a likely bug when using ts_headline with the <-> operator
>
> Assuming the following query:
>
> SELECT ts_headline('English','This Commercial Bank does not have any Equity
> in Europe but European Commercial Bank does',
>                     phraseto_tsquery('English','European Commercial
> Bank')::tsquery);
>
> The returned result is:
> This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but
> <b>European</b> <b>Commercial</b> <b>Bank</b> does
>
> This highlights the words Commercial & Bank separately in addition to
> European Commercial Bank.
>
> However, the correct output expected should be:
> This Commercial Bank does not have any Equity in Europe but <b>European</b>
> <b>Commercial</b> <b>Bank</b> does
>
> Which only highlights *European Commercial Bank* due to the <-> operator in
> phraseto_tsquery.
>
> SELECT phraseto_tsquery('English','European Commercial Bank');
> returns 'european' <-> 'commerci' <-> 'bank' as expected indicating the
> problem is with ts_headline function.

I tested this against Postgres 11 and master (and you tested on PG 10
and 14) and I found the same behavior, plus I found someting even
worse:

SELECT ts_headline('English',
'This Commercial Bank does not have any Equity in Europe but European Commercial Bank does',
('''equiti'' <-> ''bank''')::tsquery);
ts_headline
----------------------------------------------------------------------------------------------------------------

This Commercial <b>Bank</b> does not have any <b>Equity</b> in Europebut European Commercial <b>Bank</b> does

Notice that "Bank" and "Equity" are not next to each other, but they
still highlight. In fact, the words appear to be independently checked:

SELECT ts_headline('English',
'This Commercial Bank does not have any Equity in Europe but European Commercial Bank does',
('''XXX'' <-> ''bank''')::tsquery);
ts_headline
---------------------------------------------------------------------------------------------------------
This Commercial <b>Bank</b> does not have any Equity in Europe but European Commercial <b>Bank</b> does

Is this documented somewhere?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-10-28 20:46:40 Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly
Previous Message Sergei Kornilov 2023-10-28 18:47:40 Re:BUG #18172: High memory usage in tSRF function context