Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alex Malek <magicagent(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, ngigi(at)at(dot)co(dot)ke
Subject: Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly
Date: 2023-10-28 21:20:11
Message-ID: CALT9ZEGMS_U-dLQLROg5op9va7kjA8qQMRKbUROSWsv2sYec5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi, Bruce and Tom!

On Sun, 29 Oct 2023 at 00:46, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Is this documented somewhere?
>
> The docs [1] only say that ts_headline "returns an excerpt from the
> document in which terms from the query are highlighted". This
> behavior does not violate that admittedly-weak contract.
>
> IIRC, ts_headline does attempt to find a text fragment or fragments
> that fully satisfy the query (e.g., include an exact phrase match)
> but it will then highlight all the matching words in the fragment,
> not only the location of the phrase match. I do not agree with the
> OP's opinion that that's wrong. The highlight-em-all approach has its
> own value, and in any case it may not be possible to find a full match
> that satisfies the function's other constraints such as MaxWords.
> Refusing to highlight anything in that event would be unhelpful.
>
> regards, tom lane

I think that the ts_headline main functionality is to make Postgres
more friendly to search-engine-like approach, which I feel is too
niche usage scenario for supporting it as a part of core code. If
remember right, bug reports coming from the users supposing it has
more strict semantics than it has in reality are regular. And I also
remember myself being puzzled by unusual output in the past.

If we fiddle with other parameters of ts_headline we can easily have
other kinds of output that seem counterintuitive e.g.:
SELECT ts_headline('English',

'This Commercial Bank does not have any
Equity in Europe but European Commercial Bank does',

('''equiti'' <-> ''bank''')::tsquery, 'MaxWords=30, MinWords=2');
ts_headline
-----------------
This Commercial
(1 row)

What do you think about clearly deprecating this feature in docs,
still leaving it working as it is?

Kind regards,
Pavel Borisov,
Supabase.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Korotkov 2023-10-28 22:39:40 Re: BUG #18170: Unexpected error: no relation entry for relid 3
Previous Message Tom Lane 2023-10-28 20:46:40 Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly