Re: Ellipses around result fragment of ts_headline

From: Asher Snyder <asnyder(at)noloh(dot)com>
To: sushant354(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Ellipses around result fragment of ts_headline
Date: 2009-02-14 21:46:53
Message-ID: 00a001c98eed$bfef8ab0$3fcea010$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yes, you are correct in your assumption that I'm looking for a single
fragment to also have the option to add a fragment delimiter based on its
position in the document.

>-----Original Message-----
>From: Sushant Sinha [mailto:sushant354(at)gmail(dot)com]
>Sent: Saturday, February 14, 2009 4:41 PM
>To: Asher Snyder
>Cc: pgsql-hackers(at)postgresql(dot)org
>Subject: RE: [HACKERS] Ellipses around result fragment of ts_headline
>
>The documentation in 8.4dev has information on FragmentDelimiter
>http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
>
>If you do not specify MaxFragments > 0, then the default headline
>generator kicks in. The default headline generator does not have any
>fragment delimiter. So it is correct that you will not see any
>delimiter.
>
>I think you are looking for the default headline generator to add
>ellipses as well depending on where the fragment is. I do not what
>other people opinion on this is.
>
>-Sushant.
>
>On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:
>> Interesting, it could be that you already do it, but the documentation
>makes
>> no reference to a fragment delimiter, so there's no way that I can see
>to
>> add one. The documentation for ts_headline only lists StartSel,
>StopSel,
>> MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be
>no
>> option for a fragment delimiter.
>>
>> In my case I do:
>>
>> SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query,
>'MinWords =
>> 17') as copy, ts_rank(v1.text_search, query) AS rank FROM
>> (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
>> ||
>> setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
>> FROM search.v_searchable_content b1) v1,
>> plainto_tsquery($1) query
>> WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search
>ORDER
>> BY rank DESC, title
>>
>> Now, this use of ts_headline correctly returns me highlighted
>fragmented
>> search results, but there will be no fragment delimiter for the
>headline.
>> Some suggestions were to change ts_headline(v1.copy, query, 'MinWords
>= 17')
>> to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...', but
>as you
>> can clearly see this would always occur, and not be intelligent
>regarding
>> the fragments. I hope that you're correct and that it is implemented,
>and
>> not documented
>>
>> >-----Original Message-----
>> >From: Sushant Sinha [mailto:sushant354(at)gmail(dot)com]
>> >Sent: Saturday, February 14, 2009 4:07 PM
>> >To: Asher Snyder
>> >Cc: pgsql-hackers(at)postgresql(dot)org
>> >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline
>> >
>> >I think we currently do that. We add ellipses only when we encounter
>a
>> >new fragment. So there should not be ellipses if we are at the end of
>> >the document or if that is the first fragment (includes the beginning
>of
>> >the document). Here is the code in generateHeadline, ts_parse.c that
>> >adds the ellipses:
>> >
>> > if (!infrag)
>> > {
>> >
>> > /* start of a new fragment */
>> > infrag = 1;
>> > numfragments ++;
>> > /* add a fragment delimitor if this is after the
>first
>> >one */
>> > if (numfragments > 1)
>> > {
>> > memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
>> > ptr += prs->fragdelimlen;
>> > }
>> >
>> > }
>> >
>> >It is possible that there is a bug that needs to be fixed. Can you
>show
>> >me an example where you found that?
>> >
>> >-Sushant.
>> >
>> >
>> >
>> >
>> >On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
>> >> It would be very useful if there were an option to have ts_headline
>> >append
>> >> ellipses before or after a result fragement based on the position
>of
>> >the
>> >> fragment in the source document. For instance, when running
>> >ts_headline(doc,
>> >> query) it will correctly return a fragment with words highlighted,
>> >however,
>> >> there's no easy way to determine whether this returned fragment is
>at
>> >the
>> >> beginning or end of the original doc, and add the necessary
>ellipses.
>> >>
>> >> Searches such as postgresql.org ALWAYS add ellipses before or after
>> >the
>> >> fragment regardless of whether or not ellipses are warranted. In my
>> >opinion
>> >> always adding ellipses to the fragment is deceptive to the user, in
>> >many of
>> >> my search result cases, the fragment is at the beginning of the
>doc,
>> >and
>> >> would confuse the user to always see ellipses. So you can see how
>> >useful the
>> >> feature described above would be beneficial to the accuracy of the
>> >search
>> >> result fragment.
>> >>
>> >>
>> >>
>> >>
>> >>
>>
>>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Asher Snyder 2009-02-14 21:49:45 Re: Ellipses around result fragment of ts_headline
Previous Message Sushant Sinha 2009-02-14 21:46:50 Re: Ellipses around result fragment of ts_headline