From: | "Catalin Marinas" <catalin(dot)marinas(at)gmail(dot)com> |
---|---|
To: | "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, "Teodor Sigaev" <teodor(at)sigaev(dot)ru> |
Subject: | Re: Fragments in tsearch2 headline |
Date: | 2007-10-30 10:41:27 |
Message-ID: | b0943d9e0710300341h34d68bbp4a717d681b769b3f@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 28/10/2007, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> On Sat, 27 Oct 2007, Tom Lane wrote:
>
> > "Catalin Marinas" <catalin(dot)marinas(at)gmail(dot)com> writes:
> >> Is there an easy way to generate a headline from separate fragments
> >> containing the search words and maybe separated by "..."?
> >
> > Hmm, the documentation for ts_headline claims it does this already:
[...]
> > However, a quick look at the code suggests this is a lie --- I see no
> > evidence whatever that there's any smarts for putting in ellipses.
>
> Probably documentation is not correct here. 'ellipsis-separated' should be
> treated as a general wording. Default highlighting is <b>..</b> as it
> stated below in docs.
It seems that I'll have to implement the headline outside the query
(Python, in my case). I would use to_tsvector and to_tsquery to
generate the lexemes and the work position, add them to a hash table
and use the position of the matching lexemes to generate the headline.
I could also highlight the full text and generate the headline I want
based on it but if I limit the number of excerpts, it gets complicated
to avoid the same lexeme being shown in all excerpts. Is a lexeme
always a substring of the corresponding token (so that I can use
simple regexp)?
Any other ideas?
Thanks.
--
Catalin
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2007-10-30 10:52:07 | Re: Fragments in tsearch2 headline |
Previous Message | Pavel Stehule | 2007-10-30 10:37:24 | Re: Checking empty array |