From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Catalin Marinas <catalin(dot)marinas(at)gmail(dot)com> |
Cc: | Richard Huxton <dev(at)archonet(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: Fragments in tsearch2 headline |
Date: | 2007-10-30 15:39:53 |
Message-ID: | Pine.LNX.4.64.0710301834340.14368@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 30 Oct 2007, Catalin Marinas wrote:
> On 30/10/2007, Richard Huxton <dev(at)archonet(dot)com> wrote:
>> Oleg Bartunov wrote:
>>> Catalin,
>>>
>>> what is your need ? What's wrong with this ?
>>>
>>> postgres=# select ts_headline('1 2 3 4 5 3 4 abc abc 2 3
>>> xyz','2'::tsquery, 'StartSel=...,StopSel=...')
>>> ;
>>> ts_headline
>>> -------------------------------------------
>>> 1 ...2... 3 4 5 3 4 abc abc ...2... 3 xyz
>>
>> I think he want's something like: "1 2 3 ... abc 2 3 ..."
>>
>> A few characters of context around each match and then ... between. Kind
>> of like grep -C.
>
> That's pretty much correct (with the difference that I'd like context
> of words rather than lines as in "grep" and StartSel=<b>,
> StopSel=</b>).
>
> Since the text I want a headline for might be pretty long (tens of
> lines), I'd like to only show the excerpts around the matching words.
> Similar to the above example:
>
> select ts_headline('1 2 3 4 5 3 4 abc x y z 2 3', '2 & abc'::tsquery);
>
> should give:
>
> '1 <b>2</b> 3 4 ... 3 4 <b>abc</b> x y'
>
> Currently, if you limit the maximum words so that 'abc' is too far, it
> only highlights the first match.
ok, then you have to formalize many things - how long should be excerpts,
how much excerpts to show, etc. In tsearch2 we have get_covers() function,
which produces all excerpts like:
=# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'), '2&3'::tsquery);
get_covers
------------------------------------------------
1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3
(1 row)
Once you formalize your requirements, you can look on it and adapt to your
needs (and share with people). I think it could be nice contrib module.
>
> Many of the search engines (including google) show the headline this
> way. I think Lucene can do this as well but I've never used it to be
> sure.
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | M Rather | 2007-10-30 16:39:20 | pgsql.broken.csc |
Previous Message | Bob Pawley | 2007-10-30 15:34:35 | Re: PostgreSQL and AutoCad |