From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Sushant Sinha <sushant354(at)gmail(dot)com> |
Cc: | Catalin Marinas <catalin(dot)marinas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Richard Huxton <dev(at)archonet(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: Fragments in tsearch2 headline |
Date: | 2008-03-17 18:27:44 |
Message-ID: | 200803171827.m2HIRiM08492@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Teodor, Oleg, do we want this?
http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php
---------------------------------------------------------------------------
Sushant Sinha wrote:
> I wrote a headline generation function for my app and I have attached
> the patch (against the cvs head). It generates multiple contexts in
> which the query appears. Essentially, it uses the cover function to
> generate all covers, chooses smallest covers and stretches each
> selected cover according to the chosen parameters. I think ideally
> changes should be made to prsd_headline function but I couldn't
> understand that segment of code well.
>
> The sql interface is
>
> headline_with_fragments(text parser, tsvector docvector, text doc,
> tsquery queryin, int4 maxcoverSize, int4 mincoverSize, int4 maxWords)
> RETURNS text
>
> This will generate headline that contain maxWords and each cover
> stretched to maxcoverSize. It will not add any fragment with less than
> mincoverSize.
> I am running my app with maxcoverSize = 20, mincoverSize = 5, maxWords = 40.
> So it shows roughly two fragments per query.
>
> If Teoder or Oleg want to add this to main branch, I will be happy to
> clean it up and test it better.
>
> -Sushant.
>
>
>
>
> On Oct 31, 2007 6:26 PM, Catalin Marinas <catalin(dot)marinas(at)gmail(dot)com> wrote:
> > On 30/10/2007, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> > > ok, then you have to formalize many things - how long should be excerpts,
> > > how much excerpts to show, etc. In tsearch2 we have get_covers() function,
> > > which produces all excerpts like:
> > >
> > > =# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'), '2&3'::tsquery);
> > > get_covers
> > > ------------------------------------------------
> > > 1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3
> > > (1 row)
> >
> > This function generates the lexemes, so cannot be used directly, but
> > it is probably a good starting point.
> >
> > > Once you formalize your requirements, you can look on it and adapt to your
> > > needs (and share with people). I think it could be nice contrib module.
> >
> > It seems that Sushant already wants to implement this function. He
> > would probably be faster than me :-) (I'm relatively new to db stuff).
> > Since I mainly rely on whatever a web hosting company provides, I'll
> > probably stick with a Python implementation outside the SQL query.
> >
> > Thanks for your answers.
> >
> > --
> > Catalin
> >
> > ---------------------------(end of broadcast)---------------------------
> >
> > TIP 5: don't forget to increase your free space map settings
> >
[ Attachment, skipping... ]
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Joey K. | 2008-03-17 18:53:56 | Re: identify database process given client process |
Previous Message | postgre | 2008-03-17 17:57:41 | Re: [GENERAL] large object import |