From: | "eric(at)did-it(dot)com" <eric(at)did-it(dot)com> |
---|---|
To: | Uros Gruber <uros(at)sir-mag(dot)com> |
Cc: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tomaz Borstnar <tomaz(dot)borstnar(at)over(dot)net>, pgsql-general(at)postgresql(dot)org, Teodor Sigaev <teodor(at)stack(dot)net> |
Subject: | Re: tsearch comments |
Date: | 2003-01-29 03:51:48 |
Message-ID: | 1043812309.14066.15.camel@linuxworks |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Good ideas,
I second them entirely.
We're using tsearch as well, and would like to see ranking in this
module. It would be great to have it on the DB level, instead of
middleware (perl, etc). Apparently there is built in proximity ranking?
I guess one of the problems with using the OpenFTS front end is
integration into existing systems that use different languages and
methodologies. Most of the time, you tend to want those results
delivered within your application, instead of outside of it in a general
purpose search engine.
Also, in many typical applications, the data is not necessarily
title/description based, for instance, we need to lookup data from just
a title, or sometimes a content column, and maybe even from our keywords
table.
- Ericson Smith
http://www.weightlossfriends.com
On Tue, 2003-01-28 at 17:48, Uros Gruber wrote:
> Hi!
>
> OpenFTS is great so far. But for example. We are working on
> directory engine and we would like to use some ranking on
> data we get from tsearch. The we have data like Page caption,
> description, keywords, url, page content .... and then we
> have another project we we search on complitely different
> kind of data.
>
> Using full text search in this scenario is very easy to use,
> because everything is in db and this is done on db level.
> Developer do not need to worry about that how to index
> something. It great because you can say this column is
> fulltext indexed.
>
> Second stage is ordering data you get from tsearch and thats
> where openFTS comes. But you have to make some middle ware
> which is great, but we need to focus on other problems not on
> middle ware.
>
> Moving this to C would be great but not solution to all of us
> we want to meka our searches good.
>
> I think relkov and relor is good for start and should be
> going that way. I think that everybody can very simple
> acomplish hilightning and generation of headlines once they
> get result ordered.
>
> As i say in my mail before and Oleg ask me "Could you
> elaborate this ?". I try to make some changes openFTS special
> in relkov and relkor. But i'm not god in advanced C
> programing so i spend a lot of time to find out what exactly
> code does.
>
> And here is my idea what would be great if this is possible
> to make, because i don't realy know how pg internaly works.
>
> Let say we create some table where we want to use full text
> search.
>
> CREATE table .....
> ..
> mycolumn varchar,
> another_column varchar,
> ....
> fulltext(mycolumn,another_column)
>
> }
> the system then make all necessary index tables where those
> positions would be saved when some data is inserted. I don't
> know if this is possible to make somwehere in backgound so
> user don actualy se those tables, but this is not a problem.
>
> Parsing search words can anybody easily make in their own
> language. Or he could use OpenFTS functionality. I made it
> for PHP. So when you have those search words we passed it to
> sql query.
>
> something like this.
>
> SELECT mycolumn,another_column FROM mytable WHERE mycolumn @
> 'search string' AND another_column @ 'search string';
>
> This is done by tsearch and we get data searched but not
> orderd by relevance.
>
> For that we add something in that way
>
> SELECT mycolumn,another_column,rank() AS sumofrank FROM mytable WHERE....... ORDER my
> sumofrank
>
> I'll write this rank here for better understanding
>
> rank({mycolumn=>0.01},{another_column=>0.001},'search string') AS sumofrank
>
> This would read that mycolumn have base weight 0.01 and
> another column 0.001, so if search string is found in beginig
> of another column it would be ranked lower than same string
> found in mycolumn in the middle of it. Those weight could be
> summed. With this could be possible to make order what column
> is more important not only generaly but for every query we
> make.
>
> Sintax is just for easier understanding what i'm trying to
> solve.
>
> So far we orderd aout data and then we could make hilighning
> and stuff in any language we want.
>
> I hope everybody undestands what is my idea and i would like
> to help i just have to learn more from the code and what
> internaly is done with that data.
>
> I make some ranking in PHP but it was not fast becase there
> were a lot of data etc and php is not as fast as C is. But i
> get pretty results and also the concept how to rank
> something.
>
> I could also be made some rule engine how to rank something,
> but i think that first of all we have to start on something
> trivial and simple. And when this works we move to advanced.
> Let say we check if text is bold or is in CAPS...
>
> --
> bye,
> Uros
>
> Tuesday, January 28, 2003, 8:11:36 PM, you wrote:
>
> OB> On Tue, 28 Jan 2003, Tomaz Borstnar wrote:
>
> >> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message:
> >> >We want to keep tsearch as simple as it's and now we just add
> >> >better and friendly configurability. Do we need complicate tsearch ?
> >>
> >> Sometimes you need that because some other app is putting data into database.
> >>
>
> OB> So, you'll end up with something like OpenFTS, which was designed as
> OB> *engine* to be integrated into other apps. The real problem is that
> OB> OpenFTS is written in perl and porting to other languages is
> OB> difficult task. new tsearch already has some features of OpenFTS and
> OB> we're slowly moving to idea we should rewrite OpenFTS in 'C',
> OB> so writing interfaces would be much simpler.
> OB> There is major problem with moving ALL features of OpenFTS to tsearch
> OB> we don't know how to resolve - generation of headlines, text fragments
> OB> with hilighted query terms. Once we resolve that we could concentrate
> OB> on tsearch with ranking support.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
From | Date | Subject | |
---|---|---|---|
Next Message | eric@did-it.com | 2003-01-29 04:00:20 | Re: tsearch comments |
Previous Message | Evan Macosko | 2003-01-29 03:44:25 | unsubscribe |