From: | Uros Gruber <uros(at)sir-mag(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Tomaz Borstnar <tomaz(dot)borstnar(at)over(dot)net>, pgsql-general(at)postgresql(dot)org, Teodor Sigaev <teodor(at)stack(dot)net> |
Subject: | Re: tsearch comments |
Date: | 2003-01-28 22:48:33 |
Message-ID: | 4521990343.20030128234833@sir-mag.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi!
OpenFTS is great so far. But for example. We are working on
directory engine and we would like to use some ranking on
data we get from tsearch. The we have data like Page caption,
description, keywords, url, page content .... and then we
have another project we we search on complitely different
kind of data.
Using full text search in this scenario is very easy to use,
because everything is in db and this is done on db level.
Developer do not need to worry about that how to index
something. It great because you can say this column is
fulltext indexed.
Second stage is ordering data you get from tsearch and thats
where openFTS comes. But you have to make some middle ware
which is great, but we need to focus on other problems not on
middle ware.
Moving this to C would be great but not solution to all of us
we want to meka our searches good.
I think relkov and relor is good for start and should be
going that way. I think that everybody can very simple
acomplish hilightning and generation of headlines once they
get result ordered.
As i say in my mail before and Oleg ask me "Could you
elaborate this ?". I try to make some changes openFTS special
in relkov and relkor. But i'm not god in advanced C
programing so i spend a lot of time to find out what exactly
code does.
And here is my idea what would be great if this is possible
to make, because i don't realy know how pg internaly works.
Let say we create some table where we want to use full text
search.
CREATE table .....
..
mycolumn varchar,
another_column varchar,
....
fulltext(mycolumn,another_column)
}
the system then make all necessary index tables where those
positions would be saved when some data is inserted. I don't
know if this is possible to make somwehere in backgound so
user don actualy se those tables, but this is not a problem.
Parsing search words can anybody easily make in their own
language. Or he could use OpenFTS functionality. I made it
for PHP. So when you have those search words we passed it to
sql query.
something like this.
SELECT mycolumn,another_column FROM mytable WHERE mycolumn @
'search string' AND another_column @ 'search string';
This is done by tsearch and we get data searched but not
orderd by relevance.
For that we add something in that way
SELECT mycolumn,another_column,rank() AS sumofrank FROM mytable WHERE....... ORDER my
sumofrank
I'll write this rank here for better understanding
rank({mycolumn=>0.01},{another_column=>0.001},'search string') AS sumofrank
This would read that mycolumn have base weight 0.01 and
another column 0.001, so if search string is found in beginig
of another column it would be ranked lower than same string
found in mycolumn in the middle of it. Those weight could be
summed. With this could be possible to make order what column
is more important not only generaly but for every query we
make.
Sintax is just for easier understanding what i'm trying to
solve.
So far we orderd aout data and then we could make hilighning
and stuff in any language we want.
I hope everybody undestands what is my idea and i would like
to help i just have to learn more from the code and what
internaly is done with that data.
I make some ranking in PHP but it was not fast becase there
were a lot of data etc and php is not as fast as C is. But i
get pretty results and also the concept how to rank
something.
I could also be made some rule engine how to rank something,
but i think that first of all we have to start on something
trivial and simple. And when this works we move to advanced.
Let say we check if text is bold or is in CAPS...
--
bye,
Uros
Tuesday, January 28, 2003, 8:11:36 PM, you wrote:
OB> On Tue, 28 Jan 2003, Tomaz Borstnar wrote:
>> At 13:47 28.1.2003 +0300, Oleg Bartunov wrote the following message:
>> >We want to keep tsearch as simple as it's and now we just add
>> >better and friendly configurability. Do we need complicate tsearch ?
>>
>> Sometimes you need that because some other app is putting data into database.
>>
OB> So, you'll end up with something like OpenFTS, which was designed as
OB> *engine* to be integrated into other apps. The real problem is that
OB> OpenFTS is written in perl and porting to other languages is
OB> difficult task. new tsearch already has some features of OpenFTS and
OB> we're slowly moving to idea we should rewrite OpenFTS in 'C',
OB> so writing interfaces would be much simpler.
OB> There is major problem with moving ALL features of OpenFTS to tsearch
OB> we don't know how to resolve - generation of headlines, text fragments
OB> with hilighted query terms. Once we resolve that we could concentrate
OB> on tsearch with ranking support.
From | Date | Subject | |
---|---|---|---|
Next Message | John Turner | 2003-01-28 22:51:04 | convert from datepart('epoch', now()) to Date / time |
Previous Message | eric soroos | 2003-01-28 22:37:14 | ERROR: ExecEvalAggref |