Re: Search (was: Web team meeting minutes)

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-www(at)postgresql(dot)org
Subject: Re: Search (was: Web team meeting minutes)
Date: 2006-07-14 17:35:50
Message-ID: Pine.GSO.4.63.0607142053530.2921@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Fri, 14 Jul 2006, Dave Page wrote:

>
>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
>> Sent: 14 July 2006 14:22
>> To: Dave Page
>> Cc: Magnus Hagander; pgsql-www(at)postgresql(dot)org
>> Subject: RE: [pgsql-www] Search (was: Web team meeting minutes)
>>
>> Dave,
>>
>> I see the main problem is not in search engine, but in the
>> site engine !
>> It's just not database driven. So, I withdraw my words :)
>
> It's entirely database /driven/, it's just the text index that's fs
> based. We run a fork of the ASPSeek code which has a few improvements
> over the official code including the XML data feed I mentioned, and
> support for PostgreSQL (as opposed to MySQL or Oracle which the standard
> code support).

if it's database driven what prevent updating index once db updates ?
it's not easy task and ASPseek explicitly documents this.
If you got, for example, 10 documents updated in a hour, you need to update a
lot of rows, the best case you should update as many rows as the number of
unique words in thes documents. That's why I don't believe you could ever run
online index with ASPseek. Do you have a hook in the site engine to know if
something gets changed ? Sort of webservice would be nice. I'd play with
it to made a prototype of search engine.

>
>> Does web team consider changing web site engine ? I suggest not to use
>> home-made engines, since we have no power to support it, we
>> do database
>> development, and we don't want to depend on specific person. There are
>> big open-source projects with stable, mature community and we could
>> just add fts capability we need, for example, to Drupal.
>
> Hmm, well, see JD's comments on Drupal. After many years of trying
> different search engines, ASPSeek is by far the best we've found yet
> which *doesn't* require lots of custom code, and can be relatively
> easily managed by any one of us. I'd love for us to use Tsearch to do
> it, but it seems to me we'd need far too much custom code that would
> definitely be harder to manage.
>

I'm not insisting on Drupal, it was just an example. I want to say that
better to have engine supported by community, than developing itself.
What we need is a hook which could inform search engine all changes happened
at site. I'd prefer to work with webservice and provide also webservice to
be "loosly coupled". That would be nice project for student. I don't have time
to develop search webservice myself, but could help.

> So unless you're about to release pgGoogle 1.0...

I once wrote simple perl crawler, based on OpenFTS and run it for
several sites. It's not a lot of code, check this
http://mira.sai.msu.su/~megera/pgsql/
http://mira.sai.msu.su/~megera/pgsql/varlena/

It's static, since there is no hooks available.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Claire McLister 2006-07-14 18:10:51 Re: Font for headers
Previous Message Joshua D. Drake 2006-07-14 17:32:47 Re: Font for headers