Re: full text indexing

From: "Poul L(dot) Christiansen" <poulc(at)cs(dot)auc(dot)dk>
To: Mitch Vincent <mitch(at)venux(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: full text indexing
Date: 2000-10-03 13:24:38
Message-ID: 39D9DE16.C9290A2D@cs.auc.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Mitch Vincent wrote:
>
> > Hi
> >
> > In my PostgreSQL database I have a lot of newspaper articles (size:
> > 100mb now, growing beyond 1gb within few months).
> > I wan't to use full text indexing so that users can search the articles
> > with a keyword and have the results in less than one second.
> >
> > How do I accomplish that?
>
> It's complicated :-)
>
> > Does PostgreSQL have this feature?
>
> Nope.
>
> > Which 3rd party indexing tools are available that easily interoperate
> > with PostgreSQL?
>
> There is some code in contrib called FTI (Full Text Index - no doubt).. I
> re-wrote it for my uses but ended up not using because I was doing so many
> sorts and joins, it made a scan (even an index scan) of a table with 3
> million rows in it very slow. However id you were just searching the keyword
> table itself it was very, very fast.
>
> The FTI trigger in the contrib breaks the words down to 2 letter bits (for
> substring searching) -- mine doesn't, it only indexes whole words without
> duplicates and looks at a list of words not to index (words like a, an, the,
> anything else you want -- I think it has the 300 most used English words in
> there already)
>
> It's drawback is speed, it does take a few seconds to index on INSERT (and
> UPDATE) -- that's if your text fields are 30ish k (very close to the max PG
> can store)..
>
> I would say you're going to run into the 32k limit pretty quick with
> newspaper articles -- I index resumes and I've run into it many times (32k
> of text really isn't all that much)..

I split the articles up. When pg 7.1 is out, I will merge them together
again :)

>
> Anyway, I'll try and get that trigger together that I did and send it to the
> PG guys to see if it's worthy of being added to contrib -- I'll send you a
> copy in private if you'd like. Note: I think there have been more people to
> re-write that trigger, I haven't seen anything else though..
>

Where can i find the FTI code?

I looked in ftp://ftp.postgresql.org/pub/contrib/ , but no luck :(

Thanks for the reply,
Poul L. Christiansen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Mount 2000-10-03 13:30:24 Re: Re: JDBC Performance
Previous Message chris markiewicz 2000-10-03 13:07:11 RE: trouble with trigger/function???