From: | Maarten Boekhold <maarten(dot)boekhold(at)tibcofinance(dot)com> |
---|---|
To: | Marc Tardif <intmktg(at)cam(dot)org> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: advice on indexing email |
Date: | 2000-04-28 12:04:50 |
Message-ID: | 39097E62.9FB27E81@tibcofinance.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I wrote that fti stuff in contrib...
> My problem is how to create the full word index. The actual code to
> seperate the email into seperate words isn't a problem, but should I be
> using INSERT, BEGIN/END or COPY? In this last case, I would have to create
> a temporary file holding each word of the email and then use COPY... all
> of which also has it's fair share of overhead.
You can use one of 2 ways.
1. the fti stuff in contrib uses triggers, so every time you
insert/update/delete something in/from the 'fti-ed' table, the full text index
is also updated. If you're coding abilities are OK, you can just replace the
word breakup code in contrib/fti with your own one.
2. if you have to insert large amounts of data, it is probably faster to *not*
create the triggers at first, bulk load all your data, write a little perl
script that reads the data from your table, does the word breakup and inserts
those words into the full text index table. Using a 'sort' on the output of
the perl script will help performance as the fti data will now already be
pre-sorted in the database (you could also use CLUSTER on the fti table after
the index has been created). I think I described this somewhat better in the
README in contrib/fti. If you take this approach, don't forget to create the
triggers after the bulk load of the fti table!
Maarten
--
Maarten Boekhold, maarten(dot)boekhold(at)tibcofinance(dot)com
TIBCO Finance Technology Inc.
"Sevilla" Building
Entrada 308
1096 ED Amsterdam, The Netherlands
tel: +31 20 6601000 (direct: +31 20 6601066)
fax: +31 20 6601005
http://www.tibcofinance.com
From | Date | Subject | |
---|---|---|---|
Next Message | Bill Barnes | 2000-04-28 12:39:56 | date format problem |
Previous Message | frank | 2000-04-28 11:55:14 | plperl.so ? |