| From: | "Magnus Hagander" <mha(at)sollentuna(dot)net> |
|---|---|
| To: | "philip johnson" <philip(dot)johnson(at)atempo(dot)com>, <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: tsearch2 and pdf files |
| Date: | 2006-12-12 07:50:58 |
| Message-ID: | 6BCB9D8A16AC4241919521715F4D8BCEA0FDF9@algol.sollentuna.se |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
> >> 1. Convert PDF to file with e.g xpdf
> >> 2. Insert parsed text to a table of your choice.
> >> 3. Make vectors from the text.
> >
> > Actually, if you're not going to use the headline()
> function, you cna
> > just store it directly in a vector, cutting down on the size
> > requirements.
> What size requirements ?
If you store both text and tsvector, that's going to use up a lot more
space than if you just store the tsvector. With a proper lexer and such,
it will be *more* than twice as large, given that the tsvector will be
smaller than the text.
//Magnus
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Glaesemann | 2006-12-12 07:53:08 | Re: Why DISTINCT ... DESC is slow? |
| Previous Message | Richard Huxton | 2006-12-12 07:43:04 | Re: Why DISTINCT ... DESC is slow? |