Re: tsearch2 and pdf files

From: Henrik Zagerholm <henke(at)mac(dot)se>
To: Philip Johnson <philip(dot)johnson(at)atempo(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: tsearch2 and pdf files
Date: 2006-12-11 22:06:07
Message-ID: 179575E2-2F49-427F-9961-CEE966187950@mac.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

1. Convert PDF to file with e.g xpdf
2. Insert parsed text to a table of your choice.
3. Make vectors from the text.

Cheers,

11 dec 2006 kl. 18:23 skrev Philip Johnson:

> Do you know what kind of table should I use ?
> Is there a shell script or a php script that does the work ?
>
> regards
>
>> -----Message d'origine-----
>> De : pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
>> owner(at)postgresql(dot)org] De la part de Hannes Dorbath
>> Envoyé : lundi 11 décembre 2006 12:21
>> À : pgsql-general(at)postgresql(dot)org
>> Objet : Re: [GENERAL] tsearch2 and pdf files
>>
>> You just need software that extracts the text from it. Search
>> google for
>> pdf2txt and others. Printer drivers that try to get text from
>> anything
>> are available as well.
>>
>>
>> On 11.12.2006 11:41, Philip Johnson wrote:
>>> I'm using Postgresql 8.1.5
>>>
>>> Tsearch2 is installed and runs well
>>>
>>> I'd like to use tsearch2 to index PDF files.
>>>
>>> Do someone has a detailed process to implement that?
>>
>>
>> --
>> Regards,
>> Hannes Dorbath
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2006-12-11 22:08:51 Re: tsearch2 and pdf files
Previous Message John McCawley 2006-12-11 21:47:07 Re: Status of SSL encryption in ODBC driver