Re: Indexing MS/Open Office and PDF documents

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander(dot)Bagerman(at)cognizant(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Indexing MS/Open Office and PDF documents
Date: 2012-03-15 21:12:56
Message-ID: 1331845976.26127.16.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2012-03-16 at 01:57 +0530, Alexander(dot)Bagerman(at)cognizant(dot)com
wrote:
> Hi,
>
> We are looking to use Postgres 9 for the document storing and would
> like to take advantage of the full text search capabilities. We have
> hard time identifying MS/Open Office and PDF parsers to index stored
> documents and make them available for text searching. Any advice would
> be appreciated.

The first step is to find a library that can parse such documents, or
convert them to a format that can be parsed.

After you do that, PostgreSQL allows you to load arbitrary code as
functions (in various languages), so that will allow you to make use of
the library. It's hard to give more specific advice until you've found
the library you'd like to work with.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2012-03-15 21:17:47 Re: Indexing MS/Open Office and PDF documents
Previous Message Ivan 2012-03-15 20:58:44 Re: undo update