Quick Links

Re: Replacement for Oracle Text

From:	Josh berkus <josh(at)agliodbs(dot)com>
To:	s d <daku(dot)sandor(at)gmail(dot)com>, Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Replacement for Oracle Text
Date:	2016-02-19 17:28:41
Message-ID:	56C750C9.4000500@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 02/19/2016 05:49 AM, s d wrote:
> On 19 February 2016 at 14:19, Bruce Momjian <bruce(at)momjian(dot)us
> <mailto:bruce(at)momjian(dot)us>> wrote:
>
> I wonder if PLPerl could be used to extract the words from a PDF
> document and create a tsvector column from it.
>
>
> I don't know about PLPerl(I'm pretty sure it could be used for this
> purpose, though.). On the other hand I've written code for this in
> Python which should be easy to adapt for PLPython, if necessary.

I'd swear someone already built something to do this. All you need is a
library which reads PDF and transforms it into text, and then you can
FTS it. I know there's a module for OpenOffice docs somewhere as well,
but heck if I can remember where.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

In response to

Re: Replacement for Oracle Text at 2016-02-19 13:49:16 from s d

Responses

Re: Replacement for Oracle Text at 2016-02-19 19:23:34 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jeff Janes	2016-02-19 19:18:23	Re: Monitoring and insight into NOTIFY queue
Previous Message	Don Parris	2016-02-19 16:12:28	Re: Charlotte Postgres User Group