From: | Chris Travers <chris(dot)travers(at)gmail(dot)com> |
---|---|
To: | Stephen Davies <sdavies(at)sdc(dot)com(dot)au> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, s d <daku(dot)sandor(at)gmail(dot)com>, Postgresql General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Replacement for Oracle Text |
Date: | 2016-02-20 05:51:49 |
Message-ID: | CAKt_ZftXtFD-8BddWc2QwgDc5dC+Xc2_FwO7vpu0pKYRXA8jLA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
A more general way would be to have a function which takes a pdf in and
returns the text. Mark it immutable.
Then you can index the output of converting that text to a tsvector.
You may want to pull everything into a tsvector column for ease of review,
but functional indexes also make that less important
On Sat, Feb 20, 2016 at 1:10 AM, Stephen Davies <sdavies(at)sdc(dot)com(dot)au> wrote:
> On 20/02/16 00:24, Bruce Momjian wrote:
>
>> On Fri, Feb 19, 2016 at 02:49:16PM +0100, s d wrote:
>>
>>> On 19 February 2016 at 14:19, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> > Ah, no. That's not possible
>>> >
>>> >
>>> > ...not possible, Yet.
>>> >
>>> > PostgreSQL grows by adding the features people need and its
>>> changing
>>> rapidly.
>>>
>>> I wonder if PLPerl could be used to extract the words from a PDF
>>> document and create a tsvector column from it.
>>>
>>> I don't know about PLPerl(I'm pretty sure it could be used for this
>>> purpose,
>>> though.). On the other hand I've written code for this in Python which
>>> should
>>> be easy to adapt for PLPython, if necessary.
>>>
>>
>> Right, so you would write a PL/Perl or PL/Python trigger function that
>> would populate the tsvector column on every INSERT or UPDATE.
>>
>> FWIW, I just use pdftotext in my CGI.
>
> --
>
> =============================================================================
> Stephen Davies Consulting P/L Phone: 08-8177
> 1595
> Adelaide, South Australia. Mobile:040 304
> 0583
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more
From | Date | Subject | |
---|---|---|---|
Next Message | John R Pierce | 2016-02-20 06:12:17 | Re: JDBC behaviour |
Previous Message | Sridhar N Bamandlapally | 2016-02-20 04:40:46 | Re: JDBC behaviour |