Re: Searching BLOB

From: "John Sidney-Woollett" <johnsw(at)wardbrook(dot)com>
To: "James Watson" <jdwatson1(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Searching BLOB
Date: 2006-06-13 12:28:18
Message-ID: 14323.195.152.219.3.1150201698.squirrel@mercury.wardbrook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Save yourself some effort and use Lucene to index a directory of your 300
word documents. I'm pretty sure that Lucene includes an extension to read
Word documents, and you can use PDFBox to read/write PDF files. Marrying
the searching and displaying of results to your web application should be
trivial since you're wanting to use java anyway. Lucene has full character
set support and is blindingly fast

If you're looking for a solution to this problem using Postgres, then
you'll be creating a ton extra work for yourself. If you're wanting to
learn more about postgres, then maybe it'll be worthwhile.

John

James Watson said:
> Hi,
> I am not 100% sure what the best solution would be, so I was hoping
> someone could point me in the right direction.
>
> I usually develop in MS tools, such as .net, ASP, SQL Server etc...,
> but I really want to expand my skillset and learn as much about
> Postgresqlas
> possible.
>
> What I need to do, is design a DB that will index and store
> approximately 300 word docs, each with a size no more that 1MB. They
> need to be able to seacrh the word documents for keyword/phrases to be
> able to identify which one to use.
>
> So, I need to write 2 web interfaces. A front end and a back end. Front
> end for the users who will search for their documents, and a backend
> for an admin person to upload new/ammended documents to the DB to be
> searchable.
>
> NOW..... I could do this in the usual MS tools that I work with using
> BLOB's and the built in Full-text searching that comes with SQL Server,
> but i don't have these to work with at the mometn. I am working with
> PostGres & JSP
> pages
>
> What I was hoping someone could help me out with was identifying the
> best possible solution to use.
>
> 1. How can I store the word doc's in the DB, would it be best to use a
> BLOB data type?
>
> 2. Does Postgres support full text searching of a word document once it
> is loaded into the BLOB column & how would this work? Would I have to
> unload each BLOB object, convert it back to text to search, or does
> Postgres have the ability to complete the full-text search of a BLOB,
> like MSSQL Server & Oracle do?
>
> 3. Is there a way to export the Word Doc From the BLOB colum and dump
> it into a PDF format (I guess I am asking if someone has seen or
> written a PDF generator script/storedProc for Postgres)?
>
> If someone could help me out, it would be greatly appreciated.
>
> cheers,
> James
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2006-06-13 13:04:51 Re: Error: Server doesn't listen
Previous Message Florian G. Pflug 2006-06-13 12:27:25 Re: Help speeding up this query - maybe need another index?