electronic-izing unicode texts

From: "A(dot) Cropi" <cropister(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: electronic-izing unicode texts
Date: 2005-04-20 18:27:13
Message-ID: 3076735805042011276a653103@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

hi everyone,

i have several hundred books that were typed using unicode and would
liek to put them into a database so that i can perform searches on
them. how does one design a database for this?

i was planning to make a table with these columns: ID, Title, Authors,
Publishers, Content

the Content column will contain the entire book in unicode; then, to
find out which books contain the string "blah" i'd just do somethig
like select * from table where content contains "blah"

my problem is: (1) i have never done database work before (2) i do not
have any experience in anything like this

my objectives: (1) allow users to make query through the web (i guess
i will do this via PHP interacting with the postgresql)

my questions are: (1) is it reasonable to put the bookcontent into the
CONTENT column? (2) the content of the book can be very long (some of
them have nearly 1 milloin words), so, what kind of considerations
should i be making? (3) how should i design something like this? there
must be someone outthere that has done somethign similar to this.. if
so, please share your experiences.

note: these texts are not copyrighted.. so i do not have to worry
about the legal problems.

tia

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Murphy 2005-04-20 19:24:04 Re: Strange interaction of union and expressions
Previous Message David Gagnon 2005-04-20 18:05:37 Re: Regular expression. How to disable ALL meta-character