advice on schema for multilingual text

From: "Daniel McBrearty" <danielmcbrearty(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: advice on schema for multilingual text
Date: 2006-04-08 15:31:14
Message-ID: 9cf113670604080831s706f3e34y5577183a612218db@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi

I have a website that has multilingual text stored in the database.
Currently I just have a flat table (lets called it "translations"), one row
per text item, one column per language. This works OK, for now, but I am
looking at a redesign. Mostly I want to keep information about the languages
in teh db as well, so that look like an extra table, one row per lang.

The problem that now arises is that there is an expected correlation between
the "languages" and "translations" tables - there should be a row in
languages for each column of translations. AFAIK (could well be wrong, I am
no expert in db theory) there is no real way to express in the ddl. Not
ideal.

An alternative layout would now be to lose the "translations" table, and
have two tables in place; one called "base_text" containing the text to be
translated, and another called, say, "tx_text" which contains the
translations. Each row of "tx_text" references both "base_text" and also
"languages".

This looks like a nice layout, as there is an abstract rep of the languages,
and we lose the "translations" table which can get very wide. It's nice that
the schema doesn't actually change to add a new language.

BUT there are certain invariants that need to be enforced. The main one is
this:

There must only be one row in "site_text" for any given language referencing
a given row of "base_text".

How can I enforce this? Also have to bear in mind that there COULD easily be
two translators working on the same language. It is imperative that they are
not able to simultaneously save a translation of the same base text.

I would also like to have a column in "languages", type boolean, called
"is_base" - this says what the base language is. Here, only ONE row can have
a true value. (Obviously it has default value of false and is not null).

Another invariant now comes in - the language referenced by every row of
"site_text" MUST have "is_base" set to FALSE.

If anyone can tell me how best to express to handle this stuff in
postgresql, I'd be grateful. Also general comments on whether this is a good
schema or not are welcome.

regards, and thanks in advance

Daniel

--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131

Responses

Browse pgsql-general by date

  From Date Subject
Next Message User Roman 2006-04-08 16:37:37 Re: pgAdmin3 question
Previous Message Dave Page 2006-04-08 15:21:16 Re: pgAdmin3 question