Re: advice on schema for multilingual text

From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: Daniel McBrearty <danielmcbrearty(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: advice on schema for multilingual text
Date: 2006-04-10 00:49:35
Message-ID: 81044C87-E88F-48D4-9668-42C5485047B6@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Apr 9, 2006, at 0:31 , Daniel McBrearty wrote:

> Hi
>
> I have a website that has multilingual text stored in the database.
> Currently I just have a flat table (lets called it "translations"),
> one row per text item, one column per language. This works OK, for
> now, but I am looking at a redesign. Mostly I want to keep
> information about the languages in teh db as well, so that look
> like an extra table, one row per lang.
>
> The problem that now arises is that there is an expected
> correlation between the "languages" and "translations" tables -
> there should be a row in languages for each column of translations.
> AFAIK (could well be wrong, I am no expert in db theory) there is
> no real way to express in the ddl. Not ideal.
>
> An alternative layout would now be to lose the "translations"
> table, and have two tables in place; one called "base_text"
> containing the text to be translated, and another called, say,
> "tx_text" which contains the translations. Each row of "tx_text"
> references both "base_text" and also "languages".
>
> This looks like a nice layout, as there is an abstract rep of the
> languages, and we lose the "translations" table which can get very
> wide. It's nice that the schema doesn't actually change to add a
> new language.
>
> BUT there are certain invariants that need to be enforced. The main
> one is this:
>
> There must only be one row in "site_text" for any given language
> referencing a given row of "base_text".

You'd want a unique index on (base_text, language), like:

create table tx_text
(
base_text text not null
references base_text(base_text)
, language text not null
references languages (language)
, primary key (base_text, language)
, tx_text text not null
);

> I would also like to have a column in "languages", type boolean,
> called "is_base" - this says what the base language is. Here, only
> ONE row can have a true value. (Obviously it has default value of
> false and is not null).

Here you want a partial unique index on languages where is_base is true
create table languages
(
language text primary key
, is_base boolean not null
);

create unique index languages_only_one_true_base_idx
on languages (is_base)
where is_base;

> Another invariant now comes in - the language referenced by every
> row of "site_text" MUST have "is_base" set to FALSE.

I can think of two ways to do this: one is to write a trigger to
enforce this, something like:

create function non_base_language_translations_check
returns trigger()
language plpgsql as $$
begin
if exists (
select *
from tx_text
natural join languages
where not is_base
)
then
raise exception
'Language of translation text must not be a base language.';
end if;
end;
$$;

Then use this function on triggers that fire on insert and update on
tx_text and on update on languages.

Another is to include the is_base column in tx_text (with both
language and is_base referencing languages) and use a check
constraint to make sure is_base is false:

create table tx_text
(
base_text text not null
references base_text(base_text)
, language text not null
, is_base boolean not null check not is_base
, foreign key (language, is_base) references languages (language,
is_base)
on update cascade
, primary key (base_text, language)
, tx_text text not null
);

The latter method is denormalized, which is not something I usually
recommend. I don't know how the constraint checking overhead of using
a trigger compares with using the foreign key and check constraint.

A third idea would be to have two languages tables: one with a single
row for the base language and another for the target languages (You
might even do this using table inheritance, though I haven't thought
this completely through). tx_text would reference the
target_languages table (or child table, as the case may be).

I'm sure others have opinions on this as well.

Hope this helps.

Michael Glaesemann
grzm myrealbox com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2006-04-10 01:03:29 Re: Is difference between PostgreSQL and mySQL licences
Previous Message Frank 2006-04-09 22:58:20 Re: pl/perl error