From: | Michael Glaesemann <grzm(at)myrealbox(dot)com> |
---|---|
To: | Daniel McBrearty <danielmcbrearty(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: advice on schema for multilingual text |
Date: | 2006-04-10 00:49:35 |
Message-ID: | 81044C87-E88F-48D4-9668-42C5485047B6@myrealbox.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Apr 9, 2006, at 0:31 , Daniel McBrearty wrote:
> Hi
>
> I have a website that has multilingual text stored in the database.
> Currently I just have a flat table (lets called it "translations"),
> one row per text item, one column per language. This works OK, for
> now, but I am looking at a redesign. Mostly I want to keep
> information about the languages in teh db as well, so that look
> like an extra table, one row per lang.
>
> The problem that now arises is that there is an expected
> correlation between the "languages" and "translations" tables -
> there should be a row in languages for each column of translations.
> AFAIK (could well be wrong, I am no expert in db theory) there is
> no real way to express in the ddl. Not ideal.
>
> An alternative layout would now be to lose the "translations"
> table, and have two tables in place; one called "base_text"
> containing the text to be translated, and another called, say,
> "tx_text" which contains the translations. Each row of "tx_text"
> references both "base_text" and also "languages".
>
> This looks like a nice layout, as there is an abstract rep of the
> languages, and we lose the "translations" table which can get very
> wide. It's nice that the schema doesn't actually change to add a
> new language.
>
> BUT there are certain invariants that need to be enforced. The main
> one is this:
>
> There must only be one row in "site_text" for any given language
> referencing a given row of "base_text".
You'd want a unique index on (base_text, language), like:
create table tx_text
(
base_text text not null
references base_text(base_text)
, language text not null
references languages (language)
, primary key (base_text, language)
, tx_text text not null
);
> I would also like to have a column in "languages", type boolean,
> called "is_base" - this says what the base language is. Here, only
> ONE row can have a true value. (Obviously it has default value of
> false and is not null).
Here you want a partial unique index on languages where is_base is true
create table languages
(
language text primary key
, is_base boolean not null
);
create unique index languages_only_one_true_base_idx
on languages (is_base)
where is_base;
> Another invariant now comes in - the language referenced by every
> row of "site_text" MUST have "is_base" set to FALSE.
I can think of two ways to do this: one is to write a trigger to
enforce this, something like:
create function non_base_language_translations_check
returns trigger()
language plpgsql as $$
begin
if exists (
select *
from tx_text
natural join languages
where not is_base
)
then
raise exception
'Language of translation text must not be a base language.';
end if;
end;
$$;
Then use this function on triggers that fire on insert and update on
tx_text and on update on languages.
Another is to include the is_base column in tx_text (with both
language and is_base referencing languages) and use a check
constraint to make sure is_base is false:
create table tx_text
(
base_text text not null
references base_text(base_text)
, language text not null
, is_base boolean not null check not is_base
, foreign key (language, is_base) references languages (language,
is_base)
on update cascade
, primary key (base_text, language)
, tx_text text not null
);
The latter method is denormalized, which is not something I usually
recommend. I don't know how the constraint checking overhead of using
a trigger compares with using the foreign key and check constraint.
A third idea would be to have two languages tables: one with a single
row for the base language and another for the target languages (You
might even do this using table inheritance, though I haven't thought
this completely through). tx_text would reference the
target_languages table (or child table, as the case may be).
I'm sure others have opinions on this as well.
Hope this helps.
Michael Glaesemann
grzm myrealbox com
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2006-04-10 01:03:29 | Re: Is difference between PostgreSQL and mySQL licences |
Previous Message | Frank | 2006-04-09 22:58:20 | Re: pl/perl error |