warning: long, Re: Database design problem: multilingual strings

From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: warning: long, Re: Database design problem: multilingual strings
Date: 2003-06-24 17:54:41
Message-ID: 20030624195441.L9075@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi !

We had this problem in GnuMed (www.gnumed.org). Eventually, we
decided that it is only really solvable automatically for "fixed"
strings. That is, strings that are known at database creation.
User supplied strings need user supplied translations as well.
The translation mechanism works for them just as well but you
depend on the user to supply a translation.

I am attaching the solution we use in GnuMed. The schema file
shows our table setup:

-----------------------------------------------------------
-- =============================================
-- GnuMed fixed string internationalisation
-- ========================================
-- $Source: /cvsroot/gnumed/gnumed/gnumed/server/sql/gmI18N.sql,v $
-- $Id: gmI18N.sql,v 1.14 2003/06/10 09:58:11 ncq Exp $
-- license: GPL
-- author: Karsten(dot)Hilbert(at)gmx(dot)net
-- =============================================
-- Import this script into any GnuMed database you create.

-- This will allow for transparent translation of 'fixed'
-- strings in the database. Simply switching the language in
-- i18n_curr_lang will enable the user to see another language.

-- For details please see the Developer's Guide.
-- =============================================
-- force terminate + exit(3) on errors if non-interactive
\set ON_ERROR_STOP 1
-- =============================================

create table i18n_curr_lang (
id serial primary key,
owner name default CURRENT_USER unique not null,
lang varchar(15) not null
);

comment on table i18n_curr_lang is
'holds the currently selected language per user for fixed strings in the database';

-- =============================================
create table i18n_keys (
id serial primary key,
orig text unique
);

comment on table i18n_keys is
'this table holds all the original strings that need translation so give this to your language teams,
the function i18n() will take care to enter relevant strings into this table,
the table table does NOT play any role in runtime translation activity';

-- =============================================
create table i18n_translations (
id serial primary key,
lang varchar(10),
orig text,
trans text,
unique (lang, orig)
);
create index idx_orig on i18n_translations(orig);

-- =============================================
create function i18n(text) returns text as '
DECLARE
original ALIAS FOR $1;
BEGIN
if not exists(select id from i18n_keys where orig = original) then
insert into i18n_keys (orig) values (original);
end if;
return original;
END;
' language 'plpgsql';

comment on function i18n(text) is
'insert original strings into i18n_keys for later translation';

-- =============================================
create function _(text) returns text as '
DECLARE
orig_str ALIAS FOR $1;
trans_str text;
my_lang varchar(10);
BEGIN
-- no translation available at all ?
if not exists(select orig from i18n_translations where orig = orig_str) then
return orig_str;
end if;

-- get language
select into my_lang lang
from i18n_curr_lang
where
owner = CURRENT_USER;
if not found then
return orig_str;
end if;

-- get translation
select into trans_str trans
from i18n_translations
where
lang = my_lang
and
orig = orig_str;
if not found then
return orig_str;
end if;
return trans_str;
END;
' language 'plpgsql';

comment on function _(text) is
'will return either the input or the translation if it exists';

-- =============================================
create function set_curr_lang(text) returns unknown as '
DECLARE
language ALIAS FOR $1;
BEGIN
if exists(select id from i18n_translations where lang = language) then
delete from i18n_curr_lang where owner = CURRENT_USER;
insert into i18n_curr_lang (lang) values (language);

delete from i18n_curr_lang where owner = (select trim(leading ''_'' from CURRENT_USER));
insert into i18n_curr_lang (lang, owner) values (language, (select trim(leading ''_'' from CURRENT_USER)));

return 1;
else
raise exception ''Cannot set current language to [%]. No translations available.'', language;
return NULL;
end if;
return NULL;
END;
' language 'plpgsql';

comment on function set_curr_lang(text) is
'set preferred language:
- for "current user" and "_current_user"
- only if translations for this language are available';

-- =============================================
create function set_curr_lang(text, name) returns unknown as '
DECLARE
language ALIAS FOR $1;
username ALIAS FOR $2;
BEGIN
if exists(select id from i18n_translations where lang = language) then
delete from i18n_curr_lang where owner = username;
insert into i18n_curr_lang (owner, lang) values (username, language);
return 1;
else
raise exception ''Cannot set current language to [%]. No translations available.'', language;
return NULL;
end if;
return NULL;
END;
' language 'plpgsql';

comment on function set_curr_lang(text, name) is
'set language to first argument for the user named in
the second argument if translations are available';

-- =============================================
-- there's most likely no harm in granting select to all
GRANT SELECT on
i18n_curr_lang,
i18n_keys,
i18n_translations
TO group "gm-public";

-- users need to be able to change this
-- FIXME: more groups need to have access here
GRANT SELECT, INSERT, UPDATE, DELETE on
i18n_curr_lang,
i18n_curr_lang_id_seq
TO group "_gm-doctors";

-- =============================================
-- do simple schema revision tracking
INSERT INTO gm_schema_revision (filename, version) VALUES('$RCSfile: gmI18N.sql,v $', '$Revision: 1.14 $');

-----------------------------------------------------------

Then, there's the relevant part from our developer's guide:

-----------------------------------------------------------

GNUMed:
Prev Chapter 3. Coding Guidelines Next
-------------------------------------------------------------------------------------------------------------------------------

3.7. Backend I18N for non-dynamic ("fixed") strings in the backend.

3.7.1. Introduction

In enumerations we often see fixed strings being stored in the backend. There's no good way a client can translate those to the
local language. Nevertheless we need to provide a translation. Consider the following example:

We want a table that enumerates family relations. The obvious table design would be

+-----------------------------------------------------------------------------------------------------------------------------+
|create table member ( |
| id serial primary key, |
| name varchar(20) |
|); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

Other tables will obviously reference table.id but we want the frontend to be able to show a spelled-out name for the family
member type. A simple

+-----------------------------------------------------------------------------------------------------------------------------+
| select name from member where id='some ID'; |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

will, however, always return the version that was put into the database in upon installation. Typically this would be done by
statements such as

+-----------------------------------------------------------------------------------------------------------------------------+
| insert into member(name) values('sister'); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

Hence, queries would always return the English 'sister'.

PostgreSQL does not directly support localization of database content. Therefor the following scheme has been devised:

At the top of your psql script schema definition files include the file gnumed/server/gmI18N.sql which provides a localization
infrastructure. For your convenience, just copy/paste the following two lines:

+-----------------------------------------------------------------------------------------------------------------------------+
|-- do fixed string i18n()ing |
|\i gmI18N.sql |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

The database will then contain several new tables starting with i18n_* and a few functions.

3.7.1.1. i18n_curr_lang

Here you can/should set the currently preferred language on a per-user basis. Only one language per user is allowed at any one
time. Switching the language here will enable the user to see another translation (if provided).

3.7.1.2. i18n_keys

This is just a convenience table listing all the strings that need translations. Dump this and give to translation teams. A
tool will be provided to make use of this table. It is of no importance to the actual online translation process.

3.7.1.3. i18n_translations

This is where translations actually live. As in gettext the original string is used as the key and the language code (which
should correspond with those used in i18n_curr_lang) as a discrimator.

3.7.2. How to translate strings

Make your string insertions aware of i18n issues. This is what the function i18n(text) is for. Regarding the above example
insertions need to be rewritten from

+-----------------------------------------------------------------------------------------------------------------------------+
| insert into member(name) values('sister'); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

to

+-----------------------------------------------------------------------------------------------------------------------------+
| |
| insert into member(name) values(i18n('sister')); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

The i18n() function will take care of inserting the string 'sister' into the i18n_keys table where translation teams will find
it and provide a translation. Later on, when a translation is available it will be inserted into i18n_translations:

+-----------------------------------------------------------------------------------------------------------------------------+
| insert into i18n_translations(lang, orig, trans) values ('de_DE', 'sister', 'Schwester'); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

3.7.3. How to make your tables translate strings

Now that we have translations available in i18n_translations we can start making our tables aware of them. Unfortunately,
PostgreSQL does not yet support column-level select rules. We therefor have to create views wrapping the original tables. Note
that the original table will still be useable. Original tables which have translated strings should be named "_tablename" while
views translating them should be named "v_i18n_tablename". Going back to our previous example, the table

+-----------------------------------------------------------------------------------------------------------------------------+
|create table member ( |
| id serial primary key, |
| name varchar(20) |
|); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

should be renamed to "_member" and a view created on it:

+-----------------------------------------------------------------------------------------------------------------------------+
| |
|create view v_i18n_member (id, name) as |
| select _member.id, _(_member.name) |
| from member; |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

By making sure to use the same column names in the view we minimize frontend coding changes.

You will notice how the function _() is used to access the translation for the attribute "name". This function is provided by
gmI18N.sql and provides nearly the same functionality as gettext.gettext() which is often aliased to _() in Python and other
languages. It will return a translation based on the user's currently selected language in i18n_curr_lang and the translation
for that language in i18n_translations using the original string as the key.

If no translation is available for a given string _() will return the original string. Also, if the user did not select a
language in i18n_curr_lang the original is returned.

3.7.4. How to make the frontend use translated strings

All the backend infrastructure is in place now so we can make frontends aware of translated strings. The first step is to make
frontends use the v_i18n_* views instead of the tables. If we fail to do that everything will still work. We just won't get
translations :-)

The second step is to make sure the current user has a language selected in i18n_curr_lang. Use something like

+-----------------------------------------------------------------------------------------------------------------------------+
|insert into i18n_curr_lang(lang) values ('de_DE'); |
| |
+-----------------------------------------------------------------------------------------------------------------------------+

This will default to the CURRENT_USER. The actual value need not conform to anything in particular. It can be "Klingon" for
that matter. Make sure then to have "Klingon" translations available in i18n_translations.

This i18n technique does not take care of strings that are inserted into the database dynamically (at runtime). It only makes
sense for strings that are inserted once. Such strings are often used for enumerations.

All this crap isn't necessary anymore once PostgreSQL supports native internationalization of 'fixed' strings.

-------------------------------------------------------------------------------------------------------------------------------
Prev Home Next
Client Internationalization / Up Interacting with the Backend
Localization

-----------------------------------------------------------

There are known drawbacks but this is what we currently use.
Hope that helps !

Karsten Hilbert, MD
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Carlos Oliva 2003-06-24 17:56:34 Re: Eliminating start error message: "unary operator
Previous Message Russ Brown 2003-06-24 17:44:36 Re: Database design problem: multilingual strings