Initial ugly reverse-translator

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: PgSQL General ML <pgsql-general(at)postgresql(dot)org>
Subject: Initial ugly reverse-translator
Date: 2008-04-19 14:52:07
Message-ID: 480A0717.2090609@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all

I've chucked together a quick and very ugly script to read the .po files
from the backend and produce a simple database to map translations back
to the original strings and their source locations. It's a very dirty
.po reader that doesn't try to parse the format properly, but it does
the job. There's no search interface yet, this is just intended to get
to the point where useful queries can be run on the data and the most
effective queries can be figured out.

Right now queries against errors without format-string substitutions
work ok, if not great, with pg_tgrm based lookups, eg:

test=# SELECT message_id, is_format, message, translation
test-# FROM po_translation INNER JOIN po_message ON
po_translation.message_id = po_message.id INNER JOIN
test-# WHERE 'el valor de array debe comenzar con «{» o información de
dimensión' % translation
test-# ORDER BY similarity('el valor de array debe comenzar con «{» o
información de dimensión', translation) desc;

message_id | is_format |
message | translation
------------+-----------+------------------------------------------------------------+---------------------------------------------------------------------
4470 | f | array value must start with \"{\" or dimension
information | el valor de array debe comenzar con «{» o información de
dimensión"
4437 | f | argument must be empty or one-dimensional
array | el argumento debe ser vacío o un array unidimensional"
(2 rows)

test=# SELECT DISTINCT srcfile, srcline FROM po_location WHERE
message_id = 4437;
srcfile | srcline
-------------------------------------------------------------+---------
/a/pgsql/HEAD/pgtst/src/backend/utils/adt/array_userfuncs.c | 121
utils/adt/array_userfuncs.c | 99
utils/adt/array_userfuncs.c | 121
utils/adt/array_userfuncs.c | 124
(4 rows)

It's also useful for format-string based messages, but more thought is
needed on how best to handle them. A LIKE query using the format-string
message as the pattern (after converting the pattern syntax to SQL
style) would be (a) slow and (b) very sensitive to formatting and other
variation. I haven't spent any time on that bit yet, but if anybody has
any ideas I'd be glad to hear them.

Anyway, the initial version of the script can be found at:

http://www.postnewspapers.com.au/~craig/poread.py

Consider running it in a new database as it's extremely poorly tested,
written very quickly and dirtily, and contains DDL commands. The schema
can be found inline in the script. The psycopg2 Python module is
required, and the pg_tgrm contrib module must be loaded in the database
you use the script with.

Once I'm happy with the queries for translation lookups I'll bang
together a quick web interface for the script and clean it up. At that
point it might start being useful to people here.

--
Craig Ringer

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2008-04-19 15:06:40 Re: very slow updates in 8.3?
Previous Message x asasaxax 2008-04-19 14:47:19 SQL error