Re: Unicode base char oder sowas?

From: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: "pgsql-de-allgemein\(at)postgresql(dot)org" <pgsql-de-allgemein(at)postgresql(dot)org>
Subject: Re: Unicode base char oder sowas?
Date: 2007-03-13 12:44:13
Message-ID: 876495uxcy.fsf@gate450.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-de-allgemein

Markus Schiltknecht writes:

> ..das geht aber sicher effizienter! Ist Unicode nicht so organisiert,
> dass ich ziemlich direkt vom Unicode auf eben dieses Basis Zeichen
> schliessen kann?
>
> Auch C libraries willkommen, ich bin eh in C unterwegs...

Ich hatte vor einiger Zeit mal die librecode (LGPL) in's backend
gelinkt, um an hinreichend schnelles quoted-printable
encoding/decoding zu kommen.

,----[ (info "(recode)flat") ]
| ASCII without diacritics nor underline
| ======================================
| [...]
| This code is ASCII expunged of all diacritics and underlines[...]
`----

Nachdem ich PG_MODULE_MAGIC nachgerüstet habe, scheint der Code auch
mit 8.2 zu tun:

,----[ *SQL* ]
| scratch=# create function recode (bytea, cstring) returns bytea as 'recode4pg' language c immutable;
| CREATE FUNCTION
| scratch=# select recode('łçäüöèíéáóàùúìú', 'utf-8..flat');
| recode
| -----------------
| lcauoeieaoauuiu
| (1 row)
`----

Code:

--8<---------------cut here---------------start------------->8---
#include "postgres.h"
#include "funcapi.h"

#include <recodext.h>

const char *program_name = __FILE__;

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(recode);

Datum
recode(PG_FUNCTION_ARGS)
{
bytea *in = PG_GETARG_BYTEA_P(0);
char *what = PG_GETARG_CSTRING(1);
bytea *out;
bool success;
RECODE_OUTER outer = recode_new_outer (true);
RECODE_REQUEST request = recode_new_request (outer);

success = recode_scan_request (request, what);

if (!success) {
recode_delete_request (request);
recode_delete_outer (outer);
elog(ERROR, "%s: error scanning request: %s", __FILE__, what);
}

{
RECODE_TASK task = recode_new_task (request);

task->input.limit = VARDATA(in) + VARSIZE(in) - VARHDRSZ;
task->input.buffer = task->input.cursor = VARDATA(in);
task->fail_level = RECODE_UNTRANSLATABLE;

success = recode_perform_task (task);

if (!success) {
recode_delete_task(task);
recode_delete_request (request);
recode_delete_outer (outer);
elog(ERROR, "%s: error(s) while recoding string using %s", __FILE__, what);
} else {
int len = task->output.cursor - task->output.buffer;
out = palloc(VARHDRSZ + len);
memcpy(VARDATA(out), task->output.buffer, len);
VARATT_SIZEP(out) = VARHDRSZ + len;
}
recode_delete_task(task);
}
recode_delete_request (request);
recode_delete_outer (outer);
PG_RETURN_BYTEA_P(out);
}
--8<---------------cut here---------------end--------------->8---

HTH
Andreas

In response to

Browse pgsql-de-allgemein by date

  From Date Subject
Next Message Andreas 'ads' Scherbaum 2007-03-22 20:02:54 == Wöchentlicher PostgreSQL Newsletter - 18. März 2007 ==
Previous Message Peter Eisentraut 2007-03-13 12:22:34 Re: Unicode base char oder sowas?