From: | Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org> |
---|---|
To: | Markus Schiltknecht <markus(at)bluegap(dot)ch> |
Cc: | "pgsql-de-allgemein\(at)postgresql(dot)org" <pgsql-de-allgemein(at)postgresql(dot)org> |
Subject: | Re: Unicode base char oder sowas? |
Date: | 2007-03-13 12:44:13 |
Message-ID: | 876495uxcy.fsf@gate450.dyndns.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-de-allgemein |
Markus Schiltknecht writes:
> ..das geht aber sicher effizienter! Ist Unicode nicht so organisiert,
> dass ich ziemlich direkt vom Unicode auf eben dieses Basis Zeichen
> schliessen kann?
>
> Auch C libraries willkommen, ich bin eh in C unterwegs...
Ich hatte vor einiger Zeit mal die librecode (LGPL) in's backend
gelinkt, um an hinreichend schnelles quoted-printable
encoding/decoding zu kommen.
,----[ (info "(recode)flat") ]
| ASCII without diacritics nor underline
| ======================================
| [...]
| This code is ASCII expunged of all diacritics and underlines[...]
`----
Nachdem ich PG_MODULE_MAGIC nachgerüstet habe, scheint der Code auch
mit 8.2 zu tun:
,----[ *SQL* ]
| scratch=# create function recode (bytea, cstring) returns bytea as 'recode4pg' language c immutable;
| CREATE FUNCTION
| scratch=# select recode('łçäüöèíéáóàùúìú', 'utf-8..flat');
| recode
| -----------------
| lcauoeieaoauuiu
| (1 row)
`----
Code:
--8<---------------cut here---------------start------------->8---
#include "postgres.h"
#include "funcapi.h"
#include <recodext.h>
const char *program_name = __FILE__;
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
PG_FUNCTION_INFO_V1(recode);
Datum
recode(PG_FUNCTION_ARGS)
{
bytea *in = PG_GETARG_BYTEA_P(0);
char *what = PG_GETARG_CSTRING(1);
bytea *out;
bool success;
RECODE_OUTER outer = recode_new_outer (true);
RECODE_REQUEST request = recode_new_request (outer);
success = recode_scan_request (request, what);
if (!success) {
recode_delete_request (request);
recode_delete_outer (outer);
elog(ERROR, "%s: error scanning request: %s", __FILE__, what);
}
{
RECODE_TASK task = recode_new_task (request);
task->input.limit = VARDATA(in) + VARSIZE(in) - VARHDRSZ;
task->input.buffer = task->input.cursor = VARDATA(in);
task->fail_level = RECODE_UNTRANSLATABLE;
success = recode_perform_task (task);
if (!success) {
recode_delete_task(task);
recode_delete_request (request);
recode_delete_outer (outer);
elog(ERROR, "%s: error(s) while recoding string using %s", __FILE__, what);
} else {
int len = task->output.cursor - task->output.buffer;
out = palloc(VARHDRSZ + len);
memcpy(VARDATA(out), task->output.buffer, len);
VARATT_SIZEP(out) = VARHDRSZ + len;
}
recode_delete_task(task);
}
recode_delete_request (request);
recode_delete_outer (outer);
PG_RETURN_BYTEA_P(out);
}
--8<---------------cut here---------------end--------------->8---
HTH
Andreas
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas 'ads' Scherbaum | 2007-03-22 20:02:54 | == Wöchentlicher PostgreSQL Newsletter - 18. März 2007 == |
Previous Message | Peter Eisentraut | 2007-03-13 12:22:34 | Re: Unicode base char oder sowas? |