From: | Steffen Macke <sdteffen(at)web(dot)de> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Correct Allocation of UNICODE string in C |
Date: | 2003-07-31 11:46:02 |
Message-ID: | 3F29017A.1070400@web.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello All,
I'm struggling with the correct allocation of a
UNICODE text in a C function for PostgreSQL.
The strings are sometimes truncated, sometimes garbage
bytes are added at the end.
Is there a code example, that takes a UNICODE (UTF-8) text
of unknown length, allocates the PostgreSQL structure and copies
the data correctly?
You find the function in question below,
the full sources are available from
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/dcmms/arabic/
The problem is that the arabic_reshape() function will return texts
that are longer or shorter than the original text. In the PostgreSQL
sources I just found examples, where texts are copied - no example
how to allocate a "fresh" UTF-8 string.
Best Regards,
Steffen Macke
text *
shape_arabic(text *t)
{
glong items_read;
glong items_written;
long len;
long i;
text *new_t;
text *utf8_t;
len = g_utf8_strlen(VARDATA(t), -1);
new_t = (text *) palloc(VARHDRSZ+(len*4)+4);
VARATT_SIZEP(new_t) = VARHDRSZ+(len*4)+4;
utf8_t = (text *) palloc(VARSIZE(t)+4);
VARATT_SIZEP(utf8_t) = VARSIZE(t)+4;
memset(VARDATA(new_t), 0, (len*4)+4);
memset(VARDATA(utf8_t), 0, VARSIZE(utf8_t)-VARHDRSZ);
len = len*2;
arabic_reshape(&len, VARDATA(t), VARDATA(new_t), ar_unifont);
g_ucs4_to_utf8(VARDATA(new_t), VARDATA(utf8_t), -1, &items_read,
&items_written);
len = g_utf8_strlen(VARDATA(utf8_t), -1);
return utf8_t;
}
From | Date | Subject | |
---|---|---|---|
Next Message | Rory Campbell-Lange | 2003-07-31 11:56:51 | Re: [GENERAL] interesting PHP/MySQL thread |
Previous Message | Tambet Matiisen | 2003-07-31 11:42:33 | COPY and domains |