From: | Reece Hart <reece(at)harts(dot)net> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | Michael Enke <michael(dot)enke(at)wincor-nixdorf(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | text datum VARDATA and strings |
Date: | 2006-08-14 18:04:30 |
Message-ID: | 1155578671.4158.45.camel@tallac.gene.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-general |
Michael Enke recently asked in pgsql-bugs about VARDATA and C strings
(BUG #2574: C function: arg TEXT data corrupt). Since that's not a bug,
I've moved this follow-up to pgsql-general.
On Mon, 2006-08-14 at 11:27 -0400, Tom Lane wrote:
> The usual way to get a C string from a TEXT datum is to call textout,
> eg
> str = DatumGetCString(DirectFunctionCall1(textout, datumval));
Yikes! I've been accessing VARDATA text data like Michael for years
(code below). I account for length and don't expect null-termination,
but I don't use anything like Tom's suggestion above. (I always try to
do what Tom says because that usually hurts less.)
I have three questions:
1) I based everything I did on examples lifted nearly verbatim from a
7.x manual, and I bet Michael did similarly. I've never heard of
DatumGetCString, DirectFunctionCall1, or textout. Are these and other
treasures documented somewhere?
2) Does DatumGetCString(DirectFunctionCall1(textout, datumval)) do
something other than null terminate a string? All of the strings are
from [-A-Z0-1*]; server_encoding has been either SQL_ASCII or UTF8 in
case that's relevant.
3) Is there any reason to believe that the code below is problematic?
Thanks,
Reece
#include <postgres.h>
#include <fmgr.h>
#include <ctype.h>
#include <string.h>
static char* clean_sequence(const char* in, int32 n);
PG_FUNCTION_INFO_V1(pg_clean_sequence);
Datum pg_clean_sequence(PG_FUNCTION_ARGS)
{
text* t0; /* in */
text* t1; /* out */
char* tmp;
int32 tmpl;
if ( PG_ARGISNULL(0) )
{ PG_RETURN_NULL(); }
t0 = PG_GETARG_TEXT_P(0);
tmp = clean_sequence( VARDATA(t0), VARSIZE(t0)-VARHDRSZ );
tmpl = (int32) strlen(tmp);
/* copy temp sequence into new pg variable */
t1 = (text*) palloc( tmpl + VARHDRSZ );
if (!t1)
{ elog( ERROR, "couldn't palloc (%d bytes)", tmpl+VARHDRSZ ); }
memcpy(VARDATA(t1),tmp,tmpl);
VARATT_SIZEP(t1) = tmpl + VARHDRSZ;
pfree(tmp);
PG_RETURN_TEXT_P(t1);
}
/* clean_sequence -- strip non-IUPAC symbols
The intent is to strip non-sequence data which might result from
copy-pasting a fasta file or some such.
in: char*, length
out: char*, |out|<=length, NULL-TERMINATED
out is palloc'd memory; caller must free
allow chars from IUPAC std 20
+ selenocysteine (U) + ambiguity (BZX) + gap (-) + stop (*)
*/
#define isseq(c) ( ((c)>='A' && (c)<='Z' && (c)!='J' && (c)!='O') \
|| ((c)=='-') \
|| ((c)=='*') )
char* clean_sequence(const char* in, int32 n) {
char* out;
char* oi;
int32 i;
out = palloc( n + 1 ); /* w/null */
if (!out)
{ elog( ERROR, "couldn't palloc (%d bytes)", n+1 ); }
for( i=0, oi=out; i<=n-1; i++ ) {
char c = toupper(in[i]);
if ( isseq(c) ) {
*oi++ = c;
}
}
*oi = '\0';
return(out);
}
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-08-14 19:51:22 | Re: text datum VARDATA and strings |
Previous Message | Tom Lane | 2006-08-14 15:29:32 | Re: no native spinlock support on os x 10.4.7 |
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Ribe | 2006-08-14 18:39:48 | Re: Best approach for a "gap-less" sequence |
Previous Message | Jaime Casanova | 2006-08-14 17:07:20 | Re: problem with a dropped database |