Re: Weird behaviour of C extension function

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Amaury Bouchard <amaury(dot)bouchard(at)anasen(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Weird behaviour of C extension function
Date: 2020-04-24 13:27:12
Message-ID: 24e2c22c9f1f1bc38c5a88d3e0c8fef5b980cd9a.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2020-04-24 at 14:53 +0200, Amaury Bouchard wrote:
> I have a really strange behaviour with a C function, wich gets a text as parameter.
> Everything works fine when I call the function directly, giving a text string as parameter. But a problem occurs when I try to read data from a table.
>
> To illustrate the problem, I stripped the function down to the minimum. The source code is below, but first, here is the behaviour :
>
> Direct call
> -----------
> > select passthru('hello world!'), passthru('utf8 çhàràtérs'), passthru(' h3110 123 456 ');
> INFO: INPUT STRING: 'hello world!' (12)
> INFO: INPUT STRING: 'utf8 çhàràtérs' (18)
> INFO: INPUT STRING: ' h3110 123 456 ' (15)
>
> (as you can see, the log messages show the correct input, with the number of bytes between parentheses)
>
> Reading a table data
> --------------------
> > create table mytable ( str text);
> > insert into mytable (str) values ('hello world!'), ('utf8 çhàràtérs'), (' h3110 123 456 ');
> > select passthru(str) from mytable;
> INFO: INPUT STRING: 'lo world!' (12)
> INFO: INPUT STRING: '8 çhàràtérs' (18)
> INFO: INPUT STRING: '110 123 456 �
> ' (15)
> INFO: INPUT STRING: '��' (5)
> INFO: INPUT STRING: '' (3)
>
> There, you can see that the pointer seems to be shifted 3 bytes farther.
>
> Do you have any clue for this strange behaviour?
>
>
> The source code
> ---------------
>
> #include "postgres.h"
> #include "fmgr.h"
> #include "funcapi.h"
>
> // PG module init
> #ifdef PG_MODULE_MAGIC
> PG_MODULE_MAGIC;
> #endif
> void _PG_init(void);
> Datum passthru(PG_FUNCTION_ARGS);
> PG_FUNCTION_INFO_V1(passthru);
>
> void _PG_init() {
> }
>
> Datum passthru(PG_FUNCTION_ARGS) {
> // get the input string
> text *input = PG_GETARG_TEXT_PP(0);
> char *input_pt = (char*)VARDATA(input);
> int32 input_len = VARSIZE_ANY_EXHDR(input);
> // create a null terminated copy of the input string
> char *str_copy = calloc(1, input_len + 1);
> memcpy(str_copy, input_pt, input_len);
> // log message
> elog(INFO, "INPUT STRING: '%s' (%d)", str_copy, input_len);
> free(str_copy);
> PG_RETURN_NULL();
> }

You find this in "postgres.h":

* In consumers oblivious to data alignment, call PG_DETOAST_DATUM_PACKED(),
* VARDATA_ANY(), VARSIZE_ANY() and VARSIZE_ANY_EXHDR(). Elsewhere, call
* PG_DETOAST_DATUM(), VARDATA() and VARSIZE(). Directly fetching an int16,
* int32 or wider field in the struct representing the datum layout requires
* aligned data. memcpy() is alignment-oblivious, as are most operations on
* datatypes, such as text, whose layout struct contains only char fields.

So you should use VARDATA_ANY.

What happens is that these short text columns have a 1-byte TOAST header,
but you ship the first 4 bytes unconditionally, assuming they were detoasted.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Radoslav Nedyalkov 2020-04-24 14:32:07 create index insist on 2 workers only
Previous Message Amaury Bouchard 2020-04-24 12:53:38 Weird behaviour of C extension function