Re: ESQL/C FETCH of CHAR data delivers to much data for UTF-8

From: Olivier Gautherot <ogautherot(at)gautherot(dot)net>
To: Matthias Apitz <guru(at)unixarea(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: ESQL/C FETCH of CHAR data delivers to much data for UTF-8
Date: 2020-01-10 12:32:57
Message-ID: CAJ7S9TWosuMTy-b5Bfd0JwtpgtM=uCVOXFrW8erJZQt9_KOtWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Matthias,

On Thu, Jan 9, 2020, 20:21 Matthias Apitz <guru(at)unixarea(dot)de> wrote:

> Hello,
>
> We encounter the following problem with ESQL/C: Imagine a table with two
> columns: CHAR(16) and DATE
>
> The CHAR column can contain not only 16 bytes, but 16 Unicode chars,
> which are longer than 16 bytes if one or more of the chars is a UTF-8
> multibyte
> encoded.
>
> If one provides in C a host structure to FETCH the data as:
>
> EXEC SQL BEGIN DECLARE SECTION;
> struct r_d02ben_ec {
> char string[17];
> char date[11];
> };
> typedef struct r_d02ben_ec t_d02ben_ec;
> t_d02ben_ec *hp_d02ben, hrec_d02ben;
> EXEC SQL END DECLARE SECTION;
>
> and fetches the data with ESQL/C as:
>
> EXEC SQL FETCH hc_d02ben INTO :hrec_d02ben;
>
> The generated C-code looks like this:
>
> ...
> ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "fetch hc_d02ben",
> ECPGt_EOIT,
> ECPGt_char,&(hrec_d02ben.string),(long)17,(long)1,sizeof( struct
> r_d02ben_ec ),
> ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
> ECPGt_char,&(hrec_d02ben.date),(long)11,(long)1,sizeof( struct
> r_d02ben_ec ),
> ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
> ...
>
> As you can see for the first item the length 17 is sent to the PG server
> together with the pointer to where the data should be stored
> and for the second element the length 11 is sent (which is big enough to
> receive in ASCII MM.DD.YYYY and a trailing \0).
>
> What we now see using GDB is that for the first element all UTF-8 data
> is returned, lets asume only one multibyte char, which gives 17 bytes,
> not only 16, and the trailing NULL is already placed into the element for
> the date. Now the function ECPGdo() returns the date as MM.DD.YYYY
> into the area pointed to for the 2nd element and with this overwrites
> the NULL terminator of the string[17] element. Result is later a
> SIGSEGV because the expected string in string[17] is not NULL
> terminated anymore :-)
>
> I would call it a bug, that ECPGdo() puts more than 17 bytes (16 bytes +
> NULL) as return into the place pointed to by the host var pointer when
> the column in the database has more (UTF-8) chars as will fit into
> 16+1 byte.
>
> Comments?
> Proposals for a solution?
>
> Thanks
>
> matthias
>
>
> --
> Matthias Apitz, ✉ guru(at)unixarea(dot)de, http://www.unixarea.de/
> +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
>

I would be cautious about naming this a bug as it is a classical buffer
overflow (i.e. design) issue: if you have UTF-8 characters, your text is no
longer 16-byte long and you should plan extra space in your variables.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-01-10 16:26:04 Re: Encrypted connection SQL server fdw
Previous Message Olivier Gautherot 2020-01-10 12:19:52 Re: Upgrade PostgreSQL 9.6 to 10.6