Quick Links

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date:	2011-10-04 17:27:32
Message-ID:	CAFaPBrT9EpWWcHYrvUv5OQQz6Vgg+xQX0mE1ZaQguVUuXo6-Rg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-hackers

On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
<amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
> On 4 October 2011 14:04, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>
>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>
>> Consider a latin1 database where utf8_str was a string of ascii
>> characters. [...]

>> [Patch] Look ok to you?
>>
>
> + if(GetDatabaseEncoding() == PG_UTF8)
> + pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>
> In your patch, the above will again skip mb-validation if the database
> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
> the un-converted string even if *one* of the src and dest encodings is
> SQL_ASCII.

*scratches head* I thought the point of SQL_ASCII was no encoding
conversion was done and so there would be nothing to verify.

Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
NULL bytes in the string when we are a single byte encoding.

> I think :
> if (ret == utf8_str)
> + {
> + pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
> ret = pstrdup(ret);
> + }
>
> This (ret == utf8_str) condition would be a reliable way for knowing
> whether pg_do_encoding_conversion() has done the conversion at all.

Yes. However (and maybe im nitpicking here), I dont see any reason to
verify certain strings twice if we can avoid it.

What do you think about:
+ /*
+ * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
+ * will not do any conversion or verification. we need to do it
manually instead.
+ */
+ if( GetDatabaseEncoding() == PG_UTF8 ||
GetDatabaseEncoding() == SQL_ASCII)
+ pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);

In response to

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding. at 2011-10-04 09:09:44 from Amit Khandekar

Responses

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding. at 2011-10-05 05:46:02 from Amit Khandekar

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Tom Lane	2011-10-04 20:14:10	pgsql: Remember the source GucContext for each GUC parameter.
Previous Message	Alvaro Herrera	2011-10-04 17:10:55	pgsql: Use callbacks in SlruScanDirectory for the actual action

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mr. Aaron W. Swenson	2011-10-04 18:06:18	Re: Bug with pg_ctl -w/wait and config-only directories
Previous Message	Alvaro Herrera	2011-10-04 17:17:25	Re: have SLRU truncation use callbacks