From: | Alex Hunsaker <badalex(at)gmail(dot)com> |
---|---|
To: | David Christensen <david(at)endpoint(dot)com> |
Cc: | "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plperlu problem with utf8 |
Date: | 2010-12-20 07:39:17 |
Message-ID: | AANLkTimKvnkFirhsAtQZ6aw35vw8HfzycZ0pQeiN9_n_@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Dec 19, 2010 at 21:00, David Christensen <david(at)endpoint(dot)com> wrote:
>
> On Dec 19, 2010, at 2:20 AM, Alex Hunsaker wrote:
>
>> With the attached we:
>> - for function arguments, convert (using pg_do_encoding_conversion) to
>> utf8 from the current database encoding.
>
> How does this deal with input records of composite type?
Same way it worked before just encoded properly. :)
>> - fix errors so they properly encode their message to the current
>> database encoding
>
>[...] does it degrade to the current behavior, as opposed to fail or eat the error message without outputting?
No it fails with an "invalid encoding message". I think thats the
best we can do. It certainly seems wrong to send invalid bytes back
to the client.
>> - always do the utf8 hack so utf8.pm is loaded (fixes utf8 regexs in
>> plperl). [...]
>
> This sounds good in general, but I think should be skipped if GetDatabaseEncoding() == SQL_ASCII.
Why? What if you want to use a module that does utf8 things?
>> [ PLs handling strings as utf8 in SQL_ASCII databases ]
> I think this could/should be adequately handled by not calling the function when the DatabaseEncoding is SQL_ASCII. [...] > Also, I'd argue that pltcl and plpython should be fixed as well for the same reasons.
I don't disagree, but im not volunteering for it either. Without
having done any in depth analysis, the largest problem is they use
strings for most arguments. Strings need an encoding and pl/python
and pl/tcl want a utf8 string. So we need to convert from SQL_ASCII
to UTF8. Which like you say is nonsensical. So really in the
SQL_ASCII case we would need to switch to binary datatypes for those
languages. Perl is a bit more flexible so it should be easier to
'fix'.
>> This does *not* address the bytea issue where currently if you have
>> bytea input or output we try to encode that the same as any string. I
>> think thats going to be a bit more invasive and this patch should
>> stands on its own.
>> <plperl_fix_enc.patch.gz>
>
> Yeah, I'm not sure how invasive that will end up being, or if there are other datatypes which should skip the text processing.
Well, I don't think it will be too bad. As for examples of other
datatypes that we could skip text processing: int, float and numeric.
Right now we treat them as strings, which works fine for perl... but
its a tad inefficient. Ideally we could be using things like SvIV.
> I noticed you moved the declaration of perl_sys_init_done; was that an independent bug fix, or did something in the patch require that?
It just squashes an unused variable "perl_sys_init_done" warning when
perl is compiled with its own malloc. Of course that added another
warning "warning: ISO C90 forbids mixed declarations and code" :P
In further review over caffeine this morning I noticed there are a few
places I missed: plperl_build_tuple_result(), plperl_modify_tuple()
and Util.XS.
The first 2 pass perl hash keys for the column straight to
SPI_fnumber() without doing any conversion. In theory this means we
might not be able to look up columns with non ascii characters. I
missed this because those hash keys are declared as char* instead of
SV* and do not use any of the standard perl string macros.
Anyway I hope to have a fresh patch tomorrow.
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2010-12-20 08:17:27 | Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep |
Previous Message | Heikki Linnakangas | 2010-12-20 06:16:34 | Re: serializable lock consistency |