From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | alvherre(at)commandprompt(dot)com |
Cc: | badalex(at)gmail(dot)com, cb(at)df7cb(dot)de, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pl/perl and utf-8 in sql_ascii databases |
Date: | 2012-06-21 11:22:43 |
Message-ID: | 20120621.202243.116439861.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
> > Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> > and SvPVUTF8() when turning a perl string into a cstring.
>
> Right.
I spent a bit longer time catching on pl/perl and now understand
what is the problem...
> So I played a bit with this patch, and touched it a bit mainly just to
> add some more comments; and while at it I noticed that some of the
> functions in Util.xs might leak some memory, so I made an attempt to
> plug them, as in the attached patch (which supersedes yours).
Ok, Is it ok to look into the newer patch including fix of leaks
at first?
-- Coding and styles.
This also seems to have polished the previous one on some codes,
styles and comments which generally look reasonable. And patch
style was corrected into unified.
-- Functions
I seems to work properly on the database the encodings of which
are SQL_ASCII and UTF8 (and EUC-JP) as below,
=================
=> create or replace function foo(text) returns text language plperlu as $$ $a = shift; return "BOO!" if ($a != "a\x80cあ"); return $a; $$;
SQL_ASCII=> select foo(E'a\200cあ') = E'a\200cあ';
?column?
----------
t
UTF8=> select foo(E'a\200cあ');
ERROR: invalid byte sequence for encoding "UTF8": 0x80
UTF8=> select foo(E'a\302\200cあ') = E'a\u0080cあ';
?column?
----------
t
=================
This looks quite valid according to the definition of the
encodings and perl's nature as far as I see.
-- The others
Variable naming in util_quote_*() seems a bit confusing,
> text *arg = sv2text(sv);
> text *ret = DatumGetTextP(..., PointerGetDatum(arg)));
> char *str;
> pfree(arg);
> str = text_to_cstring(ret);
> RETVAL = cstr2sv(str);
> pfree(str);
Renaming ret to quoted and str to ret as the patch attached might
make it easily readable.
> Now, with my version of the patch applied and using a SQL_ASCII database
> to test the problem in the original report, I notice that we now have a
> regression failure:
snip.
> I'm not really sure what to do here -- maybe have a second expected file
> for that test is a good enough answer? Or should I just take the test
> out? Opinions please.
The attached ugly patch does it. We seem should put NO_LOCALE=1
on the 'make check' command line for the encodings not compatible
with the environmental locale, although it looks work.
# UtfToLocal() seems to have a bug that always report illegal
# encoding was "UTF8" regardless of the real encoding. But
# plper_lc_*.(sql|out) increases if the bug is fixed.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
== My e-mail address has been changed since Apr. 1, 2012.
Attachment | Content-Type | Size |
---|---|---|
plperl_sql_ascii-3.patch | text/x-patch | 5.7 KB |
plperl_sql_ascii_regress.patch | text/x-patch | 4.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2012-06-21 11:41:25 | Catalog/Metadata consistency during changeset extraction from wal |
Previous Message | Etsuro Fujita | 2012-06-21 11:20:56 | Re: not null validation option in contrib/file_fdw |