From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS |
Date: | 2011-06-09 17:55:02 |
Message-ID: | BANLkTikuTMGn8=Jw6vsYEebAuHZREscNvQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 9, 2011 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Hmm ... while the above is easy enough to do in the backend, where we
>>> can look at pg_database_encoding_max_length, we have also got instances
>>> of this coding pattern in src/port/pgstrcasecmp.c. It's a lot less
>>> obvious how to make the test in frontend environments. Thoughts anyone?
>
>> I'm not sure if this helps at all, but an awful lot of those tests are
>> against hard-coded strings that are known to contain only ASCII
>> characters. Is there some way we can optimize this for that case?
>
> For the places where we're just looking for a match to a fixed all-ASCII
> string, an ASCII-only downcasing would be sufficient, and would
> eliminate the whole problem. But I doubt all the callers fall into that
> class.
>
> What I'm particularly worried about at the moment is whether we are
> assuming anywhere that the frontend side can duplicate the backend's
> identifier downcasing behavior. That seems like a complete morass,
> because (1) they might not have the same locale, (2) they might not
> have the same encoding, (3) even if they do, the "same" locale is known
> to behave differently on different platforms.
Right. Understood. So let's look at the cases (from git grep
pg_strcasecmp and pg_strncasecmp):
contrib/dict_int: Fixed strings only, and it's all backend code anyway.
contrib/dict_xsyn: Fixed strings only, and it's all backend code anyway.
contrib/hstore: Fixed strings only, and it's all backend code anyway.
contrib/pg_upgrade: Used to compare LC_COLLATE, LC_CTYPE, and encoding names.
contrib/pgbench: Definitely front-end code, but it's all fixed strings.
contrib/pgcrypto: All fixed strings except for one instance in
px_find_digit. But it's all backend
contrib/spi: One instance, not a fixed string, but it's backend code.
contrib/unaccent: One instance, not a fixed string, but it's backend code.
src/backend/*: Backend code, obviously.
src/bin/initdb: Strings from a constant lookup table
(tsearch_config_languages) only.
src/bin/pg_basebackup: Fixed strings only.
src/bin/pg_ctl: Fixed strings only.
src/bin/pg_dump: Fixed strings only.
src/bin/psql: Fixed strings only. In a couple of cases they are not
constants - help.c uses strings from to generated file sql_help.h, and
tab-complete.c uses strings from a constant array called
words_after_create[]. But these are constant lookup tables.
src/include: access/reloptions.h uses strncasecmp() as part of a
macro. That should be OK as long as no one tries to include this in
frontend code, which seems rather impractical.
src/interfaces/ecpg/ecpglib: Fixed strings.
src/interfaces/ecpg/pgtypeslib: Fixed strings, and strings from a
constant lookup table, only.
src/interfaces/ecpg/preproc: This looks a bit worrisome. It seems we
might be using it on identifiers here.
src/interfaces/libpq: This is attempting to match a wildcard
certificate name against a hostname, in two different places.
src/port/chklocale.c: Fixed strings or ones from a lookup table.
src/timezone/pgtz.c: Matches input strings against filenames read from the OS.
So mostly I think these are OK. The instance in
src/interfaces/ecpg/preproc looks like the most likely candidate for a
problem spot.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2011-06-09 17:59:00 | Re: Postmaster holding unlinked files for pg_largeobject table |
Previous Message | Heikki Linnakangas | 2011-06-09 17:54:45 | Re: SLRU limits |