From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: UTF8 with BOM support in psql |
Date: | 2009-11-17 07:02:17 |
Message-ID: | 1258441337.10724.13.camel@fsopti579.F-Secure.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On tis, 2009-11-17 at 14:19 +0900, Itagaki Takahiro wrote:
> The attachd patch is a new proposal of the feature.
> When we found BOM at beginning of file, set "expected_encoding" to UTF8.
> Before every execusion of query, if pset.encoding is not UTF8, we check the
> query string not to contain any non-ASCII characters and throw an error if
> found. Encoding declarations are typically written only in ascii characters,
> so we can postpone encoding checking until non-ascii characters appear.
>
> Since the default value of expected_encoding is SQL_ASCII, that pass
> through all characters, so the patch does nothing to scripts without BOM.
> (There are no codes to set expected_encoding except BOM.)
> If client encoding is UTF8, it skips BOM and no effect to the script body.
> BOMs are skipped even if client encoding is not set to UTF8, but can throw
> an error if there are no explicit encoding declaration.
I think I could support using the presence of the BOM as a fall-back
indicator of encoding in absence of any other declaration. It seems to
me, however, that the description above ignores the existence of
encodings other than SQL_ASCII and UTF8.
Also, when the proposed patch to set the encoding from the locale
appears, we need to make this logic more precise. Something like:
1. set client_encoding or \encoding, otherwise
2. if BOM found, then UTF8, otherwise
3. by locale environment, otherwise
4. SQL_ASCII (= server encoding, effectively)
Also, I'm not sure if we need this logic only when we send a query. It
might be better to do this in the lexer when we find a non-ASCII
character and we don't have a client encoding != SQL_ASCII set yet.
From | Date | Subject | |
---|---|---|---|
Next Message | KaiGai Kohei | 2009-11-17 07:30:06 | Re: TRIGGER with WHEN clause |
Previous Message | Peter Eisentraut | 2009-11-17 06:43:13 | Re: sgml and "empty" closing tags |