From: | Kris Jurka <books(at)ejurka(dot)com> |
---|---|
To: | Giuseppe Sacco <giuseppe(at)eppesuigoccas(dot)homedns(dot)org> |
Cc: | "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org> |
Subject: | Re: DatabaseMetaData.getExtraNameCharacters |
Date: | 2005-05-25 19:59:58 |
Message-ID: | Pine.BSO.4.56.0505251452050.30233@leary.csoft.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-jdbc |
On Wed, 25 May 2005, Giuseppe Sacco wrote:
> Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto:
> [...]
> > ident_start [A-Za-z\200-\377_]
> > ident_cont [A-Za-z\200-\377_0-9\$]
> > identifier {ident_start}{ident_cont}*
> >
> > So \200-\377 is octal for any character with the high bit set. The list
> > of what characters this could map to numbers in the tens of thousands for
> > unicode, so it's not really feasibly to return in this method.
>
> if I understand correclty, the valid charset is the one computed by the
> attached class. It seems to me that this is 191 characters long.
> Could you please let me know where I am wrong?
You are assuming that each character is only one byte. The backend lexing
rules are a byte by byte operation, but the JDBC side is returning a
String of characters. Consider the character "Latin Small Letter s with
Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal. This
is one character in the result of getExtraNameCharacters.
Kris Jurka
From | Date | Subject | |
---|---|---|---|
Next Message | Oliver Jowett | 2005-05-26 05:21:13 | Re: BIGINT <-> java.lang.String auto cast |
Previous Message | Giuseppe Sacco | 2005-05-25 18:51:28 | Re: DatabaseMetaData.getExtraNameCharacters |