From: | Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> |
---|---|
To: | phd2(at)earthling(dot)net |
Cc: | "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>, "'hackers(at)postgresql(dot)org'" <hackers(at)postgreSQL(dot)org>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Subject: | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
Date: | 1999-06-11 15:23:57 |
Message-ID: | 37612A0D.BAF8B997@alumni.caltech.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> > istm that the Russian and Japanese contingents could represent the
> > needs of multibyte and locale concerns very well. So, we should ask
> > ourselves some questions to see if we can make *progress* in evolving
> > our text handling, rather than just staying the same forever.
> Ok, we are here.
> And what a pros and cons for NCHAR?
I was hoping you would tell me! :)
> > SQL92 suggests some specific text handling features to help with
> > non-ascii applications.
> What the help?
OK, SQL92 defines two kinds of native character sets: those we already
have (char, varchar) and those which can be locale customized (nchar,
national character varying, and others). char and varchar always
default to the "SQL" behavior (which I think corresponds to ascii
(called "SQL_TEXT") but I didn't bother looking for the details).
So, at its simplest, there would be two sets of character types, with
char, varchar, etc, always the same on every system (just like
Postgres w/o multibyte or locale), and nchar, nvarchar, etc configured
as your locale/multibyte environment would want.
However, there are many more features in SQL92 to deal with text
customization. I'll mention a few (well, most of them, but not in
detail):
o You can define a "character set" and, independently, a "collation".
The syntax for the type definition is
CHARACTER [ VARYING ] [ (length) ]
[ CHARACTER SET your-character-set ]
[ COLLATE your-collation-sequence ]
o You can specify a character type for string literals:
_your-character-set 'literal string' e.g. _ESPANOL 'Que pasa?'
(forgive my omission of a leading upside down question mark :)
We already have some support for this in that character string
literals can have a type specification (e.g. "DATETIME 'now'") and
presumably we can use the required underscore to convert the
"_ESPANOL" to a character set and collation, all within the existing
Postgres type system.
o You can specify collation behavior in a strange way:
'Yo dude!' COLLATE collation-method
(which we could convert in the parser to a function call).
o You can translate between character sets, *if* there is a reasonable
mapping available:
TRANSLATE(string USING method)
and you can define translations in a vague way (since no one but
Postgres implemented a type system back then):
CREATE TRANSLATION translation
FOR source-charset
TO target-charset
FROM { EXTERNAL('external-translation') | IDENTITY |
existing-translation }
DROP TRANSLATION translation
o You can convert character strings which have the same character
"repertoire" from one to the other:
CONVERT(string USING conversion-method)
(e.g. we could define a method "EBCDIC_TO_ASCII" once we have an
"EBCDIC" character set).
o You can specify identifiers (column names, etc) with a specific
character set/collation by:
_charset colname (e.g. _FRANCAIS Francais where the second "c" is
allowed to be "c-cedilla", a character in the French/latin character
set; sorry I didn't type it).
> > Would these mechanisms work for people? Or are they so fundamentally
> > flawed or non-standard (it is from a standard, but I'm not sure who
> > implements it)?
Fully implementing these features (or a reasonable subset) would give
us more capabilities than we have now, and imho can be fit into our
existing type system. *Only* implementing NCHAR etc gives us the
ability to carry SQL_TEXT and multibyte/locale types in the same
database, which may not be a huge benefit to those who never want to
mix them in the same installation. I don't know who those folks might
be but Tatsuo and yourself probably do.
Comments?
- Thomas
--
Thomas Lockhart lockhart(at)alumni(dot)caltech(dot)edu
South Pasadena, California
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Lockhart | 1999-06-11 15:38:38 | "DML" |
Previous Message | Tatsuo Ishii | 1999-06-11 15:14:55 | Re: [HACKERS] Re: locales and MB (was: Postgres 6.5 beta2 and beta3 problem) |