Re: Lexing with different charsets

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: db(at)zigo(dot)dhs(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Lexing with different charsets
Date: 2004-04-14 01:18:55
Message-ID: 20040414.101855.108739877.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I've spent some more time reading specs today. Together with Peter E's
> explanataion (Thanks!) I think I've got a farily good understanding of the
> parts talking about locales now.
>
> My next question is about lexing. The spec says that one can use strings
> of different charsets in the queries, like:
>
> ... WHERE field1 = _latin1'FooBar' and field2 = _utf8'Åäö'

In my understanding this was removed as of SQL:1999. I'm not sure
about SQL:2003 though.
--
Tatsuo Ishii

> I can see that the lexer either needs to be taught about all the
> different charsets or this is not going to work very well.
>
> What if one wants to include a string in utf-16 in the query, the lexer
> can not handle that without understanding utf-16. The query can also be in
> different charsets. If it's in utf-8 for example, then we can not embed
> latin1 strings and still have a validating utf-8 query. With the above we
> can not think of the query as being in a single charset anymore. That's
> strange but okay I guess.
>
> The new wire protocol allows us to send data seperatly from the query
> which is nice, but the standard talked about strings as above so it's not
> a solution to the problem.
>
> Maybe I should have adressed this to Peter directly :-)
>
> --
> /Dennis Björklund
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephan Szabo 2004-04-14 02:30:22 Re: Lexing with different charsets
Previous Message Kurt Roeckx 2004-04-13 22:22:18 Re: 7.5 beta version