| From: | Peter Eisentraut <peter_e(at)gmx(dot)net> | 
|---|---|
| To: | Marko Kreen <markokr(at)gmail(dot)com> | 
| Cc: | Postgres Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: [rfc] unicode escapes for extended strings | 
| Date: | 2009-09-21 20:36:52 | 
| Message-ID: | 1253565412.20098.5.camel@vanquo.pezone.net | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Wed, 2009-09-09 at 18:26 +0300, Marko Kreen wrote:
> Unicode escapes for extended strings.
> 
> On 4/16/09, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> > Reasons:
> >
> >  - More people are familiar with \u escaping, as it's standard
> >   in Java/C#/Python, probably more..
> >  - U& strings will not work when stdstr=off.
> >
> >  Syntax:
> >
> >   \uXXXX      - 16-bit value
> >   \UXXXXXXXX  - 32-bit value
> >
> >  Additionally, both \u and \U can be used to specify UTF-16 surrogate
> >  pairs to encode characters with value > 0xFFFF.  This is exact behaviour
> >  used by Java/C#/Python.  (except that Java does not have \U)
> 
> v3 of the patch:
> 
>     - convert to new reentrant lexer API
>     - add lexer targets to avoid fallback to default
>     - completely disallow \U\u without proper number of hex values
>     - fix logic bug in surrogate pair handling
This looks good to me.  I'm implementing the surrogate pair handling for
the U& syntax for consistency.  Then I'll apply this.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dimitri Fontaine | 2009-09-21 20:43:02 | Re: generic copy options | 
| Previous Message | Tom Lane | 2009-09-21 20:33:32 | Re: Adding \ev view editor? |