Quick Links

Re: Implementing full UTF-8 support (aka supporting 0x00)

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Implementing full UTF-8 support (aka supporting 0x00)
Date:	2016-08-04 00:22:25
Message-ID:	CAMsr+YF8ua27YmYJkOD_o+DwU_mUDjEZss9Nua9s-Wo2Qs2MOw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4 August 2016 at 05:00, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> On Thu, Aug 4, 2016 at 5:16 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote:
> > On 3 August 2016 at 22:54, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
> wrote:
> >> What would it take to support it? Isn't the varlena header
> propagated
> >> everywhere, which could help infer the real length of the string? Any
> >> pointers or suggestions would be welcome.
> >
> >
> > One of the bigger pain points is that our interaction with C library
> > collation routines for sorting uses NULL-terminated C strings. strcoll,
> > strxfrm, etc.
>
> That particular bit of the problem would go away if this ever happened:
>
> https://wiki.postgresql.org/wiki/Todo:ICU
>
> ucoll_strcoll takes explicit lengths (though optionally accepts -1 for
> null terminated mode).
>
>
> http://userguide.icu-project.org/strings#TOC-Using-C-Strings:-NUL-Terminated-vs.-Length-Parameters
>

Yep, it does. But we've made little to no progress on integration of ICU
support and AFAIK nobody's working on it right now.

I wonder how MySQL implements their collation and encoding support?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Implementing full UTF-8 support (aka supporting 0x00) at 2016-08-03 21:00:02 from Thomas Munro

Responses

Re: Implementing full UTF-8 support (aka supporting 0x00) at 2016-08-04 00:40:31 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2016-08-04 00:24:27	Re: PostgreSQL 10 kick-off
Previous Message	Simon Riggs	2016-08-04 00:16:20	Re: Lossy Index Tuple Enhancement (LITE)