Quick Links

Re: Pre-proposal: unicode normalized text

From:	Nico Williams <nico(at)cryptonector(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-05 19:52:37
Message-ID:	ZR8UBTZcywbhK4JI@ubby21
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Oct 05, 2023 at 03:49:37PM -0400, Tom Lane wrote:
> Nico Williams <nico(at)cryptonector(dot)com> writes:
> > Text+encoding can be just like bytea with a one- or two-byte prefix
> > indicating what codeset+encoding it's in. That'd be how to encode
> > such text values on the wire, though on disk the column's type should
> > indicate the codeset+encoding, so no need to add a prefix to the value.
>
> The precedent of BOMs (byte order marks) suggests strongly that
> such a solution would be horrible to use.

This is just how you encode the type of the string. You have any number
of options. The point is that already PG can encode binary data, so if
how to encode text of disparate encodings on the wire, building on top
of the encoding of bytea is an option.

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-05 19:49:37 from Tom Lane

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-06 17:42:09 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Laurenz Albe	2023-10-05 19:54:29	Re: Good News Everyone! + feature proposal
Previous Message	Tom Lane	2023-10-05 19:49:37	Re: Pre-proposal: unicode normalized text