Quick Links

Re: Pre-proposal: unicode normalized text

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Nico Williams <nico(at)cryptonector(dot)com>
Cc:	Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-06 18:37:06
Message-ID:	CA+TgmoavAkU0kqVc7mBMsdKDsj7Bq9moPJQAkW6mSw3GgpGriw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 6, 2023 at 2:25 PM Nico Williams <nico(at)cryptonector(dot)com> wrote:
> > > > Well, that would be making the encoding a per-value property, rather
> > > > than a per-column property like collation as I proposed. I can't see
> > >
> > > On-disk it would be just a property of the type, not part of the value.
> >
> > I mean, that's not how it works.
>
> Sure, because TEXT in PG doesn't have codeset+encoding as part of it --
> it's whatever the database's encoding is. Collation can and should be a
> porperty of a column, since for Unicode it wouldn't be reasonable to
> make that part of the type. But codeset+encoding should really be a
> property of the type if PG were to support more than one. IMO.

No, what I mean is, you can't just be like "oh, the varlena will be
different in memory than on disk" as if that were no big deal.

I agree that, as an alternative to encoding being a column property,
it could instead be completely a type property, meaning that if you
want to store, say, LATIN1 text in your UTF-8 database, you first
create a latint1text data type and then use it, rather than, as in the
model I proposed, creating a text column and then applying a setting
like ENCODING latin1 to it. I think that there might be some problems
with that model, but it could also have some benefits. If someone were
going to make a run at implementing this, they might want to consider
both designs and evaluate the tradeoffs.

But, even if we were all convinced that this kind of feature was good
to add, I think it would almost certainly be wrong to invent new
varlena features along the way.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-06 18:25:44 from Nico Williams

Responses

Re: Pre-proposal: unicode normalized text at 2023-11-02 22:38:47 from Nico Williams

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Konstantin Knizhnik	2023-10-06 18:44:05	Re: Index range search optimization
Previous Message	Nico Williams	2023-10-06 18:25:44	Re: Pre-proposal: unicode normalized text