Re: Question regarding UTF-8 data and "C" collation on definition of field of table

From: Dionisis Kontominas <dkontominas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Date: 2023-02-05 23:36:54
Message-ID: CAB4Evu30cbKTZrYFh=zf-+TizKtQe28hFcdJpMw5wrEex7++fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Tom,

Thank you for your response.

I suppose that affects the outcome of ORDER BY clauses on the field,
along with the content of the indexes. Is this right?

Assuming that the requirement exists, to store UTF-8 characters on a
field that can be from multiple languages, and the database default
encoding is UTF8 which is the right thing I suppose (please verify), what
do you think should be the values of the Collation and Ctype for the
database to behave correctly? I could not find something specific in the
documentation.

What I did find interesting though is the below statement:

24.2.2.1. Standard Collations
"Additionally, the SQL standard collation name ucs_basic is available for
encoding UTF8. It is equivalent to C and sorts by Unicode code point."

Is this the right collation in the creation of the database in this use
case? If so, what would be the corresponding suitable Ctype?

Regards,
Dionisis

On Mon, 6 Feb 2023 at 00:24, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Dionisis Kontominas <dkontominas(at)gmail(dot)com> writes:
> > Let's say that the definition is for example as follows:
> > name character varying(8) COLLATE pg_catalog."C" NOT NULL
> > and also assume that the database default encoding is UTF8 and also the
> > Collate and Ctype is "C"". I plan to store strings of various languages
> in
> > this field.
>
> > Are these the correct settings that I should have used on creation of
> > the database?.
>
> Well, it won't crash or anything, but sorting will be according
> to byte-by-byte values. So the sort order of non-ASCII text is
> likely to look odd. How much do you care about that?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-02-06 00:19:01 Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Previous Message Tom Lane 2023-02-05 23:24:40 Re: Question regarding UTF-8 data and "C" collation on definition of field of table