Quick Links

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dionisis Kontominas <dkontominas(at)gmail(dot)com>
Cc:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Date:	2023-02-06 00:19:01
Message-ID:	2556580.1675642741@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Dionisis Kontominas <dkontominas(at)gmail(dot)com> writes:
> I suppose that affects the outcome of ORDER BY clauses on the field,
> along with the content of the indexes. Is this right?

Yeah.

> Assuming that the requirement exists, to store UTF-8 characters on a
> field that can be from multiple languages, and the database default
> encoding is UTF8 which is the right thing I suppose (please verify), what
> do you think should be the values of the Collation and Ctype for the
> database to behave correctly?

Um ... so define "correct". If you have a mishmash of languages in the
same column, it's likely that they have conflicting rules about sorting,
and there may be no ordering that's not surprising to somebody.

If there's a predominant language in the data, selecting a collation
matching that seems like your best bet. Otherwise, maybe you should
just shrug your shoulders and stick with C collation. It's likely
to be faster than any alternative.

regards, tom lane

In response to

Re: Question regarding UTF-8 data and "C" collation on definition of field of table at 2023-02-05 23:36:54 from Dionisis Kontominas

Responses

Re: Question regarding UTF-8 data and "C" collation on definition of field of table at 2023-02-06 00:48:15 from Dionisis Kontominas
Re: Question regarding UTF-8 data and "C" collation on definition of field of table at 2023-02-06 01:14:44 from Peter Geoghegan

Browse pgsql-general by date

	From	Date	Subject
Next Message	Dionisis Kontominas	2023-02-06 00:48:15	Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Previous Message	Dionisis Kontominas	2023-02-05 23:36:54	Re: Question regarding UTF-8 data and "C" collation on definition of field of table