From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dionisis Kontominas <dkontominas(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Question regarding UTF-8 data and "C" collation on definition of field of table |
Date: | 2023-02-06 00:19:01 |
Message-ID: | 2556580.1675642741@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Dionisis Kontominas <dkontominas(at)gmail(dot)com> writes:
> I suppose that affects the outcome of ORDER BY clauses on the field,
> along with the content of the indexes. Is this right?
Yeah.
> Assuming that the requirement exists, to store UTF-8 characters on a
> field that can be from multiple languages, and the database default
> encoding is UTF8 which is the right thing I suppose (please verify), what
> do you think should be the values of the Collation and Ctype for the
> database to behave correctly?
Um ... so define "correct". If you have a mishmash of languages in the
same column, it's likely that they have conflicting rules about sorting,
and there may be no ordering that's not surprising to somebody.
If there's a predominant language in the data, selecting a collation
matching that seems like your best bet. Otherwise, maybe you should
just shrug your shoulders and stick with C collation. It's likely
to be faster than any alternative.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dionisis Kontominas | 2023-02-06 00:48:15 | Re: Question regarding UTF-8 data and "C" collation on definition of field of table |
Previous Message | Dionisis Kontominas | 2023-02-05 23:36:54 | Re: Question regarding UTF-8 data and "C" collation on definition of field of table |