RE: About Unicode IVS

From: Graham Myers <gmyers(at)retailexpress(dot)com>
To: ่’ไบ•ๅ…ƒๆˆ <n2029(at)ndensan(dot)co(dot)jp>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: RE: About Unicode IVS
Date: 2022-03-29 08:26:09
Message-ID: d60efdf8caa7379a7483cd530ba5098e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks you for the explanation, Unicode always blows my mind ๐Ÿ˜Š The
problems is that postgres is counting code points which in your example is
two.

*From:* ่’ไบ•ๅ…ƒๆˆ <n2029(at)ndensan(dot)co(dot)jp>
*Sent:* 29 March 2022 09:21
*To:* 'Graham Myers' <gmyers(at)retailexpress(dot)com>; 'David G. Johnston' <
david(dot)g(dot)johnston(at)gmail(dot)com>
*Cc:* pgsql-admin(at)lists(dot)postgresql(dot)org
*Subject:* RE: About Unicode IVS

thank you for your reply.

This is because two characters display one character.

This includes Unicode Variant Selectors and Combining Characters.

Moto.

*From:* Graham Myers <gmyers(at)retailexpress(dot)com>
*Sent:* Tuesday, March 29, 2022 4:46 PM
*To:* ่’ไบ•ๅ…ƒๆˆ <n2029(at)ndensan(dot)co(dot)jp>; David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com>
*Cc:* pgsql-admin(at)lists(dot)postgresql(dot)org
*Subject:* RE: About Unicode IVS

Why do you expect the concatenation of two characters to return a length of
one?

Graham Myersโ€‹

*From:* ่’ไบ•ๅ…ƒๆˆ <n2029(at)ndensan(dot)co(dot)jp>
*Sent:* 29 March 2022 05:35
*To:* 'David G. Johnston' <david(dot)g(dot)johnston(at)gmail(dot)com>
*Cc:* pgsql-admin(at)lists(dot)postgresql(dot)org
*Subject:* RE: About Unicode IVS

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

2

(1 ่กŒ)

select length('่พบ๓ „‚');

length

--------

2

(1 ่กŒ)

select char_length('่พบ๓ „‚');

char_length

-------------

2

(1 ่กŒ)

$ psql -l

ใƒ‡ใƒผใ‚ฟใƒ™ใƒผใ‚นไธ€่ฆง

ๅๅ‰ | ๆ‰€ๆœ‰่€… | ใ‚จใƒณใ‚ณใƒผใƒ‡ใ‚ฃใƒณใ‚ฐ | ็…งๅˆ้ †ๅบ | Ctype(ๅค‰ๆ›ๆผ”็ฎ—ๅญ) | ใ‚ขใ‚ฏใ‚ปใ‚นๆจฉ้™

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C |
=c/D209007 +

| | | | |
D209007=CTc/D209007

template1 | D209007 | UTF8 | C | C |
=c/D209007 +

| | | | |
D209007=CTc/D209007

(4 ่กŒ)

$ cat pgdata/PG_VERSION

13

Moto.

*From:* David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
*Sent:* Tuesday, March 29, 2022 12:38 PM
*To:* ่’ไบ•ๅ…ƒๆˆ <n2029(at)ndensan(dot)co(dot)jp>
*Cc:* pgsql-admin(at)lists(dot)postgresql(dot)org
*Subject:* Re: About Unicode IVS

Graham Myers

On Monday, March 28, 2022, ่’ไบ•ๅ…ƒๆˆ <n2029(at)ndensan(dot)co(dot)jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be
1 character.

Is it possible to respond by changing the settings such as changing
the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own
function), it would be helpful if you could provide as much
information as you can.

Try char_length(text) instead.

David J.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message ่’ไบ•ๅ…ƒๆˆ 2022-03-29 08:52:45 RE: About Unicode IVS
Previous Message ่’ไบ•ๅ…ƒๆˆ 2022-03-29 08:21:18 RE: About Unicode IVS