Quick Links

Bug with UTF-8 character

From:	Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at
Subject:	Bug with UTF-8 character
Date:	2006-05-26 06:21:56
Message-ID:	44769E84.7000006@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

good morning,

I got a bug request for the following unicode character in PostgreSQL
8.1.4: 0xedaeb8

ERROR: invalid byte sequence for encoding "UTF8": 0xedaeb8

This one seemed to work properly in PostgreSQL 8.0.3.

I think the following code in postgreSQL 814 has a bug in it.

File: postgresql-8.1.4/src/backend/utils/mb/wchar.c

The entry values to the function are:

source = ed ae b8 20 20 20 20 20 20 20 20 20 20 20 20

length = 3 (length is the length of current utf-8 character)

But the code does a check where the second character should not be
greater than 0x9F, when first character is 0xED. This is not according
to UTF-8 standard in RFC 3629. I believe that is not a valid test.

This test fails on our string, when it shouldn’t.

I believe this is a bug, could you please confirm or let me know what I
am doing wrong.

Many thanks,

Hans

--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Responses

Re: Bug with UTF-8 character at 2006-05-26 13:48:33 from Martijn van Oosterhout
Re: Bug with UTF-8 character at 2006-05-26 14:33:59 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2006-05-26 07:50:03	Re: Updatable views/with check option parsing
Previous Message	Greg Stark	2006-05-26 04:35:20	Re: LIKE, leading percent, bind parameters and indexes