Quick Links

Re: patch: utf8_to_unicode (trivial)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: patch: utf8_to_unicode (trivial)
Date:	2010-08-13 17:50:32
Message-ID:	3250.1281721832@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Robert Haas's message of vie ago 13 12:50:13 -0400 2010:
>> Oh, hey, look at that. Any thought on what to about the fact that our
>> two existing copies of utf2ucs() don't match? (one tests against 0xf8
>> where the other against 0xf0)

> I'm not sure why it's masking 0xf8 instead of 0xf0.

Because it wants to verify that this is in fact a 4-byte UTF8 code.
Compare the code (and comments) for pg_utf_mblen.

AFAICS the version in mbprint.c is flat out wrong, and the only reason
nobody's noticed is that it should never get passed a more-than-4-byte
sequence anyway.

regards, tom lane

In response to

Re: patch: utf8_to_unicode (trivial) at 2010-08-13 17:40:12 from Alvaro Herrera

Responses

Re: patch: utf8_to_unicode (trivial) at 2010-08-13 18:11:27 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Fetter	2010-08-13 18:02:43	Re: patch: General purpose utility functions used by the JSON data type
Previous Message	Alvaro Herrera	2010-08-13 17:40:12	Re: patch: utf8_to_unicode (trivial)