From: | Palle Girgensohn <girgen(at)pingpong(dot)net> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Devrim Gündüz <devrim(at)gunduz(dot)org>, Jakob Egger <jakob(at)eggerapps(dot)at>, Tobias Bussmann <t(dot)bussmann(at)gmx(dot)net> |
Subject: | Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00) |
Date: | 2016-08-11 09:15:36 |
Message-ID: | 7D9265F9-2E7B-4136-BCA1-AAE8481561B7@pingpong.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> 11 aug. 2016 kl. 03:05 skrev Peter Geoghegan <pg(at)heroku(dot)com>:
>
> On Wed, Aug 10, 2016 at 1:42 PM, Palle Girgensohn <girgen(at)pingpong(dot)net> wrote:
>> They've been used for the FreeBSD ports since 2005, and have served us well. I have of course updated them regularly. In this latest version, I've removed support for other encodings beside UTF-8, mostly since I don't know how to test them, but also, I see little point in supporting anything else using ICU.
>
> Looks like you're not using the ICU equivalent of strxfrm(). While 9.5
> is not the release that introduced its use, it did expand it
> significantly. I think you need to fix this, even though it isn't
> actually used to sort text at present, since presumably FreeBSD builds
> of 9.5 don't TRUST_STRXFRM. Since you're using ICU, though, you could
> reasonably trust the ICU equivalent of strxfrm(), so that's a missed
> opportunity. (See the wiki page on the abbreviated keys issue [1] if
> you don't know what I'm talking about.)
My plan was to get it working without TRUST_STRXFRM first, and then add that functinality. I've made some preliminary tests using ICU:s ucol_getSortKey but I will have to test it a bit more. For now, I just expect not to trust strxfrm. It is the first iteration wrt strxfrm, the plan is to use that code base.
>
> Shouldn't you really have a strxfrm() wrapper, used across the board,
> including for callers outside of varlena.c? convert_string_datum() has
> been calling strxfrm() for many releases now. These calls are still
> used in FreeBSD builds, I would think, which seems like a bug that is
> not dodged by simply not defining TRUST_STRXFRM. Isn't its assumption
> that that matching the ordering used elsewhere not really hold on
> FreeBSD builds?
I was not aware of convert_string_datum, I will check that, thanks! Using a wrapper across the board seems like a good idea for refactoring.
>
> [1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue
> --
> Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Artur Zakirov | 2016-08-11 09:46:28 | Re: Bug in to_timestamp(). |
Previous Message | Marko Tiikkaja | 2016-08-11 09:01:18 | Re: Assertion failure in REL9_5_STABLE |