Re: How to convert HEX to ASCII?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Torsten Zuehlsdorff <foo(at)meisterderspiele(dot)de>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: How to convert HEX to ASCII?
Date: 2011-12-23 14:14:39
Message-ID: CAFj8pRCVaoysgoqpx1eBvL42+kL_QF-1n--WwDvv_U=AYh9Ffg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello all

just note

9.1 will have a bytea_agg aggregate

regards

Pavel Stehule

2011/12/2 Marti Raudsepp <marti(at)juffo(dot)org>:
> On Fri, Dec 2, 2011 at 16:16, Torsten Zuehlsdorff
> <foo(at)meisterderspiele(dot)de> wrote:
>> But i clearly have a missunderstanding of other chars, like umlauts or utf-8
>> chars. This, for example, should return a 'ö':
>>
>> # SELECT chr(x'C3B6'::int);
>>  chr
>> -----
>>  쎶
>> (1 row)
>
> That gives you the Unicode codepoint C3B6, but %C3%B6 is UTF-8-encoded
> and actually decodes to the codepoint 00F6.
>
> There is a fundamental problem that a decoded URL may actually be a
> binary string -- it might not have a textual representation at all.
> But if text is what you want, RFC3986 strongly suggests using UTF-8
> for encoding text strings in URLs, and that works almost always in the
> real world.
>
> So the *right* way is to first convert the URL to a binary "bytea"
> type by fixing all the % escapes, then convert that to UTF-8 encoding
> to handle multibyte characters.
>
> What I came up with is far from elegant because PostgreSQL lacks
> convenient functions for bytea manipulation (no bytea_agg, etc).
> Stealing a little from Merlin, this is what it looks like:
>
> CREATE OR REPLACE FUNCTION url_decode(input text) RETURNS text
> LANGUAGE plpgsql IMMUTABLE STRICT AS $$
> DECLARE
>  bin bytea = '';
>  byte text;
> BEGIN
>  FOR byte IN (select (regexp_matches(input, '(%..|.)', 'g'))[1]) LOOP
>    IF length(byte) = 3 THEN
>      bin = bin || decode(substring(byte, 2, 2), 'hex');
>    ELSE
>      bin = bin || byte::bytea;
>    END IF;
>  END LOOP;
>  RETURN convert_from(bin, 'utf8');
> END
> $$;
>
> db=# select url_decode('Hell%C3%B6%20World%21');
>  url_decode
> --------------
>  Hellö World!
>
> db=# select url_decode('%EC%8E%B6');
>  url_decode
> ------------
>  쎶
>
> This will break for binary-encoded data in URLs, though.
> db=# select url_decode('%fa%fa%fa');
> ERROR:  invalid byte sequence for encoding "UTF8": 0xfa
> CONTEXT:  PL/pgSQL function "url_decode" line 13 at RETURN
>
> ----
>
> On Fri, Dec 2, 2011 at 17:46, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> set client_encoding to latin1;
>
>> postgres=# select unencode('Hell%C3%B6%20World%21');
>>   unencode
>> ---------------
>>  Hellö World!
>> (1 row)
>
> Sorry, but AFAICT this makes a mess of encodings and only works by
> pure luck. The server thinks it's sending the client LATIN1 text, but
> it's actually UTF8-encoded and the last decoding step is done by your
> terminal.
>
> Regards,
> Marti
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2011-12-23 14:32:48 Re: Fetch from refcursor and transactions
Previous Message Aman Gupta 2011-12-23 13:33:14 postgresql triggers - defining a global resource (java)