From: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Cc: | "'Pavel Stehule'" <pavel(dot)stehule(at)gmail(dot)com> |
Subject: | PostgreSQL SQL Tricks: faster urldecode |
Date: | 2013-09-20 15:26:20 |
Message-ID: | B6F6FD62F2624C4C9916AC0175D56D880CE1C118@jenmbs01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
here is a function which is about 8 x faster than the one described in the PostgreSQL SQL Tricks
( http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Function_for_decoding_of_url_code )
The idea is to handle each encoded/not_encoded parts in bulk rather than spliting on each character.
urldecode_arr:
Seq Scan on lt_referrer (actual time=1.966..17623.979 rows=65717 loops=1)
urldecode:
Seq Scan on lt_referrer (actual time=4.846..144445.292 rows=65717 loops=1)
regards,
Marc Mamin
CREATE OR REPLACE FUNCTION urldecode_arr(url text)
RETURNS text AS
$BODY$
DECLARE ret text;
BEGIN
BEGIN
WITH STR AS (
SELECT
-- array with all non encoded parts, prepend with '' when the string start is encoded
case when $1 ~ '^%[0-9a-fA-F][0-9a-fA-F]'
then array['']
end
|| regexp_split_to_array ($1,'(%[0-9a-fA-F][0-9a-fA-F])+', 'i') plain,
-- array with all encoded parts
array(select (regexp_matches ($1,'((?:%[0-9a-fA-F][0-9a-fA-F])+)', 'gi'))[1]) encoded
)
SELECT string_agg(plain[i] || coalesce( convert_from(decode(replace(encoded[i], '%',''), 'hex'), 'utf8'),''),'')
FROM STR,
(SELECT generate_series(1, array_upper(encoded,1)+2) i FROM STR)blah
INTO ret;
EXCEPTION WHEN OTHERS THEN
raise notice 'failed: %',url;
return $1;
END;
RETURN coalesce(ret,$1); -- when the string has no encoding;
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE STRICT
From | Date | Subject | |
---|---|---|---|
Next Message | Manuel Kniep | 2013-09-20 15:29:11 | table inheritance and locks |
Previous Message | Kevin Grittner | 2013-09-20 14:49:04 | Re: recover deleted data |