Re: Hostnames, IDNs, Punycode and Unicode Case Folding

From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Date: 2014-12-30 00:01:44
Message-ID: CAKFQuwY7APvDZD5Yr5cAYS+ix42Aruhf4DJ5XscagMeBUffndQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Dec 29, 2014 at 4:51 PM, Mike Cardwell [via PostgreSQL] <
ml-node+s1045698n5832364h8(at)n5(dot)nabble(dot)com> wrote:

> * on the Mon, Dec 29, 2014 at 04:23:22PM -0700, David G Johnston wrote:
>
> >>> WHERE hostname='nißan.com <http://nissan.com>'
> >>>
> >>
> >> _IF_ Postgres had a punycode function, then you could use:
> >> WHERE punycode(hostname) = punycode('nißan.com <http://nissan.com>')
> >
> > If the OP wraps what he is doing up into a function that is what you end
> up
> > getting: a memoized punycode function.
> >
> > http://en.wikipedia.org/wiki/Memoization
> >
> > It has to be defined as volatile but basically write the function to
> check
> > for the provided input on the indexed table and if it doesn't exist the
> > function will calculate the punycode value and store it onto the table
> > before returning the punycode value to the caller.
> I'm not sure all that is necessary. It could be quite a simple function,
> like the lower() function. So what I would do is this:
>
> CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames
> (lower(punycode_encode(hostname)));
>
> That would prevent adding more than one representation for the same
> hostname
> to the column.

​Except two different hostname can resolve to the same
punycode_encode(hostname) value so the unique index won't work.
It was also mentioned that using the Perl encoding function was
non-performant; which is why caching the data into a memoization table has
value.

WHERE lower(punycode_encode(hostname)) =
> lower(punycode_encode('any-representation'))
>

I'm not for knowing the rules of punycode but I'm not seeing what value
lower() provides here...

>
> There doesn't need to be any extra table storage for the punycode encoded
> version.
>
>
​David J.​

--
View this message in context: http://postgresql.nabble.com/Hostnames-IDNs-Punycode-and-Unicode-Case-Folding-tp5832350p5832368.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-12-30 00:09:02 Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Previous Message Andrew Sullivan 2014-12-30 00:00:05 Re: Hostnames, IDNs, Punycode and Unicode Case Folding