From: | "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | brian <brian(at)zijn-digital(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: match accented chars with ASCII-normalised version |
Date: | 2008-01-26 02:46:24 |
Message-ID: | dcc563d10801251846q4c23a893va4fb3d30d1297633@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Jan 24, 2008 11:02 PM, brian <brian(at)zijn-digital(dot)com> wrote:
> The client for a web application I'm working on wants certain URLs to
> contain the full names of members ("SEO-friendly" links). Scripts would
> search on, say, a member directory entry based on the name of the
> member, rather than the row ID. I can easily join first & last names
> with an underscore (and split on that later) and replace spaces with +,
> etc. But many of the names contain multibyte characters and so the URLs
> would become URL-encoded, eg:
>
> Adelina España -> Adelina_Espa%C3%B1a
>
> The client won't like this (and neither will I).
>
> I can create a conversion array to replace certain characters with
> 'normal' ones:
>
> Adelina_Espana
>
> However, I then run into the problem of trying to match 'Espana' to
> 'España'. Searching online, I found a few ideas (soundex, intuitive
> fuzzy something-or-other) but mostly they seem like overkill for this
> application.
>
> The best I can come up with is to add a 'link_name' column to the table
> that holds the 'normalised' version of the name ('Adelina_Espana', or
> even 'adelina_espana'). The duplication bugs me a little but the table
> currently stands at a whopping ~3500 names, so I'm not too concerned.
>
> My question is: well, does this look like the way to go, considering
> it's just a web app (and isn't likely to ever top 10000 names)? Or is
> there something clever (yet not overkill) that I'm missing?
>
> If I do go this route, I'd add an insert/update trigger to call a
> function (PL/Perl, I'm looking at you) that handles the conversion to
> link_name.
You could create an immutable function to convert characters from
accented to normalized, then index on that function.
select normalized_name(firstname||'_'||lastname) from sometable
where normalized_name(firstname||'_'||lastname) = 'adelina_espana'
kind of thing.
From | Date | Subject | |
---|---|---|---|
Next Message | Michelle Konzack | 2008-01-26 12:06:33 | Clustering/Partitioning tables from existing tables? |
Previous Message | Tom Hart | 2008-01-25 20:53:35 | Re: ascii to utf-8 |