Re: How to clean up phone-numbers with regex?

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Andreas <maps(dot)on(at)gmx(dot)net>, pgsql-sql(at)postgresql(dot)org
Subject: Re: How to clean up phone-numbers with regex?
Date: 2014-05-19 15:43:49
Message-ID: 537A26B5.3030309@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 05/19/2014 01:54 AM, Andreas wrote:
> Hi
>
> I need to clean up phone-numbers. Somehow I got a Excel list that has
> weird graphical characters trailing some of the entries.
> My DB is UTF8 so it would store this mess but I don't like to import
> it in the first place.
>
> OK, I know how to read the stuff into a temporary table to clean it up
> before the actual import.
> How can I do an update on the column that deletes every char that is
> not in a given set of chars like '+- 0123456/()'?
>

See: http://www.postgresql.org/docs/current/static/functions-matching.html

For the first case, the regexp_replace function is probably your best
bet. But note that, depending on the quality of your input, just
removing characters outside that range may still not yield the desired
result.

select regexp_replace('(12s3)-456-635/6(a+sdk', '[^0-9()+-/]', '', 'g');
regexp_replace
-------------------
(123)-456-635/6(+

You can remove all formatting by requiring only digits then check and/or
reformat later as desired.
steve=> select regexp_replace('(12s3)-456-6356(a+sdk', '[^0-9]', '', 'g');
regexp_replace
----------------
1234566356

>
> Second but similar question:
> How can I select records that have fields that contain characters not
> included in a given alphabet?
> E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/?
>
See regexp_match on the above-referenced page.

Cheers,
Steve

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Steve Crawford 2014-05-19 15:44:38 Re: How to clean up phone-numbers with regex?
Previous Message Rob Sargent 2014-05-19 13:39:58 Re: How to clean up phone-numbers with regex?