RE: Regex Replace with 2 conditions

From: Denisa Cirstescu <Denisa(dot)Cirstescu(at)tangoe(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: RE: Regex Replace with 2 conditions
Date: 2018-02-05 16:54:18
Message-ID: CY1PR12MB00251473B9810794A05579A1E6FE0@CY1PR12MB0025.namprd12.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Francisco,

I've tried the version that you are proposing before posting this question, but it is not good as it is removing characters that have ASCII code greater than 255 and those are characters that I need to keep, such as "ă".

SELECT regexp_replace(p_string, E'[^A-Za-z0-9%_]', '', 'g'));

This is the request that I have: write a function that eliminates all ASCII characters from 1-255 that are not A-Z, a-z, 0-9, and special characters % and _

Tom,

I have tried what you suggested with the lookahead and it is working.
It is exactly what I needed. The final version of the function is:

CREATE OR REPLACE FUNCTION testFunction(p_string CHARACTER VARYING) RETURNS VARCHAR AS $$
SELECT regexp_replace(p_string, E'(?=[' || CHR(1) || '-' || CHR(255) || '])[^A-Za-z0-9%_]', '', 'g');
$$ LANGUAGE sql IMMUTABLE;

Thanks a lot,
Denisa Cîrstescu

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, February 5, 2018 4:43 PM
To: Denisa Cirstescu <Denisa(dot)Cirstescu(at)tangoe(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Regex Replace with 2 conditions

Denisa Cirstescu <Denisa(dot)Cirstescu(at)tangoe(dot)com> writes:
> Is there a way to specify 2 conditions in regexp_replace?
> I need an SQL function that eliminates all ASCII characters from 1-255 that are not A-Z, a-z, 0-9, and special characters % and _ so something like:
> SELECT regexp_replace(p_string, E'[' || CHR(1) || '-' || CHR(255) ||
> '&&[^A-Za-z0-9%_]]', '', 'g')); But this syntax is not really working.

Nope, because there's no && operator in regexes.

But I think you could get what you want by using lookahead or lookbehind to combine additional condition(s) with a basic character-class pattern.
Something like

(?=[\001-\377])[^A-Za-z0-9%_]

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2018-02-05 17:22:02 Re: Regex Replace with 2 conditions
Previous Message David G. Johnston 2018-02-05 14:53:24 Re: Regex Replace with 2 conditions