From: | Gilles Darold <gilles(at)darold(dot)net> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace |
Date: | 2021-03-03 09:15:57 |
Message-ID: | fc160ee0-c843-b024-29bb-97b5da61971f@darold.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Oracle:
https://docs.oracle.com/en/database/oracle/oracle-database/18/adfns/regexp.html#GUID-F14733F3-B943-4BAD-8489-F9704986386B
IBM:
https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061494.html?pos=2
Z/OS:
https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_regexplike.html
EDB:
https://www.enterprisedb.com/edb-docs/d/edb-postgres-advanced-server/reference/database-compatibility-for-oracle-developers-reference-guide/9.6/Database_Compatibility_for_Oracle_Developers_Reference_Guide.1.098.html
Hi,
I would like to suggest adding the $subject functions to PostgreSQL. We
can do lot of things using regexp_matches() and regexp_replace() but
some time it consist on building complex queries that these functions
can greatly simplify.
Look like all RDBMS that embedded a regexp engine implement these
functions (Oracle, DB2, MySQL, etc) but I don't know if they are part of
the SQL standard. Probably using regexp_matches() can be enough even if
it generates more complex statements but having these functions in
PostgreSQL could be useful for users and code coming from theses RDBMS.
- REGEXP_COUNT( string text, pattern text, [, position int] [, flags
text ] ) -> integer
Return the number of times a pattern occurs in a source string
after a certain position, default from beginning.
It can be implemented in PostgreSQL as a subquery using:
SELECT count(*) FROM regexp_matches('A1B2C3', '[A-Z][0-9]',
'g'); -> 3
To support positioning we have to use substr(), for example
starting at position 2:
SELECT count(*) FROM regexp_matches(substr('A1B2C3', 2),
'[A-Z][0-9]'); -> 2
With regexp_count() we can simply use it like this:
SELECT regexp_count('A1B2C3', '[A-Z][0-9]'); -> 3
SELECT regexp_count('A1B2C3', '[A-Z][0-9]', 2); -> 2
- REGEXP_INSTR( string text, pattern text, [, position int] [,
occurrence int] [, return_opt int ] [, flags text ] [, group int] ) ->
integer
Return the position in a string for a regular expression
pattern. It returns an integer indicating the beginning or ending
position of the matched substring, depending on the value of the
return_opt argument (default beginning). If no match is found, then the
function returns 0.
* position: indicates the character where the search should
begin.
* occurrence: indicates which occurrence of pattern found in
string should be search.
* return_opt: 0 mean returns the position of the first
character of the occurrence, 1 mean returns the position of the
character following the occurrence.
* flags: regular expression modifiers.
* group: indicates which subexpression in pattern is the
target of the function.
Example:
SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1,
0, 'i', 4); -> 7
to obtain a PostgreSQL equivalent:
SELECT position((SELECT (regexp_matches('1234567890',
'(123)(4(56)(78))', 'ig'))[4] offset 0 limit 1) IN '1234567890');
- REGEXP_SUBSTR( string text, pattern text, [, position int] [,
occurrence int] [, flags text ] [, group int] ) -> text
It is similar to regexp_instr(), but instead of returning the
position of the substring, it returns the substring itself.
Example:
SELECT regexp_substr('500 gilles''s street, 38000 Grenoble,
FR', ',[^,]+,'); -> , 38000 Grenoble,
or with a more complex extraction:
SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1,
'i', 4); -> 78
SELECT regexp_substr('1234567890 1234557890',
'(123)(4(5[56])(78))', 1, 2, 'i', 3); -> 55
To obtain the same result for the last example we have to use:
SELECT (SELECT * FROM regexp_matches('1234567890
1234557890', '(123)(4(5[56])(78))', 'g') offset 1 limit 2)[3];
I have not implemented the regexp_like() function, it is quite similar
than the ~ and ~* operators except that it can also support other
modifiers than 'i'. I can implement it easily and add it to the patch if
we want to supports all those common functions.
- REGEXP_LIKE( string text, pattern text, [, flags text ] ) -> boolean
Similar to the LIKE condition, except that it performs regular
expression matching instead of the simple pattern matching performed by
LIKE.
Example:
SELECT * FROM t1 WHERE regexp_like(col1, '^d$', 'm');
to obtain a PostgreSQL equivalent:
SELECT * FROM t1 WHERE regexp_match (col1, '^d$', 'm' ) IS
NOT NULL;
There is also a possible extension to regexp_replace() that I have not
implemented yet because it need more work than the previous functions.
- REGEXP_REPLACE( string text, pattern text, replace_string text, [,
position int] [, occurrence int] [, flags text ] )
Extend PostgreSQL regexp_replace() by adding position and occurrence
capabilities.
The patch is ready for testing with documentation and regression tests.
Best regards,
--
Gilles Darold
LzLabs GmbH
From | Date | Subject | |
---|---|---|---|
Next Message | Gilles Darold | 2021-03-03 09:22:16 | Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace |
Previous Message | Daniel Gustafsson | 2021-03-03 09:04:58 | Re: pg_upgrade version checking questions |