Quick Links

regexp_matches and regexp_split are inconsistent

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	regexp_matches and regexp_split are inconsistent
Date:	2007-08-11 01:25:34
Message-ID:	17867.1186795534@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I noticed the following behavior in CVS HEAD, using a pattern that is
capable of matching no characters:

regression=# SELECT foo FROM regexp_matches('ab cde', $re$\s*$re$, 'g') AS foo;
foo
-------
{""}
{""}
{" "}
{""}
{""}
{""}
{""}
(7 rows)

regression=# SELECT foo FROM regexp_split_to_table('ab cde', $re$\s*$re$) AS foo;
foo
-----
a
b
c
d
e
(5 rows)

If you count carefully, you will see that regexp_matches() reports a
match of the pattern at the start of the string and at the end of the
string, and also just before 'c' (after the match to the single space).
However, regexp_split() disregards these "degenerate" matches of the
same pattern.

Is this what we want? Arguably regexp_split is doing the most
reasonable thing for its intended usage, but the strict definition of
regexp matching seems to require what regexp_matches does. I think
we need to either change one function to match the other, or else
document the inconsistency.

Thoughts?

regards, tom lane

Responses

Re: regexp_matches and regexp_split are inconsistent at 2007-08-11 05:44:26 from Pavel Stehule
Re: regexp_matches and regexp_split are inconsistent at 2007-08-11 16:59:03 from Stephan Szabo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bertram Scharpf	2007-08-11 03:20:21	Re: Wrote a connect-by feature
Previous Message	Gregory Stark	2007-08-10 23:20:47	Re: Unexpected VACUUM FULL failure