From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com> |
Cc: | "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org> |
Subject: | Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[] |
Date: | 2021-03-05 19:46:24 |
Message-ID: | 57c492b8-e8f0-44a6-926e-14a2f2f766c6@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing. I think the following should not error, but does:
>
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR: range lower bound must be less than or equal to range upper bound
Doh! How stupid of me. I realize now I had a off-by-one thinko in my 0001 patch using int4range.
I didn't use the raw "so" and "eo" values in regexp.c like I should have,
instead, I incorrectly used (so + 1) as the startpos,
and just eo as the endpos.
This is what caused all the problems.
The fix is simple:
- lower.val = Int32GetDatum(so + 1);
+ lower.val = Int32GetDatum(so);
The example that gave the error now works properly:
SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
regexp_positions
------------------
{"[6,7)"}
(1 row)
I've also created a SQL PoC of the composite range type idea,
and convenience wrapper functions for int4range and int8range.
CREATE TYPE range AS (start int8, stop int8);
Helper functions:
range(start int8, stop int8) -> range
range(int8range) -> range
range(int4range) -> range
range(int8range[]) -> range[]
range(int4range[]) -> range[]
Demo:
regexp_positions() returns setof int4range[]:
SELECT r FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
r
-----------------------
{"[3,7)","[6,12)"}
{"[11,17)","[16,21)"}
(2 rows)
Convert int4range[] -> range[]:
SELECT range(r) FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
range
-----------------------
{"(3,6)","(6,11)"}
{"(11,16)","(16,20)"}
(2 rows)
"start" and "stop" fields:
SELECT (range(r[1])).* FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
start | stop
-------+------
3 | 6
11 | 16
(2 rows)
zero-length match at beginning:
SELECT r FROM regexp_positions('','^','g') AS r;
r
-----------
{"[0,1)"}
(1 row)
SELECT (range(r[1])).* FROM regexp_positions('','^','g') AS r;
start | stop
-------+------
0 | 0
(1 row)
My conclusion is that we should use setof int4range[] as the return value for regexp_positions().
New patch attached.
The composite range type and helper functions are of course not at all necessary,
but I think they would be a nice addition, to make it easier to work with ranges
for composite types. I intentionally didn't create anyrange versions of them,
since they can only support composite types,
since they don't require the inclusive/exclusive semantics.
/Joel
Attachment | Content-Type | Size |
---|---|---|
range.sql | application/octet-stream | 1.5 KB |
0003-regexp-positions.patch | application/octet-stream | 8.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | James Coleman | 2021-03-05 20:04:50 | Re: Nicer error when connecting to standby with hot_standby=off |
Previous Message | Jacob Champion | 2021-03-05 19:11:12 | Re: PROXY protocol support |