Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, David Fetter <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-08 16:20:14
Message-ID: 9BFE80D2-E2C7-4A8D-BE69-B803928C136B@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 5, 2021, at 11:46 AM, Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
>
> /Joel
> <range.sql><0003-regexp-positions.patch>

I did a bit more testing:

+SELECT regexp_positions('foobarbequebaz', 'b', 'g');
+ regexp_positions
+------------------
+ {"[3,5)"}
+ {"[6,8)"}
+ {"[11,13)"}
+(3 rows)
+

I understand that these ranges are intended to be read as one character long matches starting at positions 3, 6, and 11, but they look like they match either two or three characters, depending on how you read them, and users will likely be confused by that.

+SELECT regexp_positions('foobarbequebaz', '(?=beque)', 'g');
+ regexp_positions
+------------------
+ {"[6,7)"}
+(1 row)
+

This is a zero length match. As above, it might be confusing that a zero length match reads this way.

+SELECT regexp_positions('foobarbequebaz', '(?<=z)', 'g');
+ regexp_positions
+------------------
+ {"[14,15)"}
+(1 row)
+

Same here, except this time position 15 is referenced, which is beyond the end of the string.

I think a zero length match at the end of this string should read as {"[14,14)"}, and you have been forced to add one to avoid that collapsing down to "empty", but I'd rather you found a different datatype rather than abuse the definition of int4range.

It seems that you may have reached a similar conclusion down-thread?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2021-03-08 16:23:58 Re: Let people set host(no)ssl settings from initdb
Previous Message Ibrar Ahmed 2021-03-08 16:15:53 Re: Yet another fast GiST build