Quick Links

Re: regexp_positions()

From:	"Joel Jacobson" <joel(at)compiler(dot)org>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: regexp_positions()
Date:	2021-02-28 11:15:51
Message-ID:	de1d7b6e-537e-4407-a777-0d6f674b0677@www.fastmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I had a bug in the function, and I see I also accidentally renamed it to regexp_ranges().

Attached is a fixed version of the PoC.

This function is e.g. useful when we're interested in patterns in meta-data,
where we're not actually finding patterns in the data,
but in a string where each character corresponds to an element
in an array, containing the actual data.

In such case, we need to know the positions of the matches,
since they tell what corresponding array elements that matched.

For instance, let's take the UNIX diff tool we all know as an example.

Let's say you have all the raw diff lines stored in a text[] array,
and we want to produce a unified diff, by finding hunks
with up to 3 unchanged lines before/after each hunk
containing changes.

If we produce a text string containing one character per diff line,
using "=" for unchanged, "+" for addition, "-" for deletion.

Example: =====-=======+=====-+======

We could then find the hunks using this regex:

(={0,3}[-+]+={0,3})+

using regexp_positions() to find the start and end positions for each hunk:

SELECT * FROM regexp_positions('=====-=======+=====-+======','(={0,3}[-+]+={0,3})+');
start_pos | end_pos
-----------+---------
3 | 9
11 | 24
(2 rows)

/Joel

Attachment	Content-Type	Size
regexp_positions.sql	application/octet-stream	547 bytes

In response to

Re: regexp_positions() at 2021-02-28 03:58:05 from Joel Jacobson

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Dilger	2021-02-28 16:54:06	Re: proposal: psql –help reflecting service or URI usage
Previous Message	Paul Förster	2021-02-28 09:57:32	proposal: psql –help reflecting service or URI usage