Re: regexp_positions()

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: regexp_positions()
Date: 2021-02-28 11:15:51
Message-ID: de1d7b6e-537e-4407-a777-0d6f674b0677@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I had a bug in the function, and I see I also accidentally renamed it to regexp_ranges().

Attached is a fixed version of the PoC.

This function is e.g. useful when we're interested in patterns in meta-data,
where we're not actually finding patterns in the data,
but in a string where each character corresponds to an element
in an array, containing the actual data.

In such case, we need to know the positions of the matches,
since they tell what corresponding array elements that matched.

For instance, let's take the UNIX diff tool we all know as an example.

Let's say you have all the raw diff lines stored in a text[] array,
and we want to produce a unified diff, by finding hunks
with up to 3 unchanged lines before/after each hunk
containing changes.

If we produce a text string containing one character per diff line,
using "=" for unchanged, "+" for addition, "-" for deletion.

Example: =====-=======+=====-+======

We could then find the hunks using this regex:

(={0,3}[-+]+={0,3})+

using regexp_positions() to find the start and end positions for each hunk:

SELECT * FROM regexp_positions('=====-=======+=====-+======','(={0,3}[-+]+={0,3})+');
start_pos | end_pos
-----------+---------
3 | 9
11 | 24
(2 rows)

/Joel

Attachment Content-Type Size
regexp_positions.sql application/octet-stream 547 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-02-28 16:54:06 Re: proposal: psql –help reflecting service or URI usage
Previous Message Paul Förster 2021-02-28 09:57:32 proposal: psql –help reflecting service or URI usage