From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: regexp_positions() |
Date: | 2021-02-28 11:15:51 |
Message-ID: | de1d7b6e-537e-4407-a777-0d6f674b0677@www.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I had a bug in the function, and I see I also accidentally renamed it to regexp_ranges().
Attached is a fixed version of the PoC.
This function is e.g. useful when we're interested in patterns in meta-data,
where we're not actually finding patterns in the data,
but in a string where each character corresponds to an element
in an array, containing the actual data.
In such case, we need to know the positions of the matches,
since they tell what corresponding array elements that matched.
For instance, let's take the UNIX diff tool we all know as an example.
Let's say you have all the raw diff lines stored in a text[] array,
and we want to produce a unified diff, by finding hunks
with up to 3 unchanged lines before/after each hunk
containing changes.
If we produce a text string containing one character per diff line,
using "=" for unchanged, "+" for addition, "-" for deletion.
Example: =====-=======+=====-+======
We could then find the hunks using this regex:
(={0,3}[-+]+={0,3})+
using regexp_positions() to find the start and end positions for each hunk:
SELECT * FROM regexp_positions('=====-=======+=====-+======','(={0,3}[-+]+={0,3})+');
start_pos | end_pos
-----------+---------
3 | 9
11 | 24
(2 rows)
/Joel
Attachment | Content-Type | Size |
---|---|---|
regexp_positions.sql | application/octet-stream | 547 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2021-02-28 16:54:06 | Re: proposal: psql –help reflecting service or URI usage |
Previous Message | Paul Förster | 2021-02-28 09:57:32 | proposal: psql –help reflecting service or URI usage |