Re: regexp_match() returning text

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: emre(at)hasegeli(dot)com
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: regexp_match() returning text
Date: 2016-05-30 17:39:18
Message-ID: CAKFQuwZ7buEj2YM4AUu_CJ9OZf9FuMCdHK_mfW-9TrV5Vp6Q2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 29, 2016 at 1:28 PM, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:

> Attached patch adds regexp_match() function which is a simple variant of
> regexp_matches() that doesn't return a set. It is based on Tom Lane's
> comment to bug #10889 [1].
>
> [1] https://www.postgresql.org/message-id/23769.1404747766@sss.pgh.pa.us

Not sure if we need two functions or what but I dislike that:

"It ignores the parenthesized subexpressions in the pattern."

​The main problem being solved is the use of a SETOF result. I'm inclined
to prefer that the final, single, result is still an array.

Even if we return a single text value instead of an array it would be nice
to return the first (only...) parenthesized subexpression so that the input
pattern isn't constrained.

​I've got a style issue with the information_schema - I like to call it
useless-use-of-E'' - but that was there long before this patch...

/* user mustn't specify 'g' for regexp_split */ - do we add "or
regexp_match" or just removed the extraneous detail?

There seems to be scope creep regarding "regexp_split_to_table" that I'm
surprised to find. Related to that is the unexpected removal of the
"force_glob" parameter to setup_regexp_matches. You took what was a single
block of code and now duplicated it without any explanation in the commit
message (a code comment wouldn't work for this kind of change). The change
to flags from passing a pointer to text to passing in a pointer to a
previously derived pg_re_flags makes more sense on its face, and it is
apparently a non-public API, but again constitutes a refactoring that at
least would ideally be a separate commit from the one the introduces the
new behavior.

Also, I see an assert for the "no subexpressions" policy but I have seemed
to have overlooked where the non-assert-enabled user is informed of the
prohibition.

Tom's opinion on all this will be needed - I am not a C code writer nor do
I follow the patch writing process that closely generally, but I have a
keen interest in this topic - but there seems to be a bit more here than
simply having a function identical to the existing regexp_matches function
that prohibits the global flag and only ever returns a single array per the
existing API.

I'm good with the name, regexp_match - even with the behavior being changed
to returning an array instead of text. If there was any active plans to
redo the API so that we'd return a composite type instead of an array I'd
be more inclined to reserve "match" for that change and make this one a bit
more verbose.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2016-05-30 18:01:39 Re: regexp_match() returning text
Previous Message Dean Rasheed 2016-05-30 17:16:03 Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls