From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Gilles Darold <gilles(at)darold(dot)net> |
Cc: | Gilles Darold <gillesdarold(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace |
Date: | 2021-08-01 17:23:13 |
Message-ID: | 1456640.1627838593@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've been working through this patch, and trying to verify
compatibility against Oracle and DB2, and I see some points that need
discussion or at least recording for the archives.
* In Oracle, while the documentation for regexp_instr says that
return_option should only be 0 or 1, experimentation with sqlfiddle
shows that any nonzero value is silently treated as 1. The patch
raises an error for other values, which I think is a good idea.
(IBM's docs say that DB2 raises an error too, though I can't test
that.) We don't need to be bug-compatible to that extent.
* What should happen when the subexpression/capture group number of
regexp_instr or regexp_substr exceeds the number of parenthesized
subexpressions of the regexp? Oracle silently returns a no-match
result (0 or NULL), as does this patch. However, IBM's docs say
that DB2 raises an error. I'm inclined to think that this is
likewise taking bug-compatibility too far, and that we should
raise an error like DB2. There are clearly cases where throwing
an error would help debug a faulty call, while I'm less clear on
a use-case where not throwing an error would be useful.
* IBM's docs say that both regexp_count and regexp_like have
arguments "string, pattern [, start] [, flags]" --- that is,
each of start and flags can be independently specified or omitted.
The patch follows Oracle, which has no start option for
regexp_like, and where you can't write flags for regexp_count
without writing start. This is fine by me, because doing these
like DB2 would introduce the same which-argument-is-this issues
as we're being forced to cope with for regexp_replace. I don't
think we need to accept ambiguity in these cases too. But it's
worth memorializing this decision in the thread.
* The patch has most of these functions silently ignoring the 'g'
flag, but I think they should raise errors instead. Oracle doesn't
accept a 'g' flag for these, so why should we? The only case where
that logic doesn't hold is regexp_replace, because depending on which
syntax you use the 'g' flag might or might not be meaningful. So
for regexp_replace, I'd vote for silently ignoring 'g' if the
occurrence-number parameter is given, while honoring it if not.
I've already made changes in my local copy per the last item,
but I've not done anything about throwing errors for out-of-range
subexpression numbers. Anybody have an opinion about that one?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-08-01 17:59:18 | Re: slab allocator performance issues |
Previous Message | vignesh C | 2021-08-01 15:32:16 | Re: Corrected documentation of data type for the logical replication message formats. |