Re: add function argument names to regex* functions.

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Dian Fay <di(at)nmfay(dot)com>
Cc: Jim Nasby <jim(dot)nasby(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: add function argument names to regex* functions.
Date: 2024-01-08 14:26:34
Message-ID: CACJufxHTw22kOmmDBpubmju9bfFhonEekwf0s6LGvysbnWEg9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di(at)nmfay(dot)com> wrote:
>
> On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim(dot)nasby(at)gmail(dot)com> wrote:
> > >
> > > On 1/3/24 5:05 PM, Dian Fay wrote:
> > >
> > > Another possibility is `index`, which is relatively short and not a
> > > reserved keyword ^1. `position` is not as precise but would avoid the
> > > conceptual overloading of ordinary indices.
> > >
> > > I'm not a fan of "index" since that leaves the question of
> > > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > > Jian's suggestion of "occurance" is best.
> > >
> > > We do have precedent for one-based `index` in Postgres: array types are
> > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > > and easy to misspell (I looked it up after typing it just now and it
> > > _still_ feels off).
> > >
> > > How's "instance"?
> > >
> > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumably misspelling wouldn't be a big issue. But I think "instance" is OK as well.
> > >
> > > --
> > > Jim Nasby, Data Architect, Austin TX
> >
> > regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> > [, N [, endoption [, flags [, subexpr ]]]]])
> > oracle:
> > REGEXP_INSTR (source_char, pattern, [, position [, occurrence [,
> > return_opt [, match_param [, subexpr ]]]]] )
> >
> > "string" and "source_char" are almost the same descriptive, so maybe
> > there is no need to change.
> > "start" is better than "position", imho.
> > "return_opt" is better than "endoption", (maybe we need change, for
> > now I didn't)
> > "flags" cannot be changed to "match_param", given it quite everywhere
> > in functions-matching.html.
> >
> > similarly for function regexp_replace, oracle using "repplace_string",
> > we use "replacement"(mentioned in the doc).
> > so I don't think we need to change to "repplace_string".
> >
> > Based on how people google[0], I think `occurrence` is ok, even though
> > it's verbose.
> > to change from `N` to `occurrence`, we also need to change the doc,
> > that is why this patch is more larger.
> >
> >
> > [0]: https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8
>
> The `regexp_replace` summary in table 9.10 is mismatched and still
> specifies the first parameter name as `string` instead of `source`.
> Since all the other functions use `string`, should `regexp_replace` do
> the same or is this a case where an established "standard" diverges?
>

got it. Thanks for pointing it out.

in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.

So does the following refactored description of regexp_replace make sense:

The <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
<replaceable>replacement</replaceable> string substituted for the matching
substring. The <replaceable>replacement</replaceable> string can contain
<literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
through 9, to indicate that the source substring matching the
<replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
inserted, and it can contain <literal>\&amp;</literal> to indicate that the
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in
the replacement
text.

> I noticed the original documentation for some of these functions is
> rather disorganized; summaries explain `occurrence` without explaining
> the prior `start` parameter, and detailed documentation in 9.7 is
> usually a single paragraph per function running pell-mell through ifs
> and buts without section headings, so entries in table 9.10 have to
> reference the entire section 9.7.3 instead of their specific functions.
> It's out of scope here, but should I bring this up on pgsql-docs?

I got it.
in Table 9.10. Other String Functions and Operators, if we can
reference the specific function would be great.
As for now, in the browser, you need to use Ctrl+F to find the
detailed explanation in 9.7.3.
you can just bring your suggested or patch to pgsql-hackers(at)postgresql(dot)org(dot)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-01-08 14:54:00 Re: SQL:2011 application time
Previous Message Aleksander Alekseev 2024-01-08 14:04:19 Re: Escape output of pg_amcheck test