pgsql: Allow complemented character class escapes within regex brackets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Allow complemented character class escapes within regex brackets
Date: 2021-02-25 18:29:31
Message-ID: E1lFLOF-0005DU-Fq@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Allow complemented character class escapes within regex brackets.

The complement-class escapes \D, \S, \W are now allowed within
bracket expressions. There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.

Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets. POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.

One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".

Some implementation cleanup along the way:

* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.

* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.

* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does? Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.

Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/2a0af7fe460eb46f9af996075972bf7c2e3f211d

Modified Files
--------------
doc/src/sgml/func.sgml | 25 +-
src/backend/regex/re_syntax.n | 13 +-
src/backend/regex/regc_color.c | 34 ++-
src/backend/regex/regc_lex.c | 166 ++----------
src/backend/regex/regc_locale.c | 97 +++----
src/backend/regex/regc_pg_locale.c | 9 +
src/backend/regex/regcomp.c | 285 +++++++++++++++++----
src/include/regex/regguts.h | 20 +-
.../modules/test_regex/expected/test_regex.out | 250 ++++++++++++++++++
src/test/modules/test_regex/sql/test_regex.sql | 44 ++++
10 files changed, 672 insertions(+), 271 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2021-02-25 18:33:34 pgsql: Doc: remove src/backend/regex/re_syntax.n.
Previous Message Peter Eisentraut 2021-02-25 09:49:02 Re: pgsql: pg_collation_actual_version() -> pg_collation_current_version().