Re: Better locale-specific-character-class handling for regexps

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Bruno Wolff III <bruno(at)wolff(dot)to>
Subject: Re: Better locale-specific-character-class handling for regexps
Date: 2016-09-05 16:10:31
Message-ID: 12356.1473091831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 09/04/2016 08:44 PM, Tom Lane wrote:
>> I guess I could follow the lead of collate.linux.utf8.sql and produce
>> a test that's only promised to pass on one platform with one encoding,
>> but I'm not terribly excited by that. AFAIK that test file does not
>> get run at all in the buildfarm or in the wild.

> I'm not too worried if the tests don't get run regularly, but I don't
> like the idea that only works on one platform.

Well, it would work on any platform that reports high Unicode letters
as letters. The problem for putting this into the regular regression
tests is that the generic tests don't even assume UTF8 encoding, let
alone a Unicode-ish locale.

> Since we're now de facto maintainers of this regexp library, and our
> version could be used somewhere else than PostgreSQL too, it would
> actually be nice to have a regression suite that's independent from the
> pg_regress infrastructure, and wouldn't need a server to run.

If anyone ever really picks up the challenge of making the regexp library
a standalone project, I think one of the first orders of business would be
to pull out the Tcl project's regexp-related regression tests. There's a
pretty extensive set of tests written by Henry Spencer himself, and more
that they added over the years; it's far more comprehensive than our
tests. (I've looked at stealing that test set in toto, but it requires
some debug APIs that we don't expose in SQL, and probably don't want to.)

In any case, this is getting very far afield from the current patch.
I'm willing to add a regexp.linux.ut8.sql test file if you think it's
important to have some canned tests that exercise this new code, but
otherwise I don't see any near-term solution.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ivan Kartyshov 2016-09-05 16:19:45 Re: less expensive pg_buffercache on big shmem
Previous Message Alvaro Herrera 2016-09-05 15:48:32 Re: Fun fact about autovacuum and orphan temp tables