Quick Links

Re: Notes about fixing regexes and UTF-8 (yet again)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc:	NISHIYAMA Tomoaki <tomoakin(at)staff(dot)kanazawa-u(dot)ac(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Notes about fixing regexes and UTF-8 (yet again)
Date:	2012-02-18 23:45:10
Message-ID:	7392.1329608710@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Yeah, it's conceivable that we could implement something whereby
>> characters with codes above some cutoff point are handled via runtime
>> calls to iswalpha() and friends, rather than being included in the
>> statically-constructed DFA maps. The cutoff point could likely be a lot
>> less than U+FFFF, too, thereby saving storage and map build time all
>> round.

> It's been proposed to build a regexp type in PostgreSQL which would
> store the DFA directly and provides some way to run that DFA out of its
> storage without recompiling.

> Would such a mechanism be useful here?

No, this is about what goes into the DFA representation in the first
place, not about how we store it and reuse it.

regards, tom lane

In response to

Re: Notes about fixing regexes and UTF-8 (yet again) at 2012-02-18 23:01:37 from Dimitri Fontaine

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-02-18 23:55:39	Re: Future of our regular expression code
Previous Message	Dimitri Fontaine	2012-02-18 23:12:09	Re: Future of our regular expression code