From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jon Jensen <jon(at)endpoint(dot)com> |
Cc: | Neil Conway <neilc(at)samurai(dot)com>, wade <wade(at)wavefire(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POSIX regex performance bug in 7.3 Vs. 7.2 |
Date: | 2003-02-04 18:21:31 |
Message-ID: | 15575.1044382891@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jon Jensen <jon(at)endpoint(dot)com> writes:
> It would be a delight to be able to use more advanced (IMHO) Perl-
> compatible regexes in PostgreSQL.
After some further research, pcre does seem like an interesting
alternative. Both pcre and Spencer's new code have essentially
Berkeley-style licenses, so there's no problem there. Some
relevant comparisons:
1. pcre tries to be exactly compatible with Perl, so details of its
regex flavor will be familiar to many more people than the Tcl flavor
(by and large the features are similar, but there are differences).
2. pcre is already distributed as a nice tidy library; we need not
extract code from the Tcl distribution.
3. pcre is actively maintained (although tracking a new release every
couple months may not be something we really want to do, anyway).
AFAICT Henry's not doing anything much with his code, so it'd be
pretty much take-once-and-maintain-for-ourselves.
4. pcre looks like it's probably *not* as well suited to a multibyte
environment. In particular, I doubt that its UTF8 compile option was
even turned on for the performance comparison Neil cited --- and the man
page only promises "experimental, incomplete support for UTF-8 encoded
strings". The Tcl code by contrast is used only in a multibyte
environment, so that's the supported, optimized path. It doesn't even
assume null-terminated strings (yay).
5. As best I can tell so far, neither code is currently set up for
run-time choice of encoding; we'd have to do some work for that in
either case. (This probably means that tracking pcre update releases
would be problematic anyhow.)
6. According to Friedl's book, the Tcl engine (Spencer's new code)
is way faster than Perl's, and so presumably faster than pcre, though
I can't find any specific measurements of pcre in the book. It uses a
hybrid DFA/NFA approach which Friedl considers state of the art.
Strict Perl compatibility would be a nice feature, but right at the
moment the multibyte issue is looking like the determining factor.
If we don't get a multibyte-optimized engine out of this change, we're
wasting our time.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | greg | 2003-02-04 18:57:27 | Re: PGP Signing ... |
Previous Message | Marc G. Fournier | 2003-02-04 18:09:43 | PGP Signing ... |