Re: Regex with > 32k different chars causes a backend crash

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Regex with > 32k different chars causes a backend crash
Date: 2013-04-03 17:58:30
Message-ID: 19444.1365011910@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> Attached is a patch to add the overflow check. I used the error message
> "too many distinct characters in regex". That's not totally accurate,
> because there isn't a limit on distinct characters per se, but on the
> number of colors. Conceivably, you could have a regexp with more than
> 32k different characters, but where most of them are mapped to the same
> color. In practice, it's not helpful to the user to say "too many
> colors"; he will have no clue what a color is.

Patch looks good except perhaps for wordsmithing the message text.

One thought is that we don't need to identify this as a regex error
because the PG code will report it with "invalid regular expression: %s".

I think there's a good argument for saying "too many character colors"
and relying on the "invalid regular expression" context to clue in the
clueless. After all, most of them don't know what an NFA is either, but
no one has complained about the REG_ETOOBIG message. I think if you get
to the point where you're triggering this error, you probably know
something about regexes, or even if you don't the phrase "too many" will
give you a fair idea what's wrong. There is much to be said for
specifically identifying the implementation limit that's been hit, even
if most users don't know what it is. So I'd just as soon not fall back
on something imprecise.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-04-03 18:14:34 Re: commit dfda6ebaec67 versus wal_keep_segments
Previous Message Rodrigo Barboza 2013-04-03 17:53:33 Re: c language functions