Regex with > 32k different chars causes a backend crash

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Regex with > 32k different chars causes a backend crash
Date: 2013-04-03 15:11:28
Message-ID: 515C46A0.3090002@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While playing with Alexander's pg_trgm regexp patch, I noticed that the
regexp library trips an assertion (if enabled) or crashes, when passed
an input string that contains more than 32k different characters:

select 'foo' ~ (select string_agg(chr(x),'') from generate_series(100,
35000) x) as nastyregex;

This is because it uses 'short' as the datatype to identify colors. When
it overflows, -32768 is used as index to the colordesc array, and you
get a crash. AFAICS this can't reliably be used for anything more
sinister than crashing the backend.

A regex with that many different colors is an extreme case, so I think
it's enough to turn the assertion in newcolor() into a run-time check,
and throw a "too many colors in regexp" error. Alternatively, we could
expand 'color' from short to int, but that would double the memory usage
of sane regexps with less different characters.

Thoughts?

- Heikki

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-04-03 15:21:27 Re: Regex with > 32k different chars causes a backend crash
Previous Message Tom Lane 2013-04-03 14:59:09 Re: Drastic performance loss in assert-enabled build in HEAD