Re: regexp_replace to remove sql comments

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mike <mike(at)wolman(dot)co(dot)uk>
Cc: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: regexp_replace to remove sql comments
Date: 2015-10-28 23:17:47
Message-ID: 4747.1446074267@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Mike <mike(at)wolman(dot)co(dot)uk> writes:
> Thanks with a bit of moving stuff about I think thats sorted it - in
> case anyone every needs it:
> SELECT
> query,
> trim(regexp_replace(
> regexp_replace(
> regexp_replace(query,'\/\*.+\*\/','','g'),
> '--[^\r\n]*', ' ', 'g')
> , '\s+', ' ', 'g')) as q
> FROM public.pg_stat_statements
> WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
> current_database())

This doesn't look too reliable from here:

1. Doesn't handle multiline /* comments.

2. Does wrong thing if more than one /* comment appears on one line.
(You could improve that by using .*? instead of .+, but then it'd
do the wrong thing with nested /* comments.)

3. Breaks things if either -- or /* appear inside a string literal,
double-quoted identifier, or $$ literal.

I'm not at all sure that it's possible to handle this requirement 100%
correctly with regexes; they're unable to do context-sensitive processing.

But so far as pg_stat_statements is concerned, why would you need to
do this at all? The duplicate-query elimination it does should be
insensitive to comments already.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-10-28 23:20:12 Re: Postgresql Installation -- Red Hat vs OpenSUSE vs Ubuntu
Previous Message Mike 2015-10-28 22:58:34 Re: regexp_replace to remove sql comments