From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Foster, Russell" <Russell(dot)Foster(at)crl(dot)com> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: 9.5.3: substring: regex greedy operator not picking up chars as expected |
Date: | 2016-08-15 13:41:46 |
Message-ID: | 4424.1471268506@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
"Foster, Russell" <Russell(dot)Foster(at)crl(dot)com> writes:
> For the following query:
> select substring('>772' from '.*?[0-9]+')
> I would expect the output to be '>772', but it is '>7'.
As David pointed out, that's what you get because the RE as a whole is
considered to be non-greedy, ie you get the shortest overall match.
However, you can adjust that by decorating the RE:
# select substring('>772' from '(.*?[0-9]+){1,1}');
substring
-----------
>772
(1 row)
Now it's longest-overall, but the .*? part is still shortest-match,
so it doesn't consume any digits. However, I suspect that still is
not quite what you want, because it consumes too much in cases like:
# select substring('>772foo444' from '(.*?[0-9]+){1,1}');
substring
------------
>772foo444
(1 row)
There's probably really no way out of that except to be less lazy about
writing the pattern:
# select substring('>772foo444' from '([^0-9]*?[0-9]+){1,1}');
substring
-----------
>772
(1 row)
and in that formulation, of course, greediness doesn't really matter
because there is only one way to match.
# select substring('>772foo444' from '[^0-9]*[0-9]+');
substring
-----------
>772
(1 row)
See
https://www.postgresql.org/docs/9.5/static/functions-matching.html#POSIX-MATCHING-RULES
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Mephysto | 2016-08-16 06:17:30 | Re: jsonb_array_elements issue |
Previous Message | Ilya.Kompanets | 2016-08-15 13:05:24 | Проблема pg_dump.exe |