From: | "Rui Martins" <Rui(dot)Martins(at)PDMFC(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #4044: Incorrect RegExp substring Output |
Date: | 2008-03-20 15:21:49 |
Message-ID: | 3830.B1UHWUVdEF8=.1206026509.squirrel@www.pdmfc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi Tom
> "Rui Martins" <Rui(dot)Martins(at)PDMFC(dot)com> writes:
>> My reasoning is:
>> Why would the exact same sub-expression, return different results when
>> either preceded or followed by something.
>
> It *isn't* returning different results; you are testing for different
> things in these two cases, namely whether there is a match to the whole
> pattern or just a parenthesized subpattern. In none of these examples
> was there any match to '(something)' --- there couldn't possibly be,
> because "something" isn't in the data string.
>
> regards, tom lane
That's one way to look at it. That's why I mentioned the possibility of
different assumptions regarding the context of the word "match".
In fact, you are saying that the sub-expression did not "match" because
there wasn't "something" in the string to be matched!
I agree with you on this last part,
"there wasn't "something" in the string to be matched"
But the sub-expression did "match" !
I say this, because, since the empty string is a valid "match" for
"(something)?" because the "?" question mark operator, is defined as "a
sequence of 0 or 1 matches of the atom".
So we are probably just discussing semantics here!
My concern is that many will make the same refutable "valid" assumptions
that I do.
And If they will get NULL instead of an EMPTY String, it will be awkward,
besides not being able to distinguish between an EMPTY "match" and NO
"match" at all, since both will return NULL, according to your definition.
But what I find odd, is that you say that I'm testing different things. So
what would you say for the following cases ?
'(something)?'
NOTE: I removed the anchors only.
Now is this a full string match or a sub-expression match ?
We can't give a concrete answer, unless we know the concrete string to be
matched
SELECT '' ~ '(something)?'
This will be a FULL match
SELECT 'TEST' ~ '(something)?'
But this one won't! It will be a sub-expression match by your definition.
So using the EXACT same REG_EXP, we will have two different context,
depending on the input !
The regexp context, MUST NOT depend on the String to be matched.
Because if it depends, then this is VERY BAD for consistency.
Do you get my point now ?
Now try this:
SELECT SUBSTRING( '', '(something)?' )
SELECT SUBSTRING( 'TEST', '(something)?' )
Odd enough, this, currently, returns the correct answer for both queries!
And by correct I mean EMPTY String !
According to your assumption, the first, would return an Empty String, but
the second, would return NULL !
You should try this with other reg_exp implementations, and see what comes
up on the the sub-expression result.
If after this exposition I haven't been able to correctly transmit the
problem to you, then it's probably my inability to explain it better, or
my not so good English, since it's not my native language.
Hope you understand this now, since I don't know how to explain it better.
Thank you for your feedback.
Best Regards
Rui Martins
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-03-20 15:30:32 | Re: BUG #4044: Incorrect RegExp substring Output |
Previous Message | Tom Lane | 2008-03-20 14:06:06 | Re: Problem identifying constraints which should not be inherited |