From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | dansonlinepresence(at)gmail(dot)com, pgsql-docs(at)lists(dot)postgresql(dot)org |
Subject: | Re: substring start position behavior |
Date: | 2024-03-06 05:42:09 |
Message-ID: | 72911.1709703729@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs |
Bruce Momjian <bruce(at)momjian(dot)us> writes:
> This web page explains the feature:
> https://stackoverflow.com/questions/33462061/sql-server-substring-position-negative-value
> but also asks:
> now the only question that remains is, "why would anyone need it
> to behave this way?"
Yeah. I believe our implementation adheres to the SQL spec, which
says this for <character substring function> (in SQL:2021 6.3.2):
a) If the character encoding form of <character value expression>
is UTF8, UTF16, or UTF32, then, in the remainder of this General
Rule, the term “character” shall be taken to mean “unit specified
by <char length units>”.
b) Let C be the value of the <character value expression>, let LC
be the length in characters of C, and let S be the value of the
<start position>.
c) If <string length> is specified, then let L be the value of
<string length> and let E be S+L. Otherwise, let E be the larger
of LC+1 and S.
d) If at least one of C, S, and L is the null value, then the
result of the <character substring function> is the null value.
e) If E is less than S, then an exception condition is raised:
data exception — substring error (22011). [tgl note: given c),
this happens if and only if a negative <string length> is provided.]
f) Case:
i) If S is greater than LC or if E is less than 1 (one), then
the result of the <character substring function> is the
zero-length character string.
ii) Otherwise,
1) Let S1 be the larger of S and 1 (one). Let E1 be the
smaller of E and LC+1. Let L1 be E1–S1.
2) The result of the <character substring function> is a
character string containing the L1 characters of C starting at
character number S1 in the same order that the characters
appear in C.
That's a pretty sterling example of standards-ese that is both
unreadable and devoid of any justification. But if you trace through
the possible effects of a negative S value, it looks like
(1) if L >= 0 is specified and S+L (E) is less than one, the result
is an empty string per rule f)i).
(2) if L >= 0 is specified and S+L (E) is at least one but less than
LC+1, then E is the substring end+1 position.
(3) otherwise, a negative S is disregarded and replaced by 1 so
far as the substring end calculation is concerned.
(4) in any case, a negative S is disregarded and replaced by 1 so
far as the substring start calculation is concerned.
I'm kind of inclined to not document this weirdness. I especially
don't think it's worth giving an example that neither explains the
"disregarded" bit nor highlights the dependency on L being given.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | PG Doc comments form | 2024-03-06 23:42:15 | Non-blocking synchronization in libpq using pipeline mode |
Previous Message | Bruce Momjian | 2024-03-06 03:24:47 | Re: substring start position behavior |