From: | Sam Mason <sam(at)samason(dot)me(dot)uk> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: string_to_array with empty input |
Date: | 2009-04-01 14:00:06 |
Message-ID: | 20090401140005.GD12225@frubble.xen.chris-lamb.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote:
> Both interpretations are clearly consistent but it depends on whether
> you think it's a bunch of text strings concatenated together or if
> it's a list of objects.
>
> The example of string_to_array('',',')::int[] is relevant to this
> point. The whole "there's one empty element" only makes sense if
> you're thinking in terms of string processing. If it's a list of any
> other kind of object it probably doesn't make sense; you can't say
> there's one empty integer or one empty composite object or one empty
> anything else.
I think this is about the only sensible option, but my reasoning is
somewhat different.
My original intuition was that array_to_string and string_to_array
should be (perfect) inverses of each other. Unfortunately I can't see
any way to get this to happen; zero length arrays or NULL elements
in the array mean than array_to_string isn't injective. This means
that the composition of the two functions won't result in an injective
function and my original premise is false. Note that as far as I can
tell string_to_array is injective. I'm assuming that the delimiter
won't appear as part of an element of the array; e.g. an array of
integers and space as a delimiter is OK, but using the same delimiter
with unconstrained text is not OK, a blank delimiter is never OK as it's
always part of a string.
"Injective" means there exists more than one array that encodes to the
same string. The examples being how do you sanely encode '{}' and
'{NULL}' in a unique way; '{""}' is a bad example because it's just
an artifact of how strings are represented. The complications needed
to allow this to happen would make it a very similar function as the
array's normal output_function function and hence wouldn't serve a
useful purpose. All of this implies that we have to make a compromise
somewhere.
The semantics that most closely match the existing behaviour would be;
for array_to_string:
1) remove NULL values from input array
2) call output_function on remaining elements
3) intersperse[1] the delimiter between the remaining elements
4) concatenate the resulting array
for string_to_array:
1) check if input is zero length; return empty array
2) split array based on delimiter and return
Note that both functions are STRICT; i.e. a NULL for either parameter
should cause the function to return NULL. Arguably in string_to_array
it could be non-strict if the input string is empty, but it's probably
worth keeping it strict to simplify the semantics.
Here are some examples:
array_to_string('{}'::TEXT[],',') => ''
array_to_string('{NULL}'::TEXT[],',') => ''
array_to_string('{NULL,NULL}'::TEXT[],',') => ''
array_to_string('{a,NULL}'::TEXT[],',') => 'a'
array_to_string('{NULL,a}'::TEXT[],',') => 'a'
array_to_string('{a,b}'::TEXT[],',') => 'a,b'
array_to_string('{a,NULL,b}'::TEXT[],',') => 'a,b'
string_to_array('',',') => '{}'
string_to_array(' ',',') => '{" "}'
string_to_array(',',',') => '{"",""}'
string_to_array('a',',') => '{a}'
string_to_array('a,',',') => '{a,""}'
string_to_array(',a',',') => '{"",a}'
string_to_array('a,b',',') => '{a,b}'
My thinking before was that it should be doing:
string_to_array('',',') => '{""}'
instead, but I now think that Greg has a point and these are nicer/more
generally useful semantics.
Hum, that all got a bit more verbose than I was expecting. Ah well, I
hope it's somewhat useful.
--
Sam http://samason.me.uk/
[1] as in the intersperse function in Haskell
http://www.haskell.org/onlinereport/list.html#sect17.3
intersperse "#" ["a", "bar"] == ["a", "#", "bar"]
note that here we're working with arrays of string, rather than
arrays of characters as in the report.
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Funk | 2009-04-01 14:14:56 | Retain PREPARE or CONNECT TRIGGER |
Previous Message | Tom Lane | 2009-04-01 13:55:44 | Re: Indexing unknown words with Tsearch2 |
From | Date | Subject | |
---|---|---|---|
Next Message | Sam Mason | 2009-04-01 14:38:50 | Re: [GENERAL] string_to_array with empty input |
Previous Message | Pavel Stehule | 2009-04-01 13:06:47 | Re: WIP: transformation hook modules and JSON support |