| From: | Peter Smith <smithpb2250(at)gmail(dot)com> | 
|---|---|
| To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> | 
| Cc: | "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: proposal - function string_to_table | 
| Date: | 2020-08-20 02:06:42 | 
| Message-ID: | CAHut+PsKQu429JKc3ynysfMXrD2G7wVtLW+EytO+GX-6Jw=UwA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi.
I have been looking at the patch: string_to_table-20200706-2.patch
Below are some review comments for your consideration.
====
COMMENT func.sgml (style)
+       <para>
+        splits string into table using supplied delimiter and
+        optional null string.
+       </para>
The format style of the short description is inconsistent with the
other functions.
e.g. Should start with Capital letter.
e.g. Should tag the parameter names properly
Something like:
<para>
Splits <parameter>string</parameter> into a table
using supplied <parameter>delimiter</parameter>
and optional null string <parameter>nullstr</parameter>.
</para>
====
COMMENT func.sgml (what does nullstr do)
The description does not sufficiently describe the purpose/behaviour
of the nullstr.
e.g. Firstly I thought that it meant if 2 consecutive delimiters were
encountered it would substitute this string as the row value. But it
is doing the opposite of what I guessed - if the extracted row value
is the same as nullstr then a NULL row is inserted instead.
====
COMMENT func.sgml (wrong sample output)
+<programlisting>xx
+yy,
+zz</programlisting>
This output is incorrect for the sample given. There is no "yy," in
the output because there is a 'yy' nullstr substitution.
Should be:
---
xx
NULL
zz
---
====
COMMENT func.sgml (related to regexp_split_to_table)
Because this new function is similar to the existing
regexp_split_to_table, perhaps they should cross-reference each other
so a reader of this documentation is made aware of the alternative
function?
====
COMMENT (test cases)
It is impossible to tell difference in the output between empty
strings and nulls currently, so maybe you can change all the tests to
have a form like below so they can be validated properly:
# select v, v IS NULL as "is null" from
string_to_table('a,b,*,c,d,',',','*') g(v);
 v | is null
---+---------
 a | f
 b | f
   | t
 c | f
 d | f
   | f
(6 rows)
or maybe like this is even easier:
# select quote_nullable(string_to_table('a|*||c|d|','|','*'));
 quote_nullable
----------------
 'a'
 NULL
 ''
 'c'
 'd'
 ''
(6 rows)
Something similar was already proposed before [1] but that never got
put into the test code.
[1] https://www.postgresql.org/message-id/CAFj8pRDSzDYmaS06dfMXBfbr8x%2B3xjDJxA5kbL3h8%2BeOGoRUcA%40mail.gmail.com
====
COMMENT (test cases)
There are multiple combinations of the parameters to this function and
MANY different results depending on different values they can take, so
the existing tests only cover a small sample.
I have attached a lot more test scenarios that you may want to include
for better test coverage. Everything seemed to work as expected.
PSA test-cases.pdf
====
COMMENT (accum_result)
+ Datum values[1];
+ bool nulls[1];
+
+ values[0] = PointerGetDatum(result_text);
+ nulls[0] = is_null;
Why not use variables instead of arrays with only 1 element?
====
COMMENT (text_to_array_internal)
+ if (!tstate.astate)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
Maybe the condition is more readable when expressed as:
if (tstate.astate == NULL)
====
Kind Regards,
Peter Smith.
Fujitsu Australia
| Attachment | Content-Type | Size | 
|---|---|---|
| test-cases.pdf | application/pdf | 456.9 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2020-08-20 02:09:42 | Re: Creating a function for exposing memory usage of backend process | 
| Previous Message | Amit Langote | 2020-08-20 01:50:29 | Re: Creating foreign key on partitioned table is too slow |