From: | Hunaid Sohail <hunaidpgml(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me> |
Subject: | Re: [PATCH] Add roman support for to_number function |
Date: | 2025-01-20 07:32:51 |
Message-ID: | CAMWA6yaXq9=7rJdS9LVkmYB0d-AYt3pLC0hpBe9+stnM7W+SQw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tom,
On Sat, Jan 18, 2025 at 5:27 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hunaid Sohail <hunaidpgml(at)gmail(dot)com> writes:
> > I’ve attached a new patch that addresses comments 2, 3, and 4 from Tomas.
>
> I'm still quite unhappy that this doesn't tolerate trailing
> whitespace. That's not how other format patterns work, and it makes
> it impossible to round-trip the output of to_char(..., 'RN').
> I think the core of the problem is that you tried to cram the logic
> in where the existing "not implemented" error is thrown. But on
> inspection there is nothing sane about that entire "Roman correction"
> stanza -- what it does is either useless (zeroing already-zeroed
> fields) or in the wrong place (if we don't want to allow other flags
> with IS_ROMAN, that should be done via an error in NUMDesc_prepare,
> like other similar cases). In the attached I moved the roman_to_int
> call into NUM_processor's main loop so it's more like other format
> patterns, and fixed it to not eat any more characters than it should.
> This might allow putting back some format-combination capabilities,
> but other than the whitespace angle I figure we can leave that for
> people to request it.
>
Thank you for the tweaks to the patch. Overall, it looks great.
Initially, when I looked at the test case:
SELECT to_number('M CC', 'RN');
I was confused about whether it should be 'MCC'. After further inspection,
I realized that the output is 1000 for 'M'. The format of the input can be a
bit unclear. Perhaps we could add a note in the documentation
or issue a warning in roman_to_int function when input is truncated,
to clarify this behavior.
+ bool truncated = false;
+
+ /*
+ * Collect and decode valid roman numerals, consuming at most
+ * MAX_ROMAN_LEN characters. We do this in a separate loop to avoid
+ * repeated decoding and because the main loop needs to know when
it's at
+ * the last numeral.
+ */
+ for (len = 0; len < MAX_ROMAN_LEN && !OVERLOAD_TEST; len++)
+ {
+ char currChar = pg_ascii_toupper(*Np->inout_p);
+ int currValue = ROMAN_VAL(currChar);
+
+ if (currValue == 0)
+ {
+ truncated = true;
+ break; /* Not a valid
roman numeral. */
+ }
+ romanChars[len] = currChar;
+ romanValues[len] = currValue;
+ Np->inout_p++;
+ }
+
+ if (len == 0)
+ return -1; /* No valid roman
numerals. */
+
+ /* Check for truncation. */
+ if (truncated)
+ {
+ ereport(WARNING,
+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("Input truncated to \"%.*s\"",
+ len, romanChars)));
+ }
Output after change:
postgres=# SELECT to_number('MMXX ', 'RN');
WARNING: Input truncated to "MMXX"
to_number
-----------
2020
(1 row)
postgres=# SELECT to_number(' XIV ', ' RN');
WARNING: Input truncated to "XIV"
to_number
-----------
14
(1 row)
postgres=# SELECT to_number('M CC', 'RN');
WARNING: Input truncated to "M"
to_number
-----------
1000
(1 row)
Also, some modifications to the test cases will be required
if we go with these changes.
Regards,
Hunaid Sohail
From | Date | Subject | |
---|---|---|---|
Next Message | Mats Kindahl | 2025-01-20 07:33:12 | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Previous Message | Guillaume Lelarge | 2025-01-20 07:24:54 | Re: improve DEBUG1 logging of parallel workers for CREATE INDEX? |