Re: [PATCH] Add roman support for to_number function

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Hunaid Sohail <hunaidpgml(at)gmail(dot)com>
Subject: Re: [PATCH] Add roman support for to_number function
Date: 2024-09-07 23:09:40
Message-ID: 2562623.1725750580@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com> writes:
> Tested again, and the patch looks good. It does not accept leading or trailing whitespace, which seems reasonable, given the unclear behavior of to_number with other format strings. It also rejects less common Roman spellings like "IIII". I don't feel strongly about that one way or the other, but perhaps a test codifying that behavior would be useful to make it clear it's intentional.

Yeah, I don't have a strong feeling about that either, but probably
being strict is better. to_number has a big problem with "garbage in
garbage out" already, and being lax will make that worse.

A few notes from a quick read of the patch:

* roman_to_int() should have a header comment, notably explaining its
result convention. I find it fairly surprising that "0" means an
error --- I realize that Roman notation can't represent zero, but
wouldn't it be better to use -1?

* Do *not* rely on toupper(). There are multiple reasons why not,
but in this context the worst one is that in Turkish locales it's
likely to translate "i" to "İ", on which you will fail. I'd use
pg_ascii_toupper().

* I think roman_to_int() is under-commented internally too.
To start with, where did the magic "15" come from? And why
should we have such a test anyway --- what if the format allows
for trailing stuff after the roman numeral? (That is, I think
you need to fix this so that roman_to_int reports how much data
it ate, instead of assuming that it must read to the end of the
input.) The logic for detecting invalid numeral combinations
feels a little opaque too. Do you have a source for the rules
you're implementing, and if so could you cite it?

* This code will get run through pgindent before commit, so you
might want to revisit whether your comments will still look nice
afterwards. There's not a lot of consistency in them about initial
cap versus lower case or trailing period versus not, too.

* roman_result could be declared where used, no?

* I'm not really convinced that we need a new errcode
ERRCODE_INVALID_ROMAN_NUMERAL rather than using a generic one like
ERRCODE_INVALID_TEXT_REPRESENTATION. However, if we do do it
like that, why did you pick the out-of-sequence value 22P07?

* Might be good to have a few test cases demonstrating what happens
when there's other stuff combined with the RN format spec. Notably,
even if RN itself won't eat whitespace, there had better be a way
to write the format string to allow that.

* Further to Aleksander's point about lack of test coverage for
the to_char direction, I see from
https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
that almost none of the existing roman-number code paths are covered
today. While it's not strictly within the charter of this patch
to improve that, maybe we should try to do so --- at the very
least it'd raise formatting.c's coverage score a few notches.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message David Rowley 2024-09-07 22:58:33 Re: PostgreSQL 17 release announcement draft