From: | Dag Lem <dag(at)nimrod(dot)no> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: daitch_mokotoff module |
Date: | 2021-12-14 22:34:00 |
Message-ID: | ygelf0mizon.fsf@sid.nimrod.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
[...]
>
> Thanks, looks interesting. A couple generic comments, based on a quick
> code review.
Thank you for the constructive review!
>
> 1) Can the extension be marked as trusted, just like fuzzystrmatch?
I have now moved the daitch_mokotoff function into the fuzzystrmatch
module, as suggested by Andrew Dunstan.
>
> 2) The docs really need an explanation of what the extension is for,
> not just a link to fuzzystrmatch. Also, a couple examples would be
> helpful, I guess - similarly to fuzzystrmatch. The last line in the
> docs is annoyingly long.
Please see the updated documentation for the fuzzystrmatch module.
>
> 3) What's daitch_mokotov_header.pl for? I mean, it generates the
> header, but when do we need to run it?
It only has to be run if the soundex rules are changed. I have now
made the dependencies explicit in the fuzzystrmatch Makefile.
>
> 4) It seems to require perl-open, which is a module we did not need
> until now. Not sure how well supported it is, but maybe we can use a
> more standard module?
I believe Perl I/O layers have been part of Perl core for two decades
now :-)
>
> 5) Do we need to keep DM_MAIN? It seems to be meant for some kind of
> testing, but our regression tests certainly don't need it (or the
> palloc mockup). I suggest to get rid of it.
Done. BTW this was modeled after dmetaphone.c
>
> 6) I really don't understand some of the comments in
> daitch_mokotov.sql, like for example:
>
> -- testEncodeBasic
> -- Tests covered above are omitted.
>
> Also, comments with names of Java methods seem pretty confusing. It'd
> be better to actually explain what rules are the tests checking.
The tests were copied from various web sites and implementations. I have
cut down on the number of tests and made the comments more to the point.
>
> 7) There are almost no comments in the .c file (ignoring the comment
> on top). Short functions like initialize_node are probably fine
> without one, but e.g. update_node would deserve one.
More comments are added to both the .h and the .c file.
>
> 8) Some of the lines are pretty long (e.g. the update_node signature
> is almost 170 chars). That should be wrapped. Maybe try running
> pgindent on the code, that'll show which parts need better formatting
> (so as not to get broken later).
Fixed. I did run pgindent earlier, however it didn't catch those long
lines.
>
> 9) I'm sure there's better way to get the number of valid chars than this:
>
> for (i = 0, ilen = 0; (c = read_char(&str[i], &ilen)) && (c < 'A' ||
> c > ']'); i += ilen)
> {
> }
>
> Say, a while loop or something?
The code gets to the next encodable character, skipping any other
characters. I have now added a comment which should hopefully make this
clearer, and broken up the for loop for readability.
Please find attached the revised patch.
Best regards
Dag Lem
Attachment | Content-Type | Size |
---|---|---|
v3-daitch_mokotoff.patch | text/x-patch | 46.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2021-12-14 23:00:34 | generalized conveyor belt storage |
Previous Message | Tom Lane | 2021-12-14 22:30:11 | Re: Remove pg_strtouint64(), use strtoull() directly |