Re: daitch_mokotoff module

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dag Lem <dag(at)nimrod(dot)no>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: daitch_mokotoff module
Date: 2021-12-13 14:26:44
Message-ID: 0cac2d60-5b85-59c6-1ac1-77092b8688b7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/13/21 14:38, Dag Lem wrote:
> Please find attached an updated patch, with the following fixes:
>
> * Replaced remaining malloc/free with palloc/pfree.
> * Made "make check" pass.
> * Updated notes on other implementations.
>

Thanks, looks interesting. A couple generic comments, based on a quick
code review.

1) Can the extension be marked as trusted, just like fuzzystrmatch?

2) The docs really need an explanation of what the extension is for, not
just a link to fuzzystrmatch. Also, a couple examples would be helpful,
I guess - similarly to fuzzystrmatch. The last line in the docs is
annoyingly long.

3) What's daitch_mokotov_header.pl for? I mean, it generates the header,
but when do we need to run it?

4) It seems to require perl-open, which is a module we did not need
until now. Not sure how well supported it is, but maybe we can use a
more standard module?

5) Do we need to keep DM_MAIN? It seems to be meant for some kind of
testing, but our regression tests certainly don't need it (or the palloc
mockup). I suggest to get rid of it.

6) I really don't understand some of the comments in daitch_mokotov.sql,
like for example:

-- testEncodeBasic
-- Tests covered above are omitted.

Also, comments with names of Java methods seem pretty confusing. It'd be
better to actually explain what rules are the tests checking.

7) There are almost no comments in the .c file (ignoring the comment on
top). Short functions like initialize_node are probably fine without
one, but e.g. update_node would deserve one.

8) Some of the lines are pretty long (e.g. the update_node signature is
almost 170 chars). That should be wrapped. Maybe try running pgindent on
the code, that'll show which parts need better formatting (so as not to
get broken later).

9) I'm sure there's better way to get the number of valid chars than this:

for (i = 0, ilen = 0; (c = read_char(&str[i], &ilen)) && (c < 'A' ||
c > ']'); i += ilen)
{
}

Say, a while loop or something?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-12-13 14:28:26 Re: Add sub-transaction overflow status in pg_stat_activity
Previous Message Robert Haas 2021-12-13 14:23:42 Re: WIP: WAL prefetch (another approach)