From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mikhail Gribkov <youzhick(at)gmail(dot)com> |
Cc: | Arne Roland <A(dot)Roland(at)index(dot)de>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Permute underscore separated components of columns before fuzzy matching |
Date: | 2023-11-17 21:23:04 |
Message-ID: | 799067.1700256184@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Mikhail Gribkov <youzhick(at)gmail(dot)com> writes:
> Honestly I'm not entirely sure fixing only two switched words is worth the
> effort, but the declared goal is clearly achieved.
> I think the patch is good to go, although you need to fix code formatting.
I took a brief look at this. I concur that we shouldn't need to be
hugely concerned about the speed of this code path. However, we *do*
need to be concerned about its maintainability, and I think the patch
falls down badly there: it adds a chunk of very opaque and essentially
undocumented code, that people will need to reverse-engineer anytime
they are studying this function. That could be alleviated perhaps
with more work on comments, but I have to wonder whether it's worth
carrying this logic at all. It's a rather strange behavior to add,
and I wonder if many users will want it.
One thing that struck me is that no care is being taken for adjacent
underscores (that is, "foo__bar" and similar cases). It seems
unlikely that treating the zero-length substring between the
underscores as a word to permute is helpful; moreover, it adds
an edge case that the string-moving logic could easily get wrong.
I wonder if the code should treat any number of consecutive
underscores as a single separator. (Somewhat related: I think it
will behave oddly when the first or last character is '_', since the
outer loop ignores those positions.)
> And it would be much more convenient to work with your patch if every next
> version file will have a unique name (maybe something like "_v2", "_v3"
> etc. suffixes)
Please. It's very confusing when there are multiple identically-named
patches in a thread.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2023-11-17 21:36:44 | Re: Lifetime of commit timestamps |
Previous Message | Gurjeet Singh | 2023-11-17 21:22:57 | Re: Change GUC hashtable to use simplehash? |