Re: Add absolute value to dict_int

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add absolute value to dict_int
Date: 2020-03-08 22:48:33
Message-ID: 15029.1583707713@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> I've seen a few requests on how to make FTS search on the absolute value of
> integers. This question is usually driven by the fact that the text search
> parser interprets a separating hyphen ("partnumber-987") as a minus sign.

> There is currently no good answer for this that doesn't involve C coding.
> I think this feature has a natural and trivial home in the dict_int
> extension, so attached is a patch that does that.

Seems reasonable, so pushed with minor cleanup (I noticed it was
off-by-one for the maxlen check, which was harmless unless you had
rejectlong enabled as well). I debated whether I liked the "absval"
parameter name, which seemed a bit too abbreviative; but it's more
or less in line with the existing parameter names, so I left it alone.

> There are no changes to the extension creation scripts, so there is no need
> to bump the version. And I think the convention is that we don't bump the
> version just to signify a change which implements a new feature when that
> doesn't change the creation scripts.

Right, there's no need to update the creation script.

I noticed one odd thing which is not the fault of this patch, but
seems to need cleaned up:

regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = true);
ALTER TEXT SEARCH DICTIONARY
regression=# select ts_lexize('intdict', '-123456');
ts_lexize
-----------
{123456}
(1 row)

regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = 1);
ALTER TEXT SEARCH DICTIONARY
regression=# select ts_lexize('intdict', '-123456');
ERROR: absval requires a Boolean value

Why did ALTER accept that, if it wasn't valid? It's not like
there's no error checking at all:

regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = foo);
ERROR: absval requires a Boolean value

Upon digging into that, the reason is that defGetBoolean accepts
a T_Integer Value with value 1, but it doesn't accept a T_String
with value "1". And apparently we're satisfied to smash dictionary
parameters to strings before storing them.

There are at least three things we could do here:

1. Insist that defGetBoolean and its siblings should accept the
string equivalent of any non-string value they accept. This
would change the behavior of a whole lot of utility commands,
not only the text search commands, and I'm not exactly convinced
it's a good idea. Seems like it's losing error detection
capability.

2. Improve the catalog representation of text search parameters
so that the original Value node can be faithfully reconstructed.
I'd be for this, except it seems like a lot of work for a rather
minor benefit.

3. Rearrange text search parameter validation so we smash parameters
to strings *before* we validate them, ensuring we won't take any
settings that will be rejected later.

I'm leaning to #3 as being the most practical fix. Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesse Zhang 2020-03-08 23:44:21 Re: Use compiler intrinsics for bit ops in hash
Previous Message James Coleman 2020-03-08 22:29:41 Re: Allow to_date() and to_timestamp() to accept localized names