Proposal: Remove "no" from the default english.stop word list

From: Peter Marreck <peter(at)marreck(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: Remove "no" from the default english.stop word list
Date: 2018-04-12 19:25:22
Message-ID: CAC3UHA0Vc36CRXSk8k5t7C+9MfEL33rSvC4u=tjMnpJ4iDCx2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I recently ran into an issue where (after implementing fulltext search on
my site) a user searching real estate listings for "no pets" also got
results for "pets OK"! This was obviously a problem. After investigating,
it seems the word "no" is considered a stopword by default (it's in the
english.stop word list), and is therefore not indexed. I am here to propose
that this is wrong based on the following reasons:

1) The word "yes" is not also included in this stopword list, a bizarre
omission if the reason "no" was included was due to lack of significance
(although I would recommend omitting both and arguing that both are
significant)
2) The word "no" IS significant as a qualifier (such as, in my case, "no
pets", or more usefully, "no<->pets" if using to_tsquery instead of
plainto_tsquery), especially on boolean-like data that is brought into
fulltext search scope (so for example, if some attribute "balcony" is
false/not checked, you could index that as "no balcony" which then makes
both the presence AND the absence of it searchable...)

That's basically it. Thoughts?

-Peter

Browse pgsql-hackers by date

  From Date Subject
Next Message Keith Fiske 2018-04-12 19:25:30 Re: Native partitioning tablespace inheritance
Previous Message David G. Johnston 2018-04-12 19:24:49 Native partitioning tablespace inheritance