BUG #8750: 'simple' parser in to_tsvector() splits words on underscores

From: drx(at)a-blast(dot)org
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #8750: 'simple' parser in to_tsvector() splits words on underscores
Date: 2014-01-08 19:43:12
Message-ID: E1W0z20-0007Gz-9W@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 8750
Logged by: Dragan Espenschied
Email address: drx(at)a-blast(dot)org
PostgreSQL version: 9.3.2
Operating system: Ubuntu 12.04 x64_64
Description:

If to convert a text to a tsvector with the 'simple' parser, words are split
on underscores. For example:

select to_tsvector('simple', 'light_bulb');
to_tsvector
--------------------
'bulb':2 'light':1

The underscore is typically used if a term that should be kept together
contains a space, so it is an explicit note that a term should not be
split.

At least, this is how I understand it.

I suggest that words are not split on underscores by default. It would make
for example typical tasks of tagging very comfortable to implement, without
much need to modify the parser.

Thanks for considering my suggestion!
Dragan

Browse pgsql-bugs by date

  From Date Subject
Next Message David Johnston 2014-01-08 20:30:17 Re: BUG #8749: Error Size Integer
Previous Message leandroeiro 2014-01-08 18:10:41 BUG #8749: Error Size Integer