Re: BUG #13690: Full Text Search with spanish dictionary cannot find some words

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: vtamara(at)pasosdeJesus(dot)org
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #13690: Full Text Search with spanish dictionary cannot find some words
Date: 2015-10-20 16:21:38
Message-ID: 33158.1445358098@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

vtamara(at)pasosdeJesus(dot)org writes:
> The following search in english succeeds (returns 1):

> SELECT COUNT(*) FROM cat
> WHERE to_tsvector('english', nombre) @@ to_tsquery('english',
> 'politi:*'
> );

> But fails using the spanish dictionary (returns 0):

> SELECT COUNT(*) FROM cat
> WHERE to_tsvector('spanish', nombre) @@ to_tsquery('spanish',
> 'politi:*'
> );

This is because you didn't adjust the wildcard search pattern for the
different stemming rules used in Spanish. Look at the to_tsvector and
to_tsquery results:

regression=# SELECT to_tsvector('english', nombre) , to_tsquery('english','politi:*') from cat;
to_tsvector | to_tsquery
-------------------------+------------
'politica':1 'social':2 | 'politi':*
(1 row)

regression=# SELECT to_tsvector('spanish', nombre) , to_tsquery('spanish','politi:*') from cat;
to_tsvector | to_tsquery
----------------------+------------
'polit':1 'social':2 | 'politi':*
(1 row)

I don't know enough Spanish to follow the reasoning for stemming
"politica" as "polit" rather than something else; but I do see that
"politi" is not reduced to "polit", which is fairly reasonable since
that's not a word. "politi:*" will match anything whose stemmed
version starts with "politi", but that's too long ...

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2015-10-20 17:20:51 Re: BUG #13688: lack of return value in r_mark_regions()
Previous Message Tom Lane 2015-10-20 16:07:00 Re: BUG #13689: Build failed pg9.4.5 with mingw5.1