Re: BUG #8970: ts_parse incorrectly split numbers in digit token

From: Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8970: ts_parse incorrectly split numbers in digit token
Date: 2014-01-26 08:10:27
Message-ID: 52E4C2F3.5020705@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 26/01/2014 03:25, Alvaro Herrera wrote:
> marco(dot)atzeri(at)gmail(dot)com escribió:
>

>
> To trace this, I would look at src/backend/tsearch/wparser_def.c;
> probably try compiling that file with WPARSER_TRACE defined, and compare
> the output of ts_parse() in something simple such as '345' in a working
> port with the failing one. That might give you clues as to what is
> causing the failure.
>

database created with LANG=en_US.UTF-8

postgres=# SELECT * FROM ts_parse('default', '345');
;
tokid | token
-------+-------
12 | 3
12 | 4
22 | 5
(3 rows)

parsing "345"
state TPS_Base at 3 matched rule 12 flags tostate TPS_InSpace
state TPS_InSpace at 4 matched rule 8 flags BINGO tostate TPS_Base type
blank
state TPS_Base at 4 matched rule 12 flags tostate TPS_InSpace
state TPS_InSpace at 5 matched rule 8 flags BINGO tostate TPS_Base type
blank
state TPS_Base at 5 matched rule 5 flags tostate TPS_InUnsignedInt
state TPS_InUnsignedInt at EOF matched rule 0 flags BINGO tostate
TPS_Base type uint

database created with LANG=C

postgres=# SELECT * FROM ts_parse('default', '345');
;
tokid | token
-------+-------
22 | 345
(1 row)

parsing "345"
state TPS_Base at 3 matched rule 5 flags tostate TPS_InUnsignedInt
state TPS_InUnsignedInt at 4 matched rule 1 flags
state TPS_InUnsignedInt at 5 matched rule 1 flags
state TPS_InUnsignedInt at EOF matched rule 0 flags BINGO tostate
TPS_Base type uint

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-01-26 17:27:16 Re: BUG #8970: ts_parse incorrectly split numbers in digit token
Previous Message Alvaro Herrera 2014-01-26 02:25:35 Re: BUG #8970: ts_parse incorrectly split numbers in digit token