From: | Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #8970: ts_parse incorrectly split numbers in digit token |
Date: | 2014-02-01 20:16:39 |
Message-ID: | 52ED5627.4070005@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 26/01/2014 18:27, Tom Lane wrote:
> Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com> writes:
>> On 26/01/2014 03:25, Alvaro Herrera wrote:
>>> To trace this, I would look at src/backend/tsearch/wparser_def.c;
>>> probably try compiling that file with WPARSER_TRACE defined, and compare
>>> the output of ts_parse() in something simple such as '345' in a working
>>> port with the failing one. That might give you clues as to what is
>>> causing the failure.
>
>> [ trace ]
>
> As was suspected upthread, this shows that p_isdigit() is failing to
> recognize "3" as a digit. So you've got broken locale support somewhere.
>
> There are two different implementations of p_isdigit in wparser_def.c,
> depending on whether USE_WIDE_UPPER_LOWER is defined. It should be, in
> a Windows build, but maybe this is tracing back to a configure problem?
>
> regards, tom lane
>
debugging a bit I think that is not a broken locale
the first two times the character contains also a portion of the
next digit so the result is always false.
Eventually it was assumed that size of a wide char is always 32 bit ?
"Unlike Windows UTF-16 2-byte wide chars, wchar_t on Linux and OS X is 4
bytes UTF-32 (gcc/g++ and XCode). On cygwin it is 2 (cygwin uses Windows
APIs)."
testing with "SELECT * FROM ts_parse('default', '345');"
--------------------------------------------------------------
Breakpoint 1, p_isdigit (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560 p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=3407923)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35 return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$77 = 0x340033
(gdb) finish
Run till exit from #0 iswdigit (c=3407923)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834 if (item->isclass(prs) != 0)
Value returned is $78 = 0
Breakpoint 1, p_isdigit (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560 p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=3473460)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35 return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$79 = 0x350034
(gdb) finish
Run till exit from #0 iswdigit (c=3473460)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834 if (item->isclass(prs) != 0)
Value returned is $80 = 0
Breakpoint 1, p_isdigit (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560 p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=53)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35 return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$81 = 0x35
(gdb) finish
Run till exit from #0 iswdigit (c=53)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834 if (item->isclass(prs) != 0)
Value returned is $82 = 1
-------------------------------------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-02-01 22:27:11 | Re: BUG #8970: ts_parse incorrectly split numbers in digit token |
Previous Message | Paul Watson | 2014-02-01 20:14:21 |