From: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> |
---|---|
To: | Marek Lewczuk <marek(at)lewczuk(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore |
Date: | 2009-09-23 23:31:20 |
Message-ID: | 4ABAAFC8.7030108@timbira.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Marek Lewczuk escreveu:
> Please execute following example:
> select * from ts_debug('english', '<img width="182" height="120"
> align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
>
> As the result you will see, that <img/> is not identified as XML tag, but
> rather splitted as words, blank spaces etc. The reason for that is the fact,
> that last attribute "test_aa" contains underscore in its name - when the
> underscore is removed, then img tag is properly identified as XML tag.
>
> XML definition allows using underscore in tag and attribute names.
>
The problem is we already allow it in tag names but not in attribute names. So
the proper fix is to allow underscore when the state is TPS_InTag; according
to XML spec [1], the underscore is a valid character in attribute names.
A possible downside is that we don't have underscores in HTML attribute names.
In this case, should it fail? I don't think so but...
The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there
isn't a problem to back-patch it.
[1] http://www.w3.org/TR/REC-xml/#sec-common-syn
--
Euler Taveira de Oliveira
http://www.timbira.com/
Attachment | Content-Type | Size |
---|---|---|
ts.diff | text/plain | 687 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-09-24 02:01:50 | Re: Encounter shared memory error when running createlang command! |
Previous Message | Bryan McLemore | 2009-09-23 23:29:33 | BUG #5077: Corrupted Table |