Quick Links

Re: fulltext parser strange behave

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject:	Re: fulltext parser strange behave
Date:	2007-11-09 18:53:53
Message-ID:	13471.1194634433@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I've just been looking at the state machine in wparser_def.c. I think
> the processing for entities is also a few bob short in the pound. It
> recognises decimal numeric character references, but nor hexadecimal
> numeric character references. That's fairly silly since the HTML spec
> specifically says the latter are "particularly useful". The rules for
> named entities are also deficient w.r.t. digits, just like the case of
> tags that Tom noticed. This isn't academic: HTML features a number of
> named entities with digits in the name (sup2, frac14 for example).

> In XML at least, legal names are defined by the following rules from the
> spec:
> ...
> [A-Za-z:_][A-Za-z0-9:_.-]*

> I suggest we use that or something very close to it as the rule for
> names in these patterns.

No objections here. Who wants to patch wparser_def?

regards, tom lane

In response to

Re: fulltext parser strange behave at 2007-11-08 20:11:44 from Andrew Dunstan

Responses

Re: fulltext parser strange behave at 2007-11-13 19:42:18 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michele Petrazzo - Unipex srl	2007-11-09 19:46:41	functions are returns columns
Previous Message	Gevik Babakhani	2007-11-09 18:07:11	Re: Throw error and ErrorContext question.

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Albert Cervera i Areny	2007-11-09 19:47:45	Re: Contrib docs v1
Previous Message	Jan Urbański	2007-11-09 17:35:32	Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords