Re: Text searching HTML

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Campbell, Lance" <lance(at)illinois(dot)edu>
Cc: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Text searching HTML
Date: 2014-11-03 20:09:01
Message-ID: 25577.1415045341@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

"Campbell, Lance" <lance(at)illinois(dot)edu> writes:
> Is there a preferred way to search text within an HTML document? I have been reading up on searching via to_tsvector. You can pass the to_tsvector two parameters. The first appears to be a dictionary and the second text. Is there by chance an English HTML dictionary? That way html tags or html attributes would be ignored.

I believe all the built-in text search configurations ignore HTML tags by
default, since they have no mapping for the "tag" token type that the
built-in parser reports those as. You could of course make a custom
configuration that acts differently.

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Campbell, Lance 2014-11-04 04:56:43 text search index help
Previous Message Campbell, Lance 2014-11-03 17:15:58 Text searching HTML