From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Campbell, Lance" <lance(at)illinois(dot)edu> |
Cc: | "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Text searching HTML |
Date: | 2014-11-03 20:09:01 |
Message-ID: | 25577.1415045341@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
"Campbell, Lance" <lance(at)illinois(dot)edu> writes:
> Is there a preferred way to search text within an HTML document? I have been reading up on searching via to_tsvector. You can pass the to_tsvector two parameters. The first appears to be a dictionary and the second text. Is there by chance an English HTML dictionary? That way html tags or html attributes would be ignored.
I believe all the built-in text search configurations ignore HTML tags by
default, since they have no mapping for the "tag" token type that the
built-in parser reports those as. You could of course make a custom
configuration that acts differently.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Campbell, Lance | 2014-11-04 04:56:43 | text search index help |
Previous Message | Campbell, Lance | 2014-11-03 17:15:58 | Text searching HTML |