| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | "Campbell, Lance" <lance(at)illinois(dot)edu> |
| Cc: | "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org> |
| Subject: | Re: Text searching HTML |
| Date: | 2014-11-03 20:09:01 |
| Message-ID: | 25577.1415045341@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-sql |
"Campbell, Lance" <lance(at)illinois(dot)edu> writes:
> Is there a preferred way to search text within an HTML document? I have been reading up on searching via to_tsvector. You can pass the to_tsvector two parameters. The first appears to be a dictionary and the second text. Is there by chance an English HTML dictionary? That way html tags or html attributes would be ignored.
I believe all the built-in text search configurations ignore HTML tags by
default, since they have no mapping for the "tag" token type that the
built-in parser reports those as. You could of course make a custom
configuration that acts differently.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Campbell, Lance | 2014-11-04 04:56:43 | text search index help |
| Previous Message | Campbell, Lance | 2014-11-03 17:15:58 | Text searching HTML |