From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | daniel <dochtorek(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: ts_headline and query with hyphen |
Date: | 2012-12-05 03:49:21 |
Message-ID: | 24135.1354679361@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
daniel <dochtorek(at)gmail(dot)com> writes:
> I have a question about ts_headline, when the query includes word like
> 'on-line' - only the 'line' part is highlighted, even though the whole
> phrase is indexed too, some details below.
Part of the reason is that "on" is a stop word (at least in the default
english dictionary). That's why you get
> select to_tsquery('play & on-line');
> to_tsquery
> ----------------------------
> 'play' & 'on-lin' & 'line'
and not "'play' & 'on-lin' & 'on' & 'line'". If you did get the latter
then you'd get a headline result with both parts highlighted, similar to
your "custom-built" case.
> But maybe ts_headline understands or operates on
> single, not hyphenated words only?
Dunno. It would seem reasonable to highlight the whole compound in
these cases, but I have no idea how hard that is.
Another thing that seems a bit odd here is that we seem to be stemming
the compound word as a whole, but not the individual parts. Not sure
how sane that combination of choices is ...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | tim_wilson | 2012-12-05 04:17:37 | Re: Statistics mismatch between n_live_tup and actual row count |
Previous Message | daniel | 2012-12-05 03:31:35 | ts_headline and query with hyphen |