From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com> |
Subject: | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
Date: | 2016-06-07 22:05:10 |
Message-ID: | 16167.1465337110@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com> writes:
> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> matching consecutive words but it won't work for us if it cannot handle
> consecutive *duplicate* words.
> For example, the following returns true: select
> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
> Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:
*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016
***************
*** 897,903 ****
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0.0714286
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
phrase-search-no-match-at-distance-0.patch | text/x-diff | 658 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2016-06-07 22:06:26 | Re: COMMENT ON, psql and access methods |
Previous Message | Peter Geoghegan | 2016-06-07 21:01:22 | Re: Parallel query and temp_file_limit |