| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-bugs(at)postgreSQL(dot)org | 
| Subject: | Mishandling of right-associated phrase operators in FTS | 
| Date: | 2016-12-18 18:54:10 | 
| Message-ID: | 26706.1482087250@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
What do you think a tsquery like 'x <-> (y <-> z)' should mean?
I find it hard to assign it any meaning other than the same thing
as '(x <-> y) <-> z', ie, it should match a 3-lexeme sequence 'x y z'.
Right now, the execution engine gets this wrong:
regression=# select to_tsvector('x y z') @@ to_tsquery('x <-> y <-> z');
 ?column? 
----------
 t         -- okay
(1 row)
regression=# select to_tsvector('x y z') @@ to_tsquery('x <-> (y <-> z)');
 ?column? 
----------
 f         -- not so okay
(1 row)
This happens because the lower (righthand) <-> operator returns the
position of its righthand-side input ('z'), but that's two away from
where the 'x' is, so the upper phrase operator doesn't think there
is a match.
I considered trying to fix this by forcing right-associated cases into
left-associated form during tsquery parsing, but that has all the same
problems that I pointed out with respect to normalize_phrase_tree().
Really it'd be best to fix this by making the executor cope properly.
I think what we want is to pass down a flag telling recursive invocations
of TS_phrase_execute whether to return the position of the left-side or
right-side argument of a phrase match, which we would set according to
whether we are within the right or left argument of the most closely
nested upper phrase operator.  I propose to incorporate that fix into
the TS_phrase_execute rewrite I'm working on.
A related problem appears in clean_fakeval_intree()'s attempts to adjust
phrase-operator distances when it removes a stopword.  For example, 'a'
is a stopword, so we get:
regression=# select to_tsquery('(b <-> a) <-> c');
 to_tsquery  
-------------
 'b' <2> 'c'
(1 row)
That's fine, but I don't think this answer is right:
regression=# select to_tsquery('b <-> (a <-> c)');
 to_tsquery  
-------------
 'b' <-> 'c'
(1 row)
It should be 'b <2> c', same as the other one.
I haven't worked this out in detail, but I think a similar solution
would work for clean_fakeval_intree: pass down a flag indicating if
we're within the left or right argument of a <-> op, and return the
appropriate adjustment distance based on that.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2016-12-19 12:47:35 | Crash with a CUBE query on 9.6 | 
| Previous Message | Tom Lane | 2016-12-17 17:48:22 | Re: BUG #14469: Wrong cost estimates for merge append plan with partitions. |