Re: Typo Patch

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: CharSyam <charsyam(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Typo Patch
Date: 2016-07-05 19:18:46
Message-ID: CAKFQuwZpDuPPBLy4KEyEt0xx_X_3Qp+tbq2zrkf-dZ0mmYdBVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 5, 2016 at 11:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sat, Jul 2, 2016 at 12:17 PM, CharSyam <charsyam(at)gmail(dot)com> wrote:
> > I fixed typos. and attached patch for this.
> > Thanks.
> >
> > I only changed comments only in src/backend/utils/adt/tsvector_op.c
>
> Well, that's definitely a typo. However, typo or no typo, it's hard
> for me to figure out with any certainty what that sentence actually
> means.
>
​​

Given that "curitem->qoperator.​distance" is constant.

Also, both loops go from low to high position (over the same scale) with
the inner loop stopping as soon as it moves beyond the position of the
outer loop - namely that the left/inner item is always positioned to the
left of the right/outer operand within the document being search.

Regardless of whether there is a match at this meeting point increasing the
outer loop's position will not cause any of the previously checked (at the
lower positions) inner loop items to match where they did not before. I
say this but I'm concerned that for sufficiently large values of
curitem->qoperator.distance a given left operand could match multiple right
operands since the distance is a maximum extent.

Thus, in the case of a match the current Lpos needs to be checked again
during the subsequent iterator of the outer loop.

The "else if (Rpos - Lpos < distance) break" block though is oddly
commented/written since distance is a maximum - shouldn't this be treated
as a match as well?

Since, for sufficiently large values of "distance", a single left operand
could cause multiple right operands to be matched when the less-than
condition matches we need to leave Lpos alone and try the next Rpos against
it. For the greater than (default) and the equal cases we can skip
rechecking the current Lpos.

The comment in question can be removed - we're not actually resetting
anything here. The use of LposStart is not really needed - just make sure
to leave Lpos in the correct position (i.e. optional increment on all
paths) and the inner while loop will do the correct thing.

It seems a bit odd to be keying off of the RIGHT operand and returning its
position when left-to-right languages would consider the position of the
leading word to be reported.

Code Comment Suggestion:

For each physically positioned right-side operand iterate over each
instance of the left-side operand to locate one within the specified
distance. As soon as a match is found move onto the next right-operand and
continue searching starting with the last checked left-side operand. Note
that for an exact distance match the current left-side operand can be
skipped over.

For some graphical imagery consider how a Slinky operates. Hold the left
side, move the right side, release the left and let it collapse; repeat.
In this case, though, the collapsing can be stopped mid-stream since the
distance check has an inequality.

The following shouldn't be taken as an actual patch submission but rather a
codification of the above.

diff --git a/src/backend/utils/adt/tsvector_op.c
b/src/backend/utils/adt/tsvector_op.c
index 242b7e1..fefaca5 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1375,7 +1375,6 @@ TS_phrase_execute(QueryItem *curitem,
ExecPhraseData Ldata = {0, false, NULL},
Rdata = {0, false, NULL};
WordEntryPos *Lpos,
- *LposStart,
*Rpos,
*pos_iter = NULL;

@@ -1423,15 +1422,17 @@ TS_phrase_execute(QueryItem *curitem,
* ExecPhraseData->data can point to the tsvector's WordEntryPosVector
*/

+ /*
+ * For each physically positioned right-side operand iterate over each
+ * instance of the left-side operand to locate one within the specified
+ * distance. As soon as a match is found move onto the next right-operand
+ * and continue searching starting with the last checked left-side
operand.
+ * Note that for an exact distance match the current left-side operand
+ * can be skipped over.
+ */
Rpos = Rdata.pos;
- LposStart = Ldata.pos;
while (Rpos < Rdata.pos + Rdata.npos)
{
- /*
- * We need to check all possible distances, so reset Lpos
- * to guranteed not yet satisfied position.
- */
- Lpos = LposStart;
while (Lpos < Ldata.pos + Ldata.npos)
{
if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) ==
@@ -1449,7 +1450,7 @@ TS_phrase_execute(QueryItem *curitem,
* could not satisfy distance for any other right
* position
*/
- LposStart = Lpos + 1;
+ Lpos++
break;
}
else
@@ -1462,16 +1463,26 @@ TS_phrase_execute(QueryItem *curitem,
}

}
- else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos) ||
- WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
+ else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos)
+ {
+ /*
+ * No Increment - we are beyond the current right operand
+ * so its possible this left operand could match the
next right
+ * operand.
+ */
+ break;
+ }
+ else if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
curitem->qoperator.distance)
{
+ /* THIS SHOULD BE A MATCH? */
/*
- * Go to the next Rpos, because Lpos is ahead or on less
- * distance than required by current operator
+ * No Increment - since we are within the distance specified
+ * it is possible that adding the additional distance to the next
+ * right operand will still leave us within range of this left operand
+ * and so we need to check it again.
*/
break;
-
}

Lpos++;

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-07-05 19:22:33 Re: Typo Patch
Previous Message Robert Haas 2016-07-05 19:09:42 Re: can we optimize STACK_DEPTH_SLOP