Quick Links

Re: new function for tsquery creartion

From:	Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>
To:	Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc:	Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: new function for tsquery creartion
Date:	2018-04-04 14:33:47
Message-ID:	a269c5f0e426174dad29c474d3669e92@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> I'm not sure about the different result for these queries:
> SELECT websearch_to_tsquery('simple', 'cat or ');
> websearch_to_tsquery
> ----------------------
> 'cat'
> (1 row)
> SELECT websearch_to_tsquery('simple', 'cat or');
> websearch_to_tsquery
> ----------------------
> 'cat' & 'or'
> (1 row)

I guess both queries should produce just 'cat'. I've changed the
definition of parse_or_operator().

> I found an odd behavior of the query creation function in case:
> SELECT websearch_to_tsquery('english', '"pg_class pg"');
> websearch_to_tsquery
> -----------------------------
> ( 'pg' & 'class' ) <-> 'pg'
> (1 row)
>
> This query means that lexemes 'pg' and 'class' should be at the same
> distance from the last 'pg'. In other words, they should have the same
> position. But default parser will interpret pg_class as two separate
> words during text parsing/vector creation.
>
> The bug wasn't introduced in the patch and can be found in current
> master. During the discussion of the patch with Dmitry, he noticed that
> to_tsquery() function shares same odd behavior:
> select to_tsquery('english', ' pg_class <-> pg');
> to_tsquery
> -----------------------------
> ( 'pg' & 'class' ) <-> 'pg'
> (1 row)

I've been thinking about this for a while, and it seems that this should
be fixed somewhere near parsetext(). Perhaps 'pg' and 'class' should
share the same position. After all, somebody could implement a parser
which would split some words using an arbitrary set of rules, for
instance "split all words containing digits". I propose merging this
patch provided that there are no objections.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
websearch_to_tsquery_v8.patch	text/x-diff	43.3 KB

In response to

Re: new function for tsquery creartion at 2018-04-03 14:13:20 from Aleksandr Parfenov

Responses

Re: new function for tsquery creartion at 2018-04-04 14:49:08 from Aleksandr Parfenov
Re: new function for tsquery creartion at 2018-04-05 16:56:07 from Teodor Sigaev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Craig Ringer	2018-04-04 14:42:18	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message	Bruce Momjian	2018-04-04 14:25:47	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS