From: | Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: new function for tsquery creartion |
Date: | 2018-03-27 22:29:48 |
Message-ID: | 9153fb21831d0ba7c832960f37df2688@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi everyone,
I'd like to share some intermediate results. Here's what has changed:
1. OR operator is now case-insensitive. Moreover, trailing whitespace is
no longer used to identify it:
select websearch_to_tsquery('simple', 'abc or');
websearch_to_tsquery
----------------------
'abc' & 'or'
(1 row)
select websearch_to_tsquery('simple', 'abc or(def)');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)
select websearch_to_tsquery('simple', 'abc or!def');
websearch_to_tsquery
----------------------
'abc' | 'def'
(1 row)
2. AROUND(N) has been dropped. I hope that <N, M> operator will allow us
to implement it with a few lines of code.
3. websearch_to_tsquery() now tolerates various syntax errors, for
instance:
Misused operators:
'abc &'
'| abc'
'<- def'
Missing parentheses:
'abc & (def <-> (cat or rat'
Other sorts of nonsense:
'abc &--|| def' => 'abc' & !!'def'
'abc:def' => 'abc':D & 'ef'
This, however, doesn't mean that the result will always be adequate (who
would have thought?). Overall, current implementation follows the GIGO
principle. In theory, this would allow us to use user-supplied websearch
strings (but see gotchas), even if they don't make much sense. Better
then nothing, right?
4. A small refactoring: I've replaced all WAIT* macros with a enum for
better debugging (names look much nicer in GDB). Hope this is
acceptable.
5. Finally, I've added a few more comments and tests. I haven't checked
the code coverage, though.
A few gotchas:
I haven't touched gettoken_tsvector() yet. As a result, the following
queries produce errors:
select websearch_to_tsquery('simple', '''');
ERROR: syntax error in tsquery: "'"
select websearch_to_tsquery('simple', '\');
ERROR: there is no escaped character: "\"
Maybe there's more. The question is: should we fix those, or it's fine
as it is? I don't have a strong opinion about this.
--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
websearch_to_tsquery_v1.diff | text/x-diff | 28.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Chapman Flack | 2018-03-27 22:32:08 | Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility |
Previous Message | Tom Lane | 2018-03-27 22:02:47 | Re: Undesirable entries in typedefs list |