From: | Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
---|---|
To: | Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PROPOSAL] Text search configuration extension |
Date: | 2017-08-21 12:59:29 |
Message-ID: | 20170821125929.GA766@zakirov.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
On Fri, Aug 18, 2017 at 03:30:38PM +0300, Aleksandr Parfenov wrote:
> Hello hackers!
>
> I'm working on a new approach in text search configuration and want to
> share my thought with community in order to get some feedback and maybe
> some new ideas.
>
There are several cases, where the new syntax could be useful:
https://www.postgresql.org/message-id/4733B65A.9030707@students.mimuw.edu.pl
Firstly check is lexeme stopword or not, and only then normalize it.
https://www.postgresql.org/message-id/c6851b7e-da25-3d8e-a5df-022c395a11b4%40postgrespro.ru
Support union of outputs of several dictionaries.
https://www.postgresql.org/message-id/46D57E6F.8020009%40enterprisedb.com
Support of chain of dictionaries using MAP BY operator.
The basic idea of the approach is to bring to a user more control of text search configurations without writing additional or modifing existing dictionaries.
> ALTER TEXT SEARCH CONFIGURATION en_de_search ADD MAPPING FOR asciiword,
> word WITH
> CASE
> WHEN english_hunspell IS NOT NULL THEN english_hunspell
> WHEN german_hunspell IS NOT NULL THEN german_hunspell
> ELSE
> -- stem dictionaries can't be used for language detection
> english_stem UNION german_stem
> END;
For example, the configuration mentioned above will bring the following results:
=# select d @@ q, d, q from to_tsvector('german_hunspell', 'Dieser Hund wollte ihn jedoch nicht nach Hause begleiten') d, to_tsquery('en_de_search', 'hause') q;
?column? | d | q
----------+----------------------------------------------+----------
t | 'begleiten':9 'hausen':8 'hund':2 'jedoch':5 | 'hausen'
(1 row)
This configuration is useful when a query language is unknown.
Best regards,
--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Rofail | 2017-08-21 13:43:12 | Re: GSoC 2017: Foreign Key Arrays |
Previous Message | Michael Paquier | 2017-08-21 12:51:24 | Re: [JDBC] Channel binding support for SCRAM-SHA-256 |