From: | Emre Hasegeli <emre(at)hasegeli(dot)com> |
---|---|
To: | Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Subject: | Re: Flexible configuration for full-text search |
Date: | 2017-10-31 08:47:57 |
Message-ID: | CAE2gYzyHtn6OF5LnKptRRodWLkOvsepnN9YUgmLRpMTVuw0mzA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I'm mostly happy with mentioned modifications, but I have few questions
> to clarify some points. I will send new patch in week or two.
I am glad you liked it. Though, I think we should get approval from
more senior community members or committers about the syntax, before
we put more effort to the code.
> But configuration:
>
> CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END
>
> is not (as I understand ELSE can be used only with KEEP).
>
> I think we should decide to allow or disallow usage of different
> dictionaries for match checking (between CASE and WHEN) and a result
> (after THEN). If answer is 'allow', maybe we should allow the
> third example too for consistency in configurations.
I think you are right. We better allow this too. Then the CASE syntax becomes:
CASE config
WHEN [ NO ] MATCH THEN { KEEP | config }
[ ELSE config ]
END
> Based on formal definition it is possible to describe this example in
> following manner:
> CASE english_noun WHEN MATCH THEN english_hunspell END
>
> The question is same as in the previous example.
I couldn't understand the question.
> Currently, stopwords increment position, for example:
> SELECT to_tsvector('english','a test message');
> ---------------------
> 'messag':3 'test':2
>
> A stopword 'a' has a position 1 but it is not in the vector.
Is this problem only applies to stopwords and the whole thing we are
inventing? Shouldn't we preserve the positions through the pipeline?
> If we want to save this behavior, we should somehow pass a stopword to
> tsvector composition function (parsetext in ts_parse.c) for counter
> increment or increment it in another way. Currently, an empty lexemes
> array is passed as a result of LexizeExec.
>
> One of possible way to do so is something like:
> CASE polish_stopword
> WHEN MATCH THEN KEEP -- stopword counting
> ELSE polish_isspell
> END
This would mean keeping the stopwords. What we want is
CASE polish_stopword -- stopword counting
WHEN NO MATCH THEN polish_isspell
END
Do you think it is possible?
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-10-31 09:15:59 | Re: Partition-wise join for join between (declaratively) partitioned tables |
Previous Message | Amit Langote | 2017-10-31 08:43:51 | Re: path toward faster partition pruning |