From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Updated tsearch documentation |
Date: | 2007-06-20 20:24:11 |
Message-ID: | 200706202024.l5KKOBK11446@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-advocacy pgsql-hackers |
Oleg Bartunov wrote:
> On Sun, 17 Jun 2007, Bruce Momjian wrote:
>
> > I have completed my first pass over the tsearch documentation:
> >
> > http://momjian.us/expire/fulltext/HTML/sql.html
> >
> > They are from section 14 and following.
> >
> > I have come up with a number of questions that I placed in SGML comments
> > in these files:
> >
> > http://momjian.us/expire/fulltext/SGML/
> >
> > Teodor/Oleg, let me know when you want to go over my questions.
>
> Below are my answers (marked as )
OK.
>
> Comments to editorial work of Bruce Momjian.
>
> fulltext-intro.sgml:
>
> it is useful to have a predefined list of lexemes.
>
>Bruce, here should be list of types of lexemes !
Agreed. Are the list of lexemes parser-specific?
> </para></listitem>
>
> <!--
> SEEMS UNNECESSARY
> It useless to attempt normalize <type>email address</type> using
> morphological dictionary of russian language, but looks reasonable to pick
> out <type>domain name</type> and be able to search for <type>domain
> name</type>.
> -->
>
> I dont' understand where did you get this para :)
Uh, it was in the SGML. I have removed it.
> fulltext-opfunc.sgml:
>
> All of the following functions that accept a configuration argument can
> use either an integer <!-- why an integer --> or a textual configuration
> name to select a configuration.
>
> originally it was integer id, probably better use <type>oid</type>
Uh, my question is why are you allowing specification as an integer/oid
when the name works just fine. I don't see the value in allowing
numbers here.
> This returns the query used for searching an index. It can be used to test
> for an empty query. The <command>SELECT</> below returns <literal>'T'</>,
> <!-- lowercase? --> which corresponds to an empty query since GIN indexes
> do not support negate queries (a full index scan is inefficient):
>
> > capital case. This looks cumbersome, probably querytree() should
> > just return NULL.
Agreed.
> The integer option controls several behaviors which is done using bit-wise
> fields and <literal>|</literal> (for example, <literal>2|4</literal>):
> <!-- why so complex? -->
>
> > to avoid 2 arguments
But I don't see why you would want to set two of those values --- they
seem mutually exclusive, e.g.
1 divides the rank by the 1 + logarithm of the document length
2 divides the rank by the length itself
I assume you do either one, not both.
> its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; <!-- n
> if none is specified that the current configuration is used.
>
> > I don't understand this question
Same issue as above --- why allow a number here when the name works just
fine. We don't allow tables to be specified by number, so why
configurations?
> <para>
> <!-- why? -->
> Note that the cascade dropping of the <function>headline</function> function
> cause dropping of the <literal>parser</literal> used in fulltext configuration
> <replaceable>tsname</replaceable>.
> </para>
>
> > hmm, probably it should be reversed - cascade dropping of the parser cause
> > dropping of the headline function.
Agreed.
>
> In example below, <literal>fulltext_idx</literal> is
> a GIN index:<!-- why isn't this automatic -->
>
> > It's explained above. The problem is that current index api doesn't allow
> > to say if search was lossy or exact, so to preserve performance of
> > GIN index we had to introduce @@@ operator, which is the same as @@, but
> > lossy.
Well, then we have to fix the API. Telling users to use a different
operator based on what index is defined is just bad style.
> nly the <token>lword</token> lexeme, then a <acronym>TZ</acronym>
> definition like ' one 1:11' will not work since lexeme type
> <token>digit</token> is not assigned to the <acronym>TZ</acronym>.
> <!-- what do these numbers mean? -->
> </para>
OK, I changed it to be clearer.
> > nothing special, just numbers for example.
>
> <function>ts_debug</> displays information about every token of
> <replaceable class="PARAMETER">document</replaceable> as produced by the
> parser and processed by the configured dictionaries using the configuration
> specified by <replaceable class="PARAMETER">cfgname</replaceable> or
> <replaceable class="PARAMETER">oid</replaceable>. <!-- no need for oid
>
> > don't understand this comment. ts_debug accepts cfgname or its oid
Again, no need for oid.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2007-06-20 20:44:53 | Re: Updated tsearch documentation |
Previous Message | Andrew Sullivan | 2007-06-20 19:45:33 | Re: On managerial choosing (was: Postgres VS Oracle) |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-06-20 20:37:37 | Re: GUC time unit spelling a bit inconsistent |
Previous Message | Peter Eisentraut | 2007-06-20 20:23:26 | Re: GUC time unit spelling a bit inconsistent |