| From: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> | 
|---|---|
| To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> | 
| Cc: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: gsoc, oprrest function for text search | 
| Date: | 2008-07-29 07:27:11 | 
| Message-ID: | 488EC64F.20701@students.mimuw.edu.pl | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Heikki Linnakangas wrote:
> Jan Urbański wrote:
>> Here's a WIP patch implementing an oprrest function for tsvector @@ 
>> tsquery and tsquery @@ tsvector.
>>
>> The idea is (quoting a comment)
>> /*
>>  *  Traverse the tsquery preorder, calculating selectivity as:
>>  *
>>  *   selec(left_oper) * selec(right_oper) in AND nodes,
>>  *
>>  *   selec(left_oper) + selec(right_oper) -
>>  *      selec(left_oper) * selec(right_oper) in OR nodes,
>>  *
>>  *   1 - select(oper) in NOT nodes
>>  *
>>  *   freq[val] in VAL nodes, if the value is in MCELEM
>>  *   min(freq[MCELEM]) / 2 in VAL nodes, if it is not
> 
> Seems reasonable.
> 
>>  *
>>  * Implementation-wise, we sort the MCELEM array to use binary
>>  * search on it.
>>  */
> 
> Would it be possible to store the array in sorted order, to avoid 
> sorting it on every invocation of tssel?
It's being stored sorted on frequencies, like so:
[('dog', 0.9), ('cat', 0.8), ('sheep', 0.7)]
and I need it sorted on elements for bsearch().
I don't know if it's OK to break the rule that statistical data is 
stored sorted on freqneucies. If so, then ts_typanalyze() would have to 
change and do one more qsort() before storing the result.
-- 
Jan Urbanski
GPG key ID: E583D7D2
ouden estin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2008-07-29 07:53:26 | Re: Do we really want to migrate plproxy and citext into PG core distribution? | 
| Previous Message | Heikki Linnakangas | 2008-07-29 07:23:49 | Re: gsoc, oprrest function for text search |