From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Google Summer of Code 2008 |
Date: | 2008-03-09 02:38:20 |
Message-ID: | Pine.LNX.4.64.0803090532050.10010@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 8 Mar 2008, Jan Urbaski wrote:
>
>> Unfortunately, selectivity estimation for query is much difficult than just
>> estimate frequency of individual word.
>
> Sure, given something like 'cats & dogs'::tsquery the frequency of 'cat' and
> 'dog' won't suffice. But at least it's a starting point and if we estimate
> that 80% of the documents have 'dog' and 70% have 'cat' then we can tell for
> sure that at least 50% have both and that's a lot better than 0.1% that's
> being returned now.
certainly yes and given that most popular queries are single word query
this would very helpful in most cases.
The reason I though about ts_stat() improvement is that we could use its
statistics for incomplete search feature people requested, when
AND query like ( a & b &c ) rewrites to a set of AND|OR queries depending
on the terms occurency.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Warren Turkal | 2008-03-09 08:32:20 | timestamp datatype cleanup |
Previous Message | Oleg Bartunov | 2008-03-09 02:30:57 | Re: Google Summer of Code 2008 |