From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Matteo Beccati <php(at)beccati(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: Enhanced containment selectivity function |
Date: | 2005-08-04 15:49:55 |
Message-ID: | 16193.1123170595@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Matteo Beccati <php(at)beccati(dot)com> writes:
> Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
> restriction selectivity function for <@ is contsel, which returns a
> constant value of 0.001. So I started digging in the source code trying
> to understand how the default behaviour could be enhanced, and ended up
> writing a little patch which adds an alternative containment selectivity
> function (called "contstatsel") which is able to deliver better results.
After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment. In most of the core
geometric types, the "histogram" ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.
The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree < operator.
So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core. It might be best to call it something like
"parentsel", too, to avoid giving the impression that it has something
to do with 2-D containment.
Also, you should think about using the most-common-values list as well
as the histogram. I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population. Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Woodward | 2005-08-04 15:52:35 | Re: pg_dump -- data and schema only? |
Previous Message | Tino Wildenhain | 2005-08-04 15:38:47 | Re: pg_dump -- data and schema only? |
From | Date | Subject | |
---|---|---|---|
Next Message | Matteo Beccati | 2005-08-04 17:09:30 | Re: Enhanced containment selectivity function |
Previous Message | Tom Lane | 2005-08-04 15:12:58 | Re: Enhanced containment selectivity function |