From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: proposal : cross-column stats |
Date: | 2010-12-13 19:22:38 |
Message-ID: | 4D06727E.7030009@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dne 13.12.2010 18:59, Joshua Tolley napsal(a):
> On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote:
>> Another quick note: I think that storing the full contingency table is
>> wasteful since the marginals are already stored in the single column
>> statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
>> looking at this a couple years back ).
>
> Josh Tolley still looks at it occasionally, though time hasn't permitted any
> sort of significant work for quite some time. The multicolstat branch on my
> git.postgresql.org repository will create an empirical copula each
> multi-column index, and stick it in pg_statistic. It doesn't yet do anything
> useful with that information, nor am I convinced it's remotely bug-free. In a
> brief PGCon discussion with Tom a while back, it was suggested a good place
> for the planner to use these stats would be clausesel.c, which is responsible
> for handling code such as "...WHERE foo > 4 AND foo > 5".
Well, that's good news ;-)
I've done a bit of research today, and I've found some quite interesting
papers on this topic (probably, I did not have time to read them, in
most cases I've read just the title and abstract).
[1] Selectivity estimators for multidimensional range queries over real
attributes
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.914
This seems like a very good starting point. AFAIK it precisely
describes what data need to be collected, how to do the math etc.
[2] Selectivity Estimation Without the Attribute Value Independence
Assumption
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.8126
This obviously deals with the independence problem. Haven't
investigated it further, but it seems worth to read.
[3] On Analyzing the Cost of Queries with Multi-Attribute Restrictions
and Sort Operations (A Cost Function for Uniformly Partitioned
UB-Trees)
http://mistral.in.tum.de/results/publications/MB00_ideas.pdf
Describes something called UB-Tree, and shows how it may be used to
do estimates. Might be interesting as an alternative to the
traditional histograms.
There are more details about UB-Trees at
http://mistral.in.tum.de/results/publications/
[4] http://www.dbnet.ece.ntua.gr/~nikos/edith/qopt_bibl/
A rather nice collection of papers related to estimation (including
some of the papers listed above).
Hm, I planned to finally read the "Understanding MySQL Internals" over
the Xmas ... that obviously won't happen.
regards
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-12-13 19:49:34 | Re: CommitFest wrap-up |
Previous Message | David Fetter | 2010-12-13 19:19:07 | Re: CommitFest wrap-up |