From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at> |
Subject: | Re: WIP: cross column correlation ... |
Date: | 2011-02-26 18:44:52 |
Message-ID: | AANLkTinD_vPt_d5tzGKXfJURYzq5=5mCa8K8_G1Sr8+O@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
> what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on.
>
The problem is that we haven't figured out how to usefully store a
histogram for <a,b>. Consider the oft-quoted example of a
<city,postal-code> -- or <city,zip code> for Americans. A histogram
of the tuple is just the same as a histogram on the city. It doesn't
tell you how much extra selectivity the postal code or zip code gives
you. And if you happen to store a histogram of <postal code, city> by
mistake then it doesn't tell you anything at all.
We need a data structure that lets us answer the bayesian question
"given a city of New York how selective is zip-code = 02139". I don't
know what that data structure would be.
Heikki and I had a wacky hand-crafted 2D histogram data structure that
I suspect doesn't actually work. And someone else did some research on
list and came up with a fancy sounding name of a statistics concept
that might be what we want.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2011-02-26 18:58:20 | Re: WIP: cross column correlation ... |
Previous Message | Nick Raj | 2011-02-26 18:43:12 | Spatio-Temporal Functions |