From: | "Sven R(dot) Kunze" <srkunze(at)mail(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: multivariate statistics (v25) |
Date: | 2017-04-05 06:41:31 |
Message-ID: | 48c43f17-ecde-d582-6442-34516dd35a99@mail.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks Tomas and David for hacking on this patch.
On 04.04.2017 20:19, Tomas Vondra wrote:
> I'm not sure we still need the min_group_size, when evaluating
> dependencies. It was meant to deal with 'noisy' data, but I think it
> after switching to the 'degree' it might actually be a bad idea.
>
> Consider this:
>
> create table t (a int, b int);
> insert into t select 1, 1 from generate_series(1, 10000) s(i);
> insert into t select i, i from generate_series(2, 20000) s(i);
> create statistics s with (dependencies) on (a,b) from t;
> analyze t;
>
> select stadependencies from pg_statistic_ext ;
> stadependencies
> --------------------------------------------
> [{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
> (1 row)
>
> So the degree of the dependency is just ~0.333 although it's obviously
> a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
> reason is that we discard 2/3 of rows, because those groups are only a
> single row each, except for the one large group (1/3 of rows).
Just for me to follow the comments better. Is "dependency" roughly the
same as when statisticians speak about " conditional probability"?
Sven
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2017-04-05 06:42:27 | Re: Partition-wise join for join between (declaratively) partitioned tables |
Previous Message | Tsunakawa, Takayuki | 2017-04-05 06:37:35 | Re: Statement timeout behavior in extended queries |