From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com |
Cc: | tomas(dot)vondra(at)2ndquadrant(dot)com, dilipbalaut(at)gmail(dot)com, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, dean(dot)a(dot)rasheed(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, ishii(at)postgresql(dot)org, david(at)pgmasters(dot)net, michael(dot)paquier(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, petr(at)2ndquadrant(dot)com, jeff(dot)janes(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: multivariate statistics (v19) |
Date: | 2017-01-26 11:01:07 |
Message-ID: | 20170126.200107.11536715.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, I'll return on this since this should welcome more eyeballs.
At Thu, 26 Jan 2017 09:03:10 +0000, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> wrote in <4E72940DA2BF16479384A86D54D0988A565822A9(at)G01JPEXMBKW04>
> Hi
>
> When you have time, could you rebase the pathes?
> Some patches cannot be applied to the current HEAD.
For those who are willing to look this,
352a24a1f9d6f7d4abb1175bfd22acc358f43140 breaks this. So just
before it can accept this patches cleanly.
> 0001 patch can be applied but the following 0002 patch cannot be.
>
> I've just started reading your patch (mainly docs and README, not yet source code.)
>
> Though these are minor things, I've found some typos or mistakes in the document and README.
>
> >+ statistics on the table. The statistics will be created in the in the
> >+ current database. The statistics will be owned by the user issuing
>
> Regarding line 629 at 0002-PATCH-shared-infrastructure-and-ndistinct-coeffi-v22.patch,
> there is a double "in the".
>
> >+ knowledge of a value in the first column is sufficient for detemining the
> >+ value in the other column. Then functional dependencies are built on those
>
> Regarding line 701 at 0002-PATCH,
> "determining" is mistakenly spelled "detemining".
>
>
> >@@ -0,0 +1,98 @@
> >+Multivariate statististics
> >+==========================
>
> Regarding line 2415 at 0002-PATCH, "statististics" should be statistics
>
>
> >+ <refnamediv>
> >+ <refname>CREATE STATISTICS</refname>
> >+ <refpurpose>define a new statistics</refpurpose>
> >+ </refnamediv>
>
> >+ <refnamediv>
> >+ <refname>DROP STATISTICS</refname>
> >+ <refpurpose>remove a statistics</refpurpose>
> >+ </refnamediv>
>
> Regarding line 612 and 771 at 0002-PATCH,
> I assume saying "multiple statistics" explicitly is easier to understand to users
> since these commands don't for the statistics we already have in the pg_statistics in my understanding.
>
> >+ [1] http://en.wikipedia.org/wiki/Database_normalization
>
> Regarding line 386 at 0003-PATCH, is it better to change this link to this one:
> https://en.wikipedia.org/wiki/Functional_dependency ?
> README.dependencies cites directly above link.
>
> Though I pointed out these typoes and so on,
> I believe these feedback are less priority compared to the source code itself.
>
> So please work on my feedback if you have time.
README.dependencies
> dependencies, and for each one count the number of rows rows consistent it.
"of rows rows consistent it" => "or rows consistent with it"?
> are in fact consistent with the functinal dependency, i.e. that given the a
"that given the a" => "that given a" ?
dependencies.c:
dependency_dgree():
- The k is assumed larger than 1. I think assertion is required.
- "/* end of the preceding group */" seems to be better if it
is just after the "if (multi_sort.." currently just after it.
- The following comment seems mis-edited.
> * If there is a single are no contradicting rows, count the group
> * as supporting, otherwise contradicting.
maybe this would be like the following? The varialbe counting
the first "contradiction" is named "n_violations". This seems
somewhat confusing.
> * If there are no violating rows up to here, count the group
> * as supporting, otherwise contradicting.
- "/* first columns match, but the last one does not"
else if (multi_sort_compare_dims((k - 1), (k - 1), ...
The above comparison should use multi_sort_compare_dim, not
dims
- This function counts "n_contradicting_rows" but it is not
referenced. Anyway n_contradicting_rows = numrows -
n_supporing_rows so it and n_contradicting seem
unncecessary.
build_mv_dependencies():
- In the commnet,
"* covering jut 2 columns, to the largest ones, covering all columns"
"* included int the statistics. We start from the smallest ones because we"
l1: "jut" => "just", l2: "int" => "in"
mvstats.h:
- struct MVDependencyData/ MVDependenciesData
The varialbe length member at the last of the structs should
be defined using FLEXIBLE_ARRAY_MEMBER, from the convention.
- I'm not sure how much it impacts performance, but some
struct members seems to have a bit too wide types. For
example, MVDepedenciesData.type is of int32 but it can have
only '1' for now and it won't be two-digits. Also ndeps
cannot be so large.
common.c:
multi_sort_compare_dims needs comment.
general:
This patch uses int16 as the type of attrubute number but it
might be better to use AttrNumber for the purpose.
(Specifically it seems defined as the type for an attribute
index but also used as the varialbe for number of attributes)
Sorry for the random comment in advance. I'll learn this further.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Stas Kelvich | 2017-01-26 11:34:47 | Re: logical decoding of two-phase transactions |
Previous Message | Emre Hasegeli | 2017-01-26 10:53:28 | Re: Floating point comparison inconsistencies of the geometric types |