From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2019-01-22 14:43:35 |
Message-ID: | CAKJS1f-gQVQ6Je+vA6GjK_NKcNAL8haa47=gWet6vgkNMXQ+hQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 18 Jan 2019 at 10:03, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> thanks for the review. The attached patches address most of the issues
> mentioned in the past several messages, both in the MCV and histogram parts.
I made another pass over the 0001 patch. I've not read through mcv.c
again yet. Will try to get to that soon.
0001-multivariate-MCV-lists-20190117.patch
1. The following mentions "multiple functions", but lists just 1 function.
<para>
To inspect statistics defined using <command>CREATE STATISTICS</command>
command, <productname>PostgreSQL</productname> provides multiple functions.
</para>
2. There's a mix of usages of <literal>MCV</literal> and
<acronym>MCV</acronym> around the docs. Should these be the same?
3. analyze_mcv_list() is modified to make it an external function, but
it's not used anywhere out of analyze.c
4. The following can be simplified further:
* We can also leave the record as it is if there are no statistics
* including the datum values, like for example MCV lists.
*/
if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
reset_stats = true;
/*
* If we can leave the statistics as it is, just do minimal cleanup
* and we're done.
*/
if (!reset_stats)
{
ReleaseSysCache(oldtup);
return;
}
to just:
/*
* When none of the defined statistics types contain datum values
* from the table's columns then there's no need to reset the stats.
* Functional dependencies and ndistinct stats should still hold true.
*/
if (!statext_is_kind_built(oldtup, STATS_EXT_MCV))
{
ReleaseSysCache(oldtup);
return;
}
5. "so that we can ignore them below." seems misplaced now since
you've moved all the code below into clauselist_selectivity_simple().
Maybe you can change it to "so that we can inform
clauselist_selectivity_simple about clauses that it should ignore" ?
* filled with the 0-based list positions of clauses used that way, so
* that we can ignore them below.
6. README.mcv: multi-variate -> multivariate
are large the list may be quite large. This is especially true for multi-variate
7. README.mcv: similar -> a similar
it impossible to use anyarrays. It might be possible to produce similar
8. I saw you added IS NOT NULL to README.mcv, but README just mentions:
(b) MCV lists - equality and inequality clauses (AND, OR, NOT), IS NULL
Should that mention IS NOT NULL too?
9. The header comment for build_attnums_array() claims that it
"transforms an array of AttrNumber values into a bitmap", but it does
the opposite.
* Transforms an array of AttrNumber values into a bitmap.
10. The following Assert is not entirely useless. The bitmapset could
have a 0 member, but it can't store negative values.
while ((j = bms_next_member(attrs, j)) >= 0)
{
/* user-defined attributes only */
Assert(AttrNumberIsForUserDefinedAttr(j));
Just checking you thought of that when you added it?
11. XXX comments are normally reserved for things we may wish to
reconsider later, but the following seems more like a "Note:"
* XXX All the memory is allocated in a single chunk, so that the caller
* can simply pfree the return value to release all of it.
12. In statext_is_compatible_clause_internal() there's still a comment
that mentions "Var op Const", but Const op Var is also okay too.
13. This is not fall-through. Generally, such a comment is reserved
to confirm that the "break;" is meant to be missing.
default:
/* fall-through */
return false;
https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/
mentions various comment patterns that are used for that case. Your
case seems misplaced since it's right about a return, and not another
case.
14. The header comment for statext_is_compatible_clause() is not
accurate. It mentions only opexprs with equality operations are
allowed, but none of those are true.
* Only OpExprs with two arguments using an equality operator are supported.
* When returning True attnum is set to the attribute number of the Var within
* the supported clause.
15. statext_clauselist_selectivity(): "a number" -> "the number" ?
* Selects the best extended (multi-column) statistic on a table (measured by
* a number of attributes extracted from the clauses and covered by it), and
16. I understand you're changing this to a bitmask in the 0002 patch,
but int is the wrong type here;
/* we're interested in MCV lists */
int types = STATS_EXT_MCV;
Maybe just pass the STATS_EXT_MCV directly, or at least make it a char.
17. bms_membership(clauses_attnums) != BMS_MULTIPLE seems better here.
It can stop once it finds 2. No need to count them all.
/* We need at least two attributes for MCV lists. */
if (bms_num_members(clauses_attnums) < 2)
return 1.0;
18. The following comment in statext_is_compatible_clause_internal()
does not seem to be true. I see OpExprs are supported and NULL test,
including others too.
/* We only support plain Vars for now */
19. The header comment for clauselist_selectivity_simple() does not
mention what estimatedclauses is for.
20. New line. Also, missing "the" before "maximum"
+ * We
+ * iteratively search for multivariate n-distinct with maximum number
21. This comment seems like it's been copied from
estimate_num_groups() without being edited.
/* we're done with this relation */
varinfos = NIL;
Looks like it's using this to break out of the loop.
22. I don't see any dividing going on below this comment:
/*
* Sanity check --- don't divide by zero if empty relation.
*/
23. I see a few tests mentioning: "-- check change of unrelated column
type does not reset the MCV statistics"
Would it be better to just look at pg_statistic_ext there and do something like:
SELECT COUNT(*) FROM pg_statistic_ext WHERE stxname = 'whatever' AND
stxmcv IS NOT NULL;
Otherwise, you seem to be ensuring the stats were not reset by looking
at a query plan, so it's a bit harder to follow and likely testing
more than it needs to.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2019-01-22 14:46:59 | Re: TestForOldSnapshot() seems to be in the wrong place |
Previous Message | Kevin Grittner | 2019-01-22 14:33:48 | Re: Refactoring the checkpointer's fsync request queue |