From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2019-01-17 03:42:34 |
Message-ID: | CAKJS1f9C+=PZ4Ei-OCYTVk5xo84d6LmFQDkJwp9wGNQ38Ajoag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 17 Jan 2019 at 01:56, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> At this stage I'm trying to get to know the patch. I read a lot of
> discussing between you and Dean ironing out how the stats should be
> used to form selectivities. At the time I'd not read the patch yet,
> so most of it went over my head.
>
> I did note down a few things on my read. I've included them below.
> Hopefully, they're useful.
>
> MCV list review
Part 2:
35. The evaluation order of this macro is wrong.
#define ITEM_SIZE(ndims) \
(ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
You'd probably want ITEM_SIZE(10) to return 170, but:
select (10 * (2 + 1) + 2 * 8);
?column?
----------
46
Unsure why this does not cause a crash.
ndims should also have parenthesis around it in case someone does
ITEM_SIZE(x + y), likewise for the other ITEM_* macros.
36. Could do with some comments in get_mincount_for_mcv_list(). What's
magic about 0.04?
37. I think statext_mcv_build() needs some comments to detail out the
arguments. For example can attrs be empty? Must it contain at least 2
members? etc.
38. Too many "it"s
* we simply treat it as a special item in the MCV list (it it makes it).
39. I don't see analyze_mcv_list() being used anywhere around this comment:
* If we can fit all the items onto the MCV list, do that. Otherwise use
* analyze_mcv_list to decide how many items to keep in the MCV list, just
* like for the single-dimensional MCV list.
40. The comment in the above item seems to indicate the condition for
when all items can fit in the number of groups, but the if condition
does not seem to allow for an exact match?
if (ngroups > nitems)
if you want to check if the number of items can fit in the number of
groups should it be: if (ngroups >= nitems) or if (nitems <= ngroups)
? Perhaps I've misunderstood. The comment is a little confusing as I'm
not sure where the "Otherwise" code is located.
41. I don't think palloc0() is required here. palloc() should be fine
since you're initialising each element in the loop.
mcvlist->items = (MCVItem * *) palloc0(sizeof(MCVItem *) * nitems);
for (i = 0; i < nitems; i++)
{
mcvlist->items[i] = (MCVItem *) palloc(sizeof(MCVItem));
mcvlist->items[i]->values = (Datum *) palloc(sizeof(Datum) * numattrs);
mcvlist->items[i]->isnull = (bool *) palloc(sizeof(bool) * numattrs);
}
I think I agree with the comment above that chunk about reducing the
number of pallocs, even if it's just allocating the initial array as
MCVItems instead of pointers to MCVItems
42. I don't think palloc0() is required in build_distinct_groups().
palloc() should be ok.
Maybe it's worth an Assert(j + 1 == ngroups) to ensure
count_distinct_groups got them all?
43. You're assuming size_t and long are the same size here.
elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
I know at least one platform where that's not true.
44. Should use DatumGetCString() instead of DatumGetPointer().
else if (info[dim].typlen == -2) /* cstring */
{
memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
data += strlen(DatumGetPointer(v)) + 1; /* terminator */
}
45. No need to set this to NULL.
Datum *v = NULL;
Is "value" a better name than "v"?
46. What's the extra 'd' for in:
elog(ERROR, "invalid MCV magic %d (expected %dd)",
and
elog(ERROR, "invalid MCV type %d (expected %dd)",
47. Wondering about the logic behind the variation between elog() and
ereport() in statext_mcv_deserialize(). They all looks like "can't
happen" type errors.
48. format assumes size_t is the same size as long.
elog(ERROR, "invalid MCV size %ld (expected %ld)",
VARSIZE_ANY_EXHDR(data), expected_size);
49. palloc0() followed by memset(). Can just use palloc().
matches = palloc0(sizeof(char) * mcvlist->nitems);
memset(matches, (is_or) ? STATS_MATCH_NONE : STATS_MATCH_FULL,
sizeof(char) * mcvlist->nitems);
50. The coding style in mcv_get_match_bitmap I think needs to be
postgresqlified. We normally declare all our variables in a chunk then
start setting them, unless the assignment is very simple. I don't
recall places in the code where have a comment when declaring a
variable, for example.
FmgrInfo gtproc;
Var *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
Const *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
bool isgt = (!varonleft);
TypeCacheEntry *typecache
= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);
/* match the attribute to a dimension of the statistic */
int idx = bms_member_index(keys, var->varattno);
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2019-01-17 04:18:14 | Re: Delay locking partitions during query execution |
Previous Message | Amit Kapila | 2019-01-17 03:35:40 | Re: WIP: Avoid creation of the free space map for small tables |