From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Mark Dilger <hornschnorter(at)gmail(dot)com> |
Cc: | Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2017-11-27 16:47:13 |
Message-ID: | e410d526-ab5c-0ca3-672b-6a1876a42ea8@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Attached is an updated version of the patch series, fixing the issues
reported by Mark Dilger:
1) Fix fabs() issue in histogram.c.
2) Do not rely on extra_data being StdAnalyzeData, and instead lookup
the LT operator explicitly. This also adds a simple regression tests to
make sure ANALYZE on arrays works fine, but perhaps we should invent
some simple queries too.
3) I've removed / clarified some of the comments mentioned by Mark.
4) I haven't changed how the statistics kinds are defined in relation.h,
but I agree there should be a comment explaining how STATS_EXT_INFO_*
relate to StatisticExtInfo.kinds.
5) The most significant change happened histograms. There used to be two
structures for histograms:
- MVHistogram - expanded (no deduplication etc.), result of histogram
build and never used for estimation
- MVSerializedHistogram - deduplicated to save space, produced from
MVHistogram before storing in pg_statistic_ext and never used for
estimation
So there wasn't really any reason to expose the "non-serialized" version
outside histogram.c. It was just confusing and unnecessary, so I've
moved MVHistogram to histogram.c (and renamed it to MVHistogramBuild),
and renamed MVSerializedHistogram. And same for the MVBucket stuff.
So now we only deal with MVHistogram everywhere, except in histogram.c.
6) I've also made MVHistogram to include a varlena header directly (and
be packed as a bytea), which allows us to store it without having to
call any serialization functions).
I guess if we should do (5) and (6) for the MCV lists too, it seems more
convenient than the current approach. And perhaps even for the
statistics added to 9.6 (it does not change the storage format).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-multivariate-MCV-lists.patch.gz | application/gzip | 31.5 KB |
0002-multivariate-histograms.patch.gz | application/gzip | 60.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | amul sul | 2017-11-27 16:51:26 | Re: [HACKERS] Parallel Append implementation |
Previous Message | Erik Rijkers | 2017-11-27 16:40:08 | Re: Add RANGE with values and exclusions clauses to the Window Functions |