From: | Mark Dilger <hornschnorter(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2017-12-19 19:17:41 |
Message-ID: | 7F76E900-9A20-4640-A431-83B0602C3213@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Nov 27, 2017, at 8:47 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> Attached is an updated version of the patch series, fixing the issues
> reported by Mark Dilger:
>
> 1) Fix fabs() issue in histogram.c.
>
> 2) Do not rely on extra_data being StdAnalyzeData, and instead lookup
> the LT operator explicitly. This also adds a simple regression tests to
> make sure ANALYZE on arrays works fine, but perhaps we should invent
> some simple queries too.
>
> 3) I've removed / clarified some of the comments mentioned by Mark.
>
> 4) I haven't changed how the statistics kinds are defined in relation.h,
> but I agree there should be a comment explaining how STATS_EXT_INFO_*
> relate to StatisticExtInfo.kinds.
>
> 5) The most significant change happened histograms. There used to be two
> structures for histograms:
>
> - MVHistogram - expanded (no deduplication etc.), result of histogram
> build and never used for estimation
>
> - MVSerializedHistogram - deduplicated to save space, produced from
> MVHistogram before storing in pg_statistic_ext and never used for
> estimation
>
> So there wasn't really any reason to expose the "non-serialized" version
> outside histogram.c. It was just confusing and unnecessary, so I've
> moved MVHistogram to histogram.c (and renamed it to MVHistogramBuild),
> and renamed MVSerializedHistogram. And same for the MVBucket stuff.
>
> So now we only deal with MVHistogram everywhere, except in histogram.c.
>
> 6) I've also made MVHistogram to include a varlena header directly (and
> be packed as a bytea), which allows us to store it without having to
> call any serialization functions).
>
> I guess if we should do (5) and (6) for the MCV lists too, it seems more
> convenient than the current approach. And perhaps even for the
> statistics added to 9.6 (it does not change the storage format).
I tested your latest patches on my mac os x laptop and got one test
failure due to the results of 'explain' coming up differently. For the record,
I followed these steps:
cd postgresql/
git pull
# this got my directory up to 8526bcb2df76d5171b4f4d6dc7a97560a73a5eff with no local changes
patch -p 1 < ../0001-multivariate-MCV-lists.patch
patch -p 1 < ../0002-multivariate-histograms.patch
./configure --prefix=/Users/mark/master/testinstall --enable-cassert --enable-tap-tests --enable-depend && make -j4 && make check-world
mark
Attachment | Content-Type | Size |
---|---|---|
regression.diffs | application/octet-stream | 10.9 KB |
stats_ext.out | application/octet-stream | 31.0 KB |
unknown_filename | text/plain | 2 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2017-12-19 19:38:08 | Re: WIP: BRIN multi-range indexes |
Previous Message | David Fetter | 2017-12-19 18:59:18 | Re: WIP: a way forward on bootstrap data |