Re: multivariate statistics v14

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: jeff(dot)janes(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multivariate statistics v14
Date: 2016-03-16 13:45:46
Message-ID: 743161cc-771d-6e2b-c43f-4f2bd1fe6d42@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/16/2016 03:58 AM, Tatsuo Ishii wrote:
> I apology if it's already discussed. I am new to this patch.
>
>> Attached is v15 of the patch series, fixing this and also doing quite a
>> few additional improvements:
>>
>> * added some basic examples into the SGML documentation
>>
>> * addressing the objectaddress omissions, as pointed out by Alvaro
>>
>> * support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA
>>
>> * significant refactoring of MCV and histogram code, particularly
>> serialization, deserialization and building
>>
>> * reworking the functional dependencies to support more complex
>> dependencies, with multiple columns as 'conditions'
>>
>> * the reduction using functional dependencies is also significantly
>> simplified (I decided to get rid of computing the transitive closure
>> for now - it got too complex after the multi-condition dependencies,
>> so I'll leave that for the future
>
> Do you have any other missing parts in this work? I am asking
> because I wonder if you want to push this into 9.6 or rather 9.7.

I think the first few parts of the patch series, namely:

* shared infrastructure (0002)
* functional dependencies (0003)
* MCV lists (0004)
* histograms (0005)

might make it into 9.6. I believe the code for building and storing the
different kinds of stats is reasonably solid. What probably needs more
thorough review are the changes in clauselist_selectivity(), but the
code in these parts is reasonably simple as it only supports using a
single multi-variate statistics per relation.

The part (0006) that allows using multiple statistics (i.e. selects
which of the available stats to use and in what order) is probably the
most complex part of the whole patch, and I myself do have some
questions about some aspects of it. I don't think this part might get
into 9.6 at this point (although it'd be nice if we managed to do that).

I can also imagine moving the ndistinct pieces forward, in front of 0006
if that helps getting it into 9.6. There's a bit more work on making it
more flexible, though, to allow handling subsets columns (currently we
need a perfect match).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-03-16 13:56:37 Re: Declarative partitioning
Previous Message Tomas Vondra 2016-03-16 13:32:00 Re: multivariate statistics v14