From: | "Tomas Vondra" <tv(at)fuzzy(dot)cz> |
---|---|
To: | "Katharina Büchse" <katharina(dot)buechse(at)uni-jena(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: WIP: multivariate statistics / proof of concept |
Date: | 2014-11-13 16:42:25 |
Message-ID: | 8106e11197849725375a933e1cc1409f.squirrel@2.emaily.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a):
> On 13.11.2014 14:11, Tomas Vondra wrote:
>
>> The only place where I think this might work are the associative rules.
>> It's simple to specify rules like ("ZIP code" implies "city") and we
>> could
>> even do some simple check against the data to see if it actually makes
>> sense (and 'disable' the rule if not).
>
> and even this simple example has its limits, at least in Germany ZIP
> codes are not unique for rural areas, where several villages have the
> same ZIP code.
>
> I guess there are just a few examples where columns are completely
> functional dependent without any exceptions.
> But of course, if the user gives this information just for optimization
> the statistics, some exceptions don't matter.
> If this information should be used for creating different execution
> plans (e.g. on column A is an index and column B is functional
> dependent, one could think about using this index on A and the
> dependency instead of running through the whole table to find all tuples
> that fit the query on column B), exceptions are a very important issue.
Yes, exactly. The aim of this patch is "only" improving estimates, not
removing conditions from the plan (e.g. checking only the ZIP code and not
the city name). That certainly can't be done solely based on approximate
statistics, and as you point out most real-world data either contain bugs
or are inherently imperfect (we have the same kind of ZIP/city
inconsistencies in Czech). That's not a big issue for estimates (assuming
only small fraction of rows violates the rule) though.
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Banck | 2014-11-13 16:50:20 | Re: controlling psql's use of the pager a bit more |
Previous Message | Andrew Dunstan | 2014-11-13 16:41:37 | Re: controlling psql's use of the pager a bit more |