Re: Using multiple extended statistics for estimates

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using multiple extended statistics for estimates
Date: 2019-11-30 23:01:31
Message-ID: e9b8a0a2-024c-8819-ca09-9773d3719844@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/14/19 12:04 PM, Tomas Vondra wrote:
> On Thu, Nov 14, 2019 at 10:23:44AM -0800, Mark Dilger wrote:
>>
>>
>> On 11/14/19 7:55 AM, Tomas Vondra wrote:
>>> On Wed, Nov 13, 2019 at 10:04:36AM -0800, Mark Dilger wrote:
>>>>
>>>>
>>>> On 11/13/19 7:28 AM, Tomas Vondra wrote:
>>>>> Hi,
>>>>>
>>>>> here's an updated patch, with some minor tweaks based on the review
>>>>> and
>>>>> added tests (I ended up reworking those a bit, to make them more like
>>>>> the existing ones).
>>>>
>>>> Thanks, Tomas, for the new patch set!
>>>>
>>>> Attached are my review comments so far, in the form of a patch
>>>> applied on top of yours.
>>>>
>>>
>>> Thanks.
>>>
>>> 1) It's not clear to me why adding 'const' to the List parameters would
>>>   be useful? Can you explain?
>>
>> When I first started reviewing the functions, I didn't know if those
>> lists were intended to be modified by the function.  Adding 'const'
>> helps document that the function does not intend to change them.
>>
>
> Hmmm, ok. I'll think about it, but we're not really using const* in this
> way very much I think - at least not in the surrounding code.
>
>>> 2) I think you're right we can change find_strongest_dependency to do
>>>
>>>    /* also skip weaker dependencies when attribute count matches */
>>>    if (strongest->nattributes == dependency->nattributes &&
>>>        strongest->degree >= dependency->degree)
>>>        continue;
>>>
>>>   That'll skip some additional dependencies, which seems OK.
>>>
>>> 3) It's not clear to me what you mean by
>>>
>>>     * TODO: Improve this code comment.  Specifically, why would we
>>>     * ignore that no rows will match?  It seems that such a discovery
>>>     * would allow us to return an estimate of 0 rows, and that would
>>>     * be useful.
>>>
>>>   added to dependencies_clauselist_selectivity. Are you saying we
>>>   should also compute selectivity estimates for individual clauses and
>>>   use Min() as a limit? Maybe, but that seems unrelated to the patch.
>>
>> I mean that the comment right above that TODO is hard to understand.
>> You seem to be saying that it is good and proper to only take the
>> selectivity estimate from the final clause in the list, but then go on
>> to say that other clauses might prove that no rows will match.  So
>> that implies that by ignoring all but the last clause, we're ignoring
>> such other clauses that prove no rows can match.  But why would we be
>> ignoring those?
>>
>> I am not arguing that your code is wrong.  I'm just critiquing the
>> hard-to-understand phrasing of that code comment.
>>
>
> Aha, I think I understand now - thanks for the explanation. You're right
> the comment is trying to explain why just taking the last clause for a
> given attnum is fine. I'll try to make the comment clearer.

Are you planning to submit a revised patch for this?

--
Mark Dilger

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2019-11-30 23:22:26 Re: Should we add xid_current() or a int8->xid cast?
Previous Message David Fetter 2019-11-30 22:23:15 Re: Make autovacuum sort tables in descending order of xid_age