From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | John Naylor <jcnaylor(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: MCV lists for highly skewed distributions |
Date: | 2018-03-17 18:40:03 |
Message-ID: | e23480dd-ae3d-561b-458d-5ac2931596ff@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/17/2018 07:28 PM, Dean Rasheed wrote:
> On 16 March 2018 at 15:26, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Actually, one question - when deciding whether to keep the item in the
>> MCV list, analyze_mcv_list only compares it's frequency with an average
>> of the rest. But as we're removing items from the MCV list, the average
>> frequency of the non-MCV items is growing (we're removing items with
>> higher and higher frequencies). That means the estimates for the least
>> common items will get higher and higher - essentially overestimates. So,
>> couldn't/shouldn't analyze_mcv_list consider this too?
>>
>
> Yes, that's the intention. At the start, sumcount is set to the count
> of all but the last (least common) MCV item, so it can estimate the
> frequency of the non-MCV items if the last MCV item were to be
> removed. Then each time through the loop, sumcount is decreased by the
> removed item's count, increasing the estimated frequency of the
> non-MCV items accordingly.
>
I know it's updated like that, but that's not quite what I meant. Sorry
for not making the question clearer, so let me rephrase it.
Currently, analyze_mcv_list only checks if the frequency of the current
item is significantly higher than the non-MCV selectivity. My question
is if it shouldn't also consider if removing the item from MCV would not
increase the non-MCV selectivity too much.
But perhaps that would be a silly idea, because the non-MCV items may
also be seen as a random noise.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2018-03-17 18:59:29 | Re: strange failure in plpgsql_control tests (on fulmar, ICC 14.0.3) |
Previous Message | Andres Freund | 2018-03-17 18:33:01 | Re: strange failure in plpgsql_control tests (on fulmar, ICC 14.0.3) |