From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: <> join selectivity estimate question |
Date: | 2017-07-20 12:00:35 |
Message-ID: | CAEepm=0kPC1pLBiNnBQgs6AOYM6dXxEepCb1SFgvhRrh6WwcjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 20, 2017 at 11:47 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't think it does really. The thing about a <> semijoin is that it
>>> will succeed unless *every* join key value from the inner query is equal
>>> to the outer key value (or is null). That's something we should consider
>>> to be of very low probability typically, so that the <> selectivity should
>>> be estimated as nearly 1.0. If the regular equality selectivity
>>> approaches 1.0, or when there are expected to be very few rows out of the
>>> inner query, then maybe the <> estimate should start to drop off from 1.0,
>>> but it surely doesn't move linearly with the equality selectivity.
>>
>> Ok, here I go like a bull in a china shop: please find attached a
>> draft patch. Is this getting warmer?
>>
>> In the comment for JOIN_SEMI I mentioned a couple of refinements I
>> thought of but my intuition was that we don't go for such sensitive
>> and discontinuous treatment of stats; so I made the simplifying
>> assumption that RHS always has more than 1 distinct value in it.
>>
>> Anti-join <> returns all the nulls from the LHS, and then it only
>> returns other LHS rows if there is exactly one distinct non-null value
>> in RHS and it happens to be that one. But if we make the same
>> assumption I described above, namely that there are always at least 2
>> distinct values on the RHS, then the join selectivity is just
>> nullfrac.
>>
>
> The patch looks good to me.
>
> + /*
> + * For semi-joins, if there is more than one distinct key in the RHS
> + * relation then every non-null LHS row must find a match since it can
> + * only be equal to one of them.
> The word "match" confusing. Google's dictionary entry gives "be equal
> to (something) in quality or strength." as its meaning. May be we want
> to reword it as "... LHS row must find a joining row in RHS ..."?
Thanks! Yeah, here's a version with better comments.
Does anyone know how to test a situation where the join is reversed according to
get_join_variables, or "complicated cases where we can't tell for sure"?
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
neqjoinsel-fix-v2.patch | application/octet-stream | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | tushar | 2017-07-20 12:09:43 | pg_upgrade failed if view contain natural left join condition |
Previous Message | Craig Ringer | 2017-07-20 11:52:38 | Re: [PATCH] pageinspect function to decode infomasks |