Re: Question on explain

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "info(at)enricopirozzi(dot)info" <info(at)enricopirozzi(dot)info>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question on explain
Date: 2013-06-10 02:19:57
Message-ID: CAMkU=1wPOHktPNTz-k9bR7OGCBuM9ieyu=qd6p3n+LA7rj9mkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Saturday, June 8, 2013, Jeff Janes wrote:

> On Sat, Jun 8, 2013 at 8:06 AM, Enrico Pirozzi <sscotty71(at)gmail(dot)com<javascript:_e({}, 'cvml', 'sscotty71(at)gmail(dot)com');>
> > wrote:
>
>> Hi all,
>>
>> I have 2 tables:
>>
>> table1 (field1,field2,.....)
>> table2 (field1,field2,field3,.....)
>>
>> field1 is an uuid type
>>
>
> Is it a primary key? Is there a foreign key constraint between them?
>
>
>> field2 is a timestamp with time zone type.
>>
>> If I execute:
>>
>> explain (analyze on, timing off)
>> select B.field3,A.field1,A.field2
>> FROM table1 A INNER JOIN table2 B
>> using (field1,field2)
>>
>> the query plan is
>>
>>
>> QUERY
>> PLAN
>>
>>
>> ----------------------------------------------------------------------------------------------------------------
>> Hash Join (cost=137324.20..247200.77 rows=1 width=31) (actual
>> rows=1136175 loops=1)
>> Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
>> -> Seq Scan on table2 b (cost=0.00..49694.75 rows=1136175 width=31)
>> (actual rows=1136175 loops
>> =1)
>> -> Hash (cost=89610.68..89610.68 rows=2287368 width=24) (actual
>> rows=1143684 loops=1)
>> Buckets: 65536 Batches: 4 Memory Usage: 15699kB
>> -> Seq Scan on table1 a (cost=0.00..89610.68 rows=2287368
>> width=24) (actual rows=1143684 loops=1
>> )
>> Total runtime: 5055.118 ms
>> (7 rows)
>>
>> My question is
>> Why Have I a rows=1 in the explain and rows=1136175 in the explain
>> analyze?
>>
>
> Yes, that seems quite strange.
>

On further thought, that is not strange at all. You have two very
selective join conditions, and the planner assumes they are independent, so
that it can multiply the selectivities. But in reality they are completely
(or almost completely) dependent. If the planner knew about cross column
correlations, that might not even help as you can have complete statistical
dependence without having correlation.

It seems unlikely to me that the timestamp belongs in both tables, since
it's value seems to be completely dependent on the value of the UUID.

In any event, it is unlikely the planner would pick a different plan were
it to correctly understand the selectivities, so no harm is done. Although
it is easy to imagine similar queries where that would not be the case.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2013-06-10 02:43:20 Re: Sample database with difficult SQL questions
Previous Message Markus Renner 2013-06-09 21:15:17 multilib environment with postgresql92 from CentOS-repos? (9.2.4)