Re: Estimation row error

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Mathieu VINCENT <mathieu(dot)vincent(at)pmsipilot(dot)com>
Cc: Matteo Grolla <matteo(dot)grolla(at)gmail(dot)com>, Gunnar Nick Bluth <gunnar(dot)bluth(dot)extern(at)elster(dot)de>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Estimation row error
Date: 2015-12-18 15:33:11
Message-ID: CAFj8pRBv0Jhukw_z1s8TZUa8dSGDx3S9BDLb0w6uf_Ecxy5Rtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

2015-12-18 16:21 GMT+01:00 Mathieu VINCENT <mathieu(dot)vincent(at)pmsipilot(dot)com>:

> Hello,
>
> No one to help me to understand this bad estimation rows ?
> It's *NOT* caused by :
>
> - correlation between columns (cross-correlation)
> - bad statistics (i tried with default_statistics_target to 10 000)
> - bad number of distinct values
> - complexe join conditions
>
> I have no more ideas.
>

PostgreSQL has not cross tables statistics - so expect uniform distribution
of foreign keys. This expectation is broken in your example.

You can find some prototype solutions by Tomas Vondra in hackars mailing
list.

Regards

Pavel

>
> thank you for your help.
> Mathieu VINCENT
>
> 2015-12-17 11:58 GMT+01:00 Mathieu VINCENT <mathieu(dot)vincent(at)pmsipilot(dot)com>
> :
>
>> Adding foreign key between on t2 and t3, does not change the plan.
>>
>> drop table if exists t1;
>> drop table if exists t2;
>> drop table if exists t3;
>>
>> create table t1 as select generate_Series(1,200000) as c1;
>> create table t2 as select generate_Series(1,200000)%100+1 as c1;
>> create table t3 as select generate_Series(1,1500)%750+1 as c1;
>>
>> alter table t1 add PRIMARY KEY (c1);
>> create index on t2 (c1);
>> create index on t3 (c1);
>> ALTER TABLE t2 ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);
>> ALTER TABLE t3 ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);
>>
>> analyze verbose t1;
>> analyze verbose t2;
>> analyze verbose t3;
>>
>> EXPLAIN (analyze on, buffers on, verbose on)
>> select
>> *
>> from
>> t1 t1
>> inner join t2 on t1.c1=t2.c1
>> inner join t3 on t1.c1=t3.c1
>>
>> Cordialement,
>> <http://www.psih.fr/>PSIH Décisionnel en santé
>> Mathieu VINCENT
>> Data Analyst
>> PMSIpilot - 61 rue Sully - 69006 Lyon - France
>>
>> 2015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu(dot)vincent(at)pmsipilot(dot)com
>> >:
>>
>>> Here, another issue with row estimate.
>>> And, in this example, there is not correlation beetween columns in a
>>> same table.
>>>
>>> drop table if exists t1;
>>> drop table if exists t2;
>>> drop table if exists t3;
>>>
>>> create table t1 as select generate_Series(1,200000) as c1;
>>> create table t2 as select generate_Series(1,200000)%100 as c1;
>>> create table t3 as select generate_Series(1,1500)%750 as c1;
>>>
>>> alter table t1 add PRIMARY KEY (c1);
>>> create index on t2 (c1);
>>> create index on t3 (c1);
>>>
>>> analyze verbose t1;
>>> analyze verbose t2;
>>> analyze verbose t3;
>>>
>>> EXPLAIN (analyze on, buffers on, verbose on)
>>> select
>>> *
>>> from
>>> t1 t1
>>> inner join t2 on t1.c1=t2.c1
>>> inner join t3 on t2.c1=t3.c1
>>> the explain plan : http://explain.depesz.com/s/YVw
>>> Do you understand how postgresql calculate the row estimate ?
>>>
>>> BR
>>> Mathieu VINCENT
>>>
>>> 2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo(dot)grolla(at)gmail(dot)com>:
>>>
>>>> Thank you both for the help!
>>>> happy holidays
>>>>
>>>> 2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <
>>>> mathieu(dot)vincent(at)pmsipilot(dot)com>:
>>>>
>>>>> thks Gunnar,
>>>>>
>>>>> I removed the correlation between t3.c1 and t3.c2 in this sql script :
>>>>>
>>>>> drop table if exists t1;
>>>>> drop table if exists t2;
>>>>> drop table if exists t3;
>>>>> drop table if exists t4;
>>>>>
>>>>> create table t1 as select generate_Series(1,300000) as c1;
>>>>> create table t2 as select generate_Series(1,400) as c1;
>>>>> create table t3 as select floor(random()*100+1) as c1, c2 from
>>>>> generate_Series(1,200000) c2;
>>>>> create table t4 as select generate_Series(1,200000) as c1;
>>>>>
>>>>> alter table t1 add PRIMARY KEY (c1);
>>>>> alter table t2 add PRIMARY KEY (c1);
>>>>> alter table t3 add PRIMARY KEY (c1,c2);
>>>>> create index on t3 (c1);
>>>>> create index on t3 (c2);
>>>>> alter table t4 add PRIMARY KEY (c1);
>>>>>
>>>>> analyze verbose t1;
>>>>> analyze verbose t2;
>>>>> analyze verbose t3;
>>>>> analyze verbose t4;
>>>>>
>>>>> EXPLAIN (analyze on, buffers on, verbose on)
>>>>> select
>>>>> *
>>>>> from
>>>>> t1 t1
>>>>> inner join t2 on t1.c1=t2.c1
>>>>> inner join t3 on t2.c1=t3.c1
>>>>> inner join t4 on t3.c2=t4.c1
>>>>>
>>>>> Now, the estimate is good : http://explain.depesz.com/s/gCX
>>>>>
>>>>> Have a good day
>>>>>
>>>>> Mathieu VINCENT
>>>>>
>>>>> 2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <
>>>>> gunnar(dot)bluth(dot)extern(at)elster(dot)de>:
>>>>>
>>>>>> Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
>>>>>> > Gunnar Nick Bluth <gunnar(dot)bluth(dot)extern(at)elster(dot)de> wrote:
>>>>>> >
>>>>>> >> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> No one to help me to understand this bad estimation rows ?
>>>>>> >>
>>>>>> >> Well,
>>>>>> >>
>>>>>> >> on a rather beefy machine, I'm getting quite a different plan:
>>>>>> >> http://explain.depesz.com/s/3y5r
>>>>>> >
>>>>>> > you are using 9.5, right? Got the same plan with 9.5.
>>>>>>
>>>>>> Nope...:
>>>>>> version
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------------------------------------
>>>>>> PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
>>>>>> (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
>>>>>>
>>>>>> So much for those correlation improvements then ;-/
>>>>>>
>>>>>>
>>>>>> > Btw.: Hi Gunnar ;-)
>>>>>>
>>>>>> Hi :)
>>>>>>
>>>>>> --
>>>>>> Gunnar "Nick" Bluth
>>>>>> DBA ELSTER
>>>>>>
>>>>>> Tel: +49 911/991-4665
>>>>>> Mobil: +49 172/8853339
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent via pgsql-performance mailing list (
>>>>>> pgsql-performance(at)postgresql(dot)org)
>>>>>> To make changes to your subscription:
>>>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Nasby 2015-12-20 23:08:04 Re: Can't explain db size
Previous Message Mathieu VINCENT 2015-12-18 15:21:04 Re: Estimation row error