From: | Віталій Тимчишин <tivv00(at)gmail(dot)com> |
---|---|
To: | Yeb Havinga <yebhavinga(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Planner not using column limit specified for one column for another column equal to first |
Date: | 2010-04-16 12:49:45 |
Message-ID: | k2u331e40661004160549td6bf86d5pb6fde70e6d1ab5d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
16 квітня 2010 р. 11:31 Yeb Havinga <yebhavinga(at)gmail(dot)com> написав:
> Віталій Тимчишин wrote:
>
>> Hello.
>>
>> I have a query that performs very poor because there is a limit on join
>> column that is not applied to other columns:
>>
>> select * from company this_ left outer join company_tag this_1_ on
>> this_.id=this_1_.company_id left outer join company_measures companymea2_ on
>> this_.id=companymea2_.company_id left outer join company_descr ces3_ on
>> this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and
>> this_1_.company_id>50000000
>> order by this_.id asc limit 1000;
>>
>> (plan1.txt)
>> Total runtime: 7794.692 ms
>>
>> At the same time if I apply the limit (>50000000) to other columns in
>> query itself it works like a charm:
>>
>> select * from company this_ left outer join company_tag this_1_ on
>> this_.id=this_1_.company_id left outer join company_measures companymea2_ on
>> this_.id=companymea2_.company_id left outer join company_descr ces3_ on
>> this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and
>> this_1_.company_id>50000000
>> and companymea2_.company_id>50000000 and ces3_.company_id>50000000
>> order by this_.id asc limit 1000;
>>
>> (plan2.txt)
>> Total runtime: 27.547 ms
>>
>> I've thought and someone in this list've told me that this should be done
>> automatically.
>>
> Yes, if you have in a query a=b and b=c, then the optimizer figures out
> that a=c as well. (a,b and c are then member of the same equivalence class).
>
> However both queries are not the same, since the joins you're using are
> outer joins. In the first it's possible that records are returned for
> company records with no matching ces3_ records, the ces3_ records is null in
> that case. In the second query no NULL ces3_ information may be returned.
>
OK, but when I move limit to join condition the query is still fast:
select * from company this_ left outer join company_tag this_1_ on
this_.id=this_1_.company_id
left outer join company_measures companymea2_ on
this_.id=companymea2_.company_id and companymea2_.company_id>50000000
left outer join company_descr ces3_ on this_.id=ces3_.company_id and
ces3_.company_id>50000000
where this_1_.tag_id = 7 and this_.id>50000000
and this_1_.company_id>50000000
order by this_.id asc limit 1000;
(plan3.txt),
Total runtime: 26.327 ms
BTW: Changing slow query to inner joins do not make it fast
>
> Another thing is it seems that the number of rows guessed is far off from
> the actual number of rows, is the number 5000000 artificial or are you're
> statistics old or too small histogram/mcv's?
>
Nope, I suppose this is because of limit. If I remove the limit, the
estimations are quite correct. There are ~6 millions of row in each table.
Attachment | Content-Type | Size |
---|---|---|
plan3.txt | text/plain | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Віталій Тимчишин | 2010-04-16 12:59:50 | Re: Planner not using column limit specified for one column for another column equal to first |
Previous Message | Yeb Havinga | 2010-04-16 08:31:06 | Re: Planner not using column limit specified for one column for another column equal to first |