Re: Performance regression from 8.3.7 to 9.0.3

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance regression from 8.3.7 to 9.0.3
Date: 2011-03-14 17:59:59
Message-ID: AANLkTimCRMGyOU3Oq6RnonGY5Zib-gqDSSmp+E-XmH+6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Mar 14, 2011 at 12:46 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Mon, Mar 14, 2011 at 2:34 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> changes to:
>>
>> SELECT member_statistics.member_id
>>         FROM member_statistics
>>         WHERE EXISTS
>>         (
>>           SELECT mat1.tag_id
>>           FROM member_all_tags_v AS mat1
>>           WHERE mat1.member_id = member_statistics.member_id
>>             AND mat1.tag_id
>>             IN (640, 641, 3637, 3638, 637, 638, 639) AND
>> mat1.polarity >= 90
>>             AND mat1.member_id  IN ( <<400 ids>> )
>>         )
>
> It isn't easy to get the ORM to spit that kind of queries, but I could
> try them by hand.
>
>> also, always try to compare vs straight join version:
>>
>>
>> SELECT member_statistics.member_id
>>         FROM member_statistics
>>         JOIN VALUES ( <<400 ids>> ) q(member_id) using (member_id)
>>         JOIN
>>         (
>>           SELECT mat1.member_id
>>           FROM member_all_tags_v AS mat1
>>           WHERE mat1.tag_id  IN (640, 641, 3637, 3638, 637, 638, 639)
>>             AND mat1.polarity >= 90) p
>>            USING(member_id)
>>          ) p using(member_id);
>>
>> merlin
>
> The straight join like that was used long ago, but it replicates rows
> unacceptably: for each row in the subquery, one copy of member_id is
> output, which create an unacceptable overhead in the application and
> network side. It could be perhaps fixed with distinct, but then
> there's sorting overhead.

ah -- right. my mistake. well, you could always work around with
'distinct', although the exists version should be better (semi vs full
join). what options *do* you have in terms of coaxing the ORM to
produce particular sql? :-). This is likely 100% work-aroundable via
tweaking the SQL. I don't have the expertise to suggest a solution
with your exact sql, if there is one.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-03-14 18:50:36 Re: Performance regression from 8.3.7 to 9.0.3
Previous Message Merlin Moncure 2011-03-14 17:34:31 Re: Performance regression from 8.3.7 to 9.0.3