Re: Troubleshooting query performance issues

From: bricklen <bricklen(at)gmail(dot)com>
To: Jim Garrison <jim(dot)garrison(at)nwea(dot)org>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Troubleshooting query performance issues
Date: 2013-09-25 16:30:30
Message-ID: CAGrpgQ-V5+4CA3ag3g12onNjBjaPVX3Z2Edi8ycecegKjRqxSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Sep 25, 2013 at 8:58 AM, Jim Garrison <jim(dot)garrison(at)nwea(dot)org> wrote:

> I spent about a week optimizing a query in our performance-testing
> environment, which has hardware similar to production.
>
> I was able to refactor the query and reduce the runtime from hours to
> about 40 seconds, through the use of CTEs and a couple of new indexes.
>
> The database was rebuilt and refreshed with the very similar data from
> production, but now the query takes hours again.
>
> In the query plan, it is clear that the row count estimates are WAY too
> low, even though the statistics are up to date. Here's a sample query plan:
>
> CTE Scan on stef (cost=164.98..165.00 rows=1 width=38)
> CTE terms
> -> Nested Loop (cost=0.00..62.40 rows=1 width=12)
> -> Index Scan using term_idx1 on term t (cost=0.00..52.35
> rows=1 width=12)
> Index Cond: (partner_id = 497)
> Filter: (recalculate_district_averages_yn AND (NOT
> is_deleted_yn))
> -> Index Scan using growth_measurement_window_fk1 on
> growth_measurement_window gw (cost=0.00..10.04 rows=1 width=4)
> Index Cond: (term_id = t.term_id)
> Filter: (test_window_complete_yn AND (NOT is_deleted_yn)
> AND ((growth_window_type)::text = 'DISTRICT'::text))
> CTE stef
> -> Nested Loop (cost=0.00..102.58 rows=1 width=29)
> Join Filter: ((ssef.student_id = terf.student_id) AND
> (ssef.grade_id = terf.grade_id))
> -> Nested Loop (cost=0.00..18.80 rows=3 width=28)
> -> CTE Scan on terms t (cost=0.00..0.02 rows=1 width=8)
> -> Index Scan using student_school_enrollment_fact_idx2
> on student_school_enrollment_fact ssef (cost=0.00..18.74 rows=3 width=20)
> Index Cond: ((partner_id = t.partner_id) AND
> (term_id = t.term_id))
> Filter: primary_yn
> -> Index Scan using test_event_result_fact_idx3 on
> test_event_result_fact terf (cost=0.00..27.85 rows=4 width=25)
> Index Cond: ((partner_id = t.partner_id) AND (term_id =
> t.term_id))
> Filter: growth_event_yn
>
> The estimates in the first CTE are correct, but in the second, the scan on
> student_school_enrollment_fact will return about 1.5 million rows, and the
> scan on test_event_result_fact actually returns about 1.1 million. The top
> level join should return about 900K rows. I believe the fundamental issue
> is that the CTE stef outer nested loop should be a merge join instead, but
> I cannot figure out why the optimizer is estimating one row when it has the
> statistics to correctly estimate the count.
>
> What would cause PG to so badly estimate the row counts?
>
> I've already regenerated the indexes and re-analyzed the tables involved.
>
> What else can I do to find out why it's running so slowly?
>
>
More details about the environment would probably be helpful:
https://wiki.postgresql.org/wiki/Slow_Query_Questions
Are you able to swap out the CTE for a temp table and index that (+analyze)
to compare against the CTE version?

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Sam Wong 2013-09-25 16:33:49 Re: Slow plan for MAX/MIN or LIMIT 1?
Previous Message Merlin Moncure 2013-09-25 16:25:26 Re: earthdistance query performance