Re: Inaccurate Explain Cost

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Robert Sosinski <rsosinski(at)ticketevolution(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>, Spike Grobstein <spike(at)ticketevolution(dot)com>
Subject: Re: Inaccurate Explain Cost
Date: 2012-09-26 20:03:03
Message-ID: 50635F77.60008@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On 09/26/2012 01:38 PM, Robert Sosinski wrote:

> I seem to be getting an inaccurate cost from explain. Here are two
> examples for one query with two different query plans:

Well, there's this:

Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual
time=163.275..186869.844 rows=43904 loops=1)

If anything's a smoking gun, that is. I could see why you'd want to turn
off nested loops to get better execution time. But the question is: why
did it think it would match so few rows in the first place? The planner
probably would have thrown away this query plan had it known it would
loop 20x more than it thought.

I think we need to know what your default_statistics_target is set at,
and really... all of your relevant postgresql settings.

Please see this:

http://wiki.postgresql.org/wiki/Slow_Query_Questions

But you also may need to look a lot more into your query itself. The
difference between a 2 or a 3 minute query isn't going to help you
much. Over here, we tend to spend more of our time turning 2 or 3 minute
queries into 20 or 30ms queries. But judging by your date range, getting
the last 2-months of data from a table that large generally won't be
fast by any means.

That said, looking at your actual query:

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
LEFT JOIN uploads u ON u.id = eu.upload_id
LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
LEFT JOIN exchanges e ON e.id = ud.exchange_id
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Doesn't need half of these joins. They're left joins, and never used in
the query results or where criteria. You could just use this:

SELECT COUNT(DISTINCT eu.id)
FROM exchange_uploads eu
JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
AND ud.office_id = 6;

Though I presume this is just a count precursor to a query that fetches
the actul results and does need the left join. Either way, the index
scan from your second example matches 3.3M rows by using the created_at
index on exchange_uploads. That's not really very restrictive, and so
you have two problems:

1. Your nested loop stats from office_id are somehow wrong. Try
increasing your stats on that column, or just default_statistics_target
in general, and re-analyze.
2. Your created_at criteria above match way too many rows, and will also
take a long time to process.

Those are your two actual problems. We can probably get your query to
run faster, but those are pretty significant hurdles.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Edson Richter 2012-09-26 20:20:01 Re: [PERFORM] Inaccurate Explain Cost
Previous Message Jim Wilson 2012-09-26 19:55:56 Odd Invalid type name error in postgresql 9.1

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2012-09-26 20:11:49 Re: Guide to Posting Slow Query Questions
Previous Message Robert Sosinski 2012-09-26 18:38:09 Inaccurate Explain Cost