From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | dan(at)sidhe(dot)org |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Trying to track down weird query stalls |
Date: | 2009-03-30 19:20:15 |
Message-ID: | 603c8f070903301220i26469be1mcbb6ab758fe78bb8@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Mar 30, 2009 at 2:42 PM, <dan(at)sidhe(dot)org> wrote:
>> On Mon, Mar 30, 2009 at 1:50 PM, <dan(at)sidhe(dot)org> wrote:
>>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>>> really
>>> strange, annoying transient problem with one particular query stalling.
>>>
>>> The symptom here is that when this query is made with X or more records
>>> in
>>> a temp table involved in the join (where X is constant when the problem
>>> manifests, but is different between manifestations) the query takes
>>> about
>>> 20 minutes. When the query is made with X-1 records it takes less than a
>>> second. It's just this one query -- for everything else the system's
>>> nice
>>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>>> hasn't had less than 60G of free system memory on it.
>>>
>>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>>> more bizarre. Not only are the two query plans identical (save trivial
>>> differences because of the record count differences) but the explain
>>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>>> showed
>>> 259ms, the slow case 265ms.
>>
>> log_min_duration_statement is a good place to start, but it sounds
>> like the query plan you're getting when you test it by hand isn't the
>> same one that's actually executing, so I'm not sure how far that's
>> going to get you. contrib/auto_explain sounds like it would be just
>> the thing, but unfortunately that's an 8.4 feature which hasn't been
>> released yet. I'm not sure whether it could be built and run against
>> 8.3.
>
> I'm not executing any of the EXPLAINs by hand, because I didn't want to
> have to worry about typos or filling in temp tables with test data. Inside
> the app the SQL for the problematic query's stored in a variable -- when
> the task runs with debugging enabled it first executes the query with
> EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
> itself. It's possible something's going wrong in that, but the code's
> pretty simple.
>
> Arguably in this case the actual query should run faster than the EXPLAIN
> ANALYZE version, since the cache is hot. (Though that'd only likely shave
> a few dozen ms off the runtime)
Well... yeah. Also EXPLAIN ANALYZE has a non-trivial amount of
overhead, so that is quite bizarre. I have to suspect there is some
subtle difference between the way the EXPLAIN ANALYZE is done and the
way the actual query is done... like maybe one uses parameter
substitution and the other doesn't or, well, I don't know. But I
don't see how turning on debugging (which is essentially what EXPLAIN
ANALYZE is) can prevent the query from being slow.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | dan | 2009-03-30 19:35:10 | Re: Trying to track down weird query stalls |
Previous Message | dan | 2009-03-30 18:42:13 | Re: Trying to track down weird query stalls |