From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Mike Charnoky <noky(at)nextbus(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: intermittant performance problem |
Date: | 2009-03-10 03:35:56 |
Message-ID: | dcc563d10903092035gf777ef4y893c2ba20fe8e9c8@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky(at)nextbus(dot)com> wrote:
> The random sampling query is normally pretty snappy. It usually takes on
> the order of 1 second to sample a few thousand rows of data out of a few
> million. The sampling is consistently quick, too. However, on some days,
> the sampling starts off quick, then when the process starts sampling from a
> different subset of data (different range of times for the same day), the
> sampling query takes a couple minutes.
Then definitely look at saving explain plans before execution to
compare fast to slow runs. This definitely sounds like ocassionally
bad query plans to me so far.
> Regarding the concurrent vacuuming, this is definitely not happening. I
> always check pg_stat_activity whenever the sampling process starts to lag
> behind. I have never seen a vacuum running during this time.
And if autovac is getting in the ways, try adjusting the various
autovac options. spefically autovacuum_vacuum_cost_delay set to 10 or
20 (mS).
>
> Interesting idea to issue the EXPLAIN first... I will see if I can
> instrument the sampling program to do this.
>
> Thanks for your help Tom.
>
>
> Mike
>
> Tom Lane wrote:
>>
>> Mike Charnoky <noky(at)nextbus(dot)com> writes:
>>>
>>> The sampling query which runs really slow on some days looks something
>>> like this:
>>
>>> INSERT INTO sampled_data
>>> (item_name, timestmp, ... )
>>> SELECT item_name, timestmp, ... )
>>> FROM raw_data
>>> WHERE timestmp >= ? and timestmp < ?
>>> AND item_name=?
>>> AND some_data_field NOTNULL
>>> ORDER BY random()
>>> LIMIT ?;
>>
>> Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
>> There's no good way to optimize "ORDER BY random()". However, it seems
>> like the first thing you should do is modify the program so that it
>> issues an EXPLAIN for that right before actually doing the query, and
>> then you could see if the plan is different on the slow days.
>>
>>> We have done a great deal of PG tuning, including the autovacuum for the
>>> "raw_data" table. Autovacuum kicks like clockwork every day on that
>>> table after the sampling process finishes (after one day's worth of data
>>> is deleted from "raw_data" table, a roughly 7% change in size).
>>
>> Also, are you sure you have ruled out the possibility that the problem
>> comes from autovac kicking in *while* the update is running?
>>
>> regards, tom lane
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
--
When fascism comes to America, it will be the intolerant selling it as
diversity.
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Trutwin | 2009-03-10 04:23:02 | 8.3.6 build error on Debian Lenny |
Previous Message | Tom Lane | 2009-03-10 03:19:58 | Re: C++ User-defined functions |