From: | Karl Wright <kwright(at)metacarta(dot)com> |
---|---|
To: | Bill Moran <wmoran(at)collaborativefusion(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance query about large tables, lots of concurrent access |
Date: | 2007-06-19 14:17:07 |
Message-ID: | 4677E563.5070900@metacarta.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Bill Moran wrote:
> In response to Karl Wright <kwright(at)metacarta(dot)com>:
>
>> Alvaro Herrera wrote:
>>> Karl Wright wrote:
>>>> Alvaro Herrera wrote:
>>>>> Karl Wright wrote:
>>>>>
>>>>>> This particular run lasted four days before a VACUUM became essential.
>>>>>> The symptom that indicates that VACUUM is needed seems to be that the
>>>>>> CPU usage of any given postgresql query skyrockets. Is this essentially
>>>>>> correct?
>>>>> Are you saying you weren't used to run VACUUM all the time? If so,
>>>>> that's where the problem lies.
>>>> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
>>>> even every 24 hours caused multiple instances of VACUUM to eventually be
>>>> running in my case. So I tried to find a VACUUM schedule that permitted
>>>> each individual vacuum to finish before the next one started. A vacuum
>>>> seemed to require 4-5 days with this particular database - or at least
>>>> it did for 7.4. So I had the VACUUM schedule set to run every six days.
>>> How large is the database? I must admit I have never seen a database
>>> that took 4 days to vacuum. This could mean that your database is
>>> humongous, or that the vacuum strategy is wrong for some reason.
>> The database is humongus, and the machine is under intense load. On the
>> instance where this long vacuum occurred, there were several large
>> tables - one with 7,000,000 rows, one with 14,000,000, one with
>> 140,000,000, and one with 250,000,000.
>
> Don't rule out the possibility that the only way to fix this _might_ be to
> throw more hardware at it. Proper configuration can buy you a lot, but if
> your usage is exceeding the available bandwidth of the IO subsystem, the
> only way you're going to get better performance is to put in a faster IO
> subsystem.
>
>>> You know that you can run vacuum on particular tables, right? It would
>>> be probably a good idea to run vacuum on the most updated tables, and
>>> leave alone those that are not or little updated (hopefully the biggest;
>>> this would mean that an almost-complete vacuum run would take much less
>>> than a whole day).
>> Yeah, sorry, that doesn't apply here.
>
> Why not? I see no reason why an appropriate autovaccum schedule would not
> apply to your scenario. I'm not saying it does, only that your response
> does not indicate that it doesn't, and thus I'm concerned that you're
> writing autovacuum off without proper research.
>
I'm not writing off autovacuum - just the concept that the large tables
aren't the ones that are changing. Unfortunately, they *are* the most
dynamically updated.
Karl
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2007-06-19 14:19:34 | Re: Performance query about large tables, lots of concurrent access |
Previous Message | Bill Moran | 2007-06-19 14:15:04 | Re: Performance query about large tables, lots of concurrent access |