Re: Vacuuming strategy

From: Elanchezhiyan Elango <elanelango(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Vacuuming strategy
Date: 2014-04-30 17:49:27
Message-ID: CALqA5kiJGPJ_8iXjwjM0g95rmq2cbO3cHS0_g5gyuTh+Sv1r3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
>
> Each table has 4 indices. The updates are to the indexed columns.
>
> Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables
> have this same schema except that some tables don't have a port column and
> so will have one less index

What indexes exist? Are the updates to indexed columns?

Sorry I was wrong when I said the updates are to the indexed columns. The
updates are to the 'data[]' column which doesn't have any index.

On Wed, Apr 30, 2014 at 10:45 AM, Elanchezhiyan Elango <elanelango(at)gmail(dot)com
> wrote:

> Missed to answer this one:
>
>> Is the 5GB for the table plus indexes, or just the table itself?
>
> No it's not including the the indices. Including indices it's actually
> 17GB!!
>
>
> On Wed, Apr 30, 2014 at 10:40 AM, Elanchezhiyan Elango <
> elanelango(at)gmail(dot)com> wrote:
>
>> Sergey,
>> Thanks for the aggressive settings. I have listed some settings I am
>> planning to try below. Please review and let me know your feedback.
>>
>> Francisco,
>> Thanks for the partitioning idea. I used to have the tables partitioned.
>> But now that I have moved to a schema where data is split across about ~90
>> tables I have moved away from partitioning. But it's something I have to
>> reconsider at least for the high traffic tables.
>>
>> What indexes exist? Are the updates to indexed columns?
>>
>> Each table has 4 indices. The updates are to the indexed columns.
>> Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables
>> have this same schema except that some tables don't have a port column and
>> so will have one less index.
>>
>> Which day's tuples are deleted every night? Is it the day a week ago, or
>>> a month ago, or something else?
>>
>> Tables1, Tables2: Anything older than 1 month is deleted.
>> Tables3, Tables4: Anything older than 3 months is deleted.
>>
>> If only 50,000 get inserted daily and 1.2M get deleted, this table will
>>> soon be empty! I think you said daily when you meant hourly somewhere in
>>> there.
>>
>> Sorry that was wrong.
>> For Tables3 it should have been 50000 rows (not 1.2M) getting deleted
>> daily.
>> And for Tables4 it should have been 2000 rows (not 48000) getting deleted
>> daily.
>>
>> Why do you have a 4 minute timeout? That seems counter-productive.
>>
>> Oh, Is it less or more?
>>
>> You delete a bunch of tuples every night, so of course a vacuum full
>>> after that is going to return a lot of space. But that space is probably
>>> just going to be needed again the next day. If you don't do the vacuum
>>> full, does the *peak* space keep increasing, or does it stabilize?
>>
>> I haven't tested to see if the space keeps on increasing.
>>
>> I did pgstattupe() on one of the tables:
>>
>> managed_target_stats=# select * from pgstattuple('xyz');
>>
>> table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count
>> | dead_tuple_len | dead_tuple_percent | free_space | free_percent
>>
>>
>> ------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
>>
>> 5642526720 | 18880283 | 4042874387 | 71.65 |
>> 122037 | 21495560 | 0.38 | 1239598044 | 21.97
>>
>> This is one of Tables1 tables and this is after running for a week or so
>> with the default autovacuum settings. The dead_tuple_percent look good. But
>> the free_percent looks high. Is this normal? Also when I enabled autovacuum
>> logs, I saw the autovacuums triggering very rarely. May be that's the
>> reason for too much free space? I am going to try with the following
>> settings:
>>
>> log_autovacuum_min_duration = 0
>>
>> autovacuum_vacuum_scale_factor = 0
>>
>> autovacuum_vacuum_threshold = 40000
>>
>> autovacuum_vacuum_cost_delay = 10ms
>>
>> autovacuum_max_workers = 5
>>
>> autovacuum_analyze_scale_factor = 0
>>
>> autovacuum_analyze_threshold = 40000
>>
>> Do these settings look ok? I am setting autovacuum_vacuum_scale_factor
>> and autovacuum_analyze_threshold to 0 and the thresholds to a constant
>> 40000. My reasoning is updates happen in a burst every 5 minutes and the
>> upper bound of updates is 50000 every 5 minutes. So I have just harcoded a
>> threshold of 40000. Also I am leaving autovacuum_naptime at the default of
>> 1 min because updates anyways happen only every 5 minutes at the max. So I
>> don't see a point in running autovacuum more frequently than 1min. Correct
>> me if my settings look wrong.
>>
>> Thanks,
>>
>> Elan.
>>
>>
>> On Wed, Apr 30, 2014 at 8:36 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>>> On Tue, Apr 29, 2014 at 4:59 PM, Elanchezhiyan Elango <
>>> elanelango(at)gmail(dot)com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I need help on deciding my vacuuming strategy. I need to know if I ever
>>>> need to do 'vacuum full' for my tables.
>>>>
>>>> Tables1: Following is the query patterns on 4 high traffic table in my
>>>> database:
>>>> 1. Every 5 minutes about 50000 rows in the table are updated. And for a
>>>> given clock hour the same 50000 rows are updated again and again.
>>>> 2. When a new hour begins, another 50000 rows get inserted and they get
>>>> updated every 5 minutes.
>>>> 3. Every night a days worth of old stats are deleted. So this would be
>>>> 24 * 50000 = 1.2M records that get deleted every night.
>>>>
>>>
>>> What indexes exist? Are the updates to indexed columns? Which day's
>>> tuples are deleted every night? Is it the day a week ago, or a month ago,
>>> or something else?
>>>
>>>
>>>
>>>> Tables3: Another 4 tables have the following pattern:
>>>> 1. Every 1 hour 50000 rows get updated. For a given day the same 50000
>>>> rows are updated again and again.
>>>> 2. When a new day begins, another 50000 rows get inserted and they get
>>>> updated every hour.
>>>> 3. Every night 1.2M records get deleted.
>>>>
>>>
>>> If only 50,000 get inserted daily and 1.2M get deleted, this table will
>>> soon be empty! I think you said daily when you meant hourly somewhere in
>>> there.
>>>
>>>
>>>>
>>>> Tables4: Another 39 tables have the following pattern:
>>>> 1. Every 1 hour 2000 rows gets updated. For a given day the same 2000
>>>> rows are updated again and again.
>>>> 2. When a new day begins, another 2000 rows get inserted and they get
>>>> updated every hour.
>>>> 3. Every night 48000 rows get deleted.
>>>>
>>>> With the above query pattern with intensive updates and deletes, I need
>>>> to do some aggressive vacuuming.
>>>>
>>>> Current strategy:I am running with default autovacuum settings
>>>> (postgres 9.1.9) and I tried doing a 'vacuum full' for the 8 high traffic
>>>> tables (Tables1 and Tables3) every night. But after a point, the 'vacuum
>>>> full's started timing out (with 4min timeout) every night.
>>>>
>>>
>>> Why do you have a 4 minute timeout? That seems counter-productive.
>>>
>>>
>>>
>>>> I think this is because the table is growing bigger (~5GB) and doing a
>>>> vacuum full every night is probably not feasible.
>>>>
>>>
>>> It is probably not necessary, but it certainly seems feasible. 4 min *
>>> 8 tables = 32 minutes. Call it one hour, since the 4 minute timeout has
>>> started not being enough. Is the 5GB for the table plus indexes, or just
>>> the table itself?
>>>
>>>
>>>>
>>>> Going with the default autovacuum settings and not doing 'vacuum full'
>>>> at all is also not enough for my usecase. Whenever vacuum full succeeded
>>>> every night, it did seem to reclaim a considerable amount of space.
>>>>
>>>
>>> You delete a bunch of tuples every night, so of course a vacuum full
>>> after that is going to return a lot of space. But that space is probably
>>> just going to be needed again the next day. If you don't do the vacuum
>>> full, does the *peak* space keep increasing, or does it stabilize?
>>>
>>> Cheers,
>>>
>>> Jeff
>>>
>>
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Torsten Förtsch 2014-04-30 17:52:01 arrays of rows and dblink
Previous Message Elanchezhiyan Elango 2014-04-30 17:45:43 Re: Vacuuming strategy