From: | Prakash Itnal <prakash074(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, rasna(dot)t(at)nokia(dot)com, sandhya(dot)k_s(at)nokia(dot)com |
Subject: | Re: Auto-vacuum is not running in 9.1.12 |
Date: | 2015-06-20 13:32:25 |
Message-ID: | CAHC5u79X9z5v3fVDHeTwaAm_qBKx_fRvWKG7miw7yBiVhGFTxw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Sorry for the late response. The current patch only fixes the scenario-1
listed below. It will not address the scenario-2. Also we need a fix in
unix_latch.c where the remaining sleep time is evaluated, if latch is woken
by other events (or result=0). Here to it is possible the latch might go in
long sleep if time shifts to past time.
*Scenario-1:* current_time (2015) -> changed_to_past (1995) ->
stays-here-for-half-day -> corrected to current_time (2015)
*Scenario-2:* current_time (2015) -> changed_to_future (2020) ->
stays-here-for-half-day -> corrected to current_time (2015)
*Results: *
Scenario-1: Auto-vacuuming not done from the time system time changed to
1995 until it is corrected to current time. In current context half-day.
Scenario-2: Auto-vacuuming keeps running if system time shifts to future.
However after correcting time back to current time (from 2020->2015), the
auto-vacuuming goes into 5 year sleep. Though current patch fixes waking up
from sleep it will not allow to launch auto-vacuum worker as the dblist
still holds previously set time i.e. 2020.
*Proposed Fixes:*
*autovacuum.c:* I will rebuild_database_list if time shift is detected. The
time-shift is detected if sleep time evaluated is zero or greater than
autovacuum_naptime. Currently the list is rebuilt only if time shifts to
future. I added a check to rebuild it if sleep time is greater than
autovacuum_naptime. Secondly I included the patch from Alvaro and changed
the default 300 seconds value to autovacuum_naptime. This will avoid
multiple wakeups if autovacuum_naptime is set to greater than 300 seconds.
*unix_latch.c:* Current implementation evaluates the remaining sleep time
using "cur_timeout = timeout - (start_time - cur_time)". If the time is
shifted back to past then cur_timeout will be evaluated to long time (for
eg. start_time=2015 and cur_time=1995 then cur_timeout=timeout - (-20
years) = timeout + 20years). To avoid this wrong calculation I added a
check and treat it as timeout.
With above mentioned fixes the auto-vacuuming will be robust enough to
handle any system time changes. We tested the scenarios in our setup and
they seem to work fine. I hope these are valid fixes and they do not affect
any other flows.
Please review and share your review comments/suggestions.
PS: In our product database is used in update-heavy mode with limited disc
space. So we need to be robust to handle such time changes to avoid any
system failures due to disc full.
On Fri, Jun 19, 2015 at 10:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-06-17 18:10:42 -0300, Alvaro Herrera wrote:
> > Yeah, the case is pretty weird and I'm not really sure that the server
> > ought to be expected to behave. But if this is actually the only part
> > of the server that misbehaves because of sudden gigantic time jumps, I
> > think it's fair to patch it. Here's a proposed patch.
>
> We probably should go through the server and look at the various sleeps
> and make sure thy all have a upper limit. I doubt this is the only
> location without one.
>
> Greetings,
>
> Andres Freund
>
--
Cheers,
Prakash
Attachment | Content-Type | Size |
---|---|---|
time_shift_fixes_in_autovacuum.patch | application/octet-stream | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-06-20 13:35:39 | Re: castoroides spinlock failure on test_shm_mq |
Previous Message | Michael Paquier | 2015-06-20 08:48:31 | Re: The real reason why TAP testing isn't ready for prime time |