Re: autovacuum holds exclusive lock on table preventing it from to be updated

From: Dmitry O Litvintsev <litvinse(at)fnal(dot)gov>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: autovacuum holds exclusive lock on table preventing it from to be updated
Date: 2017-06-19 19:53:55
Message-ID: BL2PR09MB100982187092C1BCB00434EBB9C40@BL2PR09MB1009.namprd09.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

yes, we had to restart database 4 days ago (and vacuum has resumed on start).
I checked the log files and discovered that autovacuum on this table takes

pages: 0 removed, 14072307 remain
tuples: 43524292 removed, 395006545 remain
buffer usage: -1493114028 hits, 107664973 misses, 30263658 dirtied
avg read rate: 1.604 MB/s, avg write rate: 0.451 MB/s
system usage: CPU 2055.81s/17710.94u sec elapsed 524356.57 sec

6 days. So it is perpetually being autovacuumed (which I assumed to be a good thing)

Table has 400M entries, 115 GB.

I will try your suggestions in the test environment.

Thank you,
Dmitry
________________________________________
From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Sent: Monday, June 19, 2017 1:16 PM
To: Dmitry O Litvintsev
Cc: Andreas Kretschmer; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

On Mon, Jun 19, 2017 at 10:33 AM, Dmitry O Litvintsev <litvinse(at)fnal(dot)gov<mailto:litvinse(at)fnal(dot)gov>> wrote:
Hi

Since I have posted this nothing really changed. I am starting to panic (mildly).

The source (production) runs :

relname | mode | granted | substr | query_start | age
----------------------------+--------------------------+---------+----------------------------------------------------------------------+-------------------------------+------------------------
t_inodes_iio_idx | RowExclusiveLock | t | autovacuum: VACUUM ANALYZE public.t_inodes (to prevent wraparound) | 2017-06-15 10:26:18.643209-05 | 4 days 01:58:56.697559

This is close to unreadable. You can use use \x to get output from psql which survives email more readably.

Your first report was 6 days ago. Why is the job only 4 days old? Are you frequently restarting your production server, so that the vacuum job never gets a chance to finish? If so, that would explain your predicament.

And how big is this table, that it takes at least 4 days to VACUUM?

vacuum_cost_delay = 50ms

That is a lot. The default value for this is 0. The default value for autovacuum_vacuum_cost_delay is 20, which is usually too high for giant databases.

I think you are changing this in the wrong direction. Rather than increase vacuum_cost_delay, you need to decrease autovacuum_vacuum_cost_delay, so that you won't keep having problems in the future.

On your test server, change vacuum_cost_delay to zero and then initiate a manual vacuum of the table. It will block on the autovacuum's lock, so then kill the autovacuum (best to have the manual vacuum queued up first, otherwise it will be race between when you start the manual vacuum, and when the autovacuum automatically restarts, to see who gets the lock). See how long it takes this unthrottled vacuum to run, and how much effect the IO it causes has on the performance of other tasks. If acceptable, repeat this on production (although really, I don't that you have much of a choice on whether the effect it is acceptable or not--it needs to be done.)

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Israel Brewster 2017-06-19 20:17:32 sub-select with multiple records, columns
Previous Message Dmitry Dolgov 2017-06-19 19:52:54 Re: performance considerations of jsonb vs separate rows