Re: BUG #14938: ALTER TABLE hang/ poor performance

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dipesh Kamdar <dipesh(dot)kamdar(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14938: ALTER TABLE hang/ poor performance
Date: 2017-12-01 18:30:33
Message-ID: 9e33cd3a-cada-ee2d-5f4f-160be257d6a3@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On 12/01/2017 06:37 PM, Dipesh Kamdar wrote:
> #1    9.5.3 was released more than a year ago, you might be hitting one
> of the
> bugs fixed since then.
>
>>>> Do you know any bug related ?
>

No. Considering how little we know about the issue you're observing,
it's quite impossible to point to any particular bugfix.

But it's a good practice to run current version, and it's rather
annoying to help people only to discover they've been hitting an issue
fixed ages ago. To the extent that it's likely discouraging people from
investigating the bug report.

> #2 What do you mean by deadlock?Moreover, autovacuum certainly should not
> deadlock with anything (certainly not with DELETE)
>
>>>> Please take a look of SEQ
>    deleting  TAB1  data of    50K   ------   T1
>    autovacuum  check TAB1      -------  T2  ( Since delete is running
> skipping table )
>    delete completed    ---- T3
>     before starting next delete on same table autovacuum started on
> TAB1    ---- T4
>     DELETE statement is waiting for autovacuum to release lock.    
>     after some time later reported deadlock in log and killed DELETE
> process.
>

DELETE acquires ROW EXCLUSIVE lock
VACUUM acquires SHARE UPDATE EXCLUSIVE lock

These two lock modes do not conflict with each other, i.e. VACUUM (or
autovacuum) will not wait for DELETE (and vice versa).

Perhaps you're running the DELETE in a transaction, and that transaction
has acquired other (stronger) locks? Those locks will be held until the
end of the transaction, but that has nothing to do with DELETE.

Show us what locks the two sessions hold.

> #3  I also don't quite understand why you do the delete in batches of 50k
> rows, to be honest
>
>>> It is client facing application with close to 1.5 billion records and has multiple FK other and indexes removing again millions of record in single statement means 
>      putting entire online application on fire.
>

OK, so you're using this to throttle the DELETE process.

> #4 I am running following SQL as monitoring point of every 10min. 
> Let me know if anything is missing above monitor script.
>
> SELECT
>
>   COALESCE(blockingl.relation*::*regclass*::*text,blockingl.locktype) as
> locked_item,
>
>   now() *-* blockeda.query_start AS waiting_duration, blockeda.pid AS
> blocked_pid,
>
>   blockeda.query as blocked_query, blockedl.mode as blocked_mode,
>
>   blockinga.pid AS blocking_pid, blockinga.query as blocking_query,
>
>   blockingl.mode as blocking_mode
>
> FROM pg_catalog.pg_locks blockedl
>
>     INNER JOIN pg_stat_activity blockeda ON blockedl.pid *=* blockeda.pid
>
>     INNER JOIN pg_catalog.pg_locks blockingl ON(
>
>                 ( (blockingl.transactionid*=*blockedl.transactionid) OR
>
>                (blockingl.relation*=*blockedl.relation AND
> blockingl.locktype*=*blockedl.locktype)
>
>       ) AND blockedl.pid *!=* blockingl.pid)
>
>     INNER JOIN pg_stat_activity blockinga ON blockingl.pid *=*
> blockinga.pid  AND blockinga.datid *=* blockeda.datid
>
> WHERE NOT blockedl.granted
>
>       AND blockinga.datname *=* current_database()
>

I don't care about a monitoring query that runs every 10 minutes. The
deadlock detection runs after 1 second lock wait by default, which means
the monitoring query has pretty much no chance of observing that.

You need to simulate the issue and collect data from pg_locks while it's
happening. And collect everything from pg_locks - the lock dependencies
may easily be more complex.

It's also a good idea to share the deadlock log message, and info what
the processes were doing.

>
> #5 It is production , We will not able to attach process to gdb.
>

OK. Assuming it's a locking issue, we should be able to debug it using
just the pg_locks data.

>
> #6 My main concern is not coming in pg_locks table. 
>
> Is it advice to use update statement on pg_class.reloptions column
> rather than using ALTER TABLE SET option.
>

No. Touching the catalogs directly is pretty bad idea. The locking is
there for a reason.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bossart, Nathan 2017-12-01 18:53:03 Re: BUG #14941: Vacuum crashes
Previous Message Dipesh Kamdar 2017-12-01 17:37:39 Re: BUG #14938: ALTER TABLE hang/ poor performance