Re: Somebody has not thought through subscription locking considerations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Somebody has not thought through subscription locking considerations
Date: 2017-03-31 06:08:38
Message-ID: CAD21AoBTo3LY-+5mc-Xs1p_c0FjvcDNcg_YCn=xdwOFHiffRGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 9:53 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> On 30/03/17 07:25, Tom Lane wrote:
>> I noticed this failure report:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi&dt=2017-03-29%2019%3A45%3A27
>>
>> in which we find
>>
>> *** /home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/regress/expected/updatable_views.out Thu Mar 30 04:45:43 2017
>> --- /home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/regress/results/updatable_views.out Thu Mar 30 05:32:37 2017
>> ***************
>> *** 349,354 ****
>> --- 349,358 ----
>> DROP VIEW ro_view10, ro_view12, ro_view18;
>> DROP SEQUENCE seq CASCADE;
>> NOTICE: drop cascades to view ro_view19
>> + ERROR: deadlock detected
>> + DETAIL: Process 7576 waits for AccessShareLock on relation 1259 of database 16384; blocked by process 7577.
>> + Process 7577 waits for ShareRowExclusiveLock on relation 6102 of database 16384; blocked by process 7576.
>> + HINT: See server log for query details.
>> -- simple updatable view
>> CREATE TABLE base_tbl (a int PRIMARY KEY, b text DEFAULT 'Unspecified');
>> INSERT INTO base_tbl SELECT i, 'Row ' || i FROM generate_series(-2, 2) g(i);
>>
>> and the referenced bit of log is
>>
>> [58dc19dd.1d98:175] ERROR: deadlock detected
>> [58dc19dd.1d98:176] DETAIL: Process 7576 waits for AccessShareLock on relation 1259 of database 16384; blocked by process 7577.
>> Process 7577 waits for ShareRowExclusiveLock on relation 6102 of database 16384; blocked by process 7576.
>> Process 7576: DROP SEQUENCE seq CASCADE;
>> Process 7577: VACUUM FULL pg_class;
>> [58dc19dd.1d98:177] HINT: See server log for query details.
>> [58dc19dd.1d98:178] STATEMENT: DROP SEQUENCE seq CASCADE;
>>
>> Of course, 1259 is pg_class and 6102 is pg_subscription_rel.
>>
>> I await with interest an explanation of what "VACUUM FULL pg_class" is
>> doing trying to acquire ShareRowExclusiveLock on pg_subscription_rel, not
>> to mention why a DROP SEQUENCE is holding some fairly strong lock on that
>> relation. *Especially* in a situation where no subscriptions exist ---
>> but even if any did, this seems unacceptable on its face. Access to core
>> catalogs like pg_class cannot depend on random other stuff.
>>
>
> Hmm, the DROP SEQUENCE is result of not having dependency info for
> relations/subscriptions I think. I was told during review it's needless
> bloat of dependency catalog. I guess we should revisit that. It's also
> likely that RemoveSubscriptionRel will work fine with lower lock level.
>
> I have no idea why VACUUM FULL of pg_class would touch the
> pg_subscription_rel though, I'll have to dig into that.

VACUUM FULL of any table acquires ShareRowExclusiveLock on
pg_subscription_rel because when doDeletion removes old heap the
RemoveSubscriptionRel is called in heap_drop_with_catalog. The
following stack trace is what I got when run VACUUM FULL pg_class.

#2 0x000000000084b9d2 in LockRelationOid (relid=6102, lockmode=6) at lmgr.c:115
#3 0x00000000004cec37 in relation_open (relationId=6102, lockmode=6)
at heapam.c:1122
#4 0x00000000004ceef9 in heap_open (relationId=6102, lockmode=6) at
heapam.c:1288
#5 0x0000000000599a02 in RemoveSubscriptionRel (subid=0, relid=16409)
at pg_subscription.c:361
#6 0x000000000056037f in heap_drop_with_catalog (relid=16409) at heap.c:1842
#7 0x000000000055b420 in doDeletion (object=0x1bca758, flags=1) at
dependency.c:1125
#8 0x000000000055b192 in deleteOneObject (object=0x1bca758,
depRel=0x7fff000716a0, flags=1) at dependency.c:1028
#9 0x000000000055a169 in deleteObjectsInList
(targetObjects=0x1b99708, depRel=0x7fff000716a0, flags=1) at
dependency.c:263
#10 0x000000000055a21a in performDeletion (object=0x7fff00071750,
behavior=DROP_RESTRICT, flags=1) at dependency.c:344
#11 0x0000000000615597 in finish_heap_swap (OIDOldHeap=1259,
OIDNewHeap=16409, is_system_catalog=1 '\001', swap_toast_by_content=0
'\000', check_constraints=0 '\000', is_internal=1 '\001',
frozenXid=571, cutoffMulti=1, newrelpersistence=112 'p') at
cluster.c:1574
#12 0x0000000000613bd3 in rebuild_relation (OldHeap=0x7f541f0d24a0,
indexOid=0, verbose=0 '\000') at cluster.c:590
#13 0x000000000061362a in cluster_rel (tableOid=1259, indexOid=0,
recheck=0 '\000', verbose=0 '\000') at cluster.c:404
#14 0x000000000069f40f in vacuum_rel (relid=1259, relation=0x1b9a770,
options=17, params=0x7fff00071a80) at vacuum.c:1441
#15 0x000000000069db9b in vacuum (options=17, relation=0x1b9a770,
relid=0, params=0x7fff00071a80, va_cols=0x0, bstrategy=0x1bf2f50,
isTopLevel=1 '\001') at vacuum.c:304
#16 0x000000000069d7ec in ExecVacuum (vacstmt=0x1b9a7c8, isTopLevel=1
'\001') at vacuum.c:122
#17 0x0000000000873a9c in standard_ProcessUtility (pstmt=0x1b9ab28,
queryString=0x1b99d50 "vacuum FULL pg_class;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x1b9ac20,
completionTag=0x7fff00071eb0 "") at utility.c:670
#18 0x0000000000873222 in ProcessUtility (pstmt=0x1b9ab28,
queryString=0x1b99d50 "vacuum FULL pg_class;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x1b9ac20,
completionTag=0x7fff00071eb0 "") at utility.c:353
#19 0x000000000087220c in PortalRunUtility (portal=0x1b35540,
pstmt=0x1b9ab28, isTopLevel=1 '\001', setHoldSnapshot=0 '\000',
dest=0x1b9ac20, completionTag=0x7fff00071eb0 "") at pquery.c:1174
#20 0x0000000000872419 in PortalRunMulti (portal=0x1b35540,
isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0x1b9ac20,
altdest=0x1b9ac20, completionTag=0x7fff00071eb0 "") at pquery.c:1317
#21 0x0000000000871945 in PortalRun (portal=0x1b35540,
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001',
dest=0x1b9ac20, altdest=0x1b9ac20, completionTag=0x7fff00071eb0 "") at
pquery.c:795
#22 0x000000000086ba9e in exec_simple_query (query_string=0x1b99d50
"vacuum FULL pg_class;") at postgres.c:1101
#23 0x000000000086fbf8 in PostgresMain (argc=1, argv=0x1b44220,
dbname=0x1b44080 "postgres", username=0x1b44058 "masahiko") at
postgres.c:4071
#24 0x00000000007d6e46 in BackendRun (port=0x1b3c5d0) at postmaster.c:4317
#25 0x00000000007d65c6 in BackendStartup (port=0x1b3c5d0) at postmaster.c:3989
#26 0x00000000007d2bde in ServerLoop () at postmaster.c:1729
#27 0x00000000007d22a2 in PostmasterMain (argc=5, argv=0x1b15ed0) at
postmaster.c:1337
#28 0x0000000000710cfa in main (argc=5, argv=0x1b15ed0) at main.c:228

> I can see that locking for example pg_trigger or pg_depend in
> ShareRowExclusiveLock will also block VACUUM FULL on pg_class (and other
> related tables like pg_attribute). So maybe we just need to be careful
> about not taking such a strong lock...
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-31 06:27:18 Re: REINDEX CONCURRENTLY 2.0
Previous Message Arthur Zakirov 2017-03-31 05:58:50 Re: [PATCH] Generic type subscripting