From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD |
Date: | 2017-12-11 02:51:19 |
Message-ID: | CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
I heard a report of a 10.1 cluster hanging with several 'BtreePage'
wait_events showing in pg_stat_activity. The query plan involved
Parallel Index Only Scan, and the table is concurrently updated quite
heavily. I tried and failed to make a reproducer, but from the clues
available it seemed clear that somehow *all* participants in a
Parallel Index Scan must be waiting for someone else to advance the
scan. The report came with a back trace[1] that was the same in all 3
backends (leader + 2 workers), which I'll summarise here:
ConditionVariableSleep
_bt_parallel_seize
_bt_readnextpage
_bt_steppage
_bt_next
btgettuple
index_getnext_tid
IndexOnlyNext
I think _bt_steppage() called _bt_parallel_seize(), then it called
_bt_readnextpage() which I guess must have encountered a BTP_DELETED
or BTP_HALF_DEAD-marked page so didn't take this early break out of
the loop:
/* check for deleted page */
if (!P_IGNORE(opaque))
{
PredicateLockPage(rel, blkno,
scan->xs_snapshot);
/* see if there are any matches on this page */
/* note that this will clear moreRight
if we can stop */
if (_bt_readpage(scan, dir,
P_FIRSTDATAKEY(opaque)))
break;
}
... and then it called _bt_parallel_seize() itself, in violation of
the rule (by my reading of the code) that you must call
_bt_parallel_release() (via _bt_readpage()) or _bt_parallel_done()
after seizing the scan. If you call _bt_parallel_seize() again
without doing that first, you'll finish up waiting for yourself
forever. Does this theory make sense?
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2017-12-11 03:07:23 | Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD |
Previous Message | Amit Langote | 2017-12-11 02:30:02 | Re: ScalarArrayOpExpr and multi-dimensional arrays |