From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: recent failures on lorikeet |
Date: | 2021-06-14 16:33:18 |
Message-ID: | ce94774f-0583-7be2-8ec3-2bb161b959fd@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 6/14/21 9:39 AM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> I've been looking at the recent spate of intermittent failures on my
>> Cygwin animal lorikeet. Most of them look something like this, where
>> there's 'VACUUM FULL pg_class' and an almost simultaneous "CREATE TABLE'
>> which fails.
> Do you have any idea what "exit code 127" signifies on that platform?
> (BTW, not all of them look like that; many are reported as plain
> segfaults.) I hadn't spotted the association with a concurrent "VACUUM
> FULL pg_class" before, that does seem interesting.
>
>> Getting stack traces in this platform can be very difficult. I'm going
>> to try forcing complete serialization of the regression tests
>> (MAX_CONNECTIONS=1) to see if the problem goes away. Any other
>> suggestions might be useful. Note that we're not getting the same issue
>> on REL_13_STABLE, where the same group pf tests run together (inherit
>> typed_table, vacuum)
> If it does go away, that'd be interesting, but I don't see how it gets
> us any closer to a fix. Seems like a stack trace is a necessity to
> narrow it down.
>
>
Some have given stack traces and some not, not sure why. The one from
June 13 has this:
---- backtrace ----
??
??:0
WaitOnLock
src/backend/storage/lmgr/lock.c:1831
LockAcquireExtended
src/backend/storage/lmgr/lock.c:1119
LockRelationOid
src/backend/storage/lmgr/lmgr.c:135
relation_open
src/backend/access/common/relation.c:59
table_open
src/backend/access/table/table.c:43
ScanPgRelation
src/backend/utils/cache/relcache.c:322
RelationBuildDesc
src/backend/utils/cache/relcache.c:1039
RelationIdGetRelation
src/backend/utils/cache/relcache.c:2045
relation_open
src/backend/access/common/relation.c:59
table_open
src/backend/access/table/table.c:43
ExecInitPartitionInfo
src/backend/executor/execPartition.c:510
ExecPrepareTupleRouting
src/backend/executor/nodeModifyTable.c:2311
ExecModifyTable
src/backend/executor/nodeModifyTable.c:2559
ExecutePlan
src/backend/executor/execMain.c:1557
The line in lmgr.c is where the process title gets changed to "waiting".
I recently stopped setting process title on this animal on REL_13_STABLE
and its similar errors have largely gone away. I can do the same on
HEAD. But it does make me wonder what the heck has changed to make this
code fragile.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-06-14 16:37:58 | Re: PG 14 release notes, first draft |
Previous Message | Tomas Vondra | 2021-06-14 16:28:01 | Re: Use extended statistics to estimate (Var op Var) clauses |