Re: Something is broken about connection startup

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Something is broken about connection startup
Date: 2016-11-10 23:04:34
Message-ID: 23418.1478819074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> A quick look through the sources confirms that this error implies that
> SearchSysCache on the RELOID cache must have failed to find a tuple for
> pg_proc --- there are many occurrences of this text, but they all are
> reporting that. Which absolutely should not be happening now that we use
> MVCC catalog scans, concurrent updates or no. So I think this is a bug,
> and possibly a fairly-recently-introduced one, because I can't remember
> seeing buildfarm failures like this one before.

After tweaking elog.c to promote FATAL to PANIC, I got stack traces
confirming that the error occurs here:

#0 0x0000003779a325e5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x0000003779a33dc5 in abort () at abort.c:92
#2 0x000000000080d177 in errfinish (dummy=<value optimized out>) at elog.c:560
#3 0x000000000080df94 in elog_finish (elevel=<value optimized out>,
fmt=<value optimized out>) at elog.c:1381
#4 0x0000000000801859 in RelationCacheInitializePhase3 () at relcache.c:3444
#5 0x000000000081a145 in InitPostgres (in_dbname=<value optimized out>, dboid=0,
username=<value optimized out>, useroid=<value optimized out>, out_dbname=0x0)
at postinit.c:982
#6 0x0000000000710c81 in PostgresMain (argc=1, argv=<value optimized out>,
dbname=0x24d4c40 "regression", username=0x24abc88 "postgres") at postgres.c:3728
#7 0x00000000006a6eae in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4271
#8 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3945
#9 ServerLoop (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1701
#10 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1309
#11 0x00000000006273d8 in main (argc=3, argv=0x24a9b20) at main.c:228

So it's happening when RelationCacheInitializePhase3 is trying to replace
a fake pg_class row for pg_proc (made by formrdesc) with the real one.
That's even odder, because that's late enough that this should be a pretty
ordinary catalog lookup. Now I wonder if it's possible that this can be
seen during ordinary relation opens after connection startup. If so, it
would almost surely be a recently-introduced bug, else we'd have heard
about this from the field.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Scalia 2016-11-10 23:11:55 Re: Shared memory estimation for postgres
Previous Message leoaaryan 2016-11-10 22:57:08 Shared memory estimation for postgres