Re: Unable to Connect to DB Instance

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Boblitz John <john(dot)boblitz(at)bertschi(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unable to Connect to DB Instance
Date: 2018-07-04 15:56:20
Message-ID: 9e42f409-2e90-636a-772d-9ea6ca34d25a@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support pgsql-general

On 07/04/2018 04:08 AM, Boblitz John wrote:

>
> Good Morning,
>
> Beginning yesterday morning, users have been unable to fully connect
> to our DB Instance.
>
> 1.At the time of the initial report – I was connected to the DB via
> pgAdmin and could perform queries without problem.
>
> 2.Users reported messages similar to “could not open file
> "global/11801": No such file or directory”
>
> 3.At that time, connection logging was turned off and there were no
> messages in the log files.
>
> 4.As this is a development environment, I turned logging on in the
> config and restarted the DB
>
> 5.After restart, neither I, nor the Users could fully reconnect.
>
> 6.I have performed a SYS Level backup (tar of the whole postgres
> directory tree)
>
> 7.I cannot perform a DB level Backup (same errors occur)
>
> System Details
>
>                 Linux Debian      7.11
>
>                 Postgres              9.1 (9.1.24lts-0+d)
>

> It appears that we can connect to the DB Server itself as I get
> “connection received” and “connection authorized” – but when trying
> to access the DB itself, several errors are raised (see below).
>
> I am assuming that some internals are no longer consistent – the
> file “global/11801” for instance really does not exist on the system.
>
> Questions:
>
> 1.Is there any way to recover from this (backup is unfortunately
> rather old) >
> 2.What are possible causes?  I’d like to prevent this from happening
> on my production servers.

Looks like something/someone deleted files from portions of the $DATA
directory. In particular from ~/global and ~/pg_tblspc. Without those
files you can't really proceed.

Can you recover by creating a new instance and restoring from a dump of
the production server?

As to exact cause the only thing I can think of is to to look at the
system logs at the time of the initial failure and see if there is
anything there that would shed light.

>
> ** I am aware that we are on older releases, and yes, we plan to
> migrate to more current releases “soon” ™ …

>
> Thanks in advance.
>
> John Boblitz
>
> Exceprt from Log:
>
> 2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG:
> connection received: host=192.168.250.50 port=28559
>
> 2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG:
> connection authorized: user=dbadmin database=postgres
>
> 2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR:  could
> not open file "global/11801": No such file or directory
>
> 2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT:
> SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN
> pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN
> usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince,
> CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as
> inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location()
> ELSE NULL END as receiveloc, CASE WHEN usesuper THEN
> pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN
> usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as
> replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN
> pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused
>
>                   FROM pg_user WHERE usename=current_user
>
> 2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR:  could
> not open file "global/11801": No such file or directory
>
> 2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT:
> SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname =
> current_user;
>
> 2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG:
> connection received: host=192.168.250.50 port=28561
>
> 2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG:
> connection authorized: user=dbadmin database=g11Base
>
> 2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR:  could
> not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No
> such file or directory
>
> 2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT:
> SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1
>
>                             WHEN (nspname LIKE E'pg\\_%') THEN 0
>
>                             ELSE 3 END AS nsptyp,
>
>                        nsp.nspname, nsp.oid,
> pg_get_userbyid(nspowner) AS namespaceowner, nspacl,
> description,       has_schema_privilege(nsp.oid, 'CREATE') as cancreate,
>
>                 (SELECT array_agg(label) FROM pg_seclabels sl1
> WHERE sl1.objoid=nsp.oid) AS labels,
>
>                 (SELECT array_agg(provider) FROM pg_seclabels sl2
> WHERE sl2.objoid=nsp.oid) AS providers
>
>                   FROM pg_namespace nsp
>
>                   LEFT OUTER JOIN pg_description des ON
> (des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)
>
>                 WHERE NOT ((nspname = 'pg_catalog' AND EXISTS
> (SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace
> = nsp.oid LIMIT 1)) OR
>
>                 (nspname = 'pgagent' AND EXISTS (SELECT 1 FROM
> pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT
> 1)) OR
>
>                 (nspname = 'information_schema' AND EXISTS (SELECT
> 1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid
> LIMIT 1)) OR
>
>                 (nspname LIKE '_%' AND EXISTS (SELECT 1 FROM
> pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid
> LIMIT 1))
>
>                 )  AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname
> NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgadmin-support by date

  From Date Subject
Next Message Boblitz John 2018-07-04 17:07:19 RE: Unable to Connect to DB Instance
Previous Message Tom Lane 2018-07-04 15:49:57 Re: Unable to Connect to DB Instance

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-07-04 16:01:53 Re: Postgres sometimes stalling on 'percentile_cont'
Previous Message Tom Lane 2018-07-04 15:49:57 Re: Unable to Connect to DB Instance