Re: Error: no connection to the server

From: "William B(dot) Clay" <william(dot)b(dot)clay(at)acm(dot)org>
To: Marco Bambini <marco(at)sqlabs(dot)com>, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Error: no connection to the server
Date: 2016-04-26 14:13:56
Message-ID: 76efa9df-ca28-59a6-35c6-2ff58210333c@acm.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

On 04/25/2016 09:58 AM, Marco Bambini wrote:
> I have a multithreaded C client and sometimes I receive the "no connection to the server" error message.
> I haven't found any documentation about it and about how to fix this issue.

Marco, what interface to PGSQL are you using? I haven't seen this exact
message coming from from libpq, so I wonder if it's coming from an
intermediate SQL library that is calling PostgreSQL.

Presuming your client established a session with the server, then this
probably means you've lost network connectivity (link down;
server/service down; TCP session abandoned due to an inactivity timer
somewhere). If your client is intended to be long-running and reliable,
it needs to recognize and recover from such outages.

Below is an example of insert-only code that has reliably recovered from
the inevitable temporary outages in transatlantic Internet connectivity
(not to mention scheduled and unscheduled server bounces) while running
7x24 for several years.

Good luck,
Bill Clay
Modena, Italy

==== code snippets ====

The "lost connection" detection snippet:

rc = pselect(maxfds, &rfds, &rfds, &rfds, NULL, &nousr2_smask);

/* what woke us up? */
if (rc < -1 || rc > 2 || (rc == -1 && errno != EINTR && errno != EBADF)) {

/* unexpected result from pselect(): die immediately */
motion_log(LOG_ERR, 1, "Unexpected pgwrite.c pselect() result=%d; PGSQL writer terminating", rc);
pgwcnt.quit = PGSQL_TERM_FORCE;
return; /* things are totally broken, give up */

} else if (errno == EBADF) {
/* invalid FD means DBMS session died; attempt recovery */
if ( pgwrite_init() < 0 ) {
/* unrecoverable error while awaiting session recovery; abort */
pgwcnt.quit = PGSQL_TERM_FORCE;
return;
} else {
/* all transactions in flight failed; don't wait for them to complete */
qelem_last = NULL;
}

} else if (rc>0) {
/* PGSQL operation completion: invite PGSQL client library to */
/* process any pending work back from server */
if (!PQconsumeInput(pgwcnt.conn)) {
/* did we lose the DBMS connection? */
if (PQstatus(pgwcnt.conn) != PGSQL_CONNECTION_OK) {
/* attempt to open new DBMS session */
if ( pgwrite_init() < 0 ) {
/* unrecoverable error while awaiting session recovery; abort */
pgwcnt.quit = PGSQL_TERM_FORCE;
return;
} else {
/* we lost all transactions in flight; don't expect them to clear */
qelem_last = NULL;
}
} else {
/* not sure what went wrong, keep trying */
motion_log(LOG_ERR, 0, "PQconsumeInput failed, pgwrite.c: %s.", PQerrorMessage(pgwcnt.conn));
}
} else {
/* check result of just-completed operation */

The (re-)connection snippet:

/**
*
* pgwrite_init: connect to PGSQL DBMS and PREPARE configured SQL statements
* used for initial connection and recovery of broken sessions
*
* on entry: * SIGUSR2 should be masked; we restore it on exit
* * pgwcnt.mutex NOT held, except startup, when queue always empty
*
* returns: -1 on unrecoverable error
* 0 on successful first open attempt
* +1 on successful open after failed sessions or open attempts
*
*/
int pgwrite_init()
{
int ret = 0; // return value
int retries = 0; // connection attempt count
int qeaband = 0; // local count abandoned queue elements
int i, j;
int so_val; // getsockopt() parms
socklen_t so_len = sizeof(so_val);

struct context *cnt0 = cnt[0]; // for easy reference

if (pgwcnt.conn) {
/* clean up failed connection attempt or broken session */
motion_log(LOG_ERR, 0, "PGSQL database '%s' connection failed: %s",
cnt0->conf.pgsql_db, PQerrorMessage(pgwcnt.conn));
pgwcnt.sessloss++; // increment lost session count
retries++;
PQfinish(pgwcnt.conn);
pgwcnt.conn = NULL;
}

/* PGSQL writer timestamps are UTC, not local time */
setenv("PGTZ", "UTC", 1);

pthread_sigmask(SIG_UNBLOCK, &usr2_smask, NULL);
/* repeat DBMS session open attempt until success or termination order */
/* NB: PGSQL enum value CONNECTION_OK is redefined as PGSQL_CONNECTION_OK in motion.h */
while (PQstatus(pgwcnt.conn)!=PGSQL_CONNECTION_OK && pgwcnt.quit<PGSQL_TERM_REQ) {
/*
* Open session to DBMS. Note:
* 1. ONE PGSQL connection is shared by all video threads;
* 2. thus, only MAIN thread PGSQL connection parameters are used.
* 3. Parameter "dbname" may be a PostgreSQL-native connect string.
* 4. Connect parms may be blank, null, or contain special
* characters (including quotes and backslashes).
* 5. Changes in PGSQL connection parameters via Web UI after the
* first PGSQL session open attempt are IGNORED.
* 6. Changes in SQL statement text via Web UI after the first
* successful session open are IGNORED.
*/
char pstring[6];
snprintf(pstring, sizeof(pstring), "%i", cnt0->conf.pgsql_port);
if ( !(pgwcnt.conn = PQconnectdbParams(
(const char *[]) {"dbname", "host",
"user", "password", "port", "sslmode",
"application_name", "connect_timeout", "keepalives",
"keepalives_idle", "keepalives_interval", "keepalives_count", NULL},
(const char *[]) {cnt0->conf.pgsql_db, cnt0->conf.pgsql_host,
cnt0->conf.pgsql_user, cnt0->conf.pgsql_password, pstring, "disable",
"motion", PGSQL_KEEPALIVE_QSECS, "1",
PGSQL_KEEPALIVE_QSECS, PGSQL_KEEPALIVE_QSECS, "1", NULL}, 0)) ) {
motion_log(LOG_CRIT, 0, "LOGIC ERROR: PQconnectdbParams returned NULL");
ret = -1;
}
/*
* PQconnectdbParams() promptly detects lack of connectivity, but the
* obvious recovery API, PQreset(), does not block after a failed
* connection attempt (at least on my system), causing an infinite
* loop. Thus, this less elegant recovery for a failed connection.
*/
if (PQstatus(pgwcnt.conn)!=PGSQL_CONNECTION_OK) {
/* connection attempt failed; first time? */
if (!retries) {
/* allow video threads to proceed , even if no DB connection */
if (pthread_mutex_unlock(&pgwcnt.mutex)) {
motion_log(LOG_ERR, 1, "PGSQL mutex unlock 2 failed, pgwrite.c; aborting");
ret = -1;
break;
}
motion_log(LOG_ERR, 0, "PGSQL attempting recovery of failed connect attempt: host '%s' database '%s' user '%s': %s",
cnt0->conf.pgsql_host, cnt0->conf.pgsql_db, cnt0->conf.pgsql_user, PQerrorMessage(pgwcnt.conn));
pgwcnt.sessloss++; // increment lost session count
}
PQfinish(pgwcnt.conn); // gotta do this even for failed connections
pgwcnt.conn = NULL;
retries ++;
/* if too many INSERTs queued while we retried, reduce queue */
qeaband += pgwrite_qflush(PGSQL_MAX_DISCONN_Q);
/* is PGSQL transport via Unix socket? */
if (!getsockopt(PQsocket(pgwcnt.conn), SOL_SOCKET, SO_DOMAIN, &so_val, &so_len) &&
so_val == PF_LOCAL ) {
/* yes, wait a while to retry */
sleep(PGSQL_KEEPALIVE_SECS);
} else {
/* no; we probably waited for connect_timeout, but just in case ... */
sleep(5);
}
}
}

/* we are here when a new DBMS session has opened or termination has been forced */
if (pgwcnt.conn && qeaband) {
/* report work abandoned prior to session recovery */
motion_log(LOG_ERR, 0, "PGSQL writer abandoned %d inserts pending link recovery", qeaband);
qeaband = 0;
}

/* unless forced termination, PREPARE all configured SQL statements */
if (pgwcnt.conn && pgwcnt.quit<PGSQL_TERM_REQ) {

/* set up file descriptor for PGSQL session */
pqfd = PQsocket(pgwcnt.conn);
maxfds = pqfd+1;
motion_log(LOG_INFO, 0, "PGSQL connected to host '%s' database '%s' as user '%s'",
PQhost(pgwcnt.conn), PQdb(pgwcnt.conn), PQuser(pgwcnt.conn));

In response to

Browse pgsql-interfaces by date

  From Date Subject
Next Message Anthony LaTorre 2016-05-20 19:50:21 psql close
Previous Message Marco Bambini 2016-04-25 07:58:48 Error: no connection to the server