checkpointer and other server processes crashing

From: Joe Abbate <jma(at)freedomcircle(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: checkpointer and other server processes crashing
Date: 2021-02-15 21:15:44
Message-ID: e426330d-bdbb-98d8-5e74-09998258503d@freedomcircle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

We've been experiencing PG server process crashes about every other week
on a mostly read only website (except for a single insert/update on page
access). Typical log entries look like

LOG: checkpointer process (PID 11200) was terminated by signal 9: Killed
LOG: terminating any other active server processes

Other than the checkpointer, the server process that was terminated was
either doing a "BEGIN READ WRITE", a "COMMIT" or executing a specific
SELECT.

The database is always recovered within a second and everything else
appears to resume normally. We're not certain about what triggers this,
but in several instances the web logs show an external bot issuing
multiple HEAD requests on what is logically a single page. The web
server logs show "broken pipe" and EOF errors, and PG logs sometimes
shows a number of "incomplete startup packet" messages before the
termination message.

This started roughly when the site was migrated to Go, whose web
"processes" run as "goroutines", scheduled by Go's runtime (previously
the site used Python and Gunicorn to serve the pages, which probably
isolated the PG processes from a barrage of nearly simultaneous requests).

As I understand it, the PG server processes doing a SELECT are spawned
as children of the Go process, so presumably if a "goroutine" dies, the
associated PG process would die too, but I'm not sure I grasp why that
would cause a recovery/restart. I also don't understand where the
checkpointer process fits in the picture (and what would cause it to die).

For the record, this is on PG 11.9 running on Debian.

TIA,

Joe

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2021-02-15 21:29:28 Re: checkpointer and other server processes crashing
Previous Message Tom Lane 2021-02-15 20:55:10 Re: pg_stat_user_tables.n_mod_since_analyze persistence?