Hung Postgres Processes

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Hung Postgres Processes
Date: 2002-11-23 04:43:39
Message-ID: web-1838340@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Folks,

Just had this particular very unpleasant experience for the first time.
I had an overnight series of data transformations running ... usually,
they run from 12:30am to 1:20 am ... and the process hung. Badly.
Requiring a "fast" system shutdown and restoring the database from
backup.

Here's the details:
Platform: Hand-built Dual Athalon MP/Molex RAID 5 (UW SCSI) system.
PostgreSQL 7.2.3
SuSE Linux 7.3

Data imports started normally at 12:00am and apparently completed.
Data transformation process (16-35 UPDATES and INSERTs affecting a
combined 1, 300,000 records) started at about 12:30am after the import
ended. The data transformations are a series of functions called by a
Perl script through cron as the root user.

Sometime during the transformation process, a statement hung. The
procedure continued running for at least 2 hours, at which point
another script, set up to detect such problems, ran a "pg_ctl -m fast
stop". Instead of stopping, the postgresql server hung.

When I got to the machine in the morning, there were 3 processes, one
query, one checkpoint process and the postmaster which were frozen.
SIGHUP and SIGTERM were ignored by these; SIGKILL was able to kill
the postmaster process, but the two other processes went to "D" status
and were untouchable.

I was forced to fast-shutdown the server. While Postgres did restart OK
after restarting the machine, I did not trust the data integrity, and
restored from backup.

Has anyone else encountered this kind of situation? Is there a way to
prevent it, or a less drastic way to resolve it? What are likely
causes?

-Josh Berkus

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-11-23 05:09:44 Re: Hung Postgres Processes
Previous Message Matthew Nuzum 2002-11-23 03:38:02 error on last line of function