Why corruption memory in one database affects all the cluster?

From: Ru Devel <rudevel(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Why corruption memory in one database affects all the cluster?
Date: 2014-07-13 19:07:48
Message-ID: CAO9=PZtzKE73-FUmrys2WT2Azq+FkBu0Db5zAOPCug9D-5YJZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I have postgres 9.3.4 running on linux, and ~20 databases in the cluster.

All the cluster was migrated from 9.2 using pg_upgradecluster.

After migration autovacuum started to fail in one database, causing entire
cluster crashes:

2014-07-13 21:16:24 MSK [5665]: [1-1] db=,user= PANIC: corrupted item
pointer: offset = 5292, size = 24
2014-07-13 21:16:24 MSK [29131]: [417-1] db=,user= LOG: server process
(PID 5665) was terminated by signal 6: Aborted
2014-07-13 21:16:24 MSK [29131]: [418-1] db=,user= DETAIL: Failed process
was running: autovacuum: VACUUM public.postfix_stat0 (to prevent wraparound)
2014-07-13 21:16:24 MSK [29131]: [419-1] db=,user= LOG: terminating any
other active server processes
2014-07-13 21:16:24 MSK [29597]: [1-1] db=,user= WARNING: terminating
connection because of crash of another server process

I have two questions:

1) why in case of some problem with only one database, only one place of
memory we have entire-server problem? The database with problem is not
important but this corrupted memory inside it leads to frequent
cluster-wide restart so all my server suffering from this local problem.
Why postmaster should restart all backends if only one dies?

2) what is the best modern way to analyze and fix such an issue?

Thank you.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martin Gudmundsson 2014-07-13 19:54:54 Re: Bi-Directional replication client awareness
Previous Message Michael Paquier 2014-07-13 08:42:01 Re: performance monitoring/tuning