Quick Links

Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption?

From:	Arjun Ranade <ranade(at)nodalexchange(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption?
Date:	2017-11-02 14:44:37
Message-ID:	CANrrCRxzFYtJLmZSkpUVFh_2ZUcGnsO2nh3Cg78t2OLQEwfzGg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Hi Tom,

I am positive that there are no prepared transactions, certainly none that
would persist past a DB shutdown and the single user mode work I did.

However, your point about the replication slots seems to be valid. Also, a
leaking replication slot seems more likely since it was a replication slot
issue with pglogical that caused the DB to crash before it was restarted a
few months ago.

Below are the items I got from querying the replication view. It seems
that you are correct,
https://www.postgresql.org/docs/9.4/static/catalog-pg-replication-slots.html
says that 'catalog_xmin' is "The oldest transaction affecting the system
catalogs that this slot needs the database to retain. VACUUM cannot remove
catalog tuples deleted by any later transaction."

On Wed, Nov 1, 2017 at 10:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Arjun Ranade <ranade(at)nodalexchange(dot)com> writes:
> > So we had a maintenance window scheduled today and I restarted the DB.
> > When it came back up, it reported the same toast error when vacuum'ing
> > pg_statistic. It also reported the "oldest xmin is far in the past"
> error.
>
> > I even brought the DB into single user mode and tried to manually vacuum
> > pg_statistic but got the same error. I also tried to TRUNCATE
> pg_statistic
> > in single user mode which failed (see below).
>
> > At this point, I am not sure what to make of the issue. The DB is still
> > complaining about old XMIN even when there are no users connected to it
> > after a fresh restart:
>
> Clearly, something is holding back the xmin horizon, and if the behavior
> persisted past a DB restart then it surely isn't a regular transaction.
>
> Are you *really* sure there are no prepared transactions?
>
> It now occurs to me that there's one other mechanism that could hold back
> the xmin horizon, and that is replication slots. Looking into the
> pg_replication_slots view should show you what's up there. I'm not
> an expert on replication slots, but I do know that they were new as of
> 9.4. I am thinking maybe your replication setup has "leaked" a slot
> that's not being used but is still preventing global xmin from advancing.
>
> regards, tom lane
>

In response to

Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption? at 2017-11-02 02:23:37 from Tom Lane

Responses

Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption? at 2017-11-02 14:52:23 from Arjun Ranade

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Arjun Ranade	2017-11-02 14:52:23	Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption?
Previous Message	Stephen Frost	2017-11-02 11:40:29	Re: Bad recovery: no pg_xlog/RECOVERYXLOG