From: | Arjun Ranade <ranade(at)nodalexchange(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption? |
Date: | 2017-11-02 14:44:37 |
Message-ID: | CANrrCRxzFYtJLmZSkpUVFh_2ZUcGnsO2nh3Cg78t2OLQEwfzGg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi Tom,
I am positive that there are no prepared transactions, certainly none that
would persist past a DB shutdown and the single user mode work I did.
However, your point about the replication slots seems to be valid. Also, a
leaking replication slot seems more likely since it was a replication slot
issue with pglogical that caused the DB to crash before it was restarted a
few months ago.
Below are the items I got from querying the replication view. It seems
that you are correct,
https://www.postgresql.org/docs/9.4/static/catalog-pg-replication-slots.html
says that 'catalog_xmin' is "The oldest transaction affecting the system
catalogs that this slot needs the database to retain. VACUUM cannot remove
catalog tuples deleted by any later transaction."
db1=# select * from pg_replication_slots;
slot_name | plugin |
slot_type | datoid | database | active | xmin | catalog_xmin |
restart_lsn
-----------------------------------------------+------------------+-----------+--------+----------+--------+-----------+--------------+--------------
repmgr_slot_3 | |
physical | | | t | 252861646 | |
157/E1755E90
repmgr_slot_4 | |
physical | | | t | 324101780 | *162630233* |
157/E1755E90
repmgr_slot_2 | |
physical | | | t | | |
157/E1755E90
pgl_tcs_dadfbc8d5_tcspg01_large_tab_ch356c67c | pglogical_output |
logical | 21705 | prod | t | | 324101794 |
157/E16D62D8
pgl_tcs_dadfbc8d5_tcspg01_all_tables | pglogical_output |
logical | 21705 | prod | t | | 324101794 |
157/E16D62D8
(5 rows)
db1=# SELECT datname, datfrozenxid FROM pg_database;
datname | datfrozenxid
-----------+--------------
template1 | *162630233*
template0 | *162630233*
repmgr_db | 138378717
postgres | 138378717
rundeck | 138378717
db1 | *162630233*
(6 rows)
On Wed, Nov 1, 2017 at 10:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Arjun Ranade <ranade(at)nodalexchange(dot)com> writes:
> > So we had a maintenance window scheduled today and I restarted the DB.
> > When it came back up, it reported the same toast error when vacuum'ing
> > pg_statistic. It also reported the "oldest xmin is far in the past"
> error.
>
> > I even brought the DB into single user mode and tried to manually vacuum
> > pg_statistic but got the same error. I also tried to TRUNCATE
> pg_statistic
> > in single user mode which failed (see below).
>
> > At this point, I am not sure what to make of the issue. The DB is still
> > complaining about old XMIN even when there are no users connected to it
> > after a fresh restart:
>
> Clearly, something is holding back the xmin horizon, and if the behavior
> persisted past a DB restart then it surely isn't a regular transaction.
>
> Are you *really* sure there are no prepared transactions?
>
> It now occurs to me that there's one other mechanism that could hold back
> the xmin horizon, and that is replication slots. Looking into the
> pg_replication_slots view should show you what's up there. I'm not
> an expert on replication slots, but I do know that they were new as of
> 9.4. I am thinking maybe your replication setup has "leaked" a slot
> that's not being used but is still preventing global xmin from advancing.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Arjun Ranade | 2017-11-02 14:52:23 | Re: Missing Chunk Error when doing a VACUUM FULL operation - DB Corruption? |
Previous Message | Stephen Frost | 2017-11-02 11:40:29 | Re: Bad recovery: no pg_xlog/RECOVERYXLOG |