Re: Could not open file "pg_subtrans/01EB"

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Could not open file "pg_subtrans/01EB"
Date: 2018-08-27 12:05:55
Message-ID: CA+t6e1=oxPx5J7PiCQpeWCAqWL7xCekJZNRipBMB5Ljthqmp+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

So As Tom Lane suggested I upgraded the database to the 9.2.24 version. I
vacuumed (vacuum verbose) all the databases and for one of the databases
(that has a duplicated record in pg_database) I got the next meesages for
two of the obejcts :
2018-08-27 16:57:59 +08 db4 22026 WARNING: relation "a" page 1560 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1561 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1562 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1563 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1564 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1565 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1566 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1567 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1568 is
uninitialized --- fixing
2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1569 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1031 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1032 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1033 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1034 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1035 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1036 is
uninitialized --- fixing
2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1037 is
uninitialized --- fixing

I tried re indexing the problematic databases :
db1>reindex system db1
2018-08-27 17:09:44 +08 db1 23218 ERROR: could not access status of
transaction 32212695
2018-08-27 17:09:44 +08 db1 23218 DETAIL: Could not open file
"pg_subtrans/01EB": No such file or directory.

I tried to delete the duplicated rows in pg_database but the delete doesnt
delete anything :

select ctid,xmin,xmax,datname from pg_database order by datname;
ctid | xmin | xmax | datname
--------+-------+----------+----------------
(0,21) | 2351 | 0 | db1
(0,4) | 1809 | 21093518 | db1
(0,3) | 1806 | 0 | postgres
(0,24) | 12594 | 0 | db2
(0,2) | 1803 | 0 | template0
(0,1) | 1802 | 0 | template1
(0,22) | 3590 | 0 | db3
(0,23) | 3592 | 0 | db4
(0,5) | 1811 | 21077312 | db4
(9 rows)

As you can see db1 and db4 have duplicated records. I tried to delete them
:
delete from pg_database where ctid='(0,4)';
DELETE 0

but the record does exist :
select ctid,datname from pg_database where ctid='(0,4)';
ctid | datname
-------+---------
(0,4) | db1
(1 row)

I set the zero_damaged_pages to on but it didnt help either.

How can I continue ?

‫בתאריך יום א׳, 26 באוג׳ 2018 ב-19:42 מאת ‪Mariel Cherkassky‬‏ <‪
mariel(dot)cherkassky(at)gmail(dot)com‬‏>:‬

> 1.I'm not really sure if they have one but its not my business.. I'm just
> trying to help those guys with an application that is based on postgres...
> 2.Yeah I realized that it is an option, but do you really think that it
> should be the first solution ? I tried to search for bugs that seems
> identical to my case but I didnt find any.
> 3.I set the vacuum_freeze_table_age to 0 and vacuumed all the duplicated
> databases but it seems that it didn't solve their problem.
> 4.I tried to delete the records according to the value in xmin/xmax and
> the result was that after the delete postgresql didnt recognize that I have
> those databases. I queried the ctid and the xmin/xman :
> postgres=# select xmin,xmax,datname,ctid from pg_Database;
> xmin | xmax | datname | ctid
> -------+----------+----------------+--------
> 1802 | 0 | template1 | (0,1)
> 1803 | 0 | template0 | (0,2)
> 1806 | 0 | postgres | (0,3)
> 1809 | 21093518 | db1 | (0,4)
> 1811 | 21077312 | db2 | (0,5)
> 2351 | 0 | db1 | (0,21)
> 3590 | 0 | db3 | (0,22)
> 3592 | 0 | db2 | (0,23)
> 12594 | 0 | db4 | (0,24)
>
> I tried to delete db1 with xmax 0 and db2 with xmax 2 (the opposite of
> what you suggested with the ctid). Now, I thought that if the xmax is set
> to 0 it means that I "didnt have" any update / delete operations and
> therefore I should delete those databases. Does it matters which one to
> delete ? I read about the xmin/xmax/ctid columns but I thought that what I
> did was legit.
>
> Moreover I realized that I have duplicated rows in more system tables so i
> don`t really like this solution.
>
> My next attempt will be to upgrade to 9.2.24.
>
> Thanks Tom ! ,
>
> Regards Mariel.
>
> ‫בתאריך יום א׳, 26 באוג׳ 2018 ב-18:51 מאת ‪Tom Lane‬‏ <‪tgl(at)sss(dot)pgh(dot)pa(dot)us
> ‬‏>:‬
>
>> Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com> writes:
>> > I'm trying to investigate a database of one of our clients. The database
>> > version is 9.2.5.
>>
>> 1. Fire their DBA for dereliction of duty.
>>
>> 2. Update to the last available release of 9.2.x (9.2.24, looks like).
>>
>> 3. Vacuum everything and see if it gets better.
>>
>> Vacuuming may or may not fix the observed data corruption problems, but
>> it's silly to ignore the fact that they're missing four years worth
>> of bug fixes in that branch. In particular I'm noticing the first
>> entry in the change notes for 9.2.6, which recommends "vacuuming all
>> tables in all databases while having vacuum_freeze_table_age set to zero":
>> https://www.postgresql.org/docs/9.2/static/release-9-2-6.html
>> That problem as-described seems to involve rows disappearing, rather than
>> getting duplicated, but I wouldn't rule out that it could also have
>> symptoms like this.
>>
>> If that doesn't fix things, you could then try:
>>
>> 4. Manually eliminate duplicate rows by identifying which one of each pair
>> seems older and deleting it with DELETE ... WHERE ctid = '(x,y)'. Then
>> reindex to confirm no duplicates remain.
>>
>> But you still need step 2, to reduce the odds the problem will recur.
>>
>> regards, tom lane
>>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mark Williams 2018-08-27 18:22:44 FW: Setting up SSL for postgre
Previous Message Julien Riou 2018-08-27 08:31:00 Re: ERROR: no known snapshots