Re: Orphan files filling root partition after crash

From: Sergey Fukanchik <fukanchik(at)gmail(dot)com>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Orphan files filling root partition after crash
Date: 2024-02-28 12:17:50
Message-ID: CAB+D6kVyqpFSgkVeS87ttg7rZ8gg+nk71cCUVeK1LSM-fJV6qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Dimitrios,
Do you have wal archiving enabled?
$PGDATA/pg_wal/ is a bit different from tablespaces (including
"default" one). It stores transaction journal.
Instructions are here:
https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-ARCHIVING-WAL
Some more info here:
https://www.postgresql.org/docs/current/wal-intro.html and here
https://www.postgresql.org/docs/current/wal-configuration.html
---
Sergey

On Wed, 28 Feb 2024 at 14:18, Dimitrios Apostolou <jimis(at)gmx(dot)net> wrote:
>
> Hello list,
>
> yesterday I was doing:
>
> ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;
>
> The table is almost a billion rows long but lies in its own TABLESPACE
> that has plenty of space. But apparently the ALTER TABLE command is
> writing a lot to the default tablespace (not the temp_tablespace, that is
> already moved to a different partition).
>
> That quickly filled up the 50GB free space in my root partition:
>
>
> 20:18:04.222 UTC [94144] PANIC: could not write to file "pg_wal/xlogtemp.94144": No space left on device
> [...]
> 20:19:11.578 UTC [94140] LOG: WAL writer process (PID 94144) was terminated by signal 6: Aborted
> 20:19:11.578 UTC [94140] LOG: terminating any other active server processes
>
>
> After postgresql crashed and restarted, the disk space in the root
> partition was still not freed! I believe this is because of "orphaned
> files" as discussed in mailing list thread [1].
>
> [1] https://www.postgresql.org/message-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG%3DoNJ%2B%3DsxLQew%40mail.gmail.com
>
> I ended up doing some risky actions to remediate the problem: Find the
> filenames that have no identically named "oid" in pg_class, and delete
> (move to backup) the biggest ones while the database is stopped.
> Fortunately the database started up fine after that!
>
> So what is the moral of the story? How to guard against this?
>
> Why did the database write so much to the default tablespace, even when
> both the table and the temp tablespace are elsewhere? Also should one
> always keep the default tablespace away from the wal partition? (I guess
> it would have helped to avoid the crash, but the ALTER TABLE command would
> have still run out of space, so I'm not sure if the orphan files would
> have been avoided)?
>
> Needless to say, I would have hoped the database cleaned-up after itself
> even after an uncontrolled crash, or that it provided some tools for the
> job. (I tried VACUUM FULL on the table, but the orphaned files did not go
> away).
>
> My postgresql version is 16.2 installed on Ubuntu.
>
> Thank you,
> Dimitris
>
>
>

--
Sergey

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jason Long 2024-02-28 12:34:30 PostgreSQL Read-only mode usage
Previous Message Dimitrios Apostolou 2024-02-28 11:16:43 Orphan files filling root partition after crash