Re: Orphaned files in base/[oid]

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Orphaned files in base/[oid]
Date: 2017-08-14 18:18:50
Message-ID: CAN-RpxAshr3Nmsht7Bva6cgBXJdFg6amXeBetZ9L4hVq=-y+vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 14, 2017 at 6:33 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2017-08-14 14:12:22 +0200, Chris Travers wrote:
> > Problem:
> > The system this came up on is PostgreSQL 9.6.3 and has had repeated
> trouble
> > with disk space. Querying pg_database_size, as well as du on the
> > subdirectory of base/ show total usage to be around 3.8TB. Summing up
> the
> > size of the relations in pg_class though shows around 2.1TB.
> >
> > Initial troubleshooting found around 150 GB of space in pg_temp which had
> > never been cleared and was at least several days old. Restarting the
> > server cleared these up.
> >
> > Poking around the base/[oid] directory, I found a large number of files
> > which did not correspond with a pg_class entry. One of the apparent
> > relations was nearly 1TB in size.
> >
> > What I think happened:
> > I think various pg_temp/* and orphaned relation files (In base/[oid])
> were
> > created when PostgreSQL crashed due to running out of space in various
> > operations including creating materialised views.
> >
> > So my question is if there is a way we can safely clean these up on
> server
> > restart? If not does it make sense to try to create a utility that can
> > connect to PostgreSQL, seek out valid files, and delete the rest?
>
> I think the fix here is to call RemovePgTempFiles() during
> crash-restarts, instead of just full starts. The previously stated need
> to be able to inspect temp files after a crash can be less impactfully
> fulfilled with restart_after_crash = false.
>
> But that only clears temp files right?

I am less concerned about the temp files because a restart clears them.

The bigger issue I see are with the orphaned base files. It looks like
files in base/[oid] don't get cleaned up either if I read my output
correctly and it would explain why we saw 1.7TB of discrepancy between
relations and database size. Safety-wise it seems like the best way out of
that is a dump/restore but doing that with a 2.1TB database is annoying.

> Greetings,
>
> Andres Freund
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-14 18:32:11 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message David Fetter 2017-08-14 18:17:05 Re: shared memory based stat collector (was: Sharing record typmods between backends)