Re: filesystem full during vacuum - space recovery issues

From: Thomas Simpson <ts(at)talentstack(dot)to>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: filesystem full during vacuum - space recovery issues
Date: 2024-07-18 18:59:35
Message-ID: d4cdd945-a48b-41e9-8c25-1b198d86b91e@talentstack.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers


On 18-Jul-2024 11:19, Paul Smith* wrote:
> On 15/07/2024 19:47, Thomas Simpson wrote:
>>
>> My problem now is how do I get this space back to return my free
>> space back to where it should be?
>>
>> I tried some scripts to map the data files to relations but this
>> didn't work as removing some files led to startup failure despite
>> them appearing to be unrelated to anything in the database - I had to
>> put them back and then startup worked.
>>
> I don't know what you tried to do
>
> What would normally happen on a failed VACUUM FULL that fills up the
> disk so the server crashes is that there are loads of data files
> containing the partially rebuilt table. Nothing 'internal' to
> PostgreSQL will point to those files as the internal pointers all
> change to the new table in an ACID way, so you should be able to
> delete them.
>
> You can usually find these relatively easily by looking in the
> relevant tablespace directory for the base filename for a new huge
> table (lots and lots of files with the same base name - eg looking for
> files called *.1000 will find you base filenames for relations over
> about 1TB) and checking to see if pg_filenode_relation() can't turn
> the filenode into a relation. If that's the case that they're not
> currently in use for a relation, then you should be able to just
> delete all those files
>
> Is this what you tried, or did your 'script to map data files to
> relations' do something else? You were a bit ambiguous about that part
> of things.
>
[BTW, v9.6 which I know is old but this server is stuck there]

Yes, I was querying relfilenode from pg_class to get the filename
(integer) and then comparing a directory listing for files which did not
match the relfilenode as candidates to remove.

I moved these elsewhere (i.e. not delete, just move out the way so I
could move them back in case of trouble).

Without these apparently unrelated files, the database did not start and
complained about them being missing, so I had to put them back.  This
was despite not finding any reference to the filename/number in pg_class.

At that point I gave up since I cannot afford to make the problem worse!

I know I'm stuck with the slow rebuild at this point.  However, I doubt
I am the only person in the world that needs to dump and reload a large
database.  My thought is this is a weak point for PostgreSQL so it makes
sense to consider ways to improve the dump reload process, especially as
it's the last-resort upgrade path recommended in the upgrade guide and
the general fail-safe route to get out of trouble.

Thanks

Tom

> Paul
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ron Johnson 2024-07-18 20:32:49 Re: filesystem full during vacuum - space recovery issues
Previous Message Muhammad Ikram 2024-07-18 16:09:31 Re: Seeking Insights on Choosing the Right CPU and RAM for PostgreSQL Setup

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2024-07-18 19:18:31 Re: Seq scan instead of index scan querying single row from primary key on large table
Previous Message Floris Van Nee 2024-07-18 18:55:25 RE: temp table on commit delete rows performance issue