Re: Unexpectedly high disk space usage

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Lists <lists(at)benjamindsmith(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Postgresql Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unexpectedly high disk space usage
Date: 2012-11-07 17:01:56
Message-ID: CAMkU=1yJijYk0viSqpBNmVJNAY8eTCJt-ZBR5dDpPrKYvWCXvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Nov 6, 2012 at 11:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> On Tue, Nov 6, 2012 at 10:49 AM, Lists <lists(at)benjamindsmith(dot)com> wrote:
>>> I followed your example, the result is at the bottom. Based on this it would
>>> seem that there are 3-4 databases that seem to be the culprit. How could I
>>> get more depth/detail on what specifically is the problem?
>
>> If you have installed the contrib modules (oid2name specifically), you
>> can use that to get the name of the bloated database:
>> oid2name | fgrep 607471
>
> Or, if you didn't install contrib, try
>
> select datname from pg_database where oid = 607471

Thanks, I knew there had to be a more direct way to do that.

>
>> If the name of the database doesn't give you any insight, then look
>> for large files in the directory base/607471 that whose names all
>> start with the same digits and use oid2name to get the names of the
>> relations for those files.
>
>> oid2name -d <name of database> -o <base name of large files>
>
> For this you can try
>
> select relname from pg_class where relfilenode = <whatever>
>
> Or let the database do the work:
>
> select relname, pg_relation_size(oid) from pg_class order by 2 desc;

Ben described using something like this method originally and not
finding the space, so I wanted to work backwards from certain
knowledge of where the OS says the space is being used.

But now I think maybe his scripts to aggregate table sizes over all
databases (and also his script to load pg_dumps of those databases
into a new cluster) are accidentally omitting some databases--the
largest ones.

Is there a simple query for a super-user to get a list of all relation
sizes over all databases cluster-wide?

If "\l+" can get the size of databases other than the one currently
connected to, maybe there is a way to extend that to tables in those
other databases.

It would at least be nice to be able to get the sizes of all
databases. Since '\l+' doesn't sort by size and I don't know how to
make it do so, I pulled the query from psql source code and modified
it:

SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first;

(And discovered a long forgotten unused database I had sitting around
taking up space)

Ben, did you ever figure out where the space was going?

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Raymond O'Donnell 2012-11-07 18:11:51 Re: How to verify pg_dump files
Previous Message Tom Lane 2012-11-07 14:55:27 Re: How to evaluate "explain analyze" correctly soon after "explain" for the same statement ?