From: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> |
---|---|
To: | Adrian Moisey <adrian(at)careerjunction(dot)co(dot)za> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: db size |
Date: | 2008-04-14 08:37:54 |
Message-ID: | 480317E2.1050900@postnewspapers.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Adrian Moisey wrote:
> Hi
>
> We currently have a 16CPU 32GB box running postgres 8.2.
>
> When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
> UTF8 -F c -b" I get a file of 14GB in size.
>
> But the database is 110GB in size on the disk. Why the big difference
> in size? Does this have anything to do with performance?
Reasons:
You're using a compact format designed to limit size and provide fast
dump/restore. The database, by contrast, is designed for fast access.
The database can contain "dead space" that hasn't been reclaimed by a
VACUUM. It can also have space allocated that it doesn't need, which you
can reclaim with VACUUM FULL. This dead space can really add up, but
it's the price of fast updates, inserts and deletes.
Your indexes take up disk space in the database, but are not dumped and
do not take up space in the dump file. Indexes can get very large
especially if you have lots of multi-column indexes.
I'm told that under certain loads indexes can grow full of mostly empty
pages, and a REINDEX every now and then can be useful to shrink them -
see "\h reindex" in psql. That won't affect your dump sizes as indexes
aren't dumped, but will affect the database size.
You can examine index (and relation) sizes using a query like:
select * from pg_class order by relpages desc
Data in the database is either not compressed, or (for larger fields) is
compressed with an algorithm that's very fast but doesn't achieve high
levels of compression. By contrast, the dumps are quite efficiently
compressed.
One of my database clusters is 571MB on disk at the moment, just after
being dropped, recreated, and populated from another data source. The
repopulation process is quite complex. I found that running VACUUM FULL
followed by REINDEX DATABASE dbname knocked 50MB off the database size,
pushing it down to 521MB. That's on a basically brand new DB. Note,
however, that 130MB of that space is in pg_xlog, and much of it will be
wasted as the DB has been under very light load but uses large xlogs
because it needs to perform well under huge load spikes. The size of the
`base' directory (the "real data", indexes, etc) is only 392MB.
If I dump that database using the same options you dumped yours with, I
end up with a hilariously small 29MB dump file. That's less than 10% of
the size of the main DB. The difference will be entirely due to
compression, a more compact storage layout in the dump files, and to the
lack of index data in the dumps. The database has quite a few indexes,
some of which are multicolumn indexes on tables with large numbers of
tuples, so that bloats the "live" version a lot.
--
Craig Ringer
From | Date | Subject | |
---|---|---|---|
Next Message | Gaetano Mendola | 2008-04-14 09:13:05 | shared_buffers performance |
Previous Message | Richard Huxton | 2008-04-14 07:14:01 | Re: varchar index joins not working? |