Re: Determining size of a database before dumping

From: Alexander Staubo <alex(at)purefiction(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Madison Kelly <linux(at)alteeve(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Determining size of a database before dumping
Date: 2006-10-02 22:42:31
Message-ID: F3044600-A99E-4A32-BE3B-4063EB8A5DD8@purefiction.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Oct 2, 2006, at 23:19 , Tom Lane wrote:

> Alexander Staubo <alex(at)purefiction(dot)net> writes:
>> You could count the disk space usage of the actual stored tuples,
>> though this will necessarily be inexact:
>> http://www.postgresql.org/docs/8.1/static/diskusage.html
>> Or you could count the size of the physical database files (/var/lib/
>> postgresql or wherever). While these would be estimates, you could at
>> least guarantee that the dump would not *exceed* the esimtate.
>
> You could guarantee no such thing; consider compression of TOAST
> values.
> Even for uncompressed data, datatypes such as int and float can easily
> print as more bytes than they occupy on-disk.

Why does pg_dump serialize data less efficiently than PostgreSQL when
using the "custom" format? (Pg_dump arguably has greater freedom in
being able to apply space-saving optimizations to the output format.
For example, one could use table statistics to selectively apply
something like Rice coding for numeric data, or vertically decompose
the tuples and emit sorted vectors using delta compression.) As for
TOAST, should not pg_dump's compression compress just as well, or
better?

Alexander.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2006-10-02 22:52:31 Re: Determining size of a database before dumping
Previous Message Aaron Glenn 2006-10-02 21:53:41 Experiences with 3PAR