From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Alexander Staubo <alex(at)purefiction(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Madison Kelly <linux(at)alteeve(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Determining size of a database before dumping |
Date: | 2006-10-02 23:12:29 |
Message-ID: | 1159830749.25557.41.camel@dogma.v10.wvs |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 2006-10-03 at 00:42 +0200, Alexander Staubo wrote:
> Why does pg_dump serialize data less efficiently than PostgreSQL when
> using the "custom" format? (Pg_dump arguably has greater freedom in
> being able to apply space-saving optimizations to the output format.
> For example, one could use table statistics to selectively apply
> something like Rice coding for numeric data, or vertically decompose
> the tuples and emit sorted vectors using delta compression.) As for
> TOAST, should not pg_dump's compression compress just as well, or
> better?
It would be a strange set of data that had a larger representation as a
compressed pg_dump than the data directory itself. However, one could
imagine a contrived case where that might happen.
Let's say you had a single table with 10,000 columns of type INT4, 100M
records, all with random numbers in the columns. I don't think standard
gzip compression will compress random INT4s down to 32 bits.
Another example is NULLs. What if only a few of those records had non-
NULL values? If I understand correctly, PostgreSQL will represent those
NULLs with just one bit.
What you're saying is more theoretical. If pg_dump used specialized
compression based on the data type of the columns, and everything was
optimal, you're correct. There's no situation in which the dump *must*
be bigger. However, since there is no practical demand for such
compression, and it would be a lot of work, there is no *guarantee* that
the data directory will be bigger. However, it probably is.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-10-03 00:05:17 | Re: Determining size of a database before dumping |
Previous Message | Ron Johnson | 2006-10-02 22:52:31 | Re: Determining size of a database before dumping |