Re: Increased size of database dump even though LESS consumed storage

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Thorsten Schöning <tschoening(at)am-soft(dot)de>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Increased size of database dump even though LESS consumed storage
Date: 2021-02-10 09:38:51
Message-ID: CA+bJJby3Senj3yYGW0j8rYTScf=AmGzL_yGvQMWrfz=AYApo=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thorsten:

On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning <tschoening(at)am-soft(dot)de> wrote:
...
> I've changed the new table "datagram" to be a partitioned one with
> partitions containing rows per year and per half-year. Each partition
> contains far less rows than before this way and while I only tested
> with "--compress=9" this time, the numbers are quite interesting:
>
> > unpartitioned: 6,4 GiB
> > half-yearly parts: 4,8 GiB
> > yearly parts: 4,8 GiB
>
> The interesting part this time is that the table layout for all
> partitions is the same like before, only the number of rows per table
> is different. Though, the number of rows overall is the same like
> before, the same data, IDs etc. Though, this time the dump really is
> smaller than with the OLD schema containing far more data because of
> duplicate IDs and stuff.
> I wouldn't have expected table layout to be that important.

Compresion is dependent on detectable redundancy on the input. pg_dump
more or less gzips per-table "copy to stdout" dumps. If your data
evolves in a determined way having it sorted by ingestion time may
increase detectable redundancy a lot, and partitioning sorts partially
by date ( or fully if you have made the partitions by range-querying
via index scan ). In this case it may not be the layout, but the
order.

Given you seem to be able to test, you may try sorting the full table
by the column you use for partitioning. IIRC cluster will do the trick
if it is indexed.

( This has happened to me compressing document dumps, presorting by
some chosen fields improved my compression ratio a bit, IIRC it was
due to the compressor searching for duplicates on a limited window
only, this is why I use "detectable" redundancy )

Francisco Olarte.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message cen 2021-02-10 09:42:02 Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased
Previous Message Daniele Varrazzo 2021-02-10 09:29:07 Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased