Re: Increased size of database dump even though LESS consumed storage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thorsten Schöning <tschoening(at)am-soft(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Increased size of database dump even though LESS consumed storage
Date: 2021-02-09 15:05:05
Message-ID: 861906.1612883105@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

=?utf-8?B?VGhvcnN0ZW4gU2Now7ZuaW5n?=<tschoening(at)am-soft(dot)de> writes:
> for various reasons I've migrated my database schema from 4 tables
> with some additional indexes to keep integrity and stuff to 1 table
> only. That made some of the former used indexes obsolete and resulted
> in overall less consumed storage:
> The old schema consumed ~42 GiB, while the new is ~16 GiB without the
> formerly available indexes and ~25 GiB with the same logical indexes.
> Though, a created dump of the new schema has increased from ~5,52 GiB
> to 6,38 GiB. Of course I'm using the same settings to create both
> dumps:
>> pg_dump.exe "--username=%USERNAME%" "--encoding=UTF-8" "--compress=9" "--format=c" "--dbname=%DB_NAME%" > "%DMP_PATH%"
> My expectation was that the dump would be smaller as well, because the
> data itself is the same, while lots of duplicate IDs, obsolete indexes
> etc. in not available tables anymore have been removed.

Removing indexes won't in itself make any noticeable difference in the
size of pg_dump output, since an index is just represented by a CREATE
INDEX (or equivalent) command.

My guess is that the rearrangement somehow made the table data less
amenable to compression. gzip depends on finding similar substrings
within a fairly narrow window (a few KB), so at least in principle,
just changing the order of rows could make a difference. I'd sort
of expect compression opportunity losses to more or less balance out
with opportunity gains over such a large data volume, but maybe you
were unlucky. Or perhaps the move into a single table was done in
such a way that it actually decreased locality-of-reference, eg maybe
similar rows were grouped before and now they're not.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2021-02-09 15:10:24 Re: How to I select value of GUC that has - in its name?
Previous Message Edward Macnaghten 2021-02-09 14:49:19 Re: How to I select value of GUC that has - in its name?