From: | Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: File based Incremental backup v8 |
Date: | 2015-03-07 15:20:17 |
Message-ID: | 54FB1731.9030305@2ndquadrant.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Il 05/03/15 05:42, Bruce Momjian ha scritto:
> On Thu, Mar 5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
>>>> Yeah, it might make the situation better than today. But I'm afraid that
>>>> many users might get disappointed about that behavior of an incremental
>>>> backup after the release...
>>>
>>> I don't get what do you mean here. Can you elaborate this point?
>>
>> The proposed version of LSN-based incremental backup has some limitations
>> (e.g., every database files need to be read even when there is no modification
>> in database since last backup, and which may make the backup time longer than
>> users expect) which may disappoint users. So I'm afraid that users who can
>> benefit from the feature might be very limited. IOW, I'm just sticking to
>> the idea of timestamp-based one :) But I should drop it if the majority in
>> the list prefers the LSN-based one even if it has such limitations.
>
> We need numbers on how effective each level of tracking will be. Until
> then, the patch can't move forward.
>
I've written a little test script to estimate how much space can be saved by file level incremental, and I've run it on some real backups I have access to.
The script takes two basebackup directory and simulate how much data can be saved in the 2nd backup using incremental backup (using file size/time and LSN)
It assumes that every file in base, global and pg_tblspc which matches both size and modification time will also match from the LSN point of view.
The result is that many databases can take advantage of incremental, even if not do big, and considering LSNs yield a result almost identical to the approach based on filesystem metadata.
== Very big geographic database (similar to openstreetmap main DB), it contains versioned data, interval two months
First backup size: 13286623850656 (12.1TiB)
Second backup size: 13323511925626 (12.1TiB)
Matching files count: 17094
Matching LSN count: 14580
Matching files size: 9129755116499 (8.3TiB, 68.5%)
Matching LSN size: 9128568799332 (8.3TiB, 68.5%)
== Big on-line store database, old data regularly moved to historic partitions, interval one day
First backup size: 1355285058842 (1.2TiB)
Second backup size: 1358389467239 (1.2TiB)
Matching files count: 3937
Matching LSN count: 2821
Matching files size: 762292960220 (709.9GiB, 56.1%)
Matching LSN size: 762122543668 (709.8GiB, 56.1%)
== Ticketing system database, interval one day
First backup size: 144988275 (138.3MiB)
Second backup size: 146135155 (139.4MiB)
Matching files count: 3124
Matching LSN count: 2641
Matching files size: 76908986 (73.3MiB, 52.6%)
Matching LSN size: 67747928 (64.6MiB, 46.4%)
== Online store, interval one day
First backup size: 20418561133 (19.0GiB)
Second backup size: 20475290733 (19.1GiB)
Matching files count: 5744
Matching LSN count: 4302
Matching files size: 4432709876 (4.1GiB, 21.6%)
Matching LSN size: 4388993884 (4.1GiB, 21.4%)
== Heavily updated database, interval one week
First backup size: 3203198962 (3.0GiB)
Second backup size: 3222409202 (3.0GiB)
Matching files count: 1801
Matching LSN count: 1273
Matching files size: 91206317 (87.0MiB, 2.8%)
Matching LSN size: 69083532 (65.9MiB, 2.1%)
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it
Attachment | Content-Type | Size |
---|---|---|
estimate_incremental.py | text/x-python-script | 2.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Jelinek | 2015-03-07 17:07:27 | Re: TABLESAMPLE patch |
Previous Message | Stephen Frost | 2015-03-07 14:24:41 | Re: CATUPDATE confusion? |