Re: finding changed blocks using WAL scanning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-20 04:17:01
Message-ID: CA+TgmobjopcytHN35czB9PG1vqwHcW3mwzoTwF7HMVdH+7WU9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 19, 2019 at 8:39 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> While I do think we should at least be thinking about the load caused
> from scanning the WAL to generate a list of blocks that are changed, the
> load I was more concerned with in the other thread is the effort
> required to actually merge all of those changes together over a large
> amount of WAL. I'm also not saying that we couldn't have either of
> those pieces done as a background worker, just that it'd be really nice
> to have an external tool (or library) that can be used on an independent
> system to do that work.

Oh. Well, I already explained my algorithm for doing that upthread,
which I believe would be quite cheap.

1. When you generate the .modblock files, stick all the block
references into a buffer. qsort(). Dedup. Write out in sorted
order.

2. When you want to use a bunch of .modblock files, do the same thing
MergeAppend does, or what merge-sort does when it does a merge pass.
Read the first 1MB of each file (or whatever amount). Repeatedly pull
an item from whichever file has the lowest remaining value, using a
binary heap. When no buffered data remains for a particular file,
read another chunk from that file.

If each .modblock file covers 1GB of WAL, you could the data from
across 1TB of WAL using only 1GB of memory, and that's assuming you
have a 1MB buffer for each .modblock file. You probably don't need
such a large buffer. If you use, say, a 128kB buffer, you could merge
the data from across 8TB of WAL using 1GB of memory. And if you have
8TB of WAL and you can't spare 1GB for the task of computing which
blocks need to be included in your incremental backup, it's time for a
hardware upgrade.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-04-20 04:19:51 Re: block-level incremental backup
Previous Message Robert Haas 2019-04-20 04:09:42 Re: finding changed blocks using WAL scanning