From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: finding changed blocks using WAL scanning |
Date: | 2019-04-15 20:31:14 |
Message-ID: | 20190415203114.pb4e2vgbtbhopcdw@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 10, 2019 at 08:11:11PM -0400, Robert Haas wrote:
> On Wed, Apr 10, 2019 at 5:49 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > There is one thing that does worry me about the file-per-LSN-range
> > approach, and that is memory consumption when trying to consume the
> > information. Suppose you have a really high velocity system. I don't
> > know exactly what the busiest systems around are doing in terms of
> > data churn these days, but let's say just for kicks that we are
> > dirtying 100GB/hour. That means, roughly 12.5 million block
> > references per hour. If each block reference takes 12 bytes, that's
> > maybe 150MB/hour in block reference files. If you run a daily
> > incremental backup, you've got to load all the block references for
> > the last 24 hours and deduplicate them, which means you're going to
> > need about 3.6GB of memory. If you run a weekly incremental backup,
> > you're going to need about 25GB of memory. That is not ideal. One
> > can keep the memory consumption to a more reasonable level by using
> > temporary files. For instance, say you realize you're going to need
> > 25GB of memory to store all the block references you have, but you
> > only have 1GB of memory that you're allowed to use. Well, just
> > hash-partition the data 32 ways by dboid/tsoid/relfilenode/segno,
> > writing each batch to a separate temporary file, and then process each
> > of those 32 files separately. That does add some additional I/O, but
> > it's not crazily complicated and doesn't seem too terrible, at least
> > to me. Still, it's something not to like.
>
> Oh, I'm being dumb. We should just have the process that writes out
> these files sort the records first. Then when we read them back in to
> use them, we can just do a merge pass like MergeAppend would do. Then
> you never need very much memory at all.
Can I throw out a simple idea? What if, when we finish writing a WAL
file, we create a new file 000000010000000000000001.modblock which
lists all the heap/index files and block numbers modified in that WAL
file? How much does that help with the list I posted earlier?
I think there is some interesting complexity brought up in this thread.
Which options are going to minimize storage I/O, network I/O, have only
background overhead, allow parallel operation, integrate with
pg_basebackup. Eventually we will need to evaluate the incremental
backup options against these criteria.
I am thinking tools could retain modblock files along with WAL, could
pull full-page-writes from WAL, or from PGDATA. It avoids the need to
scan 16MB WAL files, and the WAL files and modblock files could be
expired independently.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-04-15 20:47:46 | Re: COLLATE: Hash partition vs UPDATE |
Previous Message | Tomas Vondra | 2019-04-15 20:17:09 | Re: Zedstore - compressed in-core columnar storage |