From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
Subject: | Re: should crash recovery ignore checkpoint_flush_after ? |
Date: | 2020-01-18 20:52:21 |
Message-ID: | CA+hUKGLSx52vsSkEMN68hTf=ZKp_CJ0JuaduQXNG7L4RF9Ameg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jan 19, 2020 at 3:08 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> As I understand, the first thing that happens syncing every file in the data
> dir, like in initdb --sync. These instances were both 5+TB on zfs, with
> compression, so that's slow, but tolerable, and at least understandable, and
> with visible progress in ps.
>
> The 2nd stage replays WAL. strace show's it's occasionally running
> sync_file_range, and I think recovery might've been several times faster if
> we'd just dumped the data at the OS ASAP, fsync once per file. In fact, I've
> just kill -9 the recovery process and edited the config to disable this lest it
> spend all night in recovery.
Does sync_file_range() even do anything for non-mmap'd files on ZFS?
Non-mmap'd ZFS data is not in the Linux page cache, and I think
sync_file_range() works at that level. At a guess, there'd need to be
a new VFS file_operation so that ZFS could get a callback to handle
data in its ARC.
From | Date | Subject | |
---|---|---|---|
Next Message | Felipe Sateler | 2020-01-18 22:46:11 | Re: Possible performance regression with pg_dump of a large number of relations |
Previous Message | Peter Geoghegan | 2020-01-18 20:44:52 | Re: [HACKERS] Block level parallel vacuum |