Re: should crash recovery ignore checkpoint_flush_after ?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Subject: Re: should crash recovery ignore checkpoint_flush_after ?
Date: 2020-01-18 20:52:21
Message-ID: CA+hUKGLSx52vsSkEMN68hTf=ZKp_CJ0JuaduQXNG7L4RF9Ameg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 19, 2020 at 3:08 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> As I understand, the first thing that happens syncing every file in the data
> dir, like in initdb --sync. These instances were both 5+TB on zfs, with
> compression, so that's slow, but tolerable, and at least understandable, and
> with visible progress in ps.
>
> The 2nd stage replays WAL. strace show's it's occasionally running
> sync_file_range, and I think recovery might've been several times faster if
> we'd just dumped the data at the OS ASAP, fsync once per file. In fact, I've
> just kill -9 the recovery process and edited the config to disable this lest it
> spend all night in recovery.

Does sync_file_range() even do anything for non-mmap'd files on ZFS?
Non-mmap'd ZFS data is not in the Linux page cache, and I think
sync_file_range() works at that level. At a guess, there'd need to be
a new VFS file_operation so that ZFS could get a callback to handle
data in its ARC.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Felipe Sateler 2020-01-18 22:46:11 Re: Possible performance regression with pg_dump of a large number of relations
Previous Message Peter Geoghegan 2020-01-18 20:44:52 Re: [HACKERS] Block level parallel vacuum