Re: WAL prefetch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-19 16:44:15
Message-ID: 20180619164415.ta6q47vwvyzcjwjo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-06-19 19:34:22 +0300, Konstantin Knizhnik wrote:
> On 19.06.2018 18:50, Andres Freund wrote:
> > On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote:
> > > I do not think that prefetching in shared buffers requires much more efforts
> > > and make patch more envasive...
> > > It even somehow simplify it, because there is no to maintain own cache of
> > > prefetched pages...
> > > But it will definitely have much more impact on Postgres performance:
> > > contention for buffer locks, throwing away pages accessed by read-only
> > > queries,...
> > These arguments seem bogus to me. Otherwise the startup process is going
> > to do that work.
>
> There is just one process replaying WAL. Certainly it has some impact on hot
> standby query execution.
> But if there will be several prefetch workers (128???) then this impact will
> be dramatically increased.

Hence me suggesting how you can do that with one process (re locking). I
still entirely fail to see how "throwing away pages accessed by
read-only queries" is meaningful here - the startup process is going to
read the data anyway, and we *do not* want to use a ringbuffer as that'd
make the situation dramatically worse.

> Well, originally it was proposed by Sean - the author of pg-prefaulter. I
> just ported it from GO to C using standard PostgreSQL WAL iterator.
> Then I performed some measurements and didn't find some dramatic improvement
> in performance (in case of synchronous replication) or reducing replication
> lag for asynchronous replication neither at my desktop (SSD, 16Gb RAM, local
> replication within same computer, pgbench scale 1000), neither at pair of
> two powerful servers connected by
> InfiniBand and 3Tb NVME (pgbench with scale 100000).
> Also I noticed that read rate at replica is almost zero.

> What does it mean:
> 1. I am doing something wrong.
> 2. posix_prefetch is not so efficient.
> 3. pgbench is not right workload to demonstrate effect of prefetch.
> 4. Hardware which I am using is not typical.

I think it's probably largely a mix of 3 and 4. pgbench with random
distribution probably indeed is a bad testcase, because either
everything is in cache or just about every write ends up as a full page
write because of the scale. You might want to try a) turn of full page
writes b) use a less random distribution.

> So it make me think when such prefetch may be needed... And it caused new
> questions:
> I wonder how frequently checkpoint interval is much larger than OS
> cache?

Extremely common.

> If we enforce full pages writes (let's say each after each 1Gb), how it
> affect wal size and performance?

Extremely badly. If you look at stats of production servers (using
pg_waldump) you can see that large percentage of the total WAL volume is
FPWs, that FPWs are a storage / bandwidth / write issue, and that higher
FPW rates after a checkpoint correlate strongly negatively with performance.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-06-19 16:48:23 Re: WAL prefetch
Previous Message Tomas Vondra 2018-06-19 16:41:24 Re: WAL prefetch