From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WALWriteLock contention |
Date: | 2015-05-18 17:57:17 |
Message-ID: | CAMkU=1w7nwz89FQWhbetDgOctjxOSBvRo0hDg+6mCSmCA4B1iA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>
> >
> > My goal there was to further improve group commit. When running pgbench
> > -j10 -c10, it was common to see fsyncs that alternated between flushing 1
> > transaction, and 9 transactions. Because the first one to the gate would
> go
> > through it and slam it on all the others, and it would take one fsync
> cycle
> > for it reopen.
>
> Hmm, yeah. I remember somewhat (Peter Geoghegan, I think) mentioning
> behavior like that before, but I had not made the connection to this
> issue at that time. This blog post is pretty depressing:
>
> http://oldblog.antirez.com/post/fsync-different-thread-useless.html
>
> It suggests that an fsync in progress blocks out not only other
> fsyncs, but other writes to the same file, which for our purposes is
> just awful. More Googling around reveals that this is apparently
> well-known to Linux kernel developers and that they don't seem excited
> about fixing it. :-(
>
I think they already did. I don't see the effect in ext4, even on a rather
old kernel like 2.6.32, using the code from the link above.
>
> <crazy-idea>I wonder if we could write WAL to two different files in
> alternation, so that we could be writing to one file which fsync-ing
> the other.</crazy-idea>
>
I thought the most promising things, once there were timers and sleeps with
resolution much better than centisecond, was to record the time at which
each fsync finished, and then sleep until "then + commit_delay". That way
you don't do any harm to the sleeper, as the write head is not positioned
to process the fsync until then anyway, and give other workers the chance
to get their commit records in.
But then I kind of lost interest, because anyone who cares very much about
commit performance will probably get a nonvolatile write cache, and
anything done would be too hardware/platform dependent.
Of course a BBU isn't magic, the kernel still has to spend time scrubbing
the buffer pool and sending the dirty ones to the disk/controller when it
gets an fsync, even if the confirmation does come back quickly. But it
still seems too hardware/platform dependent to find a general purpose
optimization.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2015-05-18 18:16:23 | Re: jsonb concatenate operator's semantics seem questionable |
Previous Message | Peter Geoghegan | 2015-05-18 17:43:25 | Re: jsonb concatenate operator's semantics seem questionable |