Re: measuring lwlock-related latency spikes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: measuring lwlock-related latency spikes
Date: 2012-04-02 21:13:00
Message-ID: CAMkU=1xa9DBbscMUVyD+2KNAFLAAmTO+1dqQwPzJvei97SA7YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 2, 2012 at 12:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Long story short, when a CLOG-related stall happens,
>> essentially all the time is being spent in this here section of code:
>
>>     /*
>>      * If not part of Flush, need to fsync now.  We assume this happens
>>      * infrequently enough that it's not a performance issue.
>>      */
>>     if (!fdata) // fsync and close the file
>
> Seems like basically what you've proven is that this code path *is* a
> performance issue, and that we need to think a bit harder about how to
> avoid doing the fsync while holding locks.

And why is the fsync needed at all upon merely evicting a dirty page
so a replacement can be loaded?

If the system crashes between the write and the (eventual) fsync, you
are in the same position as if the system crashed while the page was
dirty in shared memory. Either way, you have to be able to recreate
it from WAL, right?

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-02 21:25:46 Re: measuring lwlock-related latency spikes
Previous Message Greg Sabino Mullane 2012-04-02 20:09:32 Re: libxml related crash on git head