Re: Strange query stalls on replica in 9.3.9

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Strange query stalls on replica in 9.3.9
Date: 2015-08-14 16:54:05
Message-ID: CAMkU=1wTUB5OabX2ntOj3=AW-FXHay_rWY8C7Z1cRw+1kBjxhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

>
> On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the
>
> > Once the commit of the whole-table update has replayed, the problem
> > should go way instantly because at that point each backend doing the
> > seqscan will find the the transaction has committed and so will set the
> > hint bit that means all of the other seqscan backends that come after it
> > can skip the proc array scan for that tuple.
>
> Yes ... and given that the commit on the master took < 3 seconds, it's
> not likely to take 30 seconds on the replica. That aside, the pattern
> of behavior does look similar to the planner issue.
>

Another thought. Who actually sets the hint bits on a replica?

Do the read-only processes on the replica which discovers a tuple to have
been securely committed set the hint bits?

My benchmarking suggests not.

Or does it wait for the hint bits to get set on master, and then for a
checkpoint to occur on the master, and then for that page to get changed
again and FPW to the log, and then for the log to get replayed? If so,
that explains why the issue doesn't clear up on the replica immediately
after the commit gets replayed.

>
> > So perhaps the commit of the whole-table update is delayed because the
> > startup process as also getting bogged down on the same contended lock?
> > I don't know how hard WAL replay hits the proc array lock.
>
> I don't know; we don't have any visibility into the replay process, and
> no way to tell if replay is waiting on some kind of lock. A regular
> UPDATE should not block against any select activity on the replay, though.
>
> Also, why would this affect *only* the query which does seq scans? Is
> there some difference between seqscan and index scan here, or is it
> simply because they take longer, and since this issue is timing-based,
> they're more likely to be hit?

An index scan only has to check the commit status of rows which meet the
index quals, which is presumably a small fraction of the rows.

A seq scan checks the visibility of every row first, before checking the
where clause.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2015-08-18 17:44:41 Re: Strange query stalls on replica in 9.3.9
Previous Message Josh Berkus 2015-08-14 16:34:32 Re: Strange query stalls on replica in 9.3.9