Re: "soft lockup" in kernel

From: Dennis Jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: "soft lockup" in kernel
Date: 2013-07-05 16:31:37
Message-ID: CAAEzAp8Ym1PPvLsf4YdMZbX0PXLpFApFCjm6yvs47yRnPq6zJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jul 5, 2013 at 8:58 AM, Stuart Ford <stuart(dot)ford(at)glide(dot)uk(dot)com>wrote:

> On Fri, Jul 5, 2013 at 7:00 AM, Dennis Jenkins wrpte
>

> No. iSCSI traffic between the VMWare hosts and the SAN uses completely
> separate NICs and different switches to the "production" LAN.
> I've had a look at the task activity in VCEnter and found these two events
>

> at almost the same time as the kernel messages. In both cases the start
> time (the first time below) is 5-6 seconds after the kernel message, and
> I've seen that the clock on the Postgres VM and the VCenter server, at
> least, are in sync (it may not, of course, be the VCenter server's clock
> that these logs get the time from).
>
> Remove snapshot
> GLIBMPDB001_replica
> Completed
> GLIDE\Svc_vcenter
> 05/07/2013 11:58:41
> 05/07/2013 11:58:41
> 05/07/2013 11:59:03
>
> Remove snapshot
> GLIBMPDB001_replica
> Completed
> GLIDE\Svc_vcenter
> 05/07/2013 10:11:10
> 05/07/2013 10:11:10
> 05/07/2013 10:11:23
>
>
I would not blame Veeam.

I suspect that when a snapshot is deleted that all iSCSI activity either
halts or slows SIGNIFICANTLY. This depends on your NAS.

I've seen an Oracle 7320 ZFS Storage appliance, misconfigured to use
RAID-Z2 (raid6) to store terabytes of essentially random-access data pause
for minutes when deleting a snapshot containing a few dozen gigabytes.
(the snapshot deletion kernel threads get IO priority over "nfsd" file
IO). This causes enough latency to VMWare (over NFS), that VMWare gave up
on the IO and returned a generic SCSI error to the guests. Linux guests
will semi-panic and remount their file-systems read-only. FreeBSD will
just freak out, panic and reboot. The flaw here was using the wrong raid
type (since replaced with triple-parity raid-10 and is working great).

What NAS are you using?

How busy are its disks when deleting a snapshot?

What is the RAID type under the hood?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joe Van Dyk 2013-07-05 17:23:08 Re: Efficiency of materialized views refresh in 9.3
Previous Message howardnews@selestial.com 2013-07-05 16:29:03 Re: Problems installing 9.2 on Ubuntu 12.04