Re: "soft lockup" in kernel

From: Dennis Jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: "soft lockup" in kernel
Date: 2013-07-15 19:40:23
Message-ID: CAAEzAp9uqkzA1ztKsfDa1CND5fGTwLqU+hdeAVAYWfEg+ahSXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Stuart,

I'm simply curious - did you resolve your issue? What NAS
(vendor/model/config) are you using?

On Fri, Jul 5, 2013 at 11:31 AM, Dennis Jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com
> wrote:

>
> On Fri, Jul 5, 2013 at 8:58 AM, Stuart Ford <stuart(dot)ford(at)glide(dot)uk(dot)com>wrote:
>
>> On Fri, Jul 5, 2013 at 7:00 AM, Dennis Jenkins wrpte
>>
>
>
>> No. iSCSI traffic between the VMWare hosts and the SAN uses completely
>>
>> separate NICs and different switches to the "production" LAN.
>> I've had a look at the task activity in VCEnter and found these two events
>>
>
>
>> at almost the same time as the kernel messages. In both cases the start
>> time (the first time below) is 5-6 seconds after the kernel message, and
>> I've seen that the clock on the Postgres VM and the VCenter server, at
>> least, are in sync (it may not, of course, be the VCenter server's clock
>> that these logs get the time from).
>>
>> Remove snapshot
>> GLIBMPDB001_replica
>> Completed
>> GLIDE\Svc_vcenter
>> 05/07/2013 11:58:41
>> 05/07/2013 11:58:41
>> 05/07/2013 11:59:03
>>
>> Remove snapshot
>> GLIBMPDB001_replica
>> Completed
>> GLIDE\Svc_vcenter
>> 05/07/2013 10:11:10
>> 05/07/2013 10:11:10
>> 05/07/2013 10:11:23
>>
>>
> I would not blame Veeam.
>
> I suspect that when a snapshot is deleted that all iSCSI activity either
> halts or slows SIGNIFICANTLY. This depends on your NAS.
>
> I've seen an Oracle 7320 ZFS Storage appliance, misconfigured to use
> RAID-Z2 (raid6) to store terabytes of essentially random-access data pause
> for minutes when deleting a snapshot containing a few dozen gigabytes.
> (the snapshot deletion kernel threads get IO priority over "nfsd" file
> IO). This causes enough latency to VMWare (over NFS), that VMWare gave up
> on the IO and returned a generic SCSI error to the guests. Linux guests
> will semi-panic and remount their file-systems read-only. FreeBSD will
> just freak out, panic and reboot. The flaw here was using the wrong raid
> type (since replaced with triple-parity raid-10 and is working great).
>
> What NAS are you using?
>
> How busy are its disks when deleting a snapshot?
>
> What is the RAID type under the hood?
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message shankar.kotamarthy@gmail.com 2013-07-16 05:35:41 Re: pg_upgrade could not create catalog dump while upgrading from 9.0 to 9.2
Previous Message David Kerr 2013-07-15 19:18:25 Re: Build RPM from Postgres Source