From: | Dennis Jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: "soft lockup" in kernel |
Date: | 2013-07-15 19:40:23 |
Message-ID: | CAAEzAp9uqkzA1ztKsfDa1CND5fGTwLqU+hdeAVAYWfEg+ahSXg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Stuart,
I'm simply curious - did you resolve your issue? What NAS
(vendor/model/config) are you using?
On Fri, Jul 5, 2013 at 11:31 AM, Dennis Jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com
> wrote:
>
> On Fri, Jul 5, 2013 at 8:58 AM, Stuart Ford <stuart(dot)ford(at)glide(dot)uk(dot)com>wrote:
>
>> On Fri, Jul 5, 2013 at 7:00 AM, Dennis Jenkins wrpte
>>
>
>
>> No. iSCSI traffic between the VMWare hosts and the SAN uses completely
>>
>> separate NICs and different switches to the "production" LAN.
>> I've had a look at the task activity in VCEnter and found these two events
>>
>
>
>> at almost the same time as the kernel messages. In both cases the start
>> time (the first time below) is 5-6 seconds after the kernel message, and
>> I've seen that the clock on the Postgres VM and the VCenter server, at
>> least, are in sync (it may not, of course, be the VCenter server's clock
>> that these logs get the time from).
>>
>> Remove snapshot
>> GLIBMPDB001_replica
>> Completed
>> GLIDE\Svc_vcenter
>> 05/07/2013 11:58:41
>> 05/07/2013 11:58:41
>> 05/07/2013 11:59:03
>>
>> Remove snapshot
>> GLIBMPDB001_replica
>> Completed
>> GLIDE\Svc_vcenter
>> 05/07/2013 10:11:10
>> 05/07/2013 10:11:10
>> 05/07/2013 10:11:23
>>
>>
> I would not blame Veeam.
>
> I suspect that when a snapshot is deleted that all iSCSI activity either
> halts or slows SIGNIFICANTLY. This depends on your NAS.
>
> I've seen an Oracle 7320 ZFS Storage appliance, misconfigured to use
> RAID-Z2 (raid6) to store terabytes of essentially random-access data pause
> for minutes when deleting a snapshot containing a few dozen gigabytes.
> (the snapshot deletion kernel threads get IO priority over "nfsd" file
> IO). This causes enough latency to VMWare (over NFS), that VMWare gave up
> on the IO and returned a generic SCSI error to the guests. Linux guests
> will semi-panic and remount their file-systems read-only. FreeBSD will
> just freak out, panic and reboot. The flaw here was using the wrong raid
> type (since replaced with triple-parity raid-10 and is working great).
>
> What NAS are you using?
>
> How busy are its disks when deleting a snapshot?
>
> What is the RAID type under the hood?
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | shankar.kotamarthy@gmail.com | 2013-07-16 05:35:41 | Re: pg_upgrade could not create catalog dump while upgrading from 9.0 to 9.2 |
Previous Message | David Kerr | 2013-07-15 19:18:25 | Re: Build RPM from Postgres Source |