Opened 13 years ago
Last modified 3 years ago
#9975 closed defect
Virtual HDD becomes unavailable for guest after a Canceled write is logged — at Initial Version
Reported by: | Ariel | Owned by: | |
---|---|---|---|
Component: | virtual disk | Version: | VirtualBox 4.1.6 |
Keywords: | Cc: | ||
Guest type: | Windows | Host type: | Linux |
Description
The guest is a Windows SBS Server 2008 with 2 virtual CPUs, 4 GB of virtual RAM and two virtual HDDs (C: with 40 GB and D: with 20 GB, both with lots of free space). This SBS server acts as an SBS primary server (i.e., AD controller); it has had its MS Exchange and MS SharePoint removed and an SQL Server 2008 has been installed (besides the SQL Server 2005 Express that comes with SBS and which cannot be removed). This is a supported configuration for SBS, even if it's a bit unusual. It has not been customized significantly yet because we were just testing feasibility. The guest is running the latest Guest Additions (matching the host's VirtualBox version).
The problem is the guest works fine... until it doesn't. When the problem appears, guest applications start hanging pretty fast. A few tests have shown that the problem is that the (virtual) C: drive has stopped responding, and appications die only when they try to access it. The failing drive can also be the D: drive instead. The guest system cannot be brought down in this situation: the only option is powering it off or waiting for the Windows kernel to bluescreen and reset.
When the guest system is hanged like that, one or several messages like this one can be seen in the Vbox.log file:
53:11:46.112 AHCI#0: Canceled write at offset 29189742592 (4096 bytes left) returned rc=VINF_SUCCESS
The return code, if that's what it is, is always VINF_SUCCESS. The offset and the amount of bytes left may vary.
Thinking it could be a problem with flushing I have tried setting:
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/FlushInterval" 1000000
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/FlushInterval" 1000000
And:
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/FlushInterval" 1
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/FlushInterval" 1
But both ended up failing with the same error.
The problem is not easily reproduceable. It sometimes fails several times in a row, then sometimes it works for a couple of days before failing again. It might be a VirtualBox bug somehow related to host load (but this server is nowhere near full utilization).
This problem has been found on an HP ML370 G6 dual Quad Xeon host with lots of RAM and 4 SAS 15000 RPM HDDs working in a RAID10 array. The host runs CentOS 5.7 and this host is, as you might guess, unusually fast. This host also runs another VirtualBox guest: it's a Windows Server 2008 R2 system that works fine and hasn't suffered from this problem (fingers crossed). This other guest runs under a different host user.