Opened 6 years ago
Closed 5 years ago
#18365 closed defect (invalid)
exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Reported by: | VirtualBarista | Owned by: | |
---|---|---|---|
Component: | virtual disk | Version: | VirtualBox 6.0.2 |
Keywords: | Cc: | ||
Guest type: | Linux | Host type: | Linux |
Description
Hello,
I am using Fedora 29 as a host, which runs 5 VMs, a mix of Linux and FreeBSD. I never had problems with VirtualBox 5.x, I recently upgraded to 6.0.2.
Yesterday, while working on one of the VMs, something strange happened that I've never seen before.
ALL the VM consoles suddenly printed SATA errors, at the same time.
Here are the relevant parts of /var/log/messages from three of those VMs.
VM1
Jan 26 03:03:07 vm1 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jan 26 03:03:07 vm1 kernel: ata1.00: failed command: FLUSH CACHE Jan 26 03:03:07 vm1 kernel: ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 7#012 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 26 03:03:07 vm1 kernel: ata1.00: status: { DRDY } Jan 26 03:03:07 vm1 kernel: ata1: hard resetting link Jan 26 03:03:07 vm1 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 26 03:03:07 vm1 kernel: ata1.00: configured for UDMA/133 Jan 26 03:03:07 vm1 kernel: ata1.00: retrying FLUSH 0xe7 Emask 0x4 Jan 26 03:03:08 vm1 kernel: ata1.00: device reported invalid CHS sector 0 Jan 26 03:03:08 vm1 kernel: ata1: EH complete
VM2
Jan 26 03:09:10 vm2 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jan 26 03:09:10 vm2 kernel: ata1.00: failed command: FLUSH CACHE Jan 26 03:09:10 vm2 kernel: ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0#012 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 26 03:09:10 vm2 kernel: ata1.00: status: { DRDY } Jan 26 03:09:10 vm2 kernel: ata1: hard resetting link Jan 26 03:09:11 vm2 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 26 03:09:11 vm2 kernel: ata1.00: configured for UDMA/133 Jan 26 03:09:11 vm2 kernel: ata1.00: retrying FLUSH 0xe7 Emask 0x4 Jan 26 03:09:11 vm2 kernel: ata1.00: device reported invalid CHS sector 0 Jan 26 03:09:11 vm2 kernel: ata1: EH complete
VM3
Jan 26 03:03:08 vm3 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jan 26 03:03:08 vm3 kernel: ata1.00: failed command: FLUSH CACHE Jan 26 03:03:08 vm3 kernel: ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 17#012 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 26 03:03:08 vm3 kernel: ata1.00: status: { DRDY } Jan 26 03:03:08 vm3 kernel: ata1: hard resetting link Jan 26 03:03:08 vm3 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 26 03:03:08 vm3 kernel: ata1.00: configured for UDMA/133 Jan 26 03:03:08 vm3 kernel: ata1.00: retrying FLUSH 0xe7 Emask 0x4 Jan 26 03:03:08 vm3 kernel: ata1.00: device reported invalid CHS sector 0 Jan 26 03:03:08 vm3 kernel: ata1: EH complete
I tried to send a reboot command via ssh and out of the five VMs only two responded, the other three wouldn't accept connections. I tried the "ACPI Shutdown" option directly, but they still wouldn't respond or reboot. Eventually I had to forcefully turn them off.
On reboot, everything seems back to normal.
Here is a part of the VBox log
00:00:01.048370 VD#0: Cancelling all active requests 00:00:01.048446 VD#0: Cancelling all active requests 00:00:10.824386 VD#0: Cancelling all active requests 00:00:10.828057 VD#0: Cancelling all active requests 74:34:19.969034 VD#0: Flush request was active for 29 seconds 74:40:22.518388 VD#0: Cancelling all active requests 74:40:22.518414 VD#0: Request{0x007fcddc19d740}: 74:40:22.970057 VD#0: Flush request was active for 61 seconds 74:40:22.970084 VD#0: Aborted flush returned rc=VERR_PDM_MEDIAEX_IOREQ_CANCELED 91:03:11.731269 FPUIP=00000000 CS=0000 Rsrvd1=0000 FPUDP=00000000 DS=0000 Rsvrd2=0000 91:03:11.731485 FPUIP=00000000 CS=0000 Rsrvd1=0000 FPUDP=00000000 DS=0000 Rsvrd2=0000
All VMs are stored on the host as a RAID1 array (ext4).
Please reopen if still relevant and attach full VBox.logs from your VMs, not just some small excerpts. To me this looks like the host OS couldn't cope with the I/O load induced by the VMs resulting in long running I/O requests which triggered timeouts in the guest.