Opened 6 years ago
Last modified 22 months ago
#18176 new defect
USB resets and hangs during high I/O — at Version 8
Reported by: | Jimson | Owned by: | |
---|---|---|---|
Component: | USB | Version: | VirtualBox 5.2.22 |
Keywords: | usb reset superspeed | Cc: | |
Guest type: | Linux | Host type: | Mac OS X |
Description (last modified by )
Experiencing USB reset and hangs on Virtualbox 5.2.22 on Macbook Pro 2017 10.13.6, on Oracle Linux 7.5 and Ubuntu 18.04 VMs. I'm able to replicate the problem, very quickly, by running the following:
$ sudo dd if=/dev/zero of=./testfile status=progress bs=1024k 639631360 bytes (640 MB) copied, 5.448752 s, 117 MB/s
It starts out pretty fast, but once the reset occurs, the speeds taper down to < 50MB/s. The OL system log shows the resets and I/O errors as:
Dec 10 15:55:28 jdnissen-lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd Dec 10 15:55:38 jdnissen-lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd Dec 10 15:55:44 jdnissen-lobi7 systemd: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1858 (cma) Dec 10 15:55:44 jdnissen-lobi7 systemd: Mounting Arbitrary Executable File Formats File System... Dec 10 15:55:45 jdnissen-lobi7 systemd: Mounted Arbitrary Executable File Formats File System. Dec 10 15:55:46 jdnissen-lobi7 kernel: EXT4-fs (dm-3): Delayed block allocation failed for inode 12 at logical offset 546816 with max blocks 2048 with error 5 Dec 10 15:55:46 jdnissen-lobi7 kernel: EXT4-fs (dm-3): This should not happen!! Data will be lost Dec 10 15:55:46 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:55:51 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:55:57 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:01 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:07 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:12 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:17 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:22 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:27 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8 Dec 10 15:56:32 jdnissen-lobi7 kernel: JBD2: Detected IO errors while flushing file data on dm-3-8
I've attempted to workaround this with the new test build 6.0.0 RC1, and it didn't help. OL and Ubuntu updates don't help, either.
It's happening on two different USB drives: Seagate 2TB and Western Digital 2TB. The seagate is formatted as NTFS and the WD is Linux EXT4.
In addition, I can attach the drives to an old laptop running bare-metal OL7.2, and it works just fine...no resets, hangs, or slow I/Os, during the 'dd' (or file copies).
I work for Oracle, and this problem is preventing me from being able to copy large patch bundles, used during customer installs.
Change History (10)
by , 6 years ago
comment:1 by , 6 years ago
comment:2 by , 6 years ago
The original errors in code-blocks for readability...
Dec 7 10:09:50 lobi7 kernel: usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd Dec 7 10:09:50 lobi7 kernel: usb 2-1: New USB device found, idVendor=1058, idProduct=25a2 Dec 7 10:09:50 lobi7 kernel: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Dec 7 10:09:50 lobi7 kernel: usb 2-1: Product: Elements 25A2 Dec 7 10:09:50 lobi7 kernel: usb 2-1: Manufacturer: Western Digital Dec 7 10:09:50 lobi7 kernel: usb 2-1: SerialNumber: 575854314536384353524154 Dec 7 10:09:50 lobi7 kernel: usb-storage 2-1:1.0: USB Mass Storage device detected Dec 7 10:09:50 lobi7 kernel: scsi host6: usb-storage 2-1:1.0 Dec 7 10:09:50 lobi7 mtp-probe: checking bus 2, device 4: "/sys/devices/pci0000:00/0000:00:0c.0/usb2/2-1" Dec 7 10:09:50 lobi7 mtp-probe: bus: 2, device: 4 was not an MTP device Dec 7 10:09:51 lobi7 kernel: scsi 6:0:0:0: Direct-Access WD Elements 25A2 1021 PQ: 0 ANSI: 6 Dec 7 10:09:51 lobi7 kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0 Dec 7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] 3906963456 512-byte logical blocks: (2.00 TB/1.81 TiB) Dec 7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] Write Protect is off Dec 7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] No Caching mode page found Dec 7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] Assuming drive cache: write through Dec 7 10:09:51 lobi7 kernel: sd 6:0:0:0: [sdc] Attached SCSI disk ... Dec 7 09:47:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:23 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:24 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:24 lobi7 kernel: sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Dec 7 09:47:24 lobi7 kernel: sd 5:0:0:0: [sdb] CDB: Write(10) 2a 00 07 4e c4 00 00 04 00 00 Dec 7 09:47:24 lobi7 kernel: blk_update_request: I/O error, dev sdb, sector 122602496 Dec 7 09:47:24 lobi7 kernel: EXT4-fs warning (device dm-3): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 41783656448 size 8388608 starting block 15324800) Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324800 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324801 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324802 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324803 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324804 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324805 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324806 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324807 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324808 Dec 7 09:47:24 lobi7 kernel: Buffer I/O error on device dm-3, logical block 15324809 … Dec 7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:26 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd Dec 7 09:47:26 lobi7 kernel: sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Dec 7 09:47:26 lobi7 kernel: sd 5:0:0:0: [sdb] CDB: Write(10) 2a 00 07 4e d0 00 00 04 00 00 Dec 7 09:47:26 lobi7 kernel: blk_update_request: I/O error, dev sdb, sector 122605568 Dec 7 09:47:26 lobi7 kernel: EXT4-fs warning (device dm-3): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 41792045056 size 8388608 starting block 15325184)
My concerns aren't so much the performance impact, but rather that large file copies stop, abruptly. With 100's of files to copy, at a time, it means restarting the process where it left off.
by , 6 years ago
Attachment: | Oracle Linux 7.2 OBI-2018-12-12-09-33-55.log added |
---|
comment:3 by , 6 years ago
I was able to reproduce this... and then I wasn't. But it certainly didn't happen because anything got intentionally fixed.
The underlying problem unfortunately appears to be some race condition which is inherently very configuration specific and at least for me, currently so hard to reproduce that I can't meaningfully try to fix it. I can read hundreds of gigabytes at ~180 MB/s, no problem, the resets just aren't happening. That doesn't mean the resets won't happen on other machines, but at the moment I don't have those other machines.
Please try again with a Windows host, and report any issues encountered. I believe the problem is much less severe on a Windows host because it only slows things down but does not cause I/O errors.
If you find something that makes the errors more frequent (number of virtual CPUs? idle/busy host? VM memory size?), please add a comment here.
comment:4 by , 6 years ago
Summary: | USB 3.0 resets and hangs during high I/O → USB resets and hangs during high I/O |
---|
I'll add that the problem is not USB 3.0 specific. There is some probability involved, and the error may be more likely to show up as the number of USB transactions go up, so with fast devices the error probably just shows up quicker.
There is definitely a problem in the VirtualBox Linux USB proxy in that it does now allow USB devices to be truly reset. It looks like the USB device may get sufficiently confused that it simply stops responding (that could in fact be triggering the initial reset attempt) and because we do not really reset it, the USB device stays confused until the guest OS gives up on it.
comment:5 by , 6 years ago
My attempts at reproducing this problem on two home Windows 10 computers failed, at first, I think due to bug 84741. However, an upgrade to Vbox 6.0 fixed that, and interestingly, I have been unable to reproduce this bug on Windows 10 hosts running Vbox 6.0, running the same Oracle Linux VM. I have read a large 450GB file and wrote a large 1.7TB file, until the device filled up. However, there were a few minor USB resets:
Dec 21 11:21:38 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd Dec 21 12:33:36 lobi7 kernel: EXT4-fs (dm-3): error count since last fsck: 5 Dec 21 12:33:36 lobi7 kernel: EXT4-fs (dm-3): initial error at time 1544198049: ext4_journal_check_start:56 Dec 21 12:33:36 lobi7 kernel: EXT4-fs (dm-3): last error at time 1544478866: ext4_wait_block_bitmap:497 Dec 21 12:33:56 lobi7 kernel: usb 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
The upgrade to Vbox 6.0 doesn't fix this bug, on the Mac host, however. The problem still occurs there, rather quickly.
comment:6 by , 6 years ago
Actually, you did reproduce half of the problem -- it's that "reset SuperSpeed USB device number 2 using xhci_hcd" message. The difference is that on a Windows host, the VM / USB device / VirtualBox can recover from that situation, and on Linux or OS X hosts it can't.
I know exactly why the problem exists on Linux, it's because we only pretend to reset the USB device but don't really. On OS X I'm not sure why it's happening, I thought we should be really resetting the device but perhaps we aren't. I haven't looked at the OS X behavior in detail yet.
On Linux hosts, the question is how to fix it, and I don't have the answer yet though I have some ideas.
comment:7 by , 6 years ago
Michal,
Do you need any more data points? I'm on OSX 10.11.6 with plenty of VMs to test this on. And 2 USB3 HDs.
Also, since you have the power, could you edit the messages from 'Jimson' to include the logs/messages in {{{ ... }}} tags, to make reading a little bit easier? TIA.
Or you could do it too Jimson, they're your messages, you have the authority. Except the original one, the ticket report itself. As the poet once said, "U Can't Touch This!" ;)
comment:8 by , 6 years ago
Description: | modified (diff) |
---|
Made several attempts at reproducing this issue on a non-Mac host, using the same VM exported/imported. However, my attempts at reproducing on two Windows 10 hosts failed, as the device wouldn't even mount in the VM, on either, with following mount attempt errors...
I ran out of time to troubleshoot this Windows 10 Vbox USB issue, so made another attempt on an old Dell E7440 running Oracle Linux 7.6, and though it took much longer, the USB resets occurred there, too...
Log is attached.