Opened 3 years ago
Last modified 3 years ago
#20725 new defect
VirtualBox OHCI-controller randomly skips sending URBs to the host
Reported by: | lo1ol | Owned by: | |
---|---|---|---|
Component: | USB | Version: | VirtualBox 6.1.30 |
Keywords: | OHCI usb 1.1 | Cc: | |
Guest type: | Linux | Host type: | all |
Description
Hello!
I have noticed some problems with USB 1.1 devices under VirtualBox.
First, I noticed that when I work with smart cards on VMs, sending some random APDU commands fails. For the test, I generate and destroy key pairs in a loop. I looked at the libccidand libusb log and found that at some point sending of the command gets stuck and fails when send operation timeout exceeds:
00000003 sending: 00 40 05 80 71 00 00 00 B4 00000005 -> 000000 6F 09 00 00 00 00 8E 00 00 00 00 40 05 80 71 00 00 00 B4 [133.818293] [00029ac3] libusb: debug [libusb_submit_transfer] transfer 0x7f04880010d8 [133.818297] [00029ac3] libusb: debug [add_to_flying_list] arm timer for timeout in 5000ms (first in line) [133.818302] [00029ac3] libusb: debug [submit_bulk_transfer] need 1 urbs for new transfer with length 19 [133.818479] [00029ac3] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling [133.818486] [00029ac3] libusb: debug [usbi_wait_for_events] poll() 3 fds with timeout in 60000ms [138.818506] [00029ac3] libusb: debug [usbi_wait_for_events] poll() returned 1 [138.818545] [00029ac3] libusb: debug [libusb_cancel_transfer] transfer 0x7f04880010d8 [138.835720] [00029ac3] libusb: debug [arm_timer_for_next_timeout] no timeouts, disarming timer [138.835771] [00029ac3] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling [138.835775] [00029ac3] libusb: debug [usbi_wait_for_events] poll() 3 fds with timeout in 60000ms [138.835781] [00029ac3] libusb: debug [usbi_wait_for_events] poll() returned 1 [138.835787] [00029ac3] libusb: debug [reap_for_handle] urb type=3 status=-2 transferred=0 [138.835790] [00029ac3] libusb: debug [handle_bulk_completion] handling completion status -2 of bulk urb 1/1 [138.835792] [00029ac3] libusb: debug [handle_bulk_completion] abnormal reap: urb status -2 [138.835794] [00029ac3] libusb: debug [handle_bulk_completion] abnormal reap: last URB handled, reporting [138.835797] [00029ac3] libusb: debug [usbi_handle_transfer_cancellation] detected timeout cancellation [138.835799] [00029ac3] libusb: debug [arm_timer_for_next_timeout] no timeouts, disarming timer [138.835802] [00029ac3] libusb: debug [usbi_handle_transfer_completion] transfer 0x7f04880010d8 has callback 0x7f049c94c850 [138.835804] [00029ac3] libusb: debug [sync_transfer_cb] actual_length=0 [138.835807] [00029ac3] libusb: debug [libusb_free_transfer] transfer 0x7f04880010d8 05017523 ccid_usb.c:857:WriteUSB() write failed (2/17): -7 LIBUSB_ERROR_TIMEOUT 00000004 openct/proto-t1.c:215:t1_transceive() fatal: transmit/receive failed 00000012 ifdhandler.c:1821:IFDHICCPresence() usb:072f/90cc:libudev:0:/dev/bus/usb/002/017 (lun: 0) 00000027 -> 000000 65 00 00 00 00 00 8F 00 00 00 [138.835862] [000299fb] libusb: debug [libusb_submit_transfer] transfer 0x7f04900013f8 [138.835865] [000299fb] libusb: debug [add_to_flying_list] arm timer for timeout in 5000ms (first in line) [138.835870] [000299fb] libusb: debug [submit_bulk_transfer] need 1 urbs for new transfer with length 10 [138.835901] [000299fb] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling [138.835904] [000299fb] libusb: debug [usbi_wait_for_events] poll() 3 fds with timeout in 60000ms 00000057 SW: 00000002 ifdwrapper.c:543:IFDTransmit() Card not transacted: 612 00000002 winscard.c:1620:SCardTransmit() Card not transacted: 0x80100016
After, I decided to look at the usbmon log (at the virtual machine) and found that sending of some URBs fail with -2 callback status:
ffff9e538d484e40 2827121885 S Bo:2:017:2 -115 19 = 6f090000 00008d00 00000000 05804000 0000c5 ffff9e538d484e40 2827124488 C Bo:2:017:2 0 19 > ffff9e538d484e40 2827124733 S Bi:2:017:2 -115 65556 < ffff9e538d484e40 2827128522 C Bi:2:017:2 0 16 = 80060000 00008d00 00000000 02900092 ffff9e538d484e40 2827180820 S Bo:2:017:2 -115 19 = 6f090000 00008e00 00000040 05807100 0000b4 ffff9e538d484e40 2832198155 C Bo:2:017:2 -2 0 ffff9e538d484e40 2832198387 S Bo:2:017:2 -115 10 = 65000000 00008f00 0000 ffff9e538d484e40 2832219064 C Bo:2:017:2 0 10 > ffff9e538d484e40 2832219163 S Bi:2:017:2 -115 10 <
To make sure that the problem is not caused by a small timeout (libccid uses 5 second timeout), I disabled it. After that sending failed URBs gets stuck forever and usbmon log looks like this:
ffff9e538d484e40 2827121885 S Bo:2:017:2 -115 19 = 6f090000 00008d00 00000000 05804000 0000c5 ffff9e538d484e40 2827124488 C Bo:2:017:2 0 19 > ffff9e538d484e40 2827124733 S Bi:2:017:2 -115 65556 < ffff9e538d484e40 2827128522 C Bi:2:017:2 0 16 = 80060000 00008d00 00000000 02900092 ffff9e538d484e40 2827180820 S Bo:2:017:2 -115 19 = 6f090000 00008e00 00000040 05807100 0000b4
I decided to look at usbmon log of the host machine and found that the failed URBs aren't sent at all, and the log looks like this:
ffff9e538d484e40 2827121885 S Bo:2:017:2 -115 19 = 6f090000 00008d00 00000000 05804000 0000c5 ffff9e538d484e40 2827124488 C Bo:2:017:2 0 19 > ffff9e538d484e40 2827124733 S Bi:2:017:2 -115 65556 < ffff9e538d484e40 2827128522 C Bi:2:017:2 0 16 = 80060000 00008d00 00000000 02900092
Also, I found how to get usbmon log of an emulated root hub controller. Failed commands don't appear there too.
I decided that maybe there is a problem with OHCI controller emulation. During debugging I noticed that after failure HeadP and TailP of Bulk List ED (Endpoint Description) are set and aren't equal, but the OHCI controller doesn't process it, because BLF (Bulk List Filled) status is not set.
After discovering the problem, I found there is some problem with cache. To prove it I added logging of bulk_head, status at the beginning of the StartFrame function and before calling the ServiceBulkList function. I also added logging of cached and real values of TailP and HeadP at the ServiceBulkList function.
Extra log patch:
diff --git a/src/VBox/Devices/USB/DevOHCI.cpp b/src/VBox/Devices/USB/DevOHCI.cpp index 45b23dd9..713a84a5 100644 --- a/src/VBox/Devices/USB/DevOHCI.cpp +++ b/src/VBox/Devices/USB/DevOHCI.cpp @@ -1564,6 +1564,14 @@ DECLINLINE(void) ohciR3ReadEd(PPDMDEVINS pDevIns, uint32_t EdAddr, POHCIED pEd) #endif } +/** + * Reads an OHCIED without caching. + */ +DECLINLINE(void) ohciR3ReadEdWithoutCaching(PPDMDEVINS pDevIns, uint32_t EdAddr, POHCIED pEd) +{ + ohciR3GetDWords(pDevIns, EdAddr, (uint32_t *)pEd, sizeof(*pEd) >> 2); +} + /** * Reads an OHCITD. */ @@ -3746,12 +3754,18 @@ static void ohciR3ServiceBulkList(PPDMDEVINS pDevIns, POHCI pThis, POHCICC pThis && (cIterations-- > 0)) { OHCIED Ed; + OHCIED RealEd; /* Bail if previous processing ended up in the unrecoverable error state. */ if (pThis->intr_status & OHCI_INTR_UNRECOVERABLE_ERROR) break; + ohciR3ReadEd(pDevIns, EdAddr, &Ed); + ohciR3ReadEdWithoutCaching(pDevIns, EdAddr, &RealEd); + if (Ed.TailP != RealEd.TailP || Ed.HeadP != RealEd.HeadP) + Log(("Cached and real Ed mismatch: EdAddr: 0x%x. Ed.HeadP: cached=0x%x real=0x%x. Ed.TailP: cached=0x%x real=0x%x\n", EdAddr, Ed.HeadP, RealEd.HeadP, Ed.TailP, RealEd.TailP)); + Assert(!(Ed.hwinfo & ED_HWINFO_ISO)); /* the guest is screwing us */ if (ohciR3IsEdReady(&Ed)) { @@ -4286,6 +4300,8 @@ static void ohciR3StartOfFrame(PPDMDEVINS pDevIns, POHCI pThis, POHCICC pThisCC) /* * Bulk EPs. */ + + Log(("StartOfFrame: Bulk head: 0x%x. Status: 0x%x\n", pThis->bulk_head, pThis->status)); if ( (pThis->ctl & OHCI_CTL_BLE) && (pThis->status & OHCI_STATUS_BLF)) ohciR3ServiceBulkList(pDevIns, pThis, pThisCC); @@ -4350,6 +4366,8 @@ static DECLCALLBACK(bool) ohciR3StartFrame(PVUSBIROOTHUBPORT pInterface, uint32_ ohciR3Lock(pThisCC); + Log(("StartFrame: Bulk head: 0x%x. Status: 0x%x\n", pThis->bulk_head, pThis->status)); + /* Reset idle detection flag */ pThis->fIdle = true;
I got the following logs:
- usbmon_token.log -- usbmon log on the VM
- vb_token.log -- VirtualBox log with enabled dev_ohci (lvl 2, 3, 4) logging
As you can see cached and real value of TailP are mismatched before data interchange stuck. Also, BLF status is set before reading mismatched cache:
StartFrame: Bulk head: 0x349d1150. Status: 0x0 ohciR3ReadTd(,0x349d3120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x02f72000 NextTD=0x349d30c0 BE=0x02f72007 UNK=0x0 ohciR3ReadTd(,0x349d3120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x02f72000 NextTD=0x349d30c0 BE=0x02f72007 UNK=0x0 ohciR3ReadTd(,0x349d3120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x02f72000 NextTD=0x349d30c0 BE=0x02f72007 UNK=0x0 ohciR3ReadTd(,0x349d3120,): R=1 DP=2 DI=0 T=0 EC=0 CC=0xf CBP=0x02f72000 NextTD=0x349d30c0 BE=0x02f72007 UNK=0x0 ohci: Updating HCCA on frame 0x3c4d StartOfFrame: Bulk head: 0x349d1150. Status: 0x4 Cached and real Ed mismatch: EdAddr: 0x349d1150. Ed.HeadP: cached=0x349d31e0 real=0x349d31e0. Ed.TailP: cached=0x349d31e0 real=0x349d3300 ohciR3StartOfFrame: HcCommandStatus=0x00000000: HCR=0 CLF=0 *BLF=0 OCR=0 SOC=0 *FLUSH* HcBulkHeadED_w(0x349d1150) - old=0x349d11c0 new=0x349d1150 HcBulkCurrentED_w(0x00000000) - old=0x00000000 new=0x00000000 HcControl_w(0x000002af) => CBSR=3 PLE=1 IE=1 CLE=0 *BLE=1 HCFS=0x2 IR=0 RWC=1 RWE=0 HcCommandStatus_w(0x00000004) => HCR=0 CLF=0 *BLF=1 OCR=0 SOC=0 *END*
I tried to describe the condition which leads to stuck. I think the following happens:
- HC enters the StartFrame function;
- HC updates cache;
- HCD fills TD list and sets HeadP and TailP for Bulk List;
- HCD sets Bulk List pointer via registers;
- HCD set BLF status;
- HC sees that BLF status is set and enters the ServiceBulkList function;
- HC reads old cached values of HeadP and TailP functions and sees that the TD list is not ready. Therefore it unchecks BLF status.
- HC enters the StartFrame again at the next iteration. It sees that BLF status is unchecked and doesn't process the Bulk List (but cached values of TailP and TailP are already equal).
- HCD waits for the Bulk List will be serviced.
I think about some solutions for this problem:
- Set the invalidate cache flag inside some register setter functions and invalidate cache inside R3 if it's set. But it may lead to a decrease in performance.
- Disable caching at all. It decreases performance too, but helps to avoid headaches.
- Avoid some symptoms of the problem. If cached EDs are not ready, but the corresponding status is set, then check ED directly from memory. It's probably a bad solution, because there are some cases where cached ED is ready, but real is not. Nevertheless, it may be better for performance.
I wrote a patch for the last solution and it helped me (it's draft):
diff --git a/src/VBox/Devices/USB/DevOHCI.cpp b/src/VBox/Devices/USB/DevOHCI.cpp index 45b23dd9..e34c0ebb 100644 --- a/src/VBox/Devices/USB/DevOHCI.cpp +++ b/src/VBox/Devices/USB/DevOHCI.cpp @@ -1550,20 +1550,25 @@ DECLINLINE(void) ohciR3CacheTdUpdate(POHCICC pThisCC, RTGCPHYS32 TdAddr, PCOHCIT /** * Reads an OHCIED. */ -DECLINLINE(void) ohciR3ReadEd(PPDMDEVINS pDevIns, uint32_t EdAddr, POHCIED pEd) +DECLINLINE(void) ohciR3ReadEdEx(PPDMDEVINS pDevIns, uint32_t EdAddr, POHCIED pEd, bool cached) { # ifdef VBOX_WITH_OHCI_PHYS_READ_STATS physReadStatsUpdateDesc(&g_PhysReadState.ed, EdAddr); physReadStatsUpdateDesc(&g_PhysReadState.all, EdAddr); # endif #ifdef VBOX_WITH_OHCI_PHYS_READ_CACHE - POHCICC pThisCC = PDMDEVINS_2_DATA_CC(pDevIns, POHCICC); - ohciR3PhysReadCacheRead(pDevIns, &pThisCC->CacheED, EdAddr, pEd, sizeof(*pEd)); + if (cached) { + POHCICC pThisCC = PDMDEVINS_2_DATA_CC(pDevIns, POHCICC); + ohciR3PhysReadCacheRead(pDevIns, &pThisCC->CacheED, EdAddr, pEd, sizeof(*pEd)); + } #else ohciR3GetDWords(pDevIns, EdAddr, (uint32_t *)pEd, sizeof(*pEd) >> 2); #endif } +#define ohciR3ReadEd(pDevIns, EdAddr, pEd) \ + ohciR3ReadEdEx(pDevIns, EdAddr, pEd, 1) + /** * Reads an OHCITD. */ @@ -3701,6 +3706,31 @@ DECLINLINE(bool) ohciR3IsEdReady(PCOHCIED pEd) && !(pEd->hwinfo & ED_HWINFO_SKIP); } +/** + * Tries to get ready an OHCIED from cache and on fail dirrectly from memory. + */ +DECLINLINE(bool) ohciR3TryReadReadyEd(PPDMDEVINS pDevIns, uint32_t EdAddr, POHCIED pEd) +{ +#ifdef VBOX_WITH_OHCI_PHYS_READ_CACHE + ohciR3ReadEdEx(pDevIns, EdAddr, pEd, 1); + if (ohciR3IsEdReady(pEd)) + return true; +#endif + + ohciR3ReadEdEx(pDevIns, EdAddr, pEd, 0); + if (ohciR3IsEdReady(pEd)) { +#ifdef VBOX_WITH_OHCI_PHYS_READ_CACHE + // Cache contains old page. Invalidate it. + POHCICC pThisCC = PDMDEVINS_2_DATA_CC(pDevIns, POHCICC); + ohciR3PhysReadCacheInvalidate(&pThisCC->CacheED); + ohciR3PhysReadCacheInvalidate(&pThisCC->CacheTD); +#endif + return true; + } + + return false; +} + /** * Checks if an endpoint has TDs queued (not necessarily ready to have them processed). @@ -3751,9 +3781,9 @@ static void ohciR3ServiceBulkList(PPDMDEVINS pDevIns, POHCI pThis, POHCICC pThis if (pThis->intr_status & OHCI_INTR_UNRECOVERABLE_ERROR) break; - ohciR3ReadEd(pDevIns, EdAddr, &Ed); + bool ready = ohciR3TryReadReadyEd(pDevIns, EdAddr, &Ed); Assert(!(Ed.hwinfo & ED_HWINFO_ISO)); /* the guest is screwing us */ - if (ohciR3IsEdReady(&Ed)) + if (ready) { pThis->status |= OHCI_STATUS_BLF; pThis->fBulkNeedsCleaning = true; @@ -3909,9 +3939,9 @@ static void ohciR3ServiceCtrlList(PPDMDEVINS pDevIns, POHCI pThis, POHCICC pThis if (pThis->intr_status & OHCI_INTR_UNRECOVERABLE_ERROR) break; - ohciR3ReadEd(pDevIns, EdAddr, &Ed); + bool ready = ohciR3TryReadReadyEd(pDevIns, EdAddr, &Ed); Assert(!(Ed.hwinfo & ED_HWINFO_ISO)); /* the guest is screwing us */ - if (ohciR3IsEdReady(&Ed)) + if (ready) { # if 1 /* @@ -3928,8 +3958,7 @@ static void ohciR3ServiceCtrlList(PPDMDEVINS pDevIns, POHCI pThis, POHCICC pThis pThis->status |= OHCI_STATUS_CLF; break; } - ohciR3ReadEd(pDevIns, EdAddr, &Ed); /* It might have been updated on URB completion. */ - } while (ohciR3IsEdReady(&Ed)); + } while (ohciR3TryReadReadyEd(pDevIns, EdAddr, &Ed)); # else /* Simplistic, for debugging. */ ohciR3ServiceHeadTd(pThis, VUSBXFERTYPE_CTRL, &Ed, EdAddr, "Control"); @@ -3991,8 +4020,7 @@ static void ohciR3ServicePeriodicList(PPDMDEVINS pDevIns, POHCI pThis, POHCICC p if (pThis->intr_status & OHCI_INTR_UNRECOVERABLE_ERROR) break; - ohciR3ReadEd(pDevIns, EdAddr, &Ed); - if (ohciR3IsEdReady(&Ed)) + if (ohciR3TryReadReadyEd(pDevIns, EdAddr, &Ed)) { /* * "There is no separate head pointer of isochronous transfers. The first
I admit that you may have no USB 1.1 smart card to reproduce the bug. So, I found how to reproduce the bug via USB 1.1 flash drive (or via any other flash drive connected via USB 1.1 hub).
I wrote a test thatmounts and unmounts a flash drive in a loop, and noticed that sometimes the execution of these operations takes a lot of time.
test.sh:
i=0 while true do i=$(($i+1)) echo "Iteration $i" date +"%T" mount /dev/sdb1 /media/kek || break umount /dev/sdb1 || break done
output:
Iteration 1 18:06:49 Iteration 2 18:06:51 Iteration 3 18:07:26 Iteration 4 18:07:28 Iteration 5 18:07:30
Inside the OHCI log I found the same problem as for a smart card: usbmon_falsh.log -- usbmon log for looped mount and unmount vb_flash.log -- VirtualBox output
What do you think about that? May you confirm the bug? Will it be fixed?
Also, I found some problems with EHCI controller emulation, but will write about that in another bug.
Waiting for your response! Thanks!
Attachments (6)
Change History (8)
by , 3 years ago
Attachment: | vb_flash.zip added |
---|
by , 3 years ago
Attachment: | vb_flash.2.zip added |
---|
Virtual box log for flash drive test (mount/unmount)
by , 3 years ago
Attachment: | vb_flash.3.zip added |
---|
Virtual box log for flash drive test (mount/unmount)
by , 3 years ago
Attachment: | usbmon_token.log added |
---|
usbmon log for smart card test (generate/destroy key)
by , 3 years ago
Attachment: | vb_token.zip added |
---|
Virtualbox log for smart card test (generate/destroy key)
comment:2 by , 3 years ago
Resolving of this bug is still important for me. Somebody answer to me please...
Virtual box test for flash drive test (mount/unmount)