Opened 4 years ago
Closed 4 years ago
#20182 closed defect (fixed)
in VirtualBox 6.x, vxworks vm will crash - cause Guru Meditation
Reported by: | JayK | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 6.1.18 |
Keywords: | vxworks | Cc: | |
Guest type: | other | Host type: | Linux |
Description
My company is trying to move from virtualbox 5.x to 6.x. Since then we have been unable to open our vxworks virtual machines. These machines all open in our virtualbox 5.x environment, but crash upon opening in 6.x environment.
Attachments (7)
Change History (25)
by , 4 years ago
comment:1 by , 4 years ago
You don't happen to have a vxworks VM exhibiting the issue you can supply to us for further investigation? Will be hard to get down to the issue otherwise.
follow-up: 4 comment:2 by , 4 years ago
Please provide a VBox.log from a working 5.1.x installation, so that we can compare the environments. The 5.1.x setup could have been vastly different from all we know.
Is there any reason why your VM has nested paging disabled?
As aeichner already said, we don't have VxWorks and the chance this will get fixed without being able to reproduce the problem is effectively zero. There is no obvious problem, the guest OS is triple faulting (i.e. crashing really hard) but the list of possible causes is endless.
by , 4 years ago
Attachment: | VBox_win7_vb5.0.32.log added |
---|
Log file from successful launch of vxworks 6.8 (from win7) using virtualbox 5.0.32
by , 4 years ago
Attachment: | VBoxHardening_win7_vb_5.0.32.log added |
---|
VBoxHardening file from successful launch of vxworks 6.8 in virtualbox 5.0.32
by , 4 years ago
Attachment: | Crash_win10_vb6.1.18.png added |
---|
image of crash screen using win10 and virtualbox 6.1.18
by , 4 years ago
Attachment: | VBox_win10_vb6.1.18.log added |
---|
log file of crash on win10 using vxworks vm in virtualbox 6.1.18
comment:3 by , 4 years ago
Sorry for my delay in getting back to this issue. It will be difficult (maybe impossible) for me to share the vxworks vm - due to security concerns. All I could post were the log files from a win7 execution that worked and a win10 execution that crashed. I'm still hopeful that one of you experts can offer some guidance here. Here's a few more clues that I've dug up ...
Crash occurs on many versions of fedora using a variety of VirtualBox. Crash occurs on windows as well (except for that 32 bit win7 running vb 5.0.32).
If I explicitly disable the network adapter (bridged adapter), then the vxworks vm works just fine on all platforms/versions (but is unusable to us).
vxworks 6.9 vms suffer a similar fate to vxworks 6.8. They sometimes crash/hang but run fine if I disable their network adapters.
Many thanks in advance if anyone can share any thoughts.
comment:4 by , 4 years ago
Replying to michaln:
Is there any reason why your VM has nested paging disabled?
I'm afraid I'm relatively new to virtualbox, so I relied on the default settings that were set when I imported the ova.
comment:5 by , 4 years ago
All I can suggest is trying a different emulated NIC, including different E1000 variants.
This really does not look like something that can be fixed without seeing the problem up close, so if you can't provide a reproduction scenario, we can't work on it.
You could in theory go through the VirtualBox OSE revisions and figure out exactly which change broke it. We can't even do that much. If it broke between point updates of a VirtualBox major version that would narrow it down a lot, but if it's something like "5.0.x latest works, 5.1.x does not" then that's almost useless information.
Although even if you found exactly what broke it, it might be very difficult for us to fix it without being able to fully analyze the problem and testing the fix. So it's really up to you.
comment:6 by , 4 years ago
Hi All, We're working with our management to get permission to share a vxworks 6.9 vm with you all. My company's management wants to have an NDA in place. Is there a single person that I can work with to get this done? Here's the things our legal folks would want to know ...
Company Information
- Name of the company with whom we are initiating the NDA (please include the corporate suffix, whether “LLC” or “Corp.” or “Inc.”)
- State where the company is incorporated
- Full corporate address Technical Point-of-contact (POC) Information
- Name of the person at the company who is our primary POC for the exchange of proprietary information
- E-mail address
- Phone number
- Address
Administrative POC Information
- Name of the person at the company who is our primary POC for the exchange of proprietary information
- E-mail address
- Phone number
- Address Signatory Information
- Name of the person at the company who is signing the NDA
- Title of the person at the company who is signing the NDA
Thanks in advance!
follow-up: 8 comment:7 by , 4 years ago
While signing an NDA is not out of the question, it would probably take at least six months to get the legal side sorted out and only then we could start looking for actual developer time to work on the problem.
Is there really no evaluation version of VxWorks or anything else we could work with? Do you have no contacts at Wind River, or are they hostile to virtualization?
Please understand that since there is no customer involved, this is automatically low priority. On the one hand we'd like to fix the bug, on the other hand there are very many bugs that are vastly less difficult to work on.
follow-up: 9 comment:8 by , 4 years ago
Replying to michaln:
While signing an NDA is not out of the question, it would probably take at least six months to get the legal side sorted out and only then we could start looking for actual developer time to work on the problem.
Is there really no evaluation version of VxWorks or anything else we could work with? Do you have no contacts at Wind River, or are they hostile to virtualization?
Please understand that since there is no customer involved, this is automatically low priority. On the one hand we'd like to fix the bug, on the other hand there are very many bugs that are vastly less difficult to work on.
Thanks michaln, for your honest feedback and guidance. I'm working with our management to see if we can create an evaluation vm that we can share. Once again - thanks!
comment:9 by , 4 years ago
I convinced my mgmnt to allow me to build a vxworks vm that is stripped of all of our intellectual property. After about a week of work, I finally have an ova file that I can give you folks that will reproduce the virtualbox crash when network adapter is enabled.
How can I get this ova in your hands? The .ova file is 252MB, and is too big to add it here as an attachment.
Thanks, Jay
Replying to JayK:
Replying to michaln:
While signing an NDA is not out of the question, it would probably take at least six months to get the legal side sorted out and only then we could start looking for actual developer time to work on the problem.
Is there really no evaluation version of VxWorks or anything else we could work with? Do you have no contacts at Wind River, or are they hostile to virtualization?
Please understand that since there is no customer involved, this is automatically low priority. On the one hand we'd like to fix the bug, on the other hand there are very many bugs that are vastly less difficult to work on.
Thanks michaln, for your honest feedback and guidance. I'm working with our management to see if we can create an evaluation vm that we can share. Once again - thanks!
follow-up: 11 comment:10 by , 4 years ago
That sounds like good news!
We unfortunately don't have an anonymous file drop at the moment. Would you be able to host the OVA on your end and e-mail me the download link and/or a password for the archive? It can be Google Drive or anything really.
My e-mail address is michal.necasek@…. Alternatively I should be able to dig up your e-mail address from your bug tracker registration information but that might take a little while.
comment:11 by , 4 years ago
Hi Michal, I'm going to try sending an email to what I think is your email address. I'm not very experienced with these forums, so I assume that complete email addresses may be clobbered when mentioned in comments. If my email does not get to you, please look up my email in my profile and reach out to me! Thanks, Jay
Replying to michaln:
That sounds like good news!
We unfortunately don't have an anonymous file drop at the moment. Would you be able to host the OVA on your end and e-mail me the download link and/or a password for the archive? It can be Google Drive or anything really.
My e-mail address is michal.necasek@…. Alternatively I should be able to dig up your e-mail address from your bug tracker registration information but that might take a little while.
comment:12 by , 4 years ago
Somehow I missed this before: The VBox.log from a supposedly working VirtualBox 5.0.32 setup is not using the Intel E1000 emulation! It uses the AMD PCnet emulation instead. Did it actually work with the E1000 emulation in older VirtualBox versions?
comment:13 by , 4 years ago
It looks like VxWorks e1000 driver relies on undocumented feature which was never present in VirtualBox. You must have used PCNet in 5.0.32. Apparently VxWorks driver reads ICS instead of ICR to obtain an interrupt cause without acknowledging the interrupt. The fix will be included in the next maintenance release.
comment:14 by , 4 years ago
There's now a VirtualBox 6.1.x test build including the change at https://www.virtualbox.org/wiki/Testbuilds -- please give it a shot and let us know if it made a difference.
At any rate we're pretty certain this particular guest never worked with the VirtualBox Intel E1000 emulation. It seems like the guest ended up with a "stuck" interrupt that was never cleared and eventually caused some kind of overflow and a fatal crash.
comment:15 by , 4 years ago
Hi Michal, Afraid that I had missed something in my research, I went back to check on the virtualbox vxworks vm that used to work. If I'm reading it right, it *does* use the Intel E1000 emulation. I've attached a screenshot (VXworks_VM_settings.png) of the settings. As a test, I tried toggling it to the two PCNET settings and restarting. The Vxworks vm launched both times, but were are unable to ping that VM unless we go back to the Intel settings. So I don't believe that we ever used the PCNet settings.
In any event, I'm anxious to try out your fix! (and grateful for all of your help)
Regards, Jay
by , 4 years ago
Attachment: | VXworks_VM_settings.png added |
---|
settings from old vxworks vm that used to work
comment:16 by , 4 years ago
The VBox_win7_vb5.0.32.log file attached here definitely is not configured with the E1000 emulation. But that was perhaps an omission.
We now believe that VxWorks previously did work, and what changed was that VirtualBox added a link-state change interrupt for the emulated E1000 chip, which is something the Windows XP driver for the Intel 82543GC (PRO/1000 T Server) requires.
What we found is that VxWorks relies on behavior not documented by Intel. However, the VxWorks driver also includes a workaround (an alternative interrupt handler) that was designed to work with VMware etc., whose E1000 emulation was based on the available Intel documentation. That alternative handler is why it worked with VirtualBox 5.0.x at all.
Our best guess is that the link state change interrupt can sometimes arrive before the alternative interrupt handler is in place. If it arrives late enough, the guest works, but if it's "too soon", the original interrupt handler is used and it will cause the guest to crash because it'll never clear the adapter interrupt and run out of stack space. There is definitely some non-determinism involved, which could well be caused by the link state change interrupt arriving with some delay after the NIC was initialized.
With the fix included, both interrupt handlers should work, so it shouldn't matter whether the original or alternative interrupt handler is used in the guest.
comment:17 by , 4 years ago
The fix is included in VirtualBox 6.1.20. Please let us know if it works.
comment:18 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This has been confirmed fixed, closing this ticket.
Thanks for providing a reproduction scenario! Without that we'd never have guessed what the problem was.
image of virtual machine just before crash