#11610 closed defect (fixed)
BUG: unable to handle kernel paging request
Reported by: | csreynolds | Owned by: | |
---|---|---|---|
Component: | other | Version: | VirtualBox 4.2.10 |
Keywords: | kernel | Cc: | |
Guest type: | all | Host type: | Linux |
Description
VM will start up, function for a random amount of time and then freeze. Has to be killed from command line.
Attachments (24)
Change History (85)
by , 12 years ago
by , 12 years ago
comment:1 by , 12 years ago
comment:2 by , 12 years ago
I can boot into an older kernel and i have no problems. Is there a way I can get more detailed info on why the crash is happening? I'd like to help resolve this issue if i can.
[creynolds@localhost trunk]$ uname -a Linux localhost.localdomain 3.6.10-4.fc18.x86_64 #1 SMP Tue Dec 11 18:01:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
The listed kernel above shows no problems at all. 3.7 also worked before I upgraded to 3.8
comment:3 by , 12 years ago
I had a similar experience with upgrading Fedora 17 from 3.6.10 to 3.8.3 -- VirtualBox 4.2.10 issued "unable to handle kernel paging" while writing to the virtual disk (was installing Fedora 17 Live CD to hard drive). So I tried VirtualBox 4.1.24 -- same problem. I changed to another computer with 3.8.3 kernel -- same problem. Reverted kernel to 3.6.10, and no problems were encountered.
Reported problem to Red Hat (Bug 929339), who said it was VirtualBox's problem.
comment:4 by , 12 years ago
I encountered the same problem today on ArchLinux. I was running virtualbox-4.2.10-2 and linux-3.8.5-1. I had to downgrade back to virtualbox-4.2.8-1 and linux-3.7.10-1 in order to use my virtual machines again. I will upload the relevant snippet from /var/log/messages.log.
by , 12 years ago
Attachment: | virtualbox-4.2.10-2-linux-3.8.5-1-oops.log added |
---|
by , 12 years ago
Attachment: | VBox.log.1 added |
---|
This log is from when I couldn't boot a Windows virtual machine after upgrading my ArchLinux system to virtualbox-4.2.10-2 and linux-3.8.5-1.
by , 12 years ago
Attachment: | VBox.2.log added |
---|
This log is from successfully booting the same Windows virtual machine after downgrading back to virtualbox-4.2.8-1 and linux-3.7.10-1
by , 12 years ago
Attachment: | fedora-18-oops.txt added |
---|
kernel oops starting a VM on a up-to-date fedora 18 Linux bureau 3.8.5-201.fc18.i686.PAE #1 SMP Thu Mar 28 21:50:08 UTC 2013 i686 i686 i386 GNU/Linux VirtualBox-4.2-4.2.10_84104_fedora18-1.i686
follow-up: 6 comment:5 by , 12 years ago
actually, as I read again the description of this ticket, I realized that my issue is probably not the same as this one since VMs don't start. I get a kernel Oops trying to start them.
comment:6 by , 12 years ago
Replying to rsalmon:
actually, as I read again the description of this ticket, I realized that my issue is probably not the same as this one since VMs don't start. I get a kernel Oops trying to start them.
Actually, the same is true in my case as well. The VM starts but the Oops happens at some random point while the guest is booting. I tried with both Windows XP and RHEL 6.3 guests. None ever booted into a usable state before the Oops occurred.
follow-ups: 8 9 comment:7 by , 12 years ago
Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).
You don't run KVM in parallel by any chance?
comment:8 by , 12 years ago
Replying to frank:
Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).
You don't run KVM in parallel by any chance?
In my case, no.
The host was running Fedora 17 x86_64 3.8.3-103 (10GB RAM).
I had created a Fedora VM with 2GB RAM, a 15GB virtual disk and configured networking to be bridged to em2. I connected the Fedora 17 Live CD (x86) .iso file, and clicked on "install to hard drive". It was during the last step -- the installation of packages to the virtual disk -- that I would encounter the kernel paging error. The point at which it was encountered varied -- one time it was fairly early in the package installation process, while at another time it was near the end.
I don't believe there were any other VMs active at the time.
When I changed the host's kernel back to 3.6.10, I didn't encounter any problems.
I have many other hosts running 3.8.3-103, but with an existing Windows 7 VM, and I haven't seen any problems running them. It seemed to be tied to writing a lot to the virtual disk.
comment:9 by , 12 years ago
Replying to frank:
Trying to find a pattern. It seems that Fedora 18 hosts are affected with Linux 3.8. I have a 64-bit Fedora 18 system running with a Linux kernel 3.8.5-201 installed. I have no problems starting 64-bit guests (e.g. Debian 6.0) or 32-bit guests (e.g. Windows XP).
You don't run KVM in parallel by any chance?
I don't run KVM. Now, I'm not sure of what I've done but I no longer get a kernel Oops when starting a VM. I forced a reinstall of the kernel and kernel's header/devel files, then rerun vboxdrv setup. may be I had a problem with the devel files.
kernel is 3.8.5-201.fc18.i686.PAE and I was able to start a debian 32bits.
by , 12 years ago
Attachment: | fc-18-oops.txt added |
---|
still happening with kernel 3.8.6-203.fc18.x86_64 and VirtualBox 4.2.12 I have tried re-installing header/devel rpms and re-running vboxdrv setup to see if it cleared up like frank, same issue.
comment:10 by , 12 years ago
I noticed at least one difference in the /var/log/messages between the last version of VirtualBox that worked on my machine and all of the versions that failed: just before the kernel Oops message there is a line logging that a network device entered promiscuous mode.
In the working versions of VirtualBox the device is vboxnet0 or vboxnet1, but in the versions that don't work the device is eth0. You can see an example of this at line 1 in the log snippet I originally posted: https://www.virtualbox.org/attachment/ticket/11610/virtualbox-4.2.10-2-linux-3.8.5-1-oops.log#L1
The same can be seen here in the original attachment posted by csreynolds: https://www.virtualbox.org/attachment/ticket/11610/messages#L17
Unfortunately, the other snippets of /var/log/messages posted in this thread trimmed the "device XYZ entered promiscuous mode" lines.
I wonder if this is consistent for others experiencing this issue. Do the versions of VirtualBox that fail always log the name of the physical interface before the kernel Oops, and the versions of VirtualBox that work fine always log the name of one of the vboxnet interfaces?
Does anyone know of any changes in VirtualBox 4.2.10+ or Linux 3.8+ that would affect which device the vboxdrv, vboxnetadp or vboxnetflt kernel modules try to switch into promiscuous mode?
follow-up: 14 comment:12 by , 12 years ago
Hi All, I think I was able to nail down the problem. I played with different configuration for many virtual machine and found some working configuration.
It all comes down to under System settings, "Acceleration" tab, nested paging (AMD-V) or EPT (Intel VT-x)
under System settings, "Processor" tab, Enable PAE/NX
under System settings, "Acceleration" tab, hardware virtualization (AMD-V) (Intel VT-x) (first checkbox)
Generally, the problem arise when you have activated hardware virtualization or have multiple cpu (which automatically activate it)
I was doing some tests with System Rescue CD iso file and an installed CromeBook OS
Working without problem: System rescue cd and chrome os will boot, run and wait for user input at the prompt or interface.
- PAE NX on
- 1 processor
- VT-d off
- nested page off
system rescue cd will work as above, chrome will not because it need a pae kernel, anyway.
- PAE NX off
- 1 processor
- VT-d off
- nested pages off
Not working : system rescue cd will show a fully working boot menu, when starting the default kernel will hang after third line "Probing EDD (edd=off to disable)... ok
chrome will fail silently
- pae nx on
- 1 processor
- vt-d on
- nested pages off
system rescue cd, same as above
chrome will fail silently
- pae nx on
- 1 processor
- vt-d on
- nested pages on
system rescue cd, same as above
chrome will fail silently
- pae nx on
- 1 processor
- vt-d on
- nested pages on
system rescue cd, will boot but fail before running to user login. multiple run will crash at different places
chrome will fail silently
- pae nx off
- 2 processor
- vt-d on
- nested pages off
system rescue cd, will boot but fail before running to user login. multiple run will crash at different places
chrome will fail silently
- pae nx on
- 2 processor
- vt-d on
- nested pages on
I could not see differences between nat and bridged network. (bridget network put the interface into promiscous mode) I could not see
Under Arch linux, affected kernel https://bugs.archlinux.org/task/34399 also occur with 3.9.2-1-ARCH virtualbox 4.2.12_OSE_r84980
attached in my cpuinfo, I have an intel westmere cpu.
follow-up: 15 comment:13 by , 12 years ago
Replying to sergiomb:
hi , the kernel on host or kernel on guest ?
On the host, but I believe that is a red herring. timemaster's workaround works for me, too.
comment:14 by , 12 years ago
Replying to timemaster:
Generally, the problem arise when you have activated hardware virtualization or have multiple cpu (which automatically activate it)
I was doing some tests with System Rescue CD iso file and an installed CromeBook OS
Disabling VT-x/AMD-V worked around the problem with my 32-bit Windows XP virtual machine, unfortunately that won't work for any 64-bit virtual machines. Disabling PAE/NX for a 64-bit vm seems to help it run a little longer before the kernel Oops occurs (for example, with PAE/NX enabled my 64-bit vm trips over consistently while the vm is booting, but with PAE/NX disabled it boots fine and is usable), but it does eventually happen every time for me.
This is with my host system running Linux 3.9.2 and VirtualBox 4.2.12. I'll post the output of /proc/cpuinfo as well.
by , 12 years ago
Output of /proc/cpuinfo from my host machine (Archlinux, kernel v3.9.2, VirtualBox 4.2.12)
follow-up: 17 comment:15 by , 12 years ago
follow-up: 18 comment:16 by , 12 years ago
We still cannot reproduce this problem. I would be interested to see more kernel logs after a VM crashed like described above.
follow-up: 19 comment:18 by , 12 years ago
Hi, frank-
Replying to frank:
We still cannot reproduce this problem. I would be interested to see more kernel logs after a VM crashed like described above.
What kernel version are you running on the host? What kind of processor does the host have?
I exchanged some private emails with the maintainer of the Archlinux package. He says he can't reproduce the problem, either. I'm not sure what the common trigger is between all of the affected systems...
by , 12 years ago
Attachment: | virtualbox-4.2.12-3-linux-3.9.3-1-oops.log added |
---|
Here is another kernel log from my system running Linux 3.9.3 and VirtualBox 4.2.12.
follow-up: 25 comment:19 by , 12 years ago
@sl4mmy: Care to post the whole dmesg log instead of just the oops portion? And the number of guests running simultanously when you got the oops?
Also you might want to see if you can reproduce this issue with the below test build (major rewrite of the VT-x code including many bug fixes and performance improvements)
http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85607-Linux_amd64.run
follow-up: 26 comment:20 by , 12 years ago
Hi all,
Im experiencing the same behavior: VM's boot up and freeze later, which can be easily reproduced by writing big amounts of data to the virtual disc. In my case I do a big "svn co " and the hang happens usually after svn has written ~ 1 GB.
Interestingly enough, It happens only on my server hardware: HP Prolient with 64 GB Memory and 24 Intel Xeon Cores. Quite similar setup (same VirtualBox version, same kernel, save VM) on my workstation (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz) works fine.
I can provide more details if necessary.
comment:21 by , 12 years ago
I've done some testing and begin to see patterns. On our side, this issue arises under following circumstances:
- 64 bit guest (tested with Linux 3.8 and Windows 7)
- 64 bit Linux host (Ubuntu Server in our case). Other host OSes not tested.
- Intel Xeon hardware. Yes: it doesnt trigger on a Intel Core i7, host OS/guest OS/virtualizer all being equal.
Switching the following settings on and off doesnt matter:
- PAE/NX
- Nested Paging
- VT-x/AMD-V (This one cannot be switched off on 64 guests, of course)
Also, the behavior under the posted test build (4.2.51) is still the same.
Hope that helps.
follow-ups: 24 27 comment:22 by , 12 years ago
Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.
by , 12 years ago
follow-up: 31 comment:24 by , 12 years ago
Replying to wenns:
Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.
can you precise what you disable ? graphics ? FYI, I found in one machine on linux host that one linux guest crash and X won't start when loads /usr/lib/modules/*/extra/VirtualBox/vboxvideo.ko , if I remove it before lunch X , everything works , just disable "seamless mode"
follow-up: 30 comment:25 by , 12 years ago
Hi, quickbooks-
Replying to quickbooks:
@sl4mmy: Care to post the whole dmesg log instead of just the oops portion? And the number of guests running simultanously when you got the oops?
Also you might want to see if you can reproduce this issue with the below test build (major rewrite of the VT-x code including many bug fixes and performance improvements)
I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).
by , 12 years ago
Attachment: | virtualbox-4.2.51-linux-3.9.3-oops.txt added |
---|
Complete dmesg of kernel oops produced using test build 4.2.51.
comment:26 by , 12 years ago
Hi, wenns-
Replying to wenns:
Interestingly enough, It happens only on my server hardware: HP Prolient with 64 GB Memory and 24 Intel Xeon Cores. Quite similar setup (same VirtualBox version, same kernel, save VM) on my workstation (Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz) works fine.
That is interesting. My desktop suffering from this problem has an Intel Xeon X5675 @ 3.07GHz.
comment:27 by , 12 years ago
Hi, wenns-
Replying to wenns:
Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.
Did disabling hardware acceleration work around the problem with your 32-bit guests? With hardware acceleration enabled my 32 bit guests reliably trigger the kernel oops while they are booting, but with hardware acceleration enabled my 32 bit guests are usable (well... they're noticably slower ;)). Again, this is with my desktop machine with a Xeon X5675 @ 3.07GHz.
follow-up: 34 comment:28 by , 12 years ago
So if I summaryse the cpu usage..... .... csreynolds VBox.log show that he use Xeon X5675 @ 3.07GHz sl4mmy VBox.log.1 and cpuinfo.2 show that he use Xeon X5675 @ 3.07GHz timemaster (I) show in cpuinfo am using a Xeon E5620 @ 2.40GHz wenns say that he is using Xeon processor, and his core i7 processor does not have this problem. rmflight VirtualBox-dies.log show that he is using Xeon X5650 @ 2.67GHz
p5n ? wenns ? detail please
That's a lot of Xeon processor in the range of ?56??... plus Wenns say that i7 processor are not affected.....
comment:29 by , 12 years ago
Not sure which guest caused this, as I had 3+ guests running: 1 linux, 2 windows. 1 was installing a new copy of Win 7 64bit.
I have an Intel i3-3225.
May 24 19:09:19 localhost kernel: [11647.766537] EMT-0: page allocation failure: order:9, mode:0x344d2 May 24 19:09:19 localhost kernel: [11647.766541] Pid: 5366, comm: EMT-0 Tainted: PF C O 3.9.3-201.fc18.x86_64 #1 May 24 19:09:19 localhost kernel: [11647.766542] Call Trace: May 24 19:09:19 localhost kernel: [11647.766547] [<ffffffff81139509>] warn_alloc_failed+0xe9/0x150 May 24 19:09:19 localhost kernel: [11647.766551] [<ffffffff81658ae4>] ? __alloc_pages_direct_compact+0x182/0x194 May 24 19:09:19 localhost kernel: [11647.766553] [<ffffffff8113d806>] __alloc_pages_nodemask+0x856/0xae0 May 24 19:09:19 localhost kernel: [11647.766557] [<ffffffff8117c0c8>] alloc_pages_current+0xb8/0x190 May 24 19:09:19 localhost kernel: [11647.766570] [<ffffffffa02bbd60>] rtR0MemObjLinuxAllocPages+0xc0/0x260 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766577] [<ffffffffa02bbf3a>] rtR0MemObjLinuxAllocPhysSub2+0x3a/0xe0 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766583] [<ffffffffa02bc0aa>] rtR0MemObjLinuxAllocPhysSub+0xca/0xd0 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766589] [<ffffffffa02bc479>] rtR0MemObjNativeAllocPhys+0x19/0x20 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766595] [<ffffffffa02ba314>] VBoxHost_RTR0MemObjAllocPhysExTag+0x64/0xb0 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766608] [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766613] [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766618] [<ffffffffa02b2db4>] ? supdrvIOCtl+0x1664/0x2be0 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766623] [<ffffffffa02bb89d>] ? rtR0MemAllocEx+0x17d/0x250 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766628] [<ffffffffa02ad47c>] ? VBoxDrvLinuxIOCtl_4_2_51+0x10c/0x1f0 [vboxdrv] May 24 19:09:19 localhost kernel: [11647.766631] [<ffffffff811b17e7>] ? do_vfs_ioctl+0x97/0x580 May 24 19:09:19 localhost kernel: [11647.766634] [<ffffffff812a157a>] ? inode_has_perm.isra.32.constprop.62+0x2a/0x30 May 24 19:09:19 localhost kernel: [11647.766635] [<ffffffff812a2c07>] ? file_has_perm+0x97/0xb0 May 24 19:09:19 localhost kernel: [11647.766637] [<ffffffff811b1d61>] ? sys_ioctl+0x91/0xb0 May 24 19:09:19 localhost kernel: [11647.766640] [<ffffffff81669f59>] ? system_call_fastpath+0x16/0x1b May 24 19:09:19 localhost kernel: [11647.766641] Mem-Info: May 24 19:09:19 localhost kernel: [11647.766642] Node 0 DMA per-cpu: May 24 19:09:19 localhost kernel: [11647.766643] CPU 0: hi: 0, btch: 1 usd: 0 May 24 19:09:19 localhost kernel: [11647.766644] CPU 1: hi: 0, btch: 1 usd: 0 May 24 19:09:19 localhost kernel: [11647.766645] CPU 2: hi: 0, btch: 1 usd: 0 May 24 19:09:19 localhost kernel: [11647.766646] CPU 3: hi: 0, btch: 1 usd: 0 May 24 19:09:19 localhost kernel: [11647.766646] Node 0 DMA32 per-cpu: May 24 19:09:19 localhost kernel: [11647.766648] CPU 0: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766648] CPU 1: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766649] CPU 2: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766650] CPU 3: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766650] Node 0 Normal per-cpu: May 24 19:09:19 localhost kernel: [11647.766651] CPU 0: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766652] CPU 1: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766653] CPU 2: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766653] CPU 3: hi: 186, btch: 31 usd: 0 May 24 19:09:19 localhost kernel: [11647.766656] active_anon:194178 inactive_anon:4307 isolated_anon:0 May 24 19:09:19 localhost kernel: [11647.766656] active_file:378144 inactive_file:835082 isolated_file:0 May 24 19:09:19 localhost kernel: [11647.766656] unevictable:879 dirty:29 writeback:0 unstable:0 May 24 19:09:19 localhost kernel: [11647.766656] free:58654 slab_reclaimable:32788 slab_unreclaimable:31894 May 24 19:09:19 localhost kernel: [11647.766656] mapped:1056803 shmem:6914 pagetables:11294 bounce:0 May 24 19:09:19 localhost kernel: [11647.766656] free_cma:0 May 24 19:09:19 localhost kernel: [11647.766658] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes May 24 19:09:19 localhost kernel: [11647.766660] lowmem_reserve[]: 0 3436 15947 15947 May 24 19:09:19 localhost kernel: [11647.766662] Node 0 DMA32 free:64824kB min:14548kB low:18184kB high:21820kB active_anon:7812kB inactive_anon:0kB active_file:116kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3631648kB managed:3518864kB mlocked:0kB dirty:0kB writeback:0kB mapped:8356kB shmem:4kB slab_reclaimable:376kB slab_unreclaimable:3972kB kernel_stack:48kB pagetables:1200kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no May 24 19:09:19 localhost kernel: [11647.766665] lowmem_reserve[]: 0 0 12510 12510 May 24 19:09:19 localhost kernel: [11647.766667] Node 0 Normal free:153900kB min:52968kB low:66208kB high:79452kB active_anon:768900kB inactive_anon:17228kB active_file:1512460kB inactive_file:3340232kB unevictable:3516kB isolated(anon):0kB isolated(file):0kB present:13074432kB managed:12811044kB mlocked:3516kB dirty:116kB writeback:0kB mapped:4218856kB shmem:27652kB slab_reclaimable:130776kB slab_unreclaimable:123596kB kernel_stack:2992kB pagetables:43976kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no May 24 19:09:19 localhost kernel: [11647.766670] lowmem_reserve[]: 0 0 0 0 May 24 19:09:19 localhost kernel: [11647.766671] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15892kB May 24 19:09:19 localhost kernel: [11647.766677] Node 0 DMA32: 109*4kB (UEM) 87*8kB (UEM) 131*16kB (UEM) 56*32kB (UEM) 83*64kB (UEM) 60*128kB (UM) 33*256kB (UM) 13*512kB (UM) 15*1024kB (UEM) 8*2048kB (UM) 0*4096kB = 64860kB May 24 19:09:19 localhost kernel: [11647.766684] Node 0 Normal: 5990*4kB (UEM) 2991*8kB (UEM) 1510*16kB (UEM) 783*32kB (EM) 341*64kB (EM) 74*128kB (UM) 30*256kB (UEM) 15*512kB (UEM) 10*1024kB (UM) 0*2048kB 0*4096kB = 154000kB May 24 19:09:19 localhost kernel: [11647.766691] 1220696 total pagecache pages May 24 19:09:19 localhost kernel: [11647.766692] 0 pages in swap cache May 24 19:09:19 localhost kernel: [11647.766693] Swap cache stats: add 0, delete 0, find 0/0 May 24 19:09:19 localhost kernel: [11647.766693] Free swap = 0kB May 24 19:09:19 localhost kernel: [11647.766694] Total swap = 0kB May 24 19:09:19 localhost kernel: [11647.795919] 4186111 pages RAM May 24 19:09:19 localhost kernel: [11647.795922] 2599506 pages reserved May 24 19:09:19 localhost kernel: [11647.795923] 1370459 pages shared May 24 19:09:19 localhost kernel: [11647.795923] 1307461 pages non-shared
follow-ups: 33 43 comment:30 by , 12 years ago
Replying to sl4mmy:
I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).
Test Build (May 22)
Linux 64 Host: http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run
Extension pack: http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack
Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump
comment:31 by , 12 years ago
Replying to sergiomb:
Replying to wenns:
Oops. The above is not quite right: I just experienced the same issue with an 32 bit guest (Windows 7), hardware acceleration enabled. Will disable and recheck now.
can you precise what you disable ? graphics ?
I disable VT-x/AMD-V. And now it work reliably. I'll post an overview in a couple of minutes.
FYI, I found in one machine on linux host that one linux guest crash and X won't start when loads /usr/lib/modules/*/extra/VirtualBox/vboxvideo.ko , if I remove it before lunch X , everything works , just disable "seamless mode"
by , 12 years ago
Attachment: | host_uname added |
---|
by , 12 years ago
Attachment: | host_dmesg added |
---|
by , 12 years ago
Attachment: | host_cpuinfo added |
---|
by , 12 years ago
Attachment: | host_lsmod added |
---|
by , 12 years ago
Attachment: | host_meminfo added |
---|
by , 12 years ago
Attachment: | host_vb_version added |
---|
comment:32 by , 12 years ago
I'm glad there are people caring about the issue and interested in details. So here they are. In short: Im able to trigger this issue reliably under following conditions:
- A guest (OS doesnt seem to matter) with VT-x/AMD-V enabled is running on
- Intel Xeon X5670@2.93GHz, with Linux Ubuntu Server 64 Bit on top.
I tried a couple of Linux systems and Windows 7 (64 and 32 bit) as guests, all behave the same. I *didnt* try an other host OS.
See attachments for further details on the host platform.
follow-ups: 35 39 comment:33 by , 12 years ago
Replying to quickbooks:
Replying to sl4mmy:
I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).
Test Build (May 22)
Linux 64 Host: http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run
Extension pack: http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack
Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump
I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.
comment:34 by , 12 years ago
Replying to timemaster:
p5n ?
wenns ? detail please
CPU: Dual Xeon E5506
MB: Intel S5500BC
OS: ArchLinux
comment:35 by , 12 years ago
follow-ups: 41 44 comment:36 by , 12 years ago
wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!
comment:37 by , 12 years ago
Uggly workaround:
ls -1d /sys/devices/system/cpu/cpu?/online | while read a; do echo 0 >$a; done
Yes, it dramatically slows down your host )
follow-up: 48 comment:38 by , 12 years ago
Actually switching off one of two CPUs helped me.
(I switched off all odd cores: 1,3,5,7)
comment:39 by , 12 years ago
Replying to wenns:
Replying to quickbooks:
Replying to sl4mmy:
I can reliably reproduce the issue with only a single guest running. Also, the test build you linked to still suffers from the same problem. I will post a complete dmesg from my test using that test build (4.2.51).
Test Build (May 22)
Linux 64 Host: http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run
Extension pack: http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack
Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump
I have a core dump now but its quite big (350 M gzipped). How can I pass it to you? I file that big cannot be attached to this thread.
Upload it to ftp://ftp.oracle.com/appsdev/incoming together with attaching log file, and then just post the file name here.
That way only Oracle Developer's can take a look at the core dump, as sometimes core dumps contain sensitive information.
You probably will need a FTP upload software like FileZilla or gFTP etc.
comment:40 by , 12 years ago
Hi , I just check and this is not my bug problem, I disable all CPU acceleration, and still hangs my laptop on resume a VM , sometimes seems my 6 gigas of swap is not enough. If you know other bug tickets that may address my problem, I was grateful that you show me .
Thanks,
comment:41 by , 12 years ago
Replying to frank:
wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!
Here they are: see attachments, file vb_crash_dataset.tar.gz
by , 12 years ago
Attachment: | vb_crash_dataset.tar.gz added |
---|
follow-up: 45 comment:42 by , 12 years ago
Thanks wenns! We now see where it crashes but don't know yet why it crashes. Did you ever run an older kernel on your Xeon box with the same setup, so can you confirm that this is a Linux 3.8 regression? Or did you see the same crashes with older Linux kernels?
comment:43 by , 12 years ago
Hi, quickbooks-
Replying to quickbooks:
Test Build (May 22)
Linux 64 Host: http://www.virtualbox.org/download/testcase/VirtualBox-4.2.51-85953-Linux_amd64.run
Extension pack: http://www.virtualbox.org/download/testcase/Oracle_VM_VirtualBox_Extension_Pack-4.2.51-85953.vbox-extpack
Can you upload a coredump of the guest + guest log file: https://www.virtualbox.org/wiki/Core_dump
I was able to reproduce the problem with the 85953 build. I uploaded a tarball with logs and coredumps named sl4mmy-virtualbox-4.2.51-linux-3.9.4-oops.tar.gz to the FTP site.
by , 12 years ago
Attachment: | virtualbox-4.2.51-linux-3.9.4-oops.txt added |
---|
Full system log of crash with VirtualBox-4.2.51-85953 and Linux 3.9.4
by , 12 years ago
Attachment: | VBoxSVC.log added |
---|
VirtualBox service log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4
comment:44 by , 12 years ago
Hi, frank-
Replying to frank:
wenns and others, I'm also interested in another set of data: When this happens, please attach the VBox.log file from the VM session you are currently running together with the output of 'dmesg' from the host. I need both files from the same time for investigation. Thank you!
I uploaded a tarball named sl4mmy-virtualbox-4.2.51-linux-3.9.4-oops.tar.gz to the FTP site that includes both logs plus coredumps of VirtualBox, VBoxSVC and VBoxXPCOMIPCD. I also attached both log files separately to this ticket:
comment:45 by , 12 years ago
Hi, frank-
Replying to frank:
Thanks wenns! We now see where it crashes but don't know yet why it crashes. Did you ever run an older kernel on your Xeon box with the same setup, so can you confirm that this is a Linux 3.8 regression? Or did you see the same crashes with older Linux kernels?
I ran VirtualBox on this workstation without problems since October 2012. The last working version for me was VirtualBox 4.2.8 with Linux 3.7.10.
Unfortunately, the official VirtualBox 4.2.10+ packages for Arch require Linux 3.8+ so I can't easily test VirtualBox 4.2.12 with Linux 3.7.10. It also makes it difficult to identify the regression: was the problem introduced in VirtualBox 4.2.10 or Linux 3.8?
follow-up: 47 comment:46 by , 12 years ago
sl4mmy, thanks for the logs. But one log is missing: The VBox.log file from the VM. You provided VBoxSVC.log which is from the VBoxSVC server. The VBox.log file can be found either from the VM selector window / Machine / Show Log ... or can also be found in the VM configuration directory under Logs.
by , 12 years ago
Attachment: | sl4mmy-virtualbox-4.2.51-linux-3.9.4-vbox.log added |
---|
VBox.log from crash with VirtualBox 4.2.51-85953 and Linux 3.9.4
comment:47 by , 12 years ago
Hi, frank-
Replying to frank:
sl4mmy, thanks for the logs. But one log is missing: The VBox.log file from the VM. You provided VBoxSVC.log which is from the VBoxSVC server. The VBox.log file can be found either from the VM selector window / Machine / Show Log ... or can also be found in the VM configuration directory under Logs.
D'oh! Sorry... I've just attached the VBox.log from the same session yesterday as the other log files.
comment:48 by , 12 years ago
Hi, p5n-
Replying to p5n:
Actually switching off one of two CPUs helped me.
(I switched off all odd cores: 1,3,5,7)
Wow, that's a really interesting observation! I've been able to work-around the issue on my machine by doing the same, thanks!
comment:49 by , 12 years ago
Howdy-
Thanks to p5n's observations (https://www.virtualbox.org/ticket/11610#comment:37 and https://www.virtualbox.org/ticket/11610#comment:38) I came up with a work-around that doesn't require disabling hardware virtualization acceleration:
$ numactl --cpunodebind=0 --localalloc -- /opt/VirtualBox/VirtualBox
First of all, here is what the numa topology of my workstation looks like:
$ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 node 0 size: 6143 MB node 0 free: 340 MB node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23 node 1 size: 6127 MB node 1 free: 154 MB node distances: node 0 1 0: 10 20 1: 20 10
So with this work-around VirtualBox can only run on the CPUs of node 0, and all of the memory used by VirtualBox should be allocated on the same node running the process.
By the way, this is with the VirtualBox 4.2.51-85953 test build frank and others linked to, and Linux 3.9.4.
Interestingly, when I first tried playing with numactl after reading p5n's comments I tried binding to the CPUs on node 1, not node 0, but I encountered the same kernel oops. I tried putzing with a few more options to numactl but to no avail. Before giving up, however, I decided to try binding to node 0 instead, and sure it enough it appears to work!
What is it about node 0 that is special? Is it anyway related to the fact that node 0 is the initial boot node?
I tested with a 32-bit Windows guest with 2 CPUs and a 64-bit RHEL 6.3 guest with 2 CPUs. I even tested with both running simultaneously, watching YouTube videos in the Windows guest while running some builds in the RHEL guest. :) Zero kernel Oops so far...
Yay! Big thanks to p5n!
follow-up: 51 comment:50 by , 12 years ago
Thanks sl4mmy, also for your additional log. This helps further...
comment:51 by , 12 years ago
Hi, frank-
Replying to frank:
Thanks sl4mmy, also for your additional log. This helps further...
Sure, no problem!
Also, I can confirm that the work-around also works with the official Arch packages for VirtualBox 4.2.12 (virtualbox-4.2.12-3 and virtualbox-host-modules-4.2.12-6) on Linux 3.9.4.
comment:52 by , 12 years ago
I think the reason for this problem is CONFIG_NUMA_BALANCING which was introduced in Linux 3.8. Currently looking for a patch how to prevent migrating pages between numa nodes. Probably by setting a VM area flag...
comment:54 by , 12 years ago
Just an update: We know what's wrong but it will be difficult to fix. Actually we are a bit over-stretching the Linux kernel API. We plan a workaround for 4.2.x and a better fix for the next major release. As written above, this problem affects only people which have more than one NUMA node in their system (output of numctl --hardware).
follow-up: 56 comment:55 by , 12 years ago
The following patch will be included in the next maintenance release (expected very soon). To fix the problem, please go to /usr/src/vboxhost-4.2.12/vboxdrv/r0drv/linux and apply these two lines manually. Then make sure that all VMs are terminated, recompile the host kernel driver (/etc/init.d/vboxdrv setup) and that was it. Or just wait a bit for the release.
This is actually a workaround but we cannot do a more fundamental fix. The simple fix will require a Linux kernel change, the difficult fix will require many many code changes in VBox so this will have to wait.
--- memobj-r0drv-linux.c (revision 86600) +++ memobj-r0drv-linux.c (revision 86601) @@ -1527,6 +1527,21 @@ } } +#ifdef CONFIG_NUMA_BALANCING + if (RT_SUCCESS(rc)) + { + /** @todo Ugly hack! But right now we have no other means to disable + * automatic NUMA page balancing. */ +# ifdef RT_OS_X86 + pTask->mm->numa_next_reset = jiffies + 0x7fffffffUL; + pTask->mm->numa_next_scan = jiffies + 0x7fffffffUL; +# else + pTask->mm->numa_next_reset = jiffies + 0x7fffffffffffffffUL; + pTask->mm->numa_next_scan = jiffies + 0x7fffffffffffffffUL; +# endif + } +#endif + up_write(&pTask->mm->mmap_sem); if (RT_SUCCESS(rc))
comment:56 by , 12 years ago
Replying to frank:
This is actually a workaround but we cannot do a more fundamental fix. The simple fix will require a Linux kernel change, the difficult fix will require many many code changes in VBox so this will have to
Can you post a trunk build for 64 bit linux, plz. thnx.
comment:59 by , 12 years ago
Hi, Frank-
I can confirm that the problem no longer occurs on my host system with VirtualBox 4.2.16. Thanks!
comment:60 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Hi, sl4mmy, thanks for the feedback and thanks again for helping debugging this problem. I will close this ticket. A better fix is required but this one will do it for the moment.
This problem started after I upgraded to kernel 3.8.x. 3.7.x functions properly.