#1840 closed defect (fixed)
VBoxSVC core dumps on post snv_94 opensolaris
Reported by: | jkeil | Owned by: | |
---|---|---|---|
Component: | VM control | Version: | VirtualBox 1.6.2 |
Keywords: | Cc: | ||
Guest type: | other | Host type: | Solaris |
Description
I'm using SXCE build 91, bfu'ed to post snv_94 opensolaris (2008-07-11).
Trying to start virtualbox fails:
% virtualbox
ERROR: 0 bytes read from child process
The VBoxSVC process crashes with a SIGSEGV:
dmesg:
Jul 14 00:18:01 max genunix: [ID 603404 kern.notice] NOTICE: core_log: VBoxSVC[869] core dumped: /cores/VBoxSVC-869
# pflags /cores/VBoxSVC-869 core '/cores/VBoxSVC-869' of 869: /opt/VirtualBox//VBoxSVC --automate data model = _ILP32 flags = ORPHAN|MSACCT|MSFORK /1: flags = 0 sigmask = 0xffffbefc,0x0000ffff cursig = SIGSEGV # pstack /cores/VBoxSVC-869 core '/cores/VBoxSVC-869' of 869: /opt/VirtualBox//VBoxSVC --automate afc83d14 PR_EnterMonitor (0) + 23 af41133a _PR_InitLinker (af57b000, 80418f8, af56493c, 0, 1, 0) + 3e af41a4e1 PR_Init (0, 1, 0) + 195 af56493c prldap_nspr_init (afffc7dc, af5a0a00, 9, 8041944, affd38fc, afffc178) + 74 af5658e9 _init (afffc178, af7e0948, afffc7dc, 804196c, affd671e, af156e5c) + 25 affd38fc call_init (af1a0ae8, 1) + f8 affd3e9f load_completion (af7e0948) + ef affd90f6 dlsym_intn (af7e0e48, afc9c94a, afb70018, 8041a24) + 19a affd9172 dlsym_check (af7e0e48, afc9c94a, afb70018, 8041a24) + 6e affd91ea dlsym (af7e0e48, afc9c94a, af7e0948, afb9ee94, afbf16d4, afffc350) + 4e afc6d758 pr_FindSymbolInProg (afc9c94a, affc7ed4, afb70018, 2f08) + 38 afc6d790 _PR_InitZones (afc84b79, afcb43e4, 8041aa8, afc7254b, 8041ab8, afb70018) + 21 afc723b7 _PR_InitStuff (8041ab8, afb70018, 8041ab8, afc84b79, 8041ad8, afcb43e4) + 2d afc7254b _PR_ImplicitInitialization (8041ad8, afcb43e4, 8041ad8, afc50d32, afc50e22, afcf86c8) + b afc84b79 PR_GetCurrentThread (afc50e22, afcf86c8, 0, afcf86c8, 8041af8, afcb43e4) + 23 afc50d32 _ZN9nsIThread10GetCurrentEPPS_ (afcf86c8, 0, 0, afc50dfe) + 1e afc50e22 _ZN9nsIThread13SetMainThreadEv (af722a00, 8041b20, af946f74, 3, 8041b2c, affd003d) + 30 afc891a0 NS_InitXPCOM2 (8046cf0, 8278980, 82771a8, 0) + 30 0818c585 _ZN3com10InitializeEv (0, 0, 8273a18, 1) + 4c1 08179c8f main (2, 8046e80, 8046e8c) + 20b 080b26e8 _start (2, 8047020, 8047039, 0, 8047044, 8047064) + 80
In mdb we see that libnspr4.so`_PR_InitLinker+0x39 is calling PR_EnterMonitor, which got resolved as "VBoxXPCOM.so`PR_EnterMonitor". Apparently there also is a "libnspr4.so`PR_EnterMonitor" symbol. Most likely the expected behaviour is to call libnspr4.so`PR_EnterMonitor.
After booting an unmodified snv_91, virtualbox runs just fine. I also tried an opensolaris installation using the "matrix-unstable" kernel (based on snv_93), and virtualbox starts ok, too. So it seems that some onnv-gate change between snv_93 and snv_95 (2008-7-11) did break virtualbox.
on matrix-unstable, VBoxSVC has these shared library loaded (note: there is no libnspr4.so loaded):
# pldd core.941 core 'core.941' of 941: /opt/VirtualBox//VBoxSVC --automate /lib/amd64/libadm.so.1 /lib/amd64/libdevinfo.so.1 /opt/VirtualBox/VBoxDDU.so /opt/VirtualBox/VBoxSettings.so /opt/VirtualBox/VBoxRT.so /lib/amd64/librt.so.1 /opt/VirtualBox/VBoxXPCOM.so /usr/sfw/lib/amd64/libstdc++.so.6.0.3 /lib/amd64/libm.so.2 /usr/sfw/lib/amd64/libgcc_s.so.1 /lib/amd64/libc.so.1 /lib/amd64/libsocket.so.1 /lib/amd64/libz.so.1 /lib/amd64/libsendfile.so.1 /lib/amd64/libnvpair.so.1 /lib/amd64/libnsl.so.1 /usr/lib/locale/de_DE.ISO8859-1/amd64/de_DE.ISO8859-1.so.3 /lib/amd64/libsec.so.1 /lib/amd64/libgen.so.1 /lib/amd64/libmp.so.2 /lib/amd64/libmd.so.1 /lib/amd64/libscf.so.1 /lib/amd64/libavl.so.1 /lib/amd64/libuutil.so.1 /opt/VirtualBox/components/VBoxXPCOMIPCC.so
snv_95 has these (libnspr4.so is there, and defines lots of symbols that are already present in VBoxXPCOM.so):
# pldd /cores/VBoxSVC-869 core '/cores/VBoxSVC-869' of 869: /opt/VirtualBox//VBoxSVC --automate /lib/libadm.so.1 /lib/libdevinfo.so.1 /opt/VirtualBox/VBoxDDU.so /opt/VirtualBox/VBoxSettings.so /opt/VirtualBox/VBoxRT.so /lib/librt.so.1 /opt/VirtualBox/VBoxXPCOM.so /usr/sfw/lib/libstdc++.so.6.0.3 /lib/libm.so.2 /usr/sfw/lib/libgcc_s.so.1 /lib/libc.so.1 /lib/libsocket.so.1 /lib/libz.so.1 /lib/libsendfile.so.1 /lib/libnvpair.so.1 /lib/libnsl.so.1 /usr/lib/locale/de_DE.ISO8859-1/de_DE.ISO8859-1.so.3 /lib/libsec.so.1 /lib/libgen.so.1 /lib/libmp.so.2 /lib/libmd.so.1 /lib/libscf.so.1 /lib/libavl.so.1 /usr/lib/libidmap.so.1 /lib/libuutil.so.1 /usr/lib/libldap.so.5 /lib/libresolv.so.2 /usr/lib/libsldap.so.1 /usr/lib/libsasl.so.1 /usr/lib/mps/libnspr4.so <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< /lib/libpthread.so.1 /lib/libthread.so.1 /lib/libdl.so.1 /usr/lib/mps/libplc4.so /usr/lib/mps/libnss3.so /usr/lib/mps/libsoftokn3.so /usr/lib/mps/libplds4.so /lib/libbsm.so.1 /usr/lib/mps/libssl3.so /lib/libsecdb.so.1 /lib/libtsol.so.2
libnspr4.so seems to be pulled by /usr/lib/libldap.so.5, and is a lazyloaded dependency.
Workaround:
===========
Start VBoxSVC with environment variable LD_NODIRECT=1:
# cd /opt/VirtualBox # mv VBoxSVC VBoxSVC.real # cat > VBoxSVC #!/bin/sh LD_NODIRECT=1 export LD_NODIRECT exec /opt/VirtualBox/VBoxSVC.real "$@" # chmod +x VBoxSVC
Change History (7)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Root cause that triggers the problem is apparently changeset 7057 "PSARC/2008/342 Further SID support". When I backout that changeset, the original VBoxSVC doesn't crash any more on startup.
CS 7057 added a new dependency for libsec.so: libidmap.so.1
And libidmap.so.1 needs libldap.so.5, which needs libnspr4.so.
comment:3 by , 16 years ago
I've filed:
6733880 6677411/6677801/PSARC/2008/342 broke VirtualBox by polluting its namespace
to track this in OpenSolaris.
comment:4 by , 16 years ago
Why does VBoxXPCOM.so have symbols with the same names as in NSPR?
Is it just an accident? Or it VBoxXPCOM.so using a private copy of NSPR?
If the former, can the conflicting symbols be renamed? If the latter, can it be modified to use the NSPR delivered by Solaris?
comment:5 by , 16 years ago
We've discussed this at Sun and have concluded that the best solution is for VirtualBox to use the copy of NSPR that is delivered by Solaris. We do not support multiple instances of NSPR in one process.
comment:6 by , 16 years ago
The fix for this has been integrated in what will be VirtualBox 1.6.6, so that version won't have this weird linking issue any more.
comment:7 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
An alternate workaround is to LD_PRELOAD a shared library into VBoxSVC which defines the global variable "nspr_use_zone_allocator"
% cat nspr_alloc.c int nspr_use_zone_allocator = 0;
% cc -m64 -G -h nspr_alloc.so -o nspr_alloc.so nspr_alloc.c
% /opt/VirtualBox/VBoxSVC Segmentation Fault (core dumped)
% env LD_PRELOAD=/tmp/nspr_alloc.so /opt/VirtualBox/VBoxSVC * Sun xVM VirtualBox XPCOM Server Version 1.6.2 (C) 2008 Sun Microsystems, Inc. All rights reserved.
Starting event loop.... [press Ctrl-C to quit]
The pr_FindSymbolInProg("nspr_use_zone_allocator") call from _PR_InitZones() seems to trigger forcing all lazyloaded shared libraries into memory (which includes libnspr4.so), which breaks VBoxSVC.