VirtualBox

Opened 16 years ago

Closed 16 years ago

Last modified 15 years ago

#1840 closed defect (fixed)

VBoxSVC core dumps on post snv_94 opensolaris

Reported by: jkeil Owned by:
Component: VM control Version: VirtualBox 1.6.2
Keywords: Cc:
Guest type: other Host type: Solaris

Description

I'm using SXCE build 91, bfu'ed to post snv_94 opensolaris (2008-07-11).

Trying to start virtualbox fails:


% virtualbox
ERROR: 0 bytes read from child process

The VBoxSVC process crashes with a SIGSEGV:

dmesg:

Jul 14 00:18:01 max genunix: [ID 603404 kern.notice] NOTICE: core_log: VBoxSVC[869] core dumped: /cores/VBoxSVC-869

# pflags /cores/VBoxSVC-869
core '/cores/VBoxSVC-869' of 869:	/opt/VirtualBox//VBoxSVC --automate
	data model = _ILP32  flags = ORPHAN|MSACCT|MSFORK
 /1:	flags = 0
	sigmask = 0xffffbefc,0x0000ffff  cursig = SIGSEGV

# pstack /cores/VBoxSVC-869
core '/cores/VBoxSVC-869' of 869:	/opt/VirtualBox//VBoxSVC --automate
 afc83d14 PR_EnterMonitor (0) + 23
 af41133a _PR_InitLinker (af57b000, 80418f8, af56493c, 0, 1, 0) + 3e
 af41a4e1 PR_Init  (0, 1, 0) + 195
 af56493c prldap_nspr_init (afffc7dc, af5a0a00, 9, 8041944, affd38fc, afffc178) + 74
 af5658e9 _init    (afffc178, af7e0948, afffc7dc, 804196c, affd671e, af156e5c) + 25
 affd38fc call_init (af1a0ae8, 1) + f8
 affd3e9f load_completion (af7e0948) + ef
 affd90f6 dlsym_intn (af7e0e48, afc9c94a, afb70018, 8041a24) + 19a
 affd9172 dlsym_check (af7e0e48, afc9c94a, afb70018, 8041a24) + 6e
 affd91ea dlsym    (af7e0e48, afc9c94a, af7e0948, afb9ee94, afbf16d4, afffc350) + 4e
 afc6d758 pr_FindSymbolInProg (afc9c94a, affc7ed4, afb70018, 2f08) + 38
 afc6d790 _PR_InitZones (afc84b79, afcb43e4, 8041aa8, afc7254b, 8041ab8, afb70018) + 21
 afc723b7 _PR_InitStuff (8041ab8, afb70018, 8041ab8, afc84b79, 8041ad8, afcb43e4) + 2d
 afc7254b _PR_ImplicitInitialization (8041ad8, afcb43e4, 8041ad8, afc50d32, afc50e22, afcf86c8) + b
 afc84b79 PR_GetCurrentThread (afc50e22, afcf86c8, 0, afcf86c8, 8041af8, afcb43e4) + 23
 afc50d32 _ZN9nsIThread10GetCurrentEPPS_ (afcf86c8, 0, 0, afc50dfe) + 1e
 afc50e22 _ZN9nsIThread13SetMainThreadEv (af722a00, 8041b20, af946f74, 3, 8041b2c, affd003d) + 30
 afc891a0 NS_InitXPCOM2 (8046cf0, 8278980, 82771a8, 0) + 30
 0818c585 _ZN3com10InitializeEv (0, 0, 8273a18, 1) + 4c1
 08179c8f main     (2, 8046e80, 8046e8c) + 20b
 080b26e8 _start   (2, 8047020, 8047039, 0, 8047044, 8047064) + 80

In mdb we see that libnspr4.so`_PR_InitLinker+0x39 is calling PR_EnterMonitor, which got resolved as "VBoxXPCOM.so`PR_EnterMonitor". Apparently there also is a "libnspr4.so`PR_EnterMonitor" symbol. Most likely the expected behaviour is to call libnspr4.so`PR_EnterMonitor.

After booting an unmodified snv_91, virtualbox runs just fine. I also tried an opensolaris installation using the "matrix-unstable" kernel (based on snv_93), and virtualbox starts ok, too. So it seems that some onnv-gate change between snv_93 and snv_95 (2008-7-11) did break virtualbox.

on matrix-unstable, VBoxSVC has these shared library loaded (note: there is no libnspr4.so loaded):

# pldd core.941
core 'core.941' of 941:	/opt/VirtualBox//VBoxSVC --automate
/lib/amd64/libadm.so.1
/lib/amd64/libdevinfo.so.1
/opt/VirtualBox/VBoxDDU.so
/opt/VirtualBox/VBoxSettings.so
/opt/VirtualBox/VBoxRT.so
/lib/amd64/librt.so.1
/opt/VirtualBox/VBoxXPCOM.so
/usr/sfw/lib/amd64/libstdc++.so.6.0.3
/lib/amd64/libm.so.2
/usr/sfw/lib/amd64/libgcc_s.so.1
/lib/amd64/libc.so.1
/lib/amd64/libsocket.so.1
/lib/amd64/libz.so.1
/lib/amd64/libsendfile.so.1
/lib/amd64/libnvpair.so.1
/lib/amd64/libnsl.so.1
/usr/lib/locale/de_DE.ISO8859-1/amd64/de_DE.ISO8859-1.so.3
/lib/amd64/libsec.so.1
/lib/amd64/libgen.so.1
/lib/amd64/libmp.so.2
/lib/amd64/libmd.so.1
/lib/amd64/libscf.so.1
/lib/amd64/libavl.so.1
/lib/amd64/libuutil.so.1
/opt/VirtualBox/components/VBoxXPCOMIPCC.so

snv_95 has these (libnspr4.so is there, and defines lots of symbols that are already present in VBoxXPCOM.so):

#	pldd /cores/VBoxSVC-869	
core '/cores/VBoxSVC-869' of 869:	/opt/VirtualBox//VBoxSVC --automate
/lib/libadm.so.1
/lib/libdevinfo.so.1
/opt/VirtualBox/VBoxDDU.so
/opt/VirtualBox/VBoxSettings.so
/opt/VirtualBox/VBoxRT.so
/lib/librt.so.1
/opt/VirtualBox/VBoxXPCOM.so
/usr/sfw/lib/libstdc++.so.6.0.3
/lib/libm.so.2
/usr/sfw/lib/libgcc_s.so.1
/lib/libc.so.1
/lib/libsocket.so.1
/lib/libz.so.1
/lib/libsendfile.so.1
/lib/libnvpair.so.1
/lib/libnsl.so.1
/usr/lib/locale/de_DE.ISO8859-1/de_DE.ISO8859-1.so.3
/lib/libsec.so.1
/lib/libgen.so.1
/lib/libmp.so.2
/lib/libmd.so.1
/lib/libscf.so.1
/lib/libavl.so.1
/usr/lib/libidmap.so.1
/lib/libuutil.so.1
/usr/lib/libldap.so.5
/lib/libresolv.so.2
/usr/lib/libsldap.so.1
/usr/lib/libsasl.so.1
/usr/lib/mps/libnspr4.so	<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
/lib/libpthread.so.1
/lib/libthread.so.1
/lib/libdl.so.1
/usr/lib/mps/libplc4.so
/usr/lib/mps/libnss3.so
/usr/lib/mps/libsoftokn3.so
/usr/lib/mps/libplds4.so
/lib/libbsm.so.1
/usr/lib/mps/libssl3.so
/lib/libsecdb.so.1
/lib/libtsol.so.2

libnspr4.so seems to be pulled by /usr/lib/libldap.so.5, and is a lazyloaded dependency.


Workaround:
===========

Start VBoxSVC with environment variable LD_NODIRECT=1:

# cd /opt/VirtualBox

# mv VBoxSVC VBoxSVC.real

# cat > VBoxSVC
#!/bin/sh

LD_NODIRECT=1
export LD_NODIRECT 

exec /opt/VirtualBox/VBoxSVC.real "$@"

# chmod +x VBoxSVC

Change History (7)

comment:1 by jkeil, 16 years ago

An alternate workaround is to LD_PRELOAD a shared library into VBoxSVC which defines the global variable "nspr_use_zone_allocator"

% cat nspr_alloc.c int nspr_use_zone_allocator = 0;

% cc -m64 -G -h nspr_alloc.so -o nspr_alloc.so nspr_alloc.c

% /opt/VirtualBox/VBoxSVC Segmentation Fault (core dumped)

% env LD_PRELOAD=/tmp/nspr_alloc.so /opt/VirtualBox/VBoxSVC * Sun xVM VirtualBox XPCOM Server Version 1.6.2 (C) 2008 Sun Microsystems, Inc. All rights reserved.

Starting event loop.... [press Ctrl-C to quit]

The pr_FindSymbolInProg("nspr_use_zone_allocator") call from _PR_InitZones() seems to trigger forcing all lazyloaded shared libraries into memory (which includes libnspr4.so), which breaks VBoxSVC.

comment:2 by jkeil, 16 years ago

Root cause that triggers the problem is apparently changeset 7057 "PSARC/2008/342 Further SID support". When I backout that changeset, the original VBoxSVC doesn't crash any more on startup.

CS 7057 added a new dependency for libsec.so: libidmap.so.1

And libidmap.so.1 needs libldap.so.5, which needs libnspr4.so.

comment:3 by Nicolas Williams, 16 years ago

I've filed:

6733880 6677411/6677801/PSARC/2008/342 broke VirtualBox by polluting its namespace

to track this in OpenSolaris.

comment:4 by Nicolas Williams, 16 years ago

Why does VBoxXPCOM.so have symbols with the same names as in NSPR?

Is it just an accident? Or it VBoxXPCOM.so using a private copy of NSPR?

If the former, can the conflicting symbols be renamed? If the latter, can it be modified to use the NSPR delivered by Solaris?

comment:5 by Nicolas Williams, 16 years ago

We've discussed this at Sun and have concluded that the best solution is for VirtualBox to use the copy of NSPR that is delivered by Solaris. We do not support multiple instances of NSPR in one process.

comment:6 by Klaus Espenlaub, 16 years ago

The fix for this has been integrated in what will be VirtualBox 1.6.6, so that version won't have this weird linking issue any more.

comment:7 by Frank Mehnert, 16 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette