GMMR0.cpp@ 55889

Last change on this file since 55889 was 51940, checked in by vboxsync, 10 years ago
GMMR0: Switched from fast mutex to critical section for the giant GMMR0 lock to avoid running into unnecessary trouble with the windows driver verifier. Required making the critical section code compile and link in the ring-0 environment.
Property svn:eol-style set to `native` Property svn:keywords set to `Id Revision`
File size: 189.9 KB

Line
1	/* $Id: GMMR0.cpp 51940 2014-07-08 17:45:51Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2013 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relation ship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	*
111	* @subsection sub_gmm_locking Serializing
112	*
113	* One simple fast mutex will be employed in the initial implementation, not
114	* two as mentioned in @ref subsec_pgmPhys_Serializing.
115	*
116	* @see @ref subsec_pgmPhys_Serializing
117	*
118	*
119	* @section sec_gmm_overcommit Memory Over-Commitment Management
120	*
121	* The GVM will have to do the system wide memory over-commitment
122	* management. My current ideas are:
123	* - Per VM oc policy that indicates how much to initially commit
124	* to it and what to do in a out-of-memory situation.
125	* - Prevent overtaxing the host.
126	*
127	* There are some challenges here, the main ones are configurability and
128	* security. Should we for instance permit anyone to request 100% memory
129	* commitment? Who should be allowed to do runtime adjustments of the
130	* config. And how to prevent these settings from being lost when the last
131	* VM process exits? The solution is probably to have an optional root
132	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
133	*
134	*
135	*
136	* @section sec_gmm_numa NUMA
137	*
138	* NUMA considerations will be designed and implemented a bit later.
139	*
140	* The preliminary guesses is that we will have to try allocate memory as
141	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
142	* threads). Which means it's mostly about allocation and sharing policies.
143	* Both the scheduler and allocator interface will to supply some NUMA info
144	* and we'll need to have a way to calc access costs.
145	*
146	*/
147
148
149	/*******************************************************************************
150	* Header Files *
151	*******************************************************************************/
152	#define LOG_GROUP LOG_GROUP_GMM
153	#include <VBox/rawpci.h>
154	#include <VBox/vmm/vm.h>
155	#include <VBox/vmm/gmm.h>
156	#include "GMMR0Internal.h"
157	#include <VBox/vmm/gvm.h>
158	#include <VBox/vmm/pgm.h>
159	#include <VBox/log.h>
160	#include <VBox/param.h>
161	#include <VBox/err.h>
162	#include <iprt/asm.h>
163	#include <iprt/avl.h>
164	#ifdef VBOX_STRICT
165	# include <iprt/crc.h>
166	#endif
167	#include <iprt/critsect.h>
168	#include <iprt/list.h>
169	#include <iprt/mem.h>
170	#include <iprt/memobj.h>
171	#include <iprt/mp.h>
172	#include <iprt/semaphore.h>
173	#include <iprt/string.h>
174	#include <iprt/time.h>
175
176
177	/*******************************************************************************
178	* Defined Constants And Macros *
179	*******************************************************************************/
180	/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
181	* Use a critical section instead of a fast mutex for the giant GMM lock.
182	*
183	* @remarks This is primarily a way of avoiding the deadlock checks in the
184	* windows driver verifier. */
185	#if defined(RT_OS_WINDOWS) \|\| defined(DOXYGEN_RUNNING)
186	# define VBOX_USE_CRIT_SECT_FOR_GIANT
187	#endif
188
189
190	/*******************************************************************************
191	* Structures and Typedefs *
192	*******************************************************************************/
193	/** Pointer to set of free chunks. */
194	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
195
196	/**
197	* The per-page tracking structure employed by the GMM.
198	*
199	* On 32-bit hosts we'll some trickery is necessary to compress all
200	* the information into 32-bits. When the fSharedFree member is set,
201	* the 30th bit decides whether it's a free page or not.
202	*
203	* Because of the different layout on 32-bit and 64-bit hosts, macros
204	* are used to get and set some of the data.
205	*/
206	typedef union GMMPAGE
207	{
208	#if HC_ARCH_BITS == 64
209	/** Unsigned integer view. */
210	uint64_t u;
211
212	/** The common view. */
213	struct GMMPAGECOMMON
214	{
215	uint32_t uStuff1 : 32;
216	uint32_t uStuff2 : 30;
217	/** The page state. */
218	uint32_t u2State : 2;
219	} Common;
220
221	/** The view of a private page. */
222	struct GMMPAGEPRIVATE
223	{
224	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
225	uint32_t pfn;
226	/** The GVM handle. (64K VMs) */
227	uint32_t hGVM : 16;
228	/** Reserved. */
229	uint32_t u16Reserved : 14;
230	/** The page state. */
231	uint32_t u2State : 2;
232	} Private;
233
234	/** The view of a shared page. */
235	struct GMMPAGESHARED
236	{
237	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
238	uint32_t pfn;
239	/** The reference count (64K VMs). */
240	uint32_t cRefs : 16;
241	/** Used for debug checksumming. */
242	uint32_t u14Checksum : 14;
243	/** The page state. */
244	uint32_t u2State : 2;
245	} Shared;
246
247	/** The view of a free page. */
248	struct GMMPAGEFREE
249	{
250	/** The index of the next page in the free list. UINT16_MAX is NIL. */
251	uint16_t iNext;
252	/** Reserved. Checksum or something? */
253	uint16_t u16Reserved0;
254	/** Reserved. Checksum or something? */
255	uint32_t u30Reserved1 : 30;
256	/** The page state. */
257	uint32_t u2State : 2;
258	} Free;
259
260	#else /* 32-bit */
261	/** Unsigned integer view. */
262	uint32_t u;
263
264	/** The common view. */
265	struct GMMPAGECOMMON
266	{
267	uint32_t uStuff : 30;
268	/** The page state. */
269	uint32_t u2State : 2;
270	} Common;
271
272	/** The view of a private page. */
273	struct GMMPAGEPRIVATE
274	{
275	/** The guest page frame number. (Max addressable: 2 ^ 36) */
276	uint32_t pfn : 24;
277	/** The GVM handle. (127 VMs) */
278	uint32_t hGVM : 7;
279	/** The top page state bit, MBZ. */
280	uint32_t fZero : 1;
281	} Private;
282
283	/** The view of a shared page. */
284	struct GMMPAGESHARED
285	{
286	/** The reference count. */
287	uint32_t cRefs : 30;
288	/** The page state. */
289	uint32_t u2State : 2;
290	} Shared;
291
292	/** The view of a free page. */
293	struct GMMPAGEFREE
294	{
295	/** The index of the next page in the free list. UINT16_MAX is NIL. */
296	uint32_t iNext : 16;
297	/** Reserved. Checksum or something? */
298	uint32_t u14Reserved : 14;
299	/** The page state. */
300	uint32_t u2State : 2;
301	} Free;
302	#endif
303	} GMMPAGE;
304	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
305	/** Pointer to a GMMPAGE. */
306	typedef GMMPAGE *PGMMPAGE;
307
308
309	/** @name The Page States.
310	* @{ */
311	/** A private page. */
312	#define GMM_PAGE_STATE_PRIVATE 0
313	/** A private page - alternative value used on the 32-bit implementation.
314	* This will never be used on 64-bit hosts. */
315	#define GMM_PAGE_STATE_PRIVATE_32 1
316	/** A shared page. */
317	#define GMM_PAGE_STATE_SHARED 2
318	/** A free page. */
319	#define GMM_PAGE_STATE_FREE 3
320	/** @} */
321
322
323	/** @def GMM_PAGE_IS_PRIVATE
324	*
325	* @returns true if private, false if not.
326	* @param pPage The GMM page.
327	*/
328	#if HC_ARCH_BITS == 64
329	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
330	#else
331	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
332	#endif
333
334	/** @def GMM_PAGE_IS_SHARED
335	*
336	* @returns true if shared, false if not.
337	* @param pPage The GMM page.
338	*/
339	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
340
341	/** @def GMM_PAGE_IS_FREE
342	*
343	* @returns true if free, false if not.
344	* @param pPage The GMM page.
345	*/
346	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
347
348	/** @def GMM_PAGE_PFN_LAST
349	* The last valid guest pfn range.
350	* @remark Some of the values outside the range has special meaning,
351	* see GMM_PAGE_PFN_UNSHAREABLE.
352	*/
353	#if HC_ARCH_BITS == 64
354	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
355	#else
356	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
357	#endif
358	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
359
360	/** @def GMM_PAGE_PFN_UNSHAREABLE
361	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
362	*/
363	#if HC_ARCH_BITS == 64
364	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
365	#else
366	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
367	#endif
368	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
369
370
371	/**
372	* A GMM allocation chunk ring-3 mapping record.
373	*
374	* This should really be associated with a session and not a VM, but
375	* it's simpler to associated with a VM and cleanup with the VM object
376	* is destroyed.
377	*/
378	typedef struct GMMCHUNKMAP
379	{
380	/** The mapping object. */
381	RTR0MEMOBJ hMapObj;
382	/** The VM owning the mapping. */
383	PGVM pGVM;
384	} GMMCHUNKMAP;
385	/** Pointer to a GMM allocation chunk mapping. */
386	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
387
388
389	/**
390	* A GMM allocation chunk.
391	*/
392	typedef struct GMMCHUNK
393	{
394	/** The AVL node core.
395	* The Key is the chunk ID. (Giant mtx.) */
396	AVLU32NODECORE Core;
397	/** The memory object.
398	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
399	* what the host can dish up with. (Chunk mtx protects mapping accesses
400	* and related frees.) */
401	RTR0MEMOBJ hMemObj;
402	/** Pointer to the next chunk in the free list. (Giant mtx.) */
403	PGMMCHUNK pFreeNext;
404	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
405	PGMMCHUNK pFreePrev;
406	/** Pointer to the free set this chunk belongs to. NULL for
407	* chunks with no free pages. (Giant mtx.) */
408	PGMMCHUNKFREESET pSet;
409	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
410	RTLISTNODE ListNode;
411	/** Pointer to an array of mappings. (Chunk mtx.) */
412	PGMMCHUNKMAP paMappingsX;
413	/** The number of mappings. (Chunk mtx.) */
414	uint16_t cMappingsX;
415	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
416	* mapping or freeing anything. (Giant mtx.) */
417	uint8_t volatile iChunkMtx;
418	/** Flags field reserved for future use (like eliminating enmType).
419	* (Giant mtx.) */
420	uint8_t fFlags;
421	/** The head of the list of free pages. UINT16_MAX is the NIL value.
422	* (Giant mtx.) */
423	uint16_t iFreeHead;
424	/** The number of free pages. (Giant mtx.) */
425	uint16_t cFree;
426	/** The GVM handle of the VM that first allocated pages from this chunk, this
427	* is used as a preference when there are several chunks to choose from.
428	* When in bound memory mode this isn't a preference any longer. (Giant
429	* mtx.) */
430	uint16_t hGVM;
431	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
432	* future use.) (Giant mtx.) */
433	uint16_t idNumaNode;
434	/** The number of private pages. (Giant mtx.) */
435	uint16_t cPrivate;
436	/** The number of shared pages. (Giant mtx.) */
437	uint16_t cShared;
438	/** The pages. (Giant mtx.) */
439	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
440	} GMMCHUNK;
441
442	/** Indicates that the NUMA properies of the memory is unknown. */
443	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
444
445	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
446	* @{ */
447	/** Indicates that the chunk is a large page (2MB). */
448	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
449	/** @} */
450
451
452	/**
453	* An allocation chunk TLB entry.
454	*/
455	typedef struct GMMCHUNKTLBE
456	{
457	/** The chunk id. */
458	uint32_t idChunk;
459	/** Pointer to the chunk. */
460	PGMMCHUNK pChunk;
461	} GMMCHUNKTLBE;
462	/** Pointer to an allocation chunk TLB entry. */
463	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
464
465
466	/** The number of entries tin the allocation chunk TLB. */
467	#define GMM_CHUNKTLB_ENTRIES 32
468	/** Gets the TLB entry index for the given Chunk ID. */
469	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
470
471	/**
472	* An allocation chunk TLB.
473	*/
474	typedef struct GMMCHUNKTLB
475	{
476	/** The TLB entries. */
477	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
478	} GMMCHUNKTLB;
479	/** Pointer to an allocation chunk TLB. */
480	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
481
482
483	/**
484	* The GMM instance data.
485	*/
486	typedef struct GMM
487	{
488	/** Magic / eye catcher. GMM_MAGIC */
489	uint32_t u32Magic;
490	/** The number of threads waiting on the mutex. */
491	uint32_t cMtxContenders;
492	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
493	/** The critical section protecting the GMM.
494	* More fine grained locking can be implemented later if necessary. */
495	RTCRITSECT GiantCritSect;
496	#else
497	/** The fast mutex protecting the GMM.
498	* More fine grained locking can be implemented later if necessary. */
499	RTSEMFASTMUTEX hMtx;
500	#endif
501	#ifdef VBOX_STRICT
502	/** The current mutex owner. */
503	RTNATIVETHREAD hMtxOwner;
504	#endif
505	/** The chunk tree. */
506	PAVLU32NODECORE pChunks;
507	/** The chunk TLB. */
508	GMMCHUNKTLB ChunkTLB;
509	/** The private free set. */
510	GMMCHUNKFREESET PrivateX;
511	/** The shared free set. */
512	GMMCHUNKFREESET Shared;
513
514	/** Shared module tree (global).
515	* @todo separate trees for distinctly different guest OSes. */
516	PAVLLU32NODECORE pGlobalSharedModuleTree;
517	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
518	uint32_t cShareableModules;
519
520	/** The chunk list. For simplifying the cleanup process. */
521	RTLISTANCHOR ChunkList;
522
523	/** The maximum number of pages we're allowed to allocate.
524	* @gcfgm 64-bit GMM/MaxPages Direct.
525	* @gcfgm 32-bit GMM/PctPages Relative to the number of host pages. */
526	uint64_t cMaxPages;
527	/** The number of pages that has been reserved.
528	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
529	uint64_t cReservedPages;
530	/** The number of pages that we have over-committed in reservations. */
531	uint64_t cOverCommittedPages;
532	/** The number of actually allocated (committed if you like) pages. */
533	uint64_t cAllocatedPages;
534	/** The number of pages that are shared. A subset of cAllocatedPages. */
535	uint64_t cSharedPages;
536	/** The number of pages that are actually shared between VMs. */
537	uint64_t cDuplicatePages;
538	/** The number of pages that are shared that has been left behind by
539	* VMs not doing proper cleanups. */
540	uint64_t cLeftBehindSharedPages;
541	/** The number of allocation chunks.
542	* (The number of pages we've allocated from the host can be derived from this.) */
543	uint32_t cChunks;
544	/** The number of current ballooned pages. */
545	uint64_t cBalloonedPages;
546
547	/** The legacy allocation mode indicator.
548	* This is determined at initialization time. */
549	bool fLegacyAllocationMode;
550	/** The bound memory mode indicator.
551	* When set, the memory will be bound to a specific VM and never
552	* shared. This is always set if fLegacyAllocationMode is set.
553	* (Also determined at initialization time.) */
554	bool fBoundMemoryMode;
555	/** The number of registered VMs. */
556	uint16_t cRegisteredVMs;
557
558	/** The number of freed chunks ever. This is used a list generation to
559	* avoid restarting the cleanup scanning when the list wasn't modified. */
560	uint32_t cFreedChunks;
561	/** The previous allocated Chunk ID.
562	* Used as a hint to avoid scanning the whole bitmap. */
563	uint32_t idChunkPrev;
564	/** Chunk ID allocation bitmap.
565	* Bits of allocated IDs are set, free ones are clear.
566	* The NIL id (0) is marked allocated. */
567	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
568
569	/** The index of the next mutex to use. */
570	uint32_t iNextChunkMtx;
571	/** Chunk locks for reducing lock contention without having to allocate
572	* one lock per chunk. */
573	struct
574	{
575	/** The mutex */
576	RTSEMFASTMUTEX hMtx;
577	/** The number of threads currently using this mutex. */
578	uint32_t volatile cUsers;
579	} aChunkMtx[64];
580	} GMM;
581	/** Pointer to the GMM instance. */
582	typedef GMM *PGMM;
583
584	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
585	#define GMM_MAGIC UINT32_C(0x19540414)
586
587
588	/**
589	* GMM chunk mutex state.
590	*
591	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
592	* gmmR0ChunkMutex* methods.
593	*/
594	typedef struct GMMR0CHUNKMTXSTATE
595	{
596	PGMM pGMM;
597	/** The index of the chunk mutex. */
598	uint8_t iChunkMtx;
599	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
600	uint8_t fFlags;
601	} GMMR0CHUNKMTXSTATE;
602	/** Pointer to a chunk mutex state. */
603	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
604
605	/** @name GMMR0CHUNK_MTX_XXX
606	* @{ */
607	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
608	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
609	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
610	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
611	#define GMMR0CHUNK_MTX_END UINT32_C(4)
612	/** @} */
613
614
615	/** The maximum number of shared modules per-vm. */
616	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
617	/** The maximum number of shared modules GMM is allowed to track. */
618	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
619
620
621	/**
622	* Argument packet for gmmR0SharedModuleCleanup.
623	*/
624	typedef struct GMMR0SHMODPERVMDTORARGS
625	{
626	PGVM pGVM;
627	PGMM pGMM;
628	} GMMR0SHMODPERVMDTORARGS;
629
630	/**
631	* Argument packet for gmmR0CheckSharedModule.
632	*/
633	typedef struct GMMCHECKSHAREDMODULEINFO
634	{
635	PGVM pGVM;
636	VMCPUID idCpu;
637	} GMMCHECKSHAREDMODULEINFO;
638
639	/**
640	* Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
641	*/
642	typedef struct GMMFINDDUPPAGEINFO
643	{
644	PGVM pGVM;
645	PGMM pGMM;
646	uint8_t *pSourcePage;
647	bool fFoundDuplicate;
648	} GMMFINDDUPPAGEINFO;
649
650
651	/*******************************************************************************
652	* Global Variables *
653	*******************************************************************************/
654	/** Pointer to the GMM instance data. */
655	static PGMM g_pGMM = NULL;
656
657	/** Macro for obtaining and validating the g_pGMM pointer.
658	*
659	* On failure it will return from the invoking function with the specified
660	* return value.
661	*
662	* @param pGMM The name of the pGMM variable.
663	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
664	* status codes.
665	*/
666	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
667	do { \
668	(pGMM) = g_pGMM; \
669	AssertPtrReturn((pGMM), (rc)); \
670	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
671	} while (0)
672
673	/** Macro for obtaining and validating the g_pGMM pointer, void function
674	* variant.
675	*
676	* On failure it will return from the invoking function.
677	*
678	* @param pGMM The name of the pGMM variable.
679	*/
680	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
681	do { \
682	(pGMM) = g_pGMM; \
683	AssertPtrReturnVoid((pGMM)); \
684	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
685	} while (0)
686
687
688	/** @def GMM_CHECK_SANITY_UPON_ENTERING
689	* Checks the sanity of the GMM instance data before making changes.
690	*
691	* This is macro is a stub by default and must be enabled manually in the code.
692	*
693	* @returns true if sane, false if not.
694	* @param pGMM The name of the pGMM variable.
695	*/
696	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
697	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
698	#else
699	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
700	#endif
701
702	/** @def GMM_CHECK_SANITY_UPON_LEAVING
703	* Checks the sanity of the GMM instance data after making changes.
704	*
705	* This is macro is a stub by default and must be enabled manually in the code.
706	*
707	* @returns true if sane, false if not.
708	* @param pGMM The name of the pGMM variable.
709	*/
710	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
711	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
712	#else
713	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
714	#endif
715
716	/** @def GMM_CHECK_SANITY_IN_LOOPS
717	* Checks the sanity of the GMM instance in the allocation loops.
718	*
719	* This is macro is a stub by default and must be enabled manually in the code.
720	*
721	* @returns true if sane, false if not.
722	* @param pGMM The name of the pGMM variable.
723	*/
724	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
725	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
726	#else
727	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
728	#endif
729
730
731	/*******************************************************************************
732	* Internal Functions *
733	*******************************************************************************/
734	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
735	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
736	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
737	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
738	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
739	#ifdef GMMR0_WITH_SANITY_CHECK
740	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
741	#endif
742	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
743	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
744	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
745	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
746	#ifdef VBOX_WITH_PAGE_SHARING
747	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
748	# ifdef VBOX_STRICT
749	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
750	# endif
751	#endif
752
753
754
755	/**
756	* Initializes the GMM component.
757	*
758	* This is called when the VMMR0.r0 module is loaded and protected by the
759	* loader semaphore.
760	*
761	* @returns VBox status code.
762	*/
763	GMMR0DECL(int) GMMR0Init(void)
764	{
765	LogFlow(("GMMInit:\n"));
766
767	/*
768	* Allocate the instance data and the locks.
769	*/
770	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
771	if (!pGMM)
772	return VERR_NO_MEMORY;
773
774	pGMM->u32Magic = GMM_MAGIC;
775	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
776	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
777	RTListInit(&pGMM->ChunkList);
778	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
779
780	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
781	int rc = RTCritSectInit(&pGMM->GiantCritSect);
782	#else
783	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
784	#endif
785	if (RT_SUCCESS(rc))
786	{
787	unsigned iMtx;
788	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
789	{
790	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
791	if (RT_FAILURE(rc))
792	break;
793	}
794	if (RT_SUCCESS(rc))
795	{
796	/*
797	* Check and see if RTR0MemObjAllocPhysNC works.
798	*/
799	#if 0 /* later, see @bufref{3170}. */
800	RTR0MEMOBJ MemObj;
801	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
802	if (RT_SUCCESS(rc))
803	{
804	rc = RTR0MemObjFree(MemObj, true);
805	AssertRC(rc);
806	}
807	else if (rc == VERR_NOT_SUPPORTED)
808	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
809	else
810	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
811	#else
812	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
813	pGMM->fLegacyAllocationMode = false;
814	# if ARCH_BITS == 32
815	/* Don't reuse possibly partial chunks because of the virtual
816	address space limitation. */
817	pGMM->fBoundMemoryMode = true;
818	# else
819	pGMM->fBoundMemoryMode = false;
820	# endif
821	# else
822	pGMM->fLegacyAllocationMode = true;
823	pGMM->fBoundMemoryMode = true;
824	# endif
825	#endif
826
827	/*
828	* Query system page count and guess a reasonable cMaxPages value.
829	*/
830	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
831
832	g_pGMM = pGMM;
833	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
834	return VINF_SUCCESS;
835	}
836
837	/*
838	* Bail out.
839	*/
840	while (iMtx-- > 0)
841	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
842	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
843	RTCritSectDelete(&pGMM->GiantCritSect);
844	#else
845	RTSemFastMutexDestroy(pGMM->hMtx);
846	#endif
847	}
848
849	pGMM->u32Magic = 0;
850	RTMemFree(pGMM);
851	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
852	return rc;
853	}
854
855
856	/**
857	* Terminates the GMM component.
858	*/
859	GMMR0DECL(void) GMMR0Term(void)
860	{
861	LogFlow(("GMMTerm:\n"));
862
863	/*
864	* Take care / be paranoid...
865	*/
866	PGMM pGMM = g_pGMM;
867	if (!VALID_PTR(pGMM))
868	return;
869	if (pGMM->u32Magic != GMM_MAGIC)
870	{
871	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
872	return;
873	}
874
875	/*
876	* Undo what init did and free all the resources we've acquired.
877	*/
878	/* Destroy the fundamentals. */
879	g_pGMM = NULL;
880	pGMM->u32Magic = ~GMM_MAGIC;
881	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
882	RTCritSectDelete(&pGMM->GiantCritSect);
883	#else
884	RTSemFastMutexDestroy(pGMM->hMtx);
885	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
886	#endif
887
888	/* Free any chunks still hanging around. */
889	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
890
891	/* Destroy the chunk locks. */
892	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
893	{
894	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
895	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
896	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
897	}
898
899	/* Finally the instance data itself. */
900	RTMemFree(pGMM);
901	LogFlow(("GMMTerm: done\n"));
902	}
903
904
905	/**
906	* RTAvlU32Destroy callback.
907	*
908	* @returns 0
909	* @param pNode The node to destroy.
910	* @param pvGMM The GMM handle.
911	*/
912	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
913	{
914	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
915
916	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
917	SUPR0Printf("GMMR0Term: %p/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
918	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
919
920	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
921	if (RT_FAILURE(rc))
922	{
923	SUPR0Printf("GMMR0Term: %p/%#x: RTRMemObjFree(%p,true) -> %d (cMappings=%d)\n", pChunk,
924	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
925	AssertRC(rc);
926	}
927	pChunk->hMemObj = NIL_RTR0MEMOBJ;
928
929	RTMemFree(pChunk->paMappingsX);
930	pChunk->paMappingsX = NULL;
931
932	RTMemFree(pChunk);
933	NOREF(pvGMM);
934	return 0;
935	}
936
937
938	/**
939	* Initializes the per-VM data for the GMM.
940	*
941	* This is called from within the GVMM lock (from GVMMR0CreateVM)
942	* and should only initialize the data members so GMMR0CleanupVM
943	* can deal with them. We reserve no memory or anything here,
944	* that's done later in GMMR0InitVM.
945	*
946	* @param pGVM Pointer to the Global VM structure.
947	*/
948	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
949	{
950	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
951
952	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
953	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
954	pGVM->gmm.s.Stats.fMayAllocate = false;
955	}
956
957
958	/**
959	* Acquires the GMM giant lock.
960	*
961	* @returns Assert status code from RTSemFastMutexRequest.
962	* @param pGMM Pointer to the GMM instance.
963	*/
964	static int gmmR0MutexAcquire(PGMM pGMM)
965	{
966	ASMAtomicIncU32(&pGMM->cMtxContenders);
967	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
968	int rc = RTCritSectEnter(&pGMM->GiantCritSect);
969	#else
970	int rc = RTSemFastMutexRequest(pGMM->hMtx);
971	#endif
972	ASMAtomicDecU32(&pGMM->cMtxContenders);
973	AssertRC(rc);
974	#ifdef VBOX_STRICT
975	pGMM->hMtxOwner = RTThreadNativeSelf();
976	#endif
977	return rc;
978	}
979
980
981	/**
982	* Releases the GMM giant lock.
983	*
984	* @returns Assert status code from RTSemFastMutexRequest.
985	* @param pGMM Pointer to the GMM instance.
986	*/
987	static int gmmR0MutexRelease(PGMM pGMM)
988	{
989	#ifdef VBOX_STRICT
990	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
991	#endif
992	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
993	int rc = RTCritSectLeave(&pGMM->GiantCritSect);
994	#else
995	int rc = RTSemFastMutexRelease(pGMM->hMtx);
996	AssertRC(rc);
997	#endif
998	return rc;
999	}
1000
1001
1002	/**
1003	* Yields the GMM giant lock if there is contention and a certain minimum time
1004	* has elapsed since we took it.
1005	*
1006	* @returns @c true if the mutex was yielded, @c false if not.
1007	* @param pGMM Pointer to the GMM instance.
1008	* @param puLockNanoTS Where the lock acquisition time stamp is kept
1009	* (in/out).
1010	*/
1011	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1012	{
1013	/*
1014	* If nobody is contending the mutex, don't bother checking the time.
1015	*/
1016	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1017	return false;
1018
1019	/*
1020	* Don't yield if we haven't executed for at least 2 milliseconds.
1021	*/
1022	uint64_t uNanoNow = RTTimeSystemNanoTS();
1023	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1024	return false;
1025
1026	/*
1027	* Yield the mutex.
1028	*/
1029	#ifdef VBOX_STRICT
1030	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1031	#endif
1032	ASMAtomicIncU32(&pGMM->cMtxContenders);
1033	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1034	int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1035	#else
1036	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1037	#endif
1038
1039	RTThreadYield();
1040
1041	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1042	int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1043	#else
1044	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1045	#endif
1046	*puLockNanoTS = RTTimeSystemNanoTS();
1047	ASMAtomicDecU32(&pGMM->cMtxContenders);
1048	#ifdef VBOX_STRICT
1049	pGMM->hMtxOwner = RTThreadNativeSelf();
1050	#endif
1051
1052	return true;
1053	}
1054
1055
1056	/**
1057	* Acquires a chunk lock.
1058	*
1059	* The caller must own the giant lock.
1060	*
1061	* @returns Assert status code from RTSemFastMutexRequest.
1062	* @param pMtxState The chunk mutex state info. (Avoids
1063	* passing the same flags and stuff around
1064	* for subsequent release and drop-giant
1065	* calls.)
1066	* @param pGMM Pointer to the GMM instance.
1067	* @param pChunk Pointer to the chunk.
1068	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1069	*/
1070	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1071	{
1072	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1073	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1074
1075	pMtxState->pGMM = pGMM;
1076	pMtxState->fFlags = (uint8_t)fFlags;
1077
1078	/*
1079	* Get the lock index and reference the lock.
1080	*/
1081	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1082	uint32_t iChunkMtx = pChunk->iChunkMtx;
1083	if (iChunkMtx == UINT8_MAX)
1084	{
1085	iChunkMtx = pGMM->iNextChunkMtx++;
1086	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1087
1088	/* Try get an unused one... */
1089	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1090	{
1091	iChunkMtx = pGMM->iNextChunkMtx++;
1092	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1093	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1094	{
1095	iChunkMtx = pGMM->iNextChunkMtx++;
1096	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1097	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1098	{
1099	iChunkMtx = pGMM->iNextChunkMtx++;
1100	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1101	}
1102	}
1103	}
1104
1105	pChunk->iChunkMtx = iChunkMtx;
1106	}
1107	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1108	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1109	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1110
1111	/*
1112	* Drop the giant?
1113	*/
1114	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1115	{
1116	/** @todo GMM life cycle cleanup (we may race someone
1117	* destroying and cleaning up GMM)? */
1118	gmmR0MutexRelease(pGMM);
1119	}
1120
1121	/*
1122	* Take the chunk mutex.
1123	*/
1124	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1125	AssertRC(rc);
1126	return rc;
1127	}
1128
1129
1130	/**
1131	* Releases the GMM giant lock.
1132	*
1133	* @returns Assert status code from RTSemFastMutexRequest.
1134	* @param pGMM Pointer to the GMM instance.
1135	* @param pChunk Pointer to the chunk if it's still
1136	* alive, NULL if it isn't. This is used to deassociate
1137	* the chunk from the mutex on the way out so a new one
1138	* can be selected next time, thus avoiding contented
1139	* mutexes.
1140	*/
1141	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1142	{
1143	PGMM pGMM = pMtxState->pGMM;
1144
1145	/*
1146	* Release the chunk mutex and reacquire the giant if requested.
1147	*/
1148	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1149	AssertRC(rc);
1150	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1151	rc = gmmR0MutexAcquire(pGMM);
1152	else
1153	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1154
1155	/*
1156	* Drop the chunk mutex user reference and deassociate it from the chunk
1157	* when possible.
1158	*/
1159	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1160	&& pChunk
1161	&& RT_SUCCESS(rc) )
1162	{
1163	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1164	pChunk->iChunkMtx = UINT8_MAX;
1165	else
1166	{
1167	rc = gmmR0MutexAcquire(pGMM);
1168	if (RT_SUCCESS(rc))
1169	{
1170	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1171	pChunk->iChunkMtx = UINT8_MAX;
1172	rc = gmmR0MutexRelease(pGMM);
1173	}
1174	}
1175	}
1176
1177	pMtxState->pGMM = NULL;
1178	return rc;
1179	}
1180
1181
1182	/**
1183	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1184	* chunk locked.
1185	*
1186	* This only works if gmmR0ChunkMutexAcquire was called with
1187	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1188	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1189	*
1190	* @returns VBox status code (assuming success is ok).
1191	* @param pMtxState Pointer to the chunk mutex state.
1192	*/
1193	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1194	{
1195	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1196	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1197	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1198	/** @todo GMM life cycle cleanup (we may race someone
1199	* destroying and cleaning up GMM)? */
1200	return gmmR0MutexRelease(pMtxState->pGMM);
1201	}
1202
1203
1204	/**
1205	* For experimenting with NUMA affinity and such.
1206	*
1207	* @returns The current NUMA Node ID.
1208	*/
1209	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1210	{
1211	#if 1
1212	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1213	#else
1214	return RTMpCpuId() / 16;
1215	#endif
1216	}
1217
1218
1219
1220	/**
1221	* Cleans up when a VM is terminating.
1222	*
1223	* @param pGVM Pointer to the Global VM structure.
1224	*/
1225	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1226	{
1227	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1228
1229	PGMM pGMM;
1230	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1231
1232	#ifdef VBOX_WITH_PAGE_SHARING
1233	/*
1234	* Clean up all registered shared modules first.
1235	*/
1236	gmmR0SharedModuleCleanup(pGMM, pGVM);
1237	#endif
1238
1239	gmmR0MutexAcquire(pGMM);
1240	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1241	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1242
1243	/*
1244	* The policy is 'INVALID' until the initial reservation
1245	* request has been serviced.
1246	*/
1247	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1248	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1249	{
1250	/*
1251	* If it's the last VM around, we can skip walking all the chunk looking
1252	* for the pages owned by this VM and instead flush the whole shebang.
1253	*
1254	* This takes care of the eventuality that a VM has left shared page
1255	* references behind (shouldn't happen of course, but you never know).
1256	*/
1257	Assert(pGMM->cRegisteredVMs);
1258	pGMM->cRegisteredVMs--;
1259
1260	/*
1261	* Walk the entire pool looking for pages that belong to this VM
1262	* and leftover mappings. (This'll only catch private pages,
1263	* shared pages will be 'left behind'.)
1264	*/
1265	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1266	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1267
1268	unsigned iCountDown = 64;
1269	bool fRedoFromStart;
1270	PGMMCHUNK pChunk;
1271	do
1272	{
1273	fRedoFromStart = false;
1274	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1275	{
1276	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1277	if ( ( !pGMM->fBoundMemoryMode
1278	\|\| pChunk->hGVM == pGVM->hSelf)
1279	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1280	{
1281	/* We left the giant mutex, so reset the yield counters. */
1282	uLockNanoTS = RTTimeSystemNanoTS();
1283	iCountDown = 64;
1284	}
1285	else
1286	{
1287	/* Didn't leave it, so do normal yielding. */
1288	if (!iCountDown)
1289	gmmR0MutexYield(pGMM, &uLockNanoTS);
1290	else
1291	iCountDown--;
1292	}
1293	if (pGMM->cFreedChunks != cFreeChunksOld)
1294	{
1295	fRedoFromStart = true;
1296	break;
1297	}
1298	}
1299	} while (fRedoFromStart);
1300
1301	if (pGVM->gmm.s.Stats.cPrivatePages)
1302	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1303
1304	pGMM->cAllocatedPages -= cPrivatePages;
1305
1306	/*
1307	* Free empty chunks.
1308	*/
1309	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1310	do
1311	{
1312	fRedoFromStart = false;
1313	iCountDown = 10240;
1314	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1315	while (pChunk)
1316	{
1317	PGMMCHUNK pNext = pChunk->pFreeNext;
1318	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1319	if ( !pGMM->fBoundMemoryMode
1320	\|\| pChunk->hGVM == pGVM->hSelf)
1321	{
1322	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1323	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1324	{
1325	/* We've left the giant mutex, restart? (+1 for our unlink) */
1326	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1327	if (fRedoFromStart)
1328	break;
1329	uLockNanoTS = RTTimeSystemNanoTS();
1330	iCountDown = 10240;
1331	}
1332	}
1333
1334	/* Advance and maybe yield the lock. */
1335	pChunk = pNext;
1336	if (--iCountDown == 0)
1337	{
1338	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1339	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1340	&& pPrivateSet->idGeneration != idGenerationOld;
1341	if (fRedoFromStart)
1342	break;
1343	iCountDown = 10240;
1344	}
1345	}
1346	} while (fRedoFromStart);
1347
1348	/*
1349	* Account for shared pages that weren't freed.
1350	*/
1351	if (pGVM->gmm.s.Stats.cSharedPages)
1352	{
1353	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1354	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1355	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1356	}
1357
1358	/*
1359	* Clean up balloon statistics in case the VM process crashed.
1360	*/
1361	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1362	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1363
1364	/*
1365	* Update the over-commitment management statistics.
1366	*/
1367	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1368	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1369	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1370	switch (pGVM->gmm.s.Stats.enmPolicy)
1371	{
1372	case GMMOCPOLICY_NO_OC:
1373	break;
1374	default:
1375	/** @todo Update GMM->cOverCommittedPages */
1376	break;
1377	}
1378	}
1379
1380	/* zap the GVM data. */
1381	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1382	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1383	pGVM->gmm.s.Stats.fMayAllocate = false;
1384
1385	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1386	gmmR0MutexRelease(pGMM);
1387
1388	LogFlow(("GMMR0CleanupVM: returns\n"));
1389	}
1390
1391
1392	/**
1393	* Scan one chunk for private pages belonging to the specified VM.
1394	*
1395	* @note This function may drop the giant mutex!
1396	*
1397	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1398	* we didn't.
1399	* @param pGMM Pointer to the GMM instance.
1400	* @param pGVM The global VM handle.
1401	* @param pChunk The chunk to scan.
1402	*/
1403	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1404	{
1405	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1406
1407	/*
1408	* Look for pages belonging to the VM.
1409	* (Perform some internal checks while we're scanning.)
1410	*/
1411	#ifndef VBOX_STRICT
1412	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1413	#endif
1414	{
1415	unsigned cPrivate = 0;
1416	unsigned cShared = 0;
1417	unsigned cFree = 0;
1418
1419	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1420
1421	uint16_t hGVM = pGVM->hSelf;
1422	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1423	while (iPage-- > 0)
1424	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1425	{
1426	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1427	{
1428	/*
1429	* Free the page.
1430	*
1431	* The reason for not using gmmR0FreePrivatePage here is that we
1432	* must not cause the chunk to be freed from under us - we're in
1433	* an AVL tree walk here.
1434	*/
1435	pChunk->aPages[iPage].u = 0;
1436	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1437	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1438	pChunk->iFreeHead = iPage;
1439	pChunk->cPrivate--;
1440	pChunk->cFree++;
1441	pGVM->gmm.s.Stats.cPrivatePages--;
1442	cFree++;
1443	}
1444	else
1445	cPrivate++;
1446	}
1447	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1448	cFree++;
1449	else
1450	cShared++;
1451
1452	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1453
1454	/*
1455	* Did it add up?
1456	*/
1457	if (RT_UNLIKELY( pChunk->cFree != cFree
1458	\|\| pChunk->cPrivate != cPrivate
1459	\|\| pChunk->cShared != cShared))
1460	{
1461	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %p/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1462	pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1463	pChunk->cFree = cFree;
1464	pChunk->cPrivate = cPrivate;
1465	pChunk->cShared = cShared;
1466	}
1467	}
1468
1469	/*
1470	* If not in bound memory mode, we should reset the hGVM field
1471	* if it has our handle in it.
1472	*/
1473	if (pChunk->hGVM == pGVM->hSelf)
1474	{
1475	if (!g_pGMM->fBoundMemoryMode)
1476	pChunk->hGVM = NIL_GVM_HANDLE;
1477	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1478	{
1479	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1480	pChunk, pChunk->Core.Key, pChunk->cFree);
1481	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1482
1483	gmmR0UnlinkChunk(pChunk);
1484	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1485	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1486	}
1487	}
1488
1489	/*
1490	* Look for a mapping belonging to the terminating VM.
1491	*/
1492	GMMR0CHUNKMTXSTATE MtxState;
1493	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1494	unsigned cMappings = pChunk->cMappingsX;
1495	for (unsigned i = 0; i < cMappings; i++)
1496	if (pChunk->paMappingsX[i].pGVM == pGVM)
1497	{
1498	gmmR0ChunkMutexDropGiant(&MtxState);
1499
1500	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1501
1502	cMappings--;
1503	if (i < cMappings)
1504	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1505	pChunk->paMappingsX[cMappings].pGVM = NULL;
1506	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1507	Assert(pChunk->cMappingsX - 1U == cMappings);
1508	pChunk->cMappingsX = cMappings;
1509
1510	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1511	if (RT_FAILURE(rc))
1512	{
1513	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: mapping #%x: RTRMemObjFree(%p,false) -> %d \n",
1514	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1515	AssertRC(rc);
1516	}
1517
1518	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1519	return true;
1520	}
1521
1522	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1523	return false;
1524	}
1525
1526
1527	/**
1528	* The initial resource reservations.
1529	*
1530	* This will make memory reservations according to policy and priority. If there aren't
1531	* sufficient resources available to sustain the VM this function will fail and all
1532	* future allocations requests will fail as well.
1533	*
1534	* These are just the initial reservations made very very early during the VM creation
1535	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1536	* ring-3 init has completed.
1537	*
1538	* @returns VBox status code.
1539	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1540	* @retval VERR_GMM_
1541	*
1542	* @param pVM Pointer to the VM.
1543	* @param idCpu The VCPU id.
1544	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1545	* This does not include MMIO2 and similar.
1546	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1547	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1548	* hyper heap, MMIO2 and similar.
1549	* @param enmPolicy The OC policy to use on this VM.
1550	* @param enmPriority The priority in an out-of-memory situation.
1551	*
1552	* @thread The creator thread / EMT.
1553	*/
1554	GMMR0DECL(int) GMMR0InitialReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages,
1555	GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1556	{
1557	LogFlow(("GMMR0InitialReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1558	pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1559
1560	/*
1561	* Validate, get basics and take the semaphore.
1562	*/
1563	PGMM pGMM;
1564	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1565	PGVM pGVM;
1566	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1567	if (RT_FAILURE(rc))
1568	return rc;
1569
1570	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1571	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1572	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1573	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1574	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1575
1576	gmmR0MutexAcquire(pGMM);
1577	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1578	{
1579	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1580	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1581	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1582	{
1583	/*
1584	* Check if we can accommodate this.
1585	*/
1586	/* ... later ... */
1587	if (RT_SUCCESS(rc))
1588	{
1589	/*
1590	* Update the records.
1591	*/
1592	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1593	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1594	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1595	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1596	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1597	pGVM->gmm.s.Stats.fMayAllocate = true;
1598
1599	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1600	pGMM->cRegisteredVMs++;
1601	}
1602	}
1603	else
1604	rc = VERR_WRONG_ORDER;
1605	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1606	}
1607	else
1608	rc = VERR_GMM_IS_NOT_SANE;
1609	gmmR0MutexRelease(pGMM);
1610	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1611	return rc;
1612	}
1613
1614
1615	/**
1616	* VMMR0 request wrapper for GMMR0InitialReservation.
1617	*
1618	* @returns see GMMR0InitialReservation.
1619	* @param pVM Pointer to the VM.
1620	* @param idCpu The VCPU id.
1621	* @param pReq Pointer to the request packet.
1622	*/
1623	GMMR0DECL(int) GMMR0InitialReservationReq(PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1624	{
1625	/*
1626	* Validate input and pass it on.
1627	*/
1628	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1629	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1630	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1631
1632	return GMMR0InitialReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1633	}
1634
1635
1636	/**
1637	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1638	*
1639	* @returns VBox status code.
1640	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1641	*
1642	* @param pVM Pointer to the VM.
1643	* @param idCpu The VCPU id.
1644	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1645	* This does not include MMIO2 and similar.
1646	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1647	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1648	* hyper heap, MMIO2 and similar.
1649	*
1650	* @thread EMT.
1651	*/
1652	GMMR0DECL(int) GMMR0UpdateReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages)
1653	{
1654	LogFlow(("GMMR0UpdateReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1655	pVM, cBasePages, cShadowPages, cFixedPages));
1656
1657	/*
1658	* Validate, get basics and take the semaphore.
1659	*/
1660	PGMM pGMM;
1661	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1662	PGVM pGVM;
1663	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1664	if (RT_FAILURE(rc))
1665	return rc;
1666
1667	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1668	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1669	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1670
1671	gmmR0MutexAcquire(pGMM);
1672	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1673	{
1674	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1675	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1676	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1677	{
1678	/*
1679	* Check if we can accommodate this.
1680	*/
1681	/* ... later ... */
1682	if (RT_SUCCESS(rc))
1683	{
1684	/*
1685	* Update the records.
1686	*/
1687	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1688	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1689	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1690	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1691
1692	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1693	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1694	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1695	}
1696	}
1697	else
1698	rc = VERR_WRONG_ORDER;
1699	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1700	}
1701	else
1702	rc = VERR_GMM_IS_NOT_SANE;
1703	gmmR0MutexRelease(pGMM);
1704	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1705	return rc;
1706	}
1707
1708
1709	/**
1710	* VMMR0 request wrapper for GMMR0UpdateReservation.
1711	*
1712	* @returns see GMMR0UpdateReservation.
1713	* @param pVM Pointer to the VM.
1714	* @param idCpu The VCPU id.
1715	* @param pReq Pointer to the request packet.
1716	*/
1717	GMMR0DECL(int) GMMR0UpdateReservationReq(PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1718	{
1719	/*
1720	* Validate input and pass it on.
1721	*/
1722	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1723	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1724	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1725
1726	return GMMR0UpdateReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1727	}
1728
1729	#ifdef GMMR0_WITH_SANITY_CHECK
1730
1731	/**
1732	* Performs sanity checks on a free set.
1733	*
1734	* @returns Error count.
1735	*
1736	* @param pGMM Pointer to the GMM instance.
1737	* @param pSet Pointer to the set.
1738	* @param pszSetName The set name.
1739	* @param pszFunction The function from which it was called.
1740	* @param uLine The line number.
1741	*/
1742	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1743	const char *pszFunction, unsigned uLineNo)
1744	{
1745	uint32_t cErrors = 0;
1746
1747	/*
1748	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1749	*/
1750	uint32_t cPages = 0;
1751	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1752	{
1753	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1754	{
1755	/** @todo check that the chunk is hash into the right set. */
1756	cPages += pCur->cFree;
1757	}
1758	}
1759	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1760	{
1761	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1762	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1763	cErrors++;
1764	}
1765
1766	return cErrors;
1767	}
1768
1769
1770	/**
1771	* Performs some sanity checks on the GMM while owning lock.
1772	*
1773	* @returns Error count.
1774	*
1775	* @param pGMM Pointer to the GMM instance.
1776	* @param pszFunction The function from which it is called.
1777	* @param uLineNo The line number.
1778	*/
1779	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1780	{
1781	uint32_t cErrors = 0;
1782
1783	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1784	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1785	/** @todo add more sanity checks. */
1786
1787	return cErrors;
1788	}
1789
1790	#endif /* GMMR0_WITH_SANITY_CHECK */
1791
1792	/**
1793	* Looks up a chunk in the tree and fill in the TLB entry for it.
1794	*
1795	* This is not expected to fail and will bitch if it does.
1796	*
1797	* @returns Pointer to the allocation chunk, NULL if not found.
1798	* @param pGMM Pointer to the GMM instance.
1799	* @param idChunk The ID of the chunk to find.
1800	* @param pTlbe Pointer to the TLB entry.
1801	*/
1802	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1803	{
1804	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1805	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1806	pTlbe->idChunk = idChunk;
1807	pTlbe->pChunk = pChunk;
1808	return pChunk;
1809	}
1810
1811
1812	/**
1813	* Finds a allocation chunk.
1814	*
1815	* This is not expected to fail and will bitch if it does.
1816	*
1817	* @returns Pointer to the allocation chunk, NULL if not found.
1818	* @param pGMM Pointer to the GMM instance.
1819	* @param idChunk The ID of the chunk to find.
1820	*/
1821	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1822	{
1823	/*
1824	* Do a TLB lookup, branch if not in the TLB.
1825	*/
1826	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1827	if ( pTlbe->idChunk != idChunk
1828	\|\| !pTlbe->pChunk)
1829	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1830	return pTlbe->pChunk;
1831	}
1832
1833
1834	/**
1835	* Finds a page.
1836	*
1837	* This is not expected to fail and will bitch if it does.
1838	*
1839	* @returns Pointer to the page, NULL if not found.
1840	* @param pGMM Pointer to the GMM instance.
1841	* @param idPage The ID of the page to find.
1842	*/
1843	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1844	{
1845	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1846	if (RT_LIKELY(pChunk))
1847	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1848	return NULL;
1849	}
1850
1851
1852	/**
1853	* Gets the host physical address for a page given by it's ID.
1854	*
1855	* @returns The host physical address or NIL_RTHCPHYS.
1856	* @param pGMM Pointer to the GMM instance.
1857	* @param idPage The ID of the page to find.
1858	*/
1859	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1860	{
1861	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1862	if (RT_LIKELY(pChunk))
1863	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1864	return NIL_RTHCPHYS;
1865	}
1866
1867
1868	/**
1869	* Selects the appropriate free list given the number of free pages.
1870	*
1871	* @returns Free list index.
1872	* @param cFree The number of free pages in the chunk.
1873	*/
1874	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1875	{
1876	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1877	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1878	("%d (%u)\n", iList, cFree));
1879	return iList;
1880	}
1881
1882
1883	/**
1884	* Unlinks the chunk from the free list it's currently on (if any).
1885	*
1886	* @param pChunk The allocation chunk.
1887	*/
1888	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1889	{
1890	PGMMCHUNKFREESET pSet = pChunk->pSet;
1891	if (RT_LIKELY(pSet))
1892	{
1893	pSet->cFreePages -= pChunk->cFree;
1894	pSet->idGeneration++;
1895
1896	PGMMCHUNK pPrev = pChunk->pFreePrev;
1897	PGMMCHUNK pNext = pChunk->pFreeNext;
1898	if (pPrev)
1899	pPrev->pFreeNext = pNext;
1900	else
1901	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1902	if (pNext)
1903	pNext->pFreePrev = pPrev;
1904
1905	pChunk->pSet = NULL;
1906	pChunk->pFreeNext = NULL;
1907	pChunk->pFreePrev = NULL;
1908	}
1909	else
1910	{
1911	Assert(!pChunk->pFreeNext);
1912	Assert(!pChunk->pFreePrev);
1913	Assert(!pChunk->cFree);
1914	}
1915	}
1916
1917
1918	/**
1919	* Links the chunk onto the appropriate free list in the specified free set.
1920	*
1921	* If no free entries, it's not linked into any list.
1922	*
1923	* @param pChunk The allocation chunk.
1924	* @param pSet The free set.
1925	*/
1926	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1927	{
1928	Assert(!pChunk->pSet);
1929	Assert(!pChunk->pFreeNext);
1930	Assert(!pChunk->pFreePrev);
1931
1932	if (pChunk->cFree > 0)
1933	{
1934	pChunk->pSet = pSet;
1935	pChunk->pFreePrev = NULL;
1936	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1937	pChunk->pFreeNext = pSet->apLists[iList];
1938	if (pChunk->pFreeNext)
1939	pChunk->pFreeNext->pFreePrev = pChunk;
1940	pSet->apLists[iList] = pChunk;
1941
1942	pSet->cFreePages += pChunk->cFree;
1943	pSet->idGeneration++;
1944	}
1945	}
1946
1947
1948	/**
1949	* Links the chunk onto the appropriate free list in the specified free set.
1950	*
1951	* If no free entries, it's not linked into any list.
1952	*
1953	* @param pChunk The allocation chunk.
1954	*/
1955	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1956	{
1957	PGMMCHUNKFREESET pSet;
1958	if (pGMM->fBoundMemoryMode)
1959	pSet = &pGVM->gmm.s.Private;
1960	else if (pChunk->cShared)
1961	pSet = &pGMM->Shared;
1962	else
1963	pSet = &pGMM->PrivateX;
1964	gmmR0LinkChunk(pChunk, pSet);
1965	}
1966
1967
1968	/**
1969	* Frees a Chunk ID.
1970	*
1971	* @param pGMM Pointer to the GMM instance.
1972	* @param idChunk The Chunk ID to free.
1973	*/
1974	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1975	{
1976	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1977	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1978	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1979	}
1980
1981
1982	/**
1983	* Allocates a new Chunk ID.
1984	*
1985	* @returns The Chunk ID.
1986	* @param pGMM Pointer to the GMM instance.
1987	*/
1988	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1989	{
1990	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1991	AssertCompile(NIL_GMM_CHUNKID == 0);
1992
1993	/*
1994	* Try the next sequential one.
1995	*/
1996	int32_t idChunk = ++pGMM->idChunkPrev;
1997	#if 0 /** @todo enable this code */
1998	if ( idChunk <= GMM_CHUNKID_LAST
1999	&& idChunk > NIL_GMM_CHUNKID
2000	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2001	return idChunk;
2002	#endif
2003
2004	/*
2005	* Scan sequentially from the last one.
2006	*/
2007	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2008	&& idChunk > NIL_GMM_CHUNKID)
2009	{
2010	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2011	if (idChunk > NIL_GMM_CHUNKID)
2012	{
2013	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2014	return pGMM->idChunkPrev = idChunk;
2015	}
2016	}
2017
2018	/*
2019	* Ok, scan from the start.
2020	* We're not racing anyone, so there is no need to expect failures or have restart loops.
2021	*/
2022	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2023	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2024	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2025
2026	return pGMM->idChunkPrev = idChunk;
2027	}
2028
2029
2030	/**
2031	* Allocates one private page.
2032	*
2033	* Worker for gmmR0AllocatePages.
2034	*
2035	* @param pChunk The chunk to allocate it from.
2036	* @param hGVM The GVM handle of the VM requesting memory.
2037	* @param pPageDesc The page descriptor.
2038	*/
2039	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2040	{
2041	/* update the chunk stats. */
2042	if (pChunk->hGVM == NIL_GVM_HANDLE)
2043	pChunk->hGVM = hGVM;
2044	Assert(pChunk->cFree);
2045	pChunk->cFree--;
2046	pChunk->cPrivate++;
2047
2048	/* unlink the first free page. */
2049	const uint32_t iPage = pChunk->iFreeHead;
2050	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2051	PGMMPAGE pPage = &pChunk->aPages[iPage];
2052	Assert(GMM_PAGE_IS_FREE(pPage));
2053	pChunk->iFreeHead = pPage->Free.iNext;
2054	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2055	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2056	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2057
2058	/* make the page private. */
2059	pPage->u = 0;
2060	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2061	pPage->Private.hGVM = hGVM;
2062	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2063	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2064	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2065	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2066	else
2067	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2068
2069	/* update the page descriptor. */
2070	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2071	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2072	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2073	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2074	}
2075
2076
2077	/**
2078	* Picks the free pages from a chunk.
2079	*
2080	* @returns The new page descriptor table index.
2081	* @param pGMM Pointer to the GMM instance data.
2082	* @param hGVM The global VM handle.
2083	* @param pChunk The chunk.
2084	* @param iPage The current page descriptor table index.
2085	* @param cPages The total number of pages to allocate.
2086	* @param paPages The page descriptor table (input + ouput).
2087	*/
2088	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2089	PGMMPAGEDESC paPages)
2090	{
2091	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2092	gmmR0UnlinkChunk(pChunk);
2093
2094	for (; pChunk->cFree && iPage < cPages; iPage++)
2095	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2096
2097	gmmR0LinkChunk(pChunk, pSet);
2098	return iPage;
2099	}
2100
2101
2102	/**
2103	* Registers a new chunk of memory.
2104	*
2105	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2106	*
2107	* @returns VBox status code. On success, the giant GMM lock will be held, the
2108	* caller must release it (ugly).
2109	* @param pGMM Pointer to the GMM instance.
2110	* @param pSet Pointer to the set.
2111	* @param MemObj The memory object for the chunk.
2112	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2113	* affinity.
2114	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2115	* @param ppChunk Chunk address (out). Optional.
2116	*
2117	* @remarks The caller must not own the giant GMM mutex.
2118	* The giant GMM mutex will be acquired and returned acquired in
2119	* the success path. On failure, no locks will be held.
2120	*/
2121	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2122	PGMMCHUNK *ppChunk)
2123	{
2124	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2125	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2126	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2127
2128	int rc;
2129	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2130	if (pChunk)
2131	{
2132	/*
2133	* Initialize it.
2134	*/
2135	pChunk->hMemObj = MemObj;
2136	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2137	pChunk->hGVM = hGVM;
2138	/pChunk->iFreeHead = 0;/
2139	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2140	pChunk->iChunkMtx = UINT8_MAX;
2141	pChunk->fFlags = fChunkFlags;
2142	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2143	{
2144	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2145	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2146	}
2147	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2148	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2149
2150	/*
2151	* Allocate a Chunk ID and insert it into the tree.
2152	* This has to be done behind the mutex of course.
2153	*/
2154	rc = gmmR0MutexAcquire(pGMM);
2155	if (RT_SUCCESS(rc))
2156	{
2157	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2158	{
2159	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2160	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2161	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2162	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2163	{
2164	pGMM->cChunks++;
2165	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2166	gmmR0LinkChunk(pChunk, pSet);
2167	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2168
2169	if (ppChunk)
2170	*ppChunk = pChunk;
2171	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2172	return VINF_SUCCESS;
2173	}
2174
2175	/* bail out */
2176	rc = VERR_GMM_CHUNK_INSERT;
2177	}
2178	else
2179	rc = VERR_GMM_IS_NOT_SANE;
2180	gmmR0MutexRelease(pGMM);
2181	}
2182
2183	RTMemFree(pChunk);
2184	}
2185	else
2186	rc = VERR_NO_MEMORY;
2187	return rc;
2188	}
2189
2190
2191	/**
2192	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2193	* what's remaining to the specified free set.
2194	*
2195	* @note This will leave the giant mutex while allocating the new chunk!
2196	*
2197	* @returns VBox status code.
2198	* @param pGMM Pointer to the GMM instance data.
2199	* @param pGVM Pointer to the kernel-only VM instace data.
2200	* @param pSet Pointer to the free set.
2201	* @param cPages The number of pages requested.
2202	* @param paPages The page descriptor table (input + output).
2203	* @param piPage The pointer to the page descriptor table index
2204	* variable. This will be updated.
2205	*/
2206	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2207	PGMMPAGEDESC paPages, uint32_t *piPage)
2208	{
2209	gmmR0MutexRelease(pGMM);
2210
2211	RTR0MEMOBJ hMemObj;
2212	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2213	if (RT_SUCCESS(rc))
2214	{
2215	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2216	* free pages first and then unchaining them right afterwards. Instead
2217	* do as much work as possible without holding the giant lock. */
2218	PGMMCHUNK pChunk;
2219	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2220	if (RT_SUCCESS(rc))
2221	{
2222	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2223	return VINF_SUCCESS;
2224	}
2225
2226	/* bail out */
2227	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2228	}
2229
2230	int rc2 = gmmR0MutexAcquire(pGMM);
2231	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2232	return rc;
2233
2234	}
2235
2236
2237	/**
2238	* As a last restort we'll pick any page we can get.
2239	*
2240	* @returns The new page descriptor table index.
2241	* @param pSet The set to pick from.
2242	* @param pGVM Pointer to the global VM structure.
2243	* @param iPage The current page descriptor table index.
2244	* @param cPages The total number of pages to allocate.
2245	* @param paPages The page descriptor table (input + ouput).
2246	*/
2247	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2248	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2249	{
2250	unsigned iList = RT_ELEMENTS(pSet->apLists);
2251	while (iList-- > 0)
2252	{
2253	PGMMCHUNK pChunk = pSet->apLists[iList];
2254	while (pChunk)
2255	{
2256	PGMMCHUNK pNext = pChunk->pFreeNext;
2257
2258	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2259	if (iPage >= cPages)
2260	return iPage;
2261
2262	pChunk = pNext;
2263	}
2264	}
2265	return iPage;
2266	}
2267
2268
2269	/**
2270	* Pick pages from empty chunks on the same NUMA node.
2271	*
2272	* @returns The new page descriptor table index.
2273	* @param pSet The set to pick from.
2274	* @param pGVM Pointer to the global VM structure.
2275	* @param iPage The current page descriptor table index.
2276	* @param cPages The total number of pages to allocate.
2277	* @param paPages The page descriptor table (input + ouput).
2278	*/
2279	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2280	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2281	{
2282	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2283	if (pChunk)
2284	{
2285	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2286	while (pChunk)
2287	{
2288	PGMMCHUNK pNext = pChunk->pFreeNext;
2289
2290	if (pChunk->idNumaNode == idNumaNode)
2291	{
2292	pChunk->hGVM = pGVM->hSelf;
2293	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2294	if (iPage >= cPages)
2295	{
2296	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2297	return iPage;
2298	}
2299	}
2300
2301	pChunk = pNext;
2302	}
2303	}
2304	return iPage;
2305	}
2306
2307
2308	/**
2309	* Pick pages from non-empty chunks on the same NUMA node.
2310	*
2311	* @returns The new page descriptor table index.
2312	* @param pSet The set to pick from.
2313	* @param pGVM Pointer to the global VM structure.
2314	* @param iPage The current page descriptor table index.
2315	* @param cPages The total number of pages to allocate.
2316	* @param paPages The page descriptor table (input + ouput).
2317	*/
2318	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2319	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2320	{
2321	/** @todo start by picking from chunks with about the right size first? */
2322	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2323	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2324	while (iList-- > 0)
2325	{
2326	PGMMCHUNK pChunk = pSet->apLists[iList];
2327	while (pChunk)
2328	{
2329	PGMMCHUNK pNext = pChunk->pFreeNext;
2330
2331	if (pChunk->idNumaNode == idNumaNode)
2332	{
2333	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2334	if (iPage >= cPages)
2335	{
2336	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2337	return iPage;
2338	}
2339	}
2340
2341	pChunk = pNext;
2342	}
2343	}
2344	return iPage;
2345	}
2346
2347
2348	/**
2349	* Pick pages that are in chunks already associated with the VM.
2350	*
2351	* @returns The new page descriptor table index.
2352	* @param pGMM Pointer to the GMM instance data.
2353	* @param pGVM Pointer to the global VM structure.
2354	* @param pSet The set to pick from.
2355	* @param iPage The current page descriptor table index.
2356	* @param cPages The total number of pages to allocate.
2357	* @param paPages The page descriptor table (input + ouput).
2358	*/
2359	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2360	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2361	{
2362	uint16_t const hGVM = pGVM->hSelf;
2363
2364	/* Hint. */
2365	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2366	{
2367	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2368	if (pChunk && pChunk->cFree)
2369	{
2370	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2371	if (iPage >= cPages)
2372	return iPage;
2373	}
2374	}
2375
2376	/* Scan. */
2377	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2378	{
2379	PGMMCHUNK pChunk = pSet->apLists[iList];
2380	while (pChunk)
2381	{
2382	PGMMCHUNK pNext = pChunk->pFreeNext;
2383
2384	if (pChunk->hGVM == hGVM)
2385	{
2386	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2387	if (iPage >= cPages)
2388	{
2389	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2390	return iPage;
2391	}
2392	}
2393
2394	pChunk = pNext;
2395	}
2396	}
2397	return iPage;
2398	}
2399
2400
2401
2402	/**
2403	* Pick pages in bound memory mode.
2404	*
2405	* @returns The new page descriptor table index.
2406	* @param pGVM Pointer to the global VM structure.
2407	* @param iPage The current page descriptor table index.
2408	* @param cPages The total number of pages to allocate.
2409	* @param paPages The page descriptor table (input + ouput).
2410	*/
2411	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2412	{
2413	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2414	{
2415	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2416	while (pChunk)
2417	{
2418	Assert(pChunk->hGVM == pGVM->hSelf);
2419	PGMMCHUNK pNext = pChunk->pFreeNext;
2420	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2421	if (iPage >= cPages)
2422	return iPage;
2423	pChunk = pNext;
2424	}
2425	}
2426	return iPage;
2427	}
2428
2429
2430	/**
2431	* Checks if we should start picking pages from chunks of other VMs because
2432	* we're getting close to the system memory or reserved limit.
2433	*
2434	* @returns @c true if we should, @c false if we should first try allocate more
2435	* chunks.
2436	*/
2437	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2438	{
2439	/*
2440	* Don't allocate a new chunk if we're
2441	*/
2442	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2443	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2444	- pGVM->gmm.s.Stats.cBalloonedPages
2445	/** @todo what about shared pages? */;
2446	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2447	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2448	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2449	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2450	return true;
2451	/** @todo make the threshold configurable, also test the code to see if
2452	* this ever kicks in (we might be reserving too much or smth). */
2453
2454	/*
2455	* Check how close we're to the max memory limit and how many fragments
2456	* there are?...
2457	*/
2458	/** @todo. */
2459
2460	return false;
2461	}
2462
2463
2464	/**
2465	* Checks if we should start picking pages from chunks of other VMs because
2466	* there is a lot of free pages around.
2467	*
2468	* @returns @c true if we should, @c false if we should first try allocate more
2469	* chunks.
2470	*/
2471	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2472	{
2473	/*
2474	* Setting the limit at 16 chunks (32 MB) at the moment.
2475	*/
2476	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2477	return true;
2478	return false;
2479	}
2480
2481
2482	/**
2483	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2484	*
2485	* @returns VBox status code:
2486	* @retval VINF_SUCCESS on success.
2487	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2488	* gmmR0AllocateMoreChunks is necessary.
2489	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2490	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2491	* that is we're trying to allocate more than we've reserved.
2492	*
2493	* @param pGMM Pointer to the GMM instance data.
2494	* @param pGVM Pointer to the VM.
2495	* @param cPages The number of pages to allocate.
2496	* @param paPages Pointer to the page descriptors.
2497	* See GMMPAGEDESC for details on what is expected on input.
2498	* @param enmAccount The account to charge.
2499	*
2500	* @remarks Call takes the giant GMM lock.
2501	*/
2502	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2503	{
2504	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2505
2506	/*
2507	* Check allocation limits.
2508	*/
2509	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2510	return VERR_GMM_HIT_GLOBAL_LIMIT;
2511
2512	switch (enmAccount)
2513	{
2514	case GMMACCOUNT_BASE:
2515	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2516	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2517	{
2518	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2519	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2520	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2521	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2522	}
2523	break;
2524	case GMMACCOUNT_SHADOW:
2525	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2526	{
2527	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2528	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2529	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2530	}
2531	break;
2532	case GMMACCOUNT_FIXED:
2533	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2534	{
2535	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2536	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2537	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2538	}
2539	break;
2540	default:
2541	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2542	}
2543
2544	/*
2545	* If we're in legacy memory mode, it's easy to figure if we have
2546	* sufficient number of pages up-front.
2547	*/
2548	if ( pGMM->fLegacyAllocationMode
2549	&& pGVM->gmm.s.Private.cFreePages < cPages)
2550	{
2551	Assert(pGMM->fBoundMemoryMode);
2552	return VERR_GMM_SEED_ME;
2553	}
2554
2555	/*
2556	* Update the accounts before we proceed because we might be leaving the
2557	* protection of the global mutex and thus run the risk of permitting
2558	* too much memory to be allocated.
2559	*/
2560	switch (enmAccount)
2561	{
2562	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2563	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2564	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2565	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2566	}
2567	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2568	pGMM->cAllocatedPages += cPages;
2569
2570	/*
2571	* Part two of it's-easy-in-legacy-memory-mode.
2572	*/
2573	uint32_t iPage = 0;
2574	if (pGMM->fLegacyAllocationMode)
2575	{
2576	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2577	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2578	return VINF_SUCCESS;
2579	}
2580
2581	/*
2582	* Bound mode is also relatively straightforward.
2583	*/
2584	int rc = VINF_SUCCESS;
2585	if (pGMM->fBoundMemoryMode)
2586	{
2587	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2588	if (iPage < cPages)
2589	do
2590	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2591	while (iPage < cPages && RT_SUCCESS(rc));
2592	}
2593	/*
2594	* Shared mode is trickier as we should try archive the same locality as
2595	* in bound mode, but smartly make use of non-full chunks allocated by
2596	* other VMs if we're low on memory.
2597	*/
2598	else
2599	{
2600	/* Pick the most optimal pages first. */
2601	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2602	if (iPage < cPages)
2603	{
2604	/* Maybe we should try getting pages from chunks "belonging" to
2605	other VMs before allocating more chunks? */
2606	bool fTriedOnSameAlready = false;
2607	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2608	{
2609	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2610	fTriedOnSameAlready = true;
2611	}
2612
2613	/* Allocate memory from empty chunks. */
2614	if (iPage < cPages)
2615	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2616
2617	/* Grab empty shared chunks. */
2618	if (iPage < cPages)
2619	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2620
2621	/* If there is a lof of free pages spread around, try not waste
2622	system memory on more chunks. (Should trigger defragmentation.) */
2623	if ( !fTriedOnSameAlready
2624	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2625	{
2626	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2627	if (iPage < cPages)
2628	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2629	}
2630
2631	/*
2632	* Ok, try allocate new chunks.
2633	*/
2634	if (iPage < cPages)
2635	{
2636	do
2637	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2638	while (iPage < cPages && RT_SUCCESS(rc));
2639
2640	/* If the host is out of memory, take whatever we can get. */
2641	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2642	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2643	{
2644	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2645	if (iPage < cPages)
2646	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2647	AssertRelease(iPage == cPages);
2648	rc = VINF_SUCCESS;
2649	}
2650	}
2651	}
2652	}
2653
2654	/*
2655	* Clean up on failure. Since this is bound to be a low-memory condition
2656	* we will give back any empty chunks that might be hanging around.
2657	*/
2658	if (RT_FAILURE(rc))
2659	{
2660	/* Update the statistics. */
2661	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2662	pGMM->cAllocatedPages -= cPages - iPage;
2663	switch (enmAccount)
2664	{
2665	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2666	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2667	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2668	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2669	}
2670
2671	/* Release the pages. */
2672	while (iPage-- > 0)
2673	{
2674	uint32_t idPage = paPages[iPage].idPage;
2675	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2676	if (RT_LIKELY(pPage))
2677	{
2678	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2679	Assert(pPage->Private.hGVM == pGVM->hSelf);
2680	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2681	}
2682	else
2683	AssertMsgFailed(("idPage=%#x\n", idPage));
2684
2685	paPages[iPage].idPage = NIL_GMM_PAGEID;
2686	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2687	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2688	}
2689
2690	/* Free empty chunks. */
2691	/** @todo */
2692
2693	/* return the fail status on failure */
2694	return rc;
2695	}
2696	return VINF_SUCCESS;
2697	}
2698
2699
2700	/**
2701	* Updates the previous allocations and allocates more pages.
2702	*
2703	* The handy pages are always taken from the 'base' memory account.
2704	* The allocated pages are not cleared and will contains random garbage.
2705	*
2706	* @returns VBox status code:
2707	* @retval VINF_SUCCESS on success.
2708	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2709	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2710	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2711	* private page.
2712	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2713	* shared page.
2714	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2715	* owned by the VM.
2716	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2717	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2718	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2719	* that is we're trying to allocate more than we've reserved.
2720	*
2721	* @param pVM Pointer to the VM.
2722	* @param idCpu The VCPU id.
2723	* @param cPagesToUpdate The number of pages to update (starting from the head).
2724	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2725	* @param paPages The array of page descriptors.
2726	* See GMMPAGEDESC for details on what is expected on input.
2727	* @thread EMT.
2728	*/
2729	GMMR0DECL(int) GMMR0AllocateHandyPages(PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate, uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2730	{
2731	LogFlow(("GMMR0AllocateHandyPages: pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2732	pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2733
2734	/*
2735	* Validate, get basics and take the semaphore.
2736	* (This is a relatively busy path, so make predictions where possible.)
2737	*/
2738	PGMM pGMM;
2739	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2740	PGVM pGVM;
2741	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2742	if (RT_FAILURE(rc))
2743	return rc;
2744
2745	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2746	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2747	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2748	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2749	VERR_INVALID_PARAMETER);
2750
2751	unsigned iPage = 0;
2752	for (; iPage < cPagesToUpdate; iPage++)
2753	{
2754	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2755	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2756	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2757	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2758	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2759	VERR_INVALID_PARAMETER);
2760	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2761	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2762	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2763	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2764	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2765	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2766	}
2767
2768	for (; iPage < cPagesToAlloc; iPage++)
2769	{
2770	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2771	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2772	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2773	}
2774
2775	gmmR0MutexAcquire(pGMM);
2776	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2777	{
2778	/* No allocations before the initial reservation has been made! */
2779	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2780	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2781	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2782	{
2783	/*
2784	* Perform the updates.
2785	* Stop on the first error.
2786	*/
2787	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2788	{
2789	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2790	{
2791	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2792	if (RT_LIKELY(pPage))
2793	{
2794	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2795	{
2796	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2797	{
2798	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2799	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2800	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2801	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2802	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2803	/* else: NIL_RTHCPHYS nothing */
2804
2805	paPages[iPage].idPage = NIL_GMM_PAGEID;
2806	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2807	}
2808	else
2809	{
2810	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2811	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2812	rc = VERR_GMM_NOT_PAGE_OWNER;
2813	break;
2814	}
2815	}
2816	else
2817	{
2818	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2819	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2820	break;
2821	}
2822	}
2823	else
2824	{
2825	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2826	rc = VERR_GMM_PAGE_NOT_FOUND;
2827	break;
2828	}
2829	}
2830
2831	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2832	{
2833	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2834	if (RT_LIKELY(pPage))
2835	{
2836	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2837	{
2838	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2839	Assert(pPage->Shared.cRefs);
2840	Assert(pGVM->gmm.s.Stats.cSharedPages);
2841	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2842
2843	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2844	pGVM->gmm.s.Stats.cSharedPages--;
2845	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2846	if (!--pPage->Shared.cRefs)
2847	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2848	else
2849	{
2850	Assert(pGMM->cDuplicatePages);
2851	pGMM->cDuplicatePages--;
2852	}
2853
2854	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2855	}
2856	else
2857	{
2858	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2859	rc = VERR_GMM_PAGE_NOT_SHARED;
2860	break;
2861	}
2862	}
2863	else
2864	{
2865	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2866	rc = VERR_GMM_PAGE_NOT_FOUND;
2867	break;
2868	}
2869	}
2870	} /* for each page to update */
2871
2872	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2873	{
2874	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2875	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2876	{
2877	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2878	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2879	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2880	}
2881	#endif
2882
2883	/*
2884	* Join paths with GMMR0AllocatePages for the allocation.
2885	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2886	*/
2887	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2888	}
2889	}
2890	else
2891	rc = VERR_WRONG_ORDER;
2892	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2893	}
2894	else
2895	rc = VERR_GMM_IS_NOT_SANE;
2896	gmmR0MutexRelease(pGMM);
2897	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2898	return rc;
2899	}
2900
2901
2902	/**
2903	* Allocate one or more pages.
2904	*
2905	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2906	* The allocated pages are not cleared and will contain random garbage.
2907	*
2908	* @returns VBox status code:
2909	* @retval VINF_SUCCESS on success.
2910	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2911	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2912	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2913	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2914	* that is we're trying to allocate more than we've reserved.
2915	*
2916	* @param pVM Pointer to the VM.
2917	* @param idCpu The VCPU id.
2918	* @param cPages The number of pages to allocate.
2919	* @param paPages Pointer to the page descriptors.
2920	* See GMMPAGEDESC for details on what is expected on input.
2921	* @param enmAccount The account to charge.
2922	*
2923	* @thread EMT.
2924	*/
2925	GMMR0DECL(int) GMMR0AllocatePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2926	{
2927	LogFlow(("GMMR0AllocatePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
2928
2929	/*
2930	* Validate, get basics and take the semaphore.
2931	*/
2932	PGMM pGMM;
2933	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2934	PGVM pGVM;
2935	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2936	if (RT_FAILURE(rc))
2937	return rc;
2938
2939	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2940	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2941	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2942
2943	for (unsigned iPage = 0; iPage < cPages; iPage++)
2944	{
2945	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2946	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2947	\|\| ( enmAccount == GMMACCOUNT_BASE
2948	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2949	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2950	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2951	VERR_INVALID_PARAMETER);
2952	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2953	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2954	}
2955
2956	gmmR0MutexAcquire(pGMM);
2957	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2958	{
2959
2960	/* No allocations before the initial reservation has been made! */
2961	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2962	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2963	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2964	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2965	else
2966	rc = VERR_WRONG_ORDER;
2967	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2968	}
2969	else
2970	rc = VERR_GMM_IS_NOT_SANE;
2971	gmmR0MutexRelease(pGMM);
2972	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2973	return rc;
2974	}
2975
2976
2977	/**
2978	* VMMR0 request wrapper for GMMR0AllocatePages.
2979	*
2980	* @returns see GMMR0AllocatePages.
2981	* @param pVM Pointer to the VM.
2982	* @param idCpu The VCPU id.
2983	* @param pReq Pointer to the request packet.
2984	*/
2985	GMMR0DECL(int) GMMR0AllocatePagesReq(PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2986	{
2987	/*
2988	* Validate input and pass it on.
2989	*/
2990	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
2991	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
2992	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
2993	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
2994	VERR_INVALID_PARAMETER);
2995	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
2996	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
2997	VERR_INVALID_PARAMETER);
2998
2999	return GMMR0AllocatePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3000	}
3001
3002
3003	/**
3004	* Allocate a large page to represent guest RAM
3005	*
3006	* The allocated pages are not cleared and will contains random garbage.
3007	*
3008	* @returns VBox status code:
3009	* @retval VINF_SUCCESS on success.
3010	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3011	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3012	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3013	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3014	* that is we're trying to allocate more than we've reserved.
3015	* @returns see GMMR0AllocatePages.
3016	* @param pVM Pointer to the VM.
3017	* @param idCpu The VCPU id.
3018	* @param cbPage Large page size.
3019	*/
3020	GMMR0DECL(int) GMMR0AllocateLargePage(PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
3021	{
3022	LogFlow(("GMMR0AllocateLargePage: pVM=%p cbPage=%x\n", pVM, cbPage));
3023
3024	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3025	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3026	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3027
3028	/*
3029	* Validate, get basics and take the semaphore.
3030	*/
3031	PGMM pGMM;
3032	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3033	PGVM pGVM;
3034	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3035	if (RT_FAILURE(rc))
3036	return rc;
3037
3038	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3039	if (pGMM->fLegacyAllocationMode)
3040	return VERR_NOT_SUPPORTED;
3041
3042	*pHCPhys = NIL_RTHCPHYS;
3043	*pIdPage = NIL_GMM_PAGEID;
3044
3045	gmmR0MutexAcquire(pGMM);
3046	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3047	{
3048	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3049	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3050	> pGVM->gmm.s.Stats.Reserved.cBasePages))
3051	{
3052	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3053	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3054	gmmR0MutexRelease(pGMM);
3055	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3056	}
3057
3058	/*
3059	* Allocate a new large page chunk.
3060	*
3061	* Note! We leave the giant GMM lock temporarily as the allocation might
3062	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3063	*/
3064	AssertCompile(GMM_CHUNK_SIZE == _2M);
3065	gmmR0MutexRelease(pGMM);
3066
3067	RTR0MEMOBJ hMemObj;
3068	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3069	if (RT_SUCCESS(rc))
3070	{
3071	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3072	PGMMCHUNK pChunk;
3073	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3074	if (RT_SUCCESS(rc))
3075	{
3076	/*
3077	* Allocate all the pages in the chunk.
3078	*/
3079	/* Unlink the new chunk from the free list. */
3080	gmmR0UnlinkChunk(pChunk);
3081
3082	/** @todo rewrite this to skip the looping. */
3083	/* Allocate all pages. */
3084	GMMPAGEDESC PageDesc;
3085	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3086
3087	/* Return the first page as we'll use the whole chunk as one big page. */
3088	*pIdPage = PageDesc.idPage;
3089	*pHCPhys = PageDesc.HCPhysGCPhys;
3090
3091	for (unsigned i = 1; i < cPages; i++)
3092	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3093
3094	/* Update accounting. */
3095	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3096	pGVM->gmm.s.Stats.cPrivatePages += cPages;
3097	pGMM->cAllocatedPages += cPages;
3098
3099	gmmR0LinkChunk(pChunk, pSet);
3100	gmmR0MutexRelease(pGMM);
3101	}
3102	else
3103	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3104	}
3105	}
3106	else
3107	{
3108	gmmR0MutexRelease(pGMM);
3109	rc = VERR_GMM_IS_NOT_SANE;
3110	}
3111
3112	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3113	return rc;
3114	}
3115
3116
3117	/**
3118	* Free a large page.
3119	*
3120	* @returns VBox status code:
3121	* @param pVM Pointer to the VM.
3122	* @param idCpu The VCPU id.
3123	* @param idPage The large page id.
3124	*/
3125	GMMR0DECL(int) GMMR0FreeLargePage(PVM pVM, VMCPUID idCpu, uint32_t idPage)
3126	{
3127	LogFlow(("GMMR0FreeLargePage: pVM=%p idPage=%x\n", pVM, idPage));
3128
3129	/*
3130	* Validate, get basics and take the semaphore.
3131	*/
3132	PGMM pGMM;
3133	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3134	PGVM pGVM;
3135	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3136	if (RT_FAILURE(rc))
3137	return rc;
3138
3139	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3140	if (pGMM->fLegacyAllocationMode)
3141	return VERR_NOT_SUPPORTED;
3142
3143	gmmR0MutexAcquire(pGMM);
3144	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3145	{
3146	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3147
3148	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3149	{
3150	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3151	gmmR0MutexRelease(pGMM);
3152	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3153	}
3154
3155	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3156	if (RT_LIKELY( pPage
3157	&& GMM_PAGE_IS_PRIVATE(pPage)))
3158	{
3159	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3160	Assert(pChunk);
3161	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3162	Assert(pChunk->cPrivate > 0);
3163
3164	/* Release the memory immediately. */
3165	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3166
3167	/* Update accounting. */
3168	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3169	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3170	pGMM->cAllocatedPages -= cPages;
3171	}
3172	else
3173	rc = VERR_GMM_PAGE_NOT_FOUND;
3174	}
3175	else
3176	rc = VERR_GMM_IS_NOT_SANE;
3177
3178	gmmR0MutexRelease(pGMM);
3179	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3180	return rc;
3181	}
3182
3183
3184	/**
3185	* VMMR0 request wrapper for GMMR0FreeLargePage.
3186	*
3187	* @returns see GMMR0FreeLargePage.
3188	* @param pVM Pointer to the VM.
3189	* @param idCpu The VCPU id.
3190	* @param pReq Pointer to the request packet.
3191	*/
3192	GMMR0DECL(int) GMMR0FreeLargePageReq(PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3193	{
3194	/*
3195	* Validate input and pass it on.
3196	*/
3197	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3198	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3199	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3200	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3201	VERR_INVALID_PARAMETER);
3202
3203	return GMMR0FreeLargePage(pVM, idCpu, pReq->idPage);
3204	}
3205
3206
3207	/**
3208	* Frees a chunk, giving it back to the host OS.
3209	*
3210	* @param pGMM Pointer to the GMM instance.
3211	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3212	* unmap and free the chunk in one go.
3213	* @param pChunk The chunk to free.
3214	* @param fRelaxedSem Whether we can release the semaphore while doing the
3215	* freeing (@c true) or not.
3216	*/
3217	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3218	{
3219	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3220
3221	GMMR0CHUNKMTXSTATE MtxState;
3222	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3223
3224	/*
3225	* Cleanup hack! Unmap the chunk from the callers address space.
3226	* This shouldn't happen, so screw lock contention...
3227	*/
3228	if ( pChunk->cMappingsX
3229	&& !pGMM->fLegacyAllocationMode
3230	&& pGVM)
3231	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3232
3233	/*
3234	* If there are current mappings of the chunk, then request the
3235	* VMs to unmap them. Reposition the chunk in the free list so
3236	* it won't be a likely candidate for allocations.
3237	*/
3238	if (pChunk->cMappingsX)
3239	{
3240	/** @todo R0 -> VM request */
3241	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3242	Log(("gmmR0FreeChunk: chunk still has %d/%d mappings; don't free!\n", pChunk->cMappingsX));
3243	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3244	return false;
3245	}
3246
3247
3248	/*
3249	* Save and trash the handle.
3250	*/
3251	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3252	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3253
3254	/*
3255	* Unlink it from everywhere.
3256	*/
3257	gmmR0UnlinkChunk(pChunk);
3258
3259	RTListNodeRemove(&pChunk->ListNode);
3260
3261	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3262	Assert(pCore == &pChunk->Core); NOREF(pCore);
3263
3264	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3265	if (pTlbe->pChunk == pChunk)
3266	{
3267	pTlbe->idChunk = NIL_GMM_CHUNKID;
3268	pTlbe->pChunk = NULL;
3269	}
3270
3271	Assert(pGMM->cChunks > 0);
3272	pGMM->cChunks--;
3273
3274	/*
3275	* Free the Chunk ID before dropping the locks and freeing the rest.
3276	*/
3277	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3278	pChunk->Core.Key = NIL_GMM_CHUNKID;
3279
3280	pGMM->cFreedChunks++;
3281
3282	gmmR0ChunkMutexRelease(&MtxState, NULL);
3283	if (fRelaxedSem)
3284	gmmR0MutexRelease(pGMM);
3285
3286	RTMemFree(pChunk->paMappingsX);
3287	pChunk->paMappingsX = NULL;
3288
3289	RTMemFree(pChunk);
3290
3291	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3292	AssertLogRelRC(rc);
3293
3294	if (fRelaxedSem)
3295	gmmR0MutexAcquire(pGMM);
3296	return fRelaxedSem;
3297	}
3298
3299
3300	/**
3301	* Free page worker.
3302	*
3303	* The caller does all the statistic decrementing, we do all the incrementing.
3304	*
3305	* @param pGMM Pointer to the GMM instance data.
3306	* @param pGVM Pointer to the GVM instance.
3307	* @param pChunk Pointer to the chunk this page belongs to.
3308	* @param idPage The Page ID.
3309	* @param pPage Pointer to the page.
3310	*/
3311	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3312	{
3313	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3314	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3315
3316	/*
3317	* Put the page on the free list.
3318	*/
3319	pPage->u = 0;
3320	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3321	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3322	pPage->Free.iNext = pChunk->iFreeHead;
3323	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3324
3325	/*
3326	* Update statistics (the cShared/cPrivate stats are up to date already),
3327	* and relink the chunk if necessary.
3328	*/
3329	unsigned const cFree = pChunk->cFree;
3330	if ( !cFree
3331	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3332	{
3333	gmmR0UnlinkChunk(pChunk);
3334	pChunk->cFree++;
3335	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3336	}
3337	else
3338	{
3339	pChunk->cFree = cFree + 1;
3340	pChunk->pSet->cFreePages++;
3341	}
3342
3343	/*
3344	* If the chunk becomes empty, consider giving memory back to the host OS.
3345	*
3346	* The current strategy is to try give it back if there are other chunks
3347	* in this free list, meaning if there are at least 240 free pages in this
3348	* category. Note that since there are probably mappings of the chunk,
3349	* it won't be freed up instantly, which probably screws up this logic
3350	* a bit...
3351	*/
3352	/** @todo Do this on the way out. */
3353	if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3354	&& pChunk->pFreeNext
3355	&& pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3356	&& !pGMM->fLegacyAllocationMode))
3357	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3358
3359	}
3360
3361
3362	/**
3363	* Frees a shared page, the page is known to exist and be valid and such.
3364	*
3365	* @param pGMM Pointer to the GMM instance.
3366	* @param pGVM Pointer to the GVM instance.
3367	* @param idPage The page id.
3368	* @param pPage The page structure.
3369	*/
3370	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3371	{
3372	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3373	Assert(pChunk);
3374	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3375	Assert(pChunk->cShared > 0);
3376	Assert(pGMM->cSharedPages > 0);
3377	Assert(pGMM->cAllocatedPages > 0);
3378	Assert(!pPage->Shared.cRefs);
3379
3380	pChunk->cShared--;
3381	pGMM->cAllocatedPages--;
3382	pGMM->cSharedPages--;
3383	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3384	}
3385
3386
3387	/**
3388	* Frees a private page, the page is known to exist and be valid and such.
3389	*
3390	* @param pGMM Pointer to the GMM instance.
3391	* @param pGVM Pointer to the GVM instance.
3392	* @param idPage The page id.
3393	* @param pPage The page structure.
3394	*/
3395	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3396	{
3397	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3398	Assert(pChunk);
3399	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3400	Assert(pChunk->cPrivate > 0);
3401	Assert(pGMM->cAllocatedPages > 0);
3402
3403	pChunk->cPrivate--;
3404	pGMM->cAllocatedPages--;
3405	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3406	}
3407
3408
3409	/**
3410	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3411	*
3412	* @returns VBox status code:
3413	* @retval xxx
3414	*
3415	* @param pGMM Pointer to the GMM instance data.
3416	* @param pGVM Pointer to the VM.
3417	* @param cPages The number of pages to free.
3418	* @param paPages Pointer to the page descriptors.
3419	* @param enmAccount The account this relates to.
3420	*/
3421	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3422	{
3423	/*
3424	* Check that the request isn't impossible wrt to the account status.
3425	*/
3426	switch (enmAccount)
3427	{
3428	case GMMACCOUNT_BASE:
3429	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3430	{
3431	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3432	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3433	}
3434	break;
3435	case GMMACCOUNT_SHADOW:
3436	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3437	{
3438	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3439	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3440	}
3441	break;
3442	case GMMACCOUNT_FIXED:
3443	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3444	{
3445	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3446	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3447	}
3448	break;
3449	default:
3450	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3451	}
3452
3453	/*
3454	* Walk the descriptors and free the pages.
3455	*
3456	* Statistics (except the account) are being updated as we go along,
3457	* unlike the alloc code. Also, stop on the first error.
3458	*/
3459	int rc = VINF_SUCCESS;
3460	uint32_t iPage;
3461	for (iPage = 0; iPage < cPages; iPage++)
3462	{
3463	uint32_t idPage = paPages[iPage].idPage;
3464	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3465	if (RT_LIKELY(pPage))
3466	{
3467	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3468	{
3469	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3470	{
3471	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3472	pGVM->gmm.s.Stats.cPrivatePages--;
3473	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3474	}
3475	else
3476	{
3477	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3478	pPage->Private.hGVM, pGVM->hSelf));
3479	rc = VERR_GMM_NOT_PAGE_OWNER;
3480	break;
3481	}
3482	}
3483	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3484	{
3485	Assert(pGVM->gmm.s.Stats.cSharedPages);
3486	Assert(pPage->Shared.cRefs);
3487	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3488	if (pPage->Shared.u14Checksum)
3489	{
3490	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3491	uChecksum &= UINT32_C(0x00003fff);
3492	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3493	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3494	}
3495	#endif
3496	pGVM->gmm.s.Stats.cSharedPages--;
3497	if (!--pPage->Shared.cRefs)
3498	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3499	else
3500	{
3501	Assert(pGMM->cDuplicatePages);
3502	pGMM->cDuplicatePages--;
3503	}
3504	}
3505	else
3506	{
3507	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3508	rc = VERR_GMM_PAGE_ALREADY_FREE;
3509	break;
3510	}
3511	}
3512	else
3513	{
3514	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3515	rc = VERR_GMM_PAGE_NOT_FOUND;
3516	break;
3517	}
3518	paPages[iPage].idPage = NIL_GMM_PAGEID;
3519	}
3520
3521	/*
3522	* Update the account.
3523	*/
3524	switch (enmAccount)
3525	{
3526	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3527	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3528	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3529	default:
3530	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3531	}
3532
3533	/*
3534	* Any threshold stuff to be done here?
3535	*/
3536
3537	return rc;
3538	}
3539
3540
3541	/**
3542	* Free one or more pages.
3543	*
3544	* This is typically used at reset time or power off.
3545	*
3546	* @returns VBox status code:
3547	* @retval xxx
3548	*
3549	* @param pVM Pointer to the VM.
3550	* @param idCpu The VCPU id.
3551	* @param cPages The number of pages to allocate.
3552	* @param paPages Pointer to the page descriptors containing the Page IDs for each page.
3553	* @param enmAccount The account this relates to.
3554	* @thread EMT.
3555	*/
3556	GMMR0DECL(int) GMMR0FreePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3557	{
3558	LogFlow(("GMMR0FreePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
3559
3560	/*
3561	* Validate input and get the basics.
3562	*/
3563	PGMM pGMM;
3564	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3565	PGVM pGVM;
3566	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3567	if (RT_FAILURE(rc))
3568	return rc;
3569
3570	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3571	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3572	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3573
3574	for (unsigned iPage = 0; iPage < cPages; iPage++)
3575	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3576	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3577	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3578
3579	/*
3580	* Take the semaphore and call the worker function.
3581	*/
3582	gmmR0MutexAcquire(pGMM);
3583	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3584	{
3585	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3586	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3587	}
3588	else
3589	rc = VERR_GMM_IS_NOT_SANE;
3590	gmmR0MutexRelease(pGMM);
3591	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3592	return rc;
3593	}
3594
3595
3596	/**
3597	* VMMR0 request wrapper for GMMR0FreePages.
3598	*
3599	* @returns see GMMR0FreePages.
3600	* @param pVM Pointer to the VM.
3601	* @param idCpu The VCPU id.
3602	* @param pReq Pointer to the request packet.
3603	*/
3604	GMMR0DECL(int) GMMR0FreePagesReq(PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3605	{
3606	/*
3607	* Validate input and pass it on.
3608	*/
3609	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3610	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3611	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3612	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3613	VERR_INVALID_PARAMETER);
3614	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3615	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3616	VERR_INVALID_PARAMETER);
3617
3618	return GMMR0FreePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3619	}
3620
3621
3622	/**
3623	* Report back on a memory ballooning request.
3624	*
3625	* The request may or may not have been initiated by the GMM. If it was initiated
3626	* by the GMM it is important that this function is called even if no pages were
3627	* ballooned.
3628	*
3629	* @returns VBox status code:
3630	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3631	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3632	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3633	* indicating that we won't necessarily have sufficient RAM to boot
3634	* the VM again and that it should pause until this changes (we'll try
3635	* balloon some other VM). (For standard deflate we have little choice
3636	* but to hope the VM won't use the memory that was returned to it.)
3637	*
3638	* @param pVM Pointer to the VM.
3639	* @param idCpu The VCPU id.
3640	* @param enmAction Inflate/deflate/reset.
3641	* @param cBalloonedPages The number of pages that was ballooned.
3642	*
3643	* @thread EMT.
3644	*/
3645	GMMR0DECL(int) GMMR0BalloonedPages(PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3646	{
3647	LogFlow(("GMMR0BalloonedPages: pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3648	pVM, enmAction, cBalloonedPages));
3649
3650	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3651
3652	/*
3653	* Validate input and get the basics.
3654	*/
3655	PGMM pGMM;
3656	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3657	PGVM pGVM;
3658	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3659	if (RT_FAILURE(rc))
3660	return rc;
3661
3662	/*
3663	* Take the semaphore and do some more validations.
3664	*/
3665	gmmR0MutexAcquire(pGMM);
3666	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3667	{
3668	switch (enmAction)
3669	{
3670	case GMMBALLOONACTION_INFLATE:
3671	{
3672	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3673	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3674	{
3675	/*
3676	* Record the ballooned memory.
3677	*/
3678	pGMM->cBalloonedPages += cBalloonedPages;
3679	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3680	{
3681	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3682	AssertFailed();
3683
3684	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3685	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3686	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3687	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3688	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3689	}
3690	else
3691	{
3692	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3693	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3694	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3695	}
3696	}
3697	else
3698	{
3699	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3700	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3701	pGVM->gmm.s.Stats.Reserved.cBasePages));
3702	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3703	}
3704	break;
3705	}
3706
3707	case GMMBALLOONACTION_DEFLATE:
3708	{
3709	/* Deflate. */
3710	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3711	{
3712	/*
3713	* Record the ballooned memory.
3714	*/
3715	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3716	pGMM->cBalloonedPages -= cBalloonedPages;
3717	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3718	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3719	{
3720	AssertFailed(); /* This is path is for later. */
3721	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3722	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3723
3724	/*
3725	* Anything we need to do here now when the request has been completed?
3726	*/
3727	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3728	}
3729	else
3730	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3731	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3732	}
3733	else
3734	{
3735	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3736	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3737	}
3738	break;
3739	}
3740
3741	case GMMBALLOONACTION_RESET:
3742	{
3743	/* Reset to an empty balloon. */
3744	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3745
3746	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3747	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3748	break;
3749	}
3750
3751	default:
3752	rc = VERR_INVALID_PARAMETER;
3753	break;
3754	}
3755	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3756	}
3757	else
3758	rc = VERR_GMM_IS_NOT_SANE;
3759
3760	gmmR0MutexRelease(pGMM);
3761	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3762	return rc;
3763	}
3764
3765
3766	/**
3767	* VMMR0 request wrapper for GMMR0BalloonedPages.
3768	*
3769	* @returns see GMMR0BalloonedPages.
3770	* @param pVM Pointer to the VM.
3771	* @param idCpu The VCPU id.
3772	* @param pReq Pointer to the request packet.
3773	*/
3774	GMMR0DECL(int) GMMR0BalloonedPagesReq(PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3775	{
3776	/*
3777	* Validate input and pass it on.
3778	*/
3779	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3780	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3781	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3782	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3783	VERR_INVALID_PARAMETER);
3784
3785	return GMMR0BalloonedPages(pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3786	}
3787
3788	/**
3789	* Return memory statistics for the hypervisor
3790	*
3791	* @returns VBox status code:
3792	* @param pVM Pointer to the VM.
3793	* @param pReq Pointer to the request packet.
3794	*/
3795	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PVM pVM, PGMMMEMSTATSREQ pReq)
3796	{
3797	/*
3798	* Validate input and pass it on.
3799	*/
3800	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3801	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3802	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3803	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3804	VERR_INVALID_PARAMETER);
3805
3806	/*
3807	* Validate input and get the basics.
3808	*/
3809	PGMM pGMM;
3810	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3811	pReq->cAllocPages = pGMM->cAllocatedPages;
3812	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3813	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3814	pReq->cMaxPages = pGMM->cMaxPages;
3815	pReq->cSharedPages = pGMM->cDuplicatePages;
3816	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3817
3818	return VINF_SUCCESS;
3819	}
3820
3821	/**
3822	* Return memory statistics for the VM
3823	*
3824	* @returns VBox status code:
3825	* @param pVM Pointer to the VM.
3826	* @parma idCpu Cpu id.
3827	* @param pReq Pointer to the request packet.
3828	*/
3829	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3830	{
3831	/*
3832	* Validate input and pass it on.
3833	*/
3834	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3835	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3836	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3837	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3838	VERR_INVALID_PARAMETER);
3839
3840	/*
3841	* Validate input and get the basics.
3842	*/
3843	PGMM pGMM;
3844	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3845	PGVM pGVM;
3846	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3847	if (RT_FAILURE(rc))
3848	return rc;
3849
3850	/*
3851	* Take the semaphore and do some more validations.
3852	*/
3853	gmmR0MutexAcquire(pGMM);
3854	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3855	{
3856	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3857	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3858	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3859	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3860	}
3861	else
3862	rc = VERR_GMM_IS_NOT_SANE;
3863
3864	gmmR0MutexRelease(pGMM);
3865	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3866	return rc;
3867	}
3868
3869
3870	/**
3871	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3872	*
3873	* Don't call this in legacy allocation mode!
3874	*
3875	* @returns VBox status code.
3876	* @param pGMM Pointer to the GMM instance data.
3877	* @param pGVM Pointer to the Global VM structure.
3878	* @param pChunk Pointer to the chunk to be unmapped.
3879	*/
3880	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3881	{
3882	Assert(!pGMM->fLegacyAllocationMode);
3883
3884	/*
3885	* Find the mapping and try unmapping it.
3886	*/
3887	uint32_t cMappings = pChunk->cMappingsX;
3888	for (uint32_t i = 0; i < cMappings; i++)
3889	{
3890	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3891	if (pChunk->paMappingsX[i].pGVM == pGVM)
3892	{
3893	/* unmap */
3894	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3895	if (RT_SUCCESS(rc))
3896	{
3897	/* update the record. */
3898	cMappings--;
3899	if (i < cMappings)
3900	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3901	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3902	pChunk->paMappingsX[cMappings].pGVM = NULL;
3903	Assert(pChunk->cMappingsX - 1U == cMappings);
3904	pChunk->cMappingsX = cMappings;
3905	}
3906
3907	return rc;
3908	}
3909	}
3910
3911	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3912	return VERR_GMM_CHUNK_NOT_MAPPED;
3913	}
3914
3915
3916	/**
3917	* Unmaps a chunk previously mapped into the address space of the current process.
3918	*
3919	* @returns VBox status code.
3920	* @param pGMM Pointer to the GMM instance data.
3921	* @param pGVM Pointer to the Global VM structure.
3922	* @param pChunk Pointer to the chunk to be unmapped.
3923	*/
3924	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3925	{
3926	if (!pGMM->fLegacyAllocationMode)
3927	{
3928	/*
3929	* Lock the chunk and if possible leave the giant GMM lock.
3930	*/
3931	GMMR0CHUNKMTXSTATE MtxState;
3932	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3933	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3934	if (RT_SUCCESS(rc))
3935	{
3936	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3937	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3938	}
3939	return rc;
3940	}
3941
3942	if (pChunk->hGVM == pGVM->hSelf)
3943	return VINF_SUCCESS;
3944
3945	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3946	return VERR_GMM_CHUNK_NOT_MAPPED;
3947	}
3948
3949
3950	/**
3951	* Worker for gmmR0MapChunk.
3952	*
3953	* @returns VBox status code.
3954	* @param pGMM Pointer to the GMM instance data.
3955	* @param pGVM Pointer to the Global VM structure.
3956	* @param pChunk Pointer to the chunk to be mapped.
3957	* @param ppvR3 Where to store the ring-3 address of the mapping.
3958	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3959	* contain the address of the existing mapping.
3960	*/
3961	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3962	{
3963	/*
3964	* If we're in legacy mode this is simple.
3965	*/
3966	if (pGMM->fLegacyAllocationMode)
3967	{
3968	if (pChunk->hGVM != pGVM->hSelf)
3969	{
3970	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3971	return VERR_GMM_CHUNK_NOT_FOUND;
3972	}
3973
3974	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3975	return VINF_SUCCESS;
3976	}
3977
3978	/*
3979	* Check to see if the chunk is already mapped.
3980	*/
3981	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3982	{
3983	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3984	if (pChunk->paMappingsX[i].pGVM == pGVM)
3985	{
3986	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3987	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3988	#ifdef VBOX_WITH_PAGE_SHARING
3989	/* The ring-3 chunk cache can be out of sync; don't fail. */
3990	return VINF_SUCCESS;
3991	#else
3992	return VERR_GMM_CHUNK_ALREADY_MAPPED;
3993	#endif
3994	}
3995	}
3996
3997	/*
3998	* Do the mapping.
3999	*/
4000	RTR0MEMOBJ hMapObj;
4001	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4002	if (RT_SUCCESS(rc))
4003	{
4004	/* reallocate the array? assumes few users per chunk (usually one). */
4005	unsigned iMapping = pChunk->cMappingsX;
4006	if ( iMapping <= 3
4007	\|\| (iMapping & 3) == 0)
4008	{
4009	unsigned cNewSize = iMapping <= 3
4010	? iMapping + 1
4011	: iMapping + 4;
4012	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
4013	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4014	{
4015	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4016	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4017	}
4018
4019	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
4020	if (RT_UNLIKELY(!pvMappings))
4021	{
4022	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4023	return VERR_NO_MEMORY;
4024	}
4025	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4026	}
4027
4028	/* insert new entry */
4029	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4030	pChunk->paMappingsX[iMapping].pGVM = pGVM;
4031	Assert(pChunk->cMappingsX == iMapping);
4032	pChunk->cMappingsX = iMapping + 1;
4033
4034	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
4035	}
4036
4037	return rc;
4038	}
4039
4040
4041	/**
4042	* Maps a chunk into the user address space of the current process.
4043	*
4044	* @returns VBox status code.
4045	* @param pGMM Pointer to the GMM instance data.
4046	* @param pGVM Pointer to the Global VM structure.
4047	* @param pChunk Pointer to the chunk to be mapped.
4048	* @param fRelaxedSem Whether we can release the semaphore while doing the
4049	* mapping (@c true) or not.
4050	* @param ppvR3 Where to store the ring-3 address of the mapping.
4051	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4052	* contain the address of the existing mapping.
4053	*/
4054	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4055	{
4056	/*
4057	* Take the chunk lock and leave the giant GMM lock when possible, then
4058	* call the worker function.
4059	*/
4060	GMMR0CHUNKMTXSTATE MtxState;
4061	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4062	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4063	if (RT_SUCCESS(rc))
4064	{
4065	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4066	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4067	}
4068
4069	return rc;
4070	}
4071
4072
4073
4074	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4075	/**
4076	* Check if a chunk is mapped into the specified VM
4077	*
4078	* @returns mapped yes/no
4079	* @param pGMM Pointer to the GMM instance.
4080	* @param pGVM Pointer to the Global VM structure.
4081	* @param pChunk Pointer to the chunk to be mapped.
4082	* @param ppvR3 Where to store the ring-3 address of the mapping.
4083	*/
4084	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4085	{
4086	GMMR0CHUNKMTXSTATE MtxState;
4087	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4088	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4089	{
4090	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4091	if (pChunk->paMappingsX[i].pGVM == pGVM)
4092	{
4093	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4094	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4095	return true;
4096	}
4097	}
4098	*ppvR3 = NULL;
4099	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4100	return false;
4101	}
4102	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
4103
4104
4105	/**
4106	* Map a chunk and/or unmap another chunk.
4107	*
4108	* The mapping and unmapping applies to the current process.
4109	*
4110	* This API does two things because it saves a kernel call per mapping when
4111	* when the ring-3 mapping cache is full.
4112	*
4113	* @returns VBox status code.
4114	* @param pVM The VM.
4115	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4116	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4117	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4118	* @thread EMT
4119	*/
4120	GMMR0DECL(int) GMMR0MapUnmapChunk(PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4121	{
4122	LogFlow(("GMMR0MapUnmapChunk: pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4123	pVM, idChunkMap, idChunkUnmap, ppvR3));
4124
4125	/*
4126	* Validate input and get the basics.
4127	*/
4128	PGMM pGMM;
4129	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4130	PGVM pGVM;
4131	int rc = GVMMR0ByVM(pVM, &pGVM);
4132	if (RT_FAILURE(rc))
4133	return rc;
4134
4135	AssertCompile(NIL_GMM_CHUNKID == 0);
4136	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4137	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4138
4139	if ( idChunkMap == NIL_GMM_CHUNKID
4140	&& idChunkUnmap == NIL_GMM_CHUNKID)
4141	return VERR_INVALID_PARAMETER;
4142
4143	if (idChunkMap != NIL_GMM_CHUNKID)
4144	{
4145	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4146	*ppvR3 = NIL_RTR3PTR;
4147	}
4148
4149	/*
4150	* Take the semaphore and do the work.
4151	*
4152	* The unmapping is done last since it's easier to undo a mapping than
4153	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4154	* that it pushes the user virtual address space to within a chunk of
4155	* it it's limits, so, no problem here.
4156	*/
4157	gmmR0MutexAcquire(pGMM);
4158	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4159	{
4160	PGMMCHUNK pMap = NULL;
4161	if (idChunkMap != NIL_GVM_HANDLE)
4162	{
4163	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4164	if (RT_LIKELY(pMap))
4165	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4166	else
4167	{
4168	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4169	rc = VERR_GMM_CHUNK_NOT_FOUND;
4170	}
4171	}
4172	/** @todo split this operation, the bail out might (theoretcially) not be
4173	* entirely safe. */
4174
4175	if ( idChunkUnmap != NIL_GMM_CHUNKID
4176	&& RT_SUCCESS(rc))
4177	{
4178	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4179	if (RT_LIKELY(pUnmap))
4180	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4181	else
4182	{
4183	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4184	rc = VERR_GMM_CHUNK_NOT_FOUND;
4185	}
4186
4187	if (RT_FAILURE(rc) && pMap)
4188	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4189	}
4190
4191	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4192	}
4193	else
4194	rc = VERR_GMM_IS_NOT_SANE;
4195	gmmR0MutexRelease(pGMM);
4196
4197	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4198	return rc;
4199	}
4200
4201
4202	/**
4203	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4204	*
4205	* @returns see GMMR0MapUnmapChunk.
4206	* @param pVM Pointer to the VM.
4207	* @param pReq Pointer to the request packet.
4208	*/
4209	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4210	{
4211	/*
4212	* Validate input and pass it on.
4213	*/
4214	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4215	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4216	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4217
4218	return GMMR0MapUnmapChunk(pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4219	}
4220
4221
4222	/**
4223	* Legacy mode API for supplying pages.
4224	*
4225	* The specified user address points to a allocation chunk sized block that
4226	* will be locked down and used by the GMM when the GM asks for pages.
4227	*
4228	* @returns VBox status code.
4229	* @param pVM Pointer to the VM.
4230	* @param idCpu The VCPU id.
4231	* @param pvR3 Pointer to the chunk size memory block to lock down.
4232	*/
4233	GMMR0DECL(int) GMMR0SeedChunk(PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4234	{
4235	/*
4236	* Validate input and get the basics.
4237	*/
4238	PGMM pGMM;
4239	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4240	PGVM pGVM;
4241	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4242	if (RT_FAILURE(rc))
4243	return rc;
4244
4245	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4246	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4247
4248	if (!pGMM->fLegacyAllocationMode)
4249	{
4250	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4251	return VERR_NOT_SUPPORTED;
4252	}
4253
4254	/*
4255	* Lock the memory and add it as new chunk with our hGVM.
4256	* (The GMM locking is done inside gmmR0RegisterChunk.)
4257	*/
4258	RTR0MEMOBJ MemObj;
4259	rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4260	if (RT_SUCCESS(rc))
4261	{
4262	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /fChunkFlags/, NULL);
4263	if (RT_SUCCESS(rc))
4264	gmmR0MutexRelease(pGMM);
4265	else
4266	RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4267	}
4268
4269	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4270	return rc;
4271	}
4272
4273	#ifdef VBOX_WITH_PAGE_SHARING
4274
4275	# ifdef VBOX_STRICT
4276	/**
4277	* For checksumming shared pages in strict builds.
4278	*
4279	* The purpose is making sure that a page doesn't change.
4280	*
4281	* @returns Checksum, 0 on failure.
4282	* @param GMM The GMM instance data.
4283	* @param idPage The page ID.
4284	*/
4285	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4286	{
4287	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4288	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4289
4290	uint8_t *pbChunk;
4291	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4292	return 0;
4293	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4294
4295	return RTCrc32(pbPage, PAGE_SIZE);
4296	}
4297	# endif /* VBOX_STRICT */
4298
4299
4300	/**
4301	* Calculates the module hash value.
4302	*
4303	* @returns Hash value.
4304	* @param pszModuleName The module name.
4305	* @param pszVersion The module version string.
4306	*/
4307	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4308	{
4309	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4310	}
4311
4312
4313	/**
4314	* Finds a global module.
4315	*
4316	* @returns Pointer to the global module on success, NULL if not found.
4317	* @param pGMM The GMM instance data.
4318	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4319	* @param cbModule The module size.
4320	* @param enmGuestOS The guest OS type.
4321	* @param pszModuleName The module name.
4322	* @param pszVersion The module version.
4323	*/
4324	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4325	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4326	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4327	{
4328	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4329	pGblMod;
4330	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4331	{
4332	if (pGblMod->cbModule != cbModule)
4333	continue;
4334	if (pGblMod->enmGuestOS != enmGuestOS)
4335	continue;
4336	if (pGblMod->cRegions != cRegions)
4337	continue;
4338	if (strcmp(pGblMod->szName, pszModuleName))
4339	continue;
4340	if (strcmp(pGblMod->szVersion, pszVersion))
4341	continue;
4342
4343	uint32_t i;
4344	for (i = 0; i < cRegions; i++)
4345	{
4346	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4347	if (pGblMod->aRegions[i].off != off)
4348	break;
4349
4350	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4351	if (pGblMod->aRegions[i].cb != cb)
4352	break;
4353	}
4354
4355	if (i == cRegions)
4356	return pGblMod;
4357	}
4358
4359	return NULL;
4360	}
4361
4362
4363	/**
4364	* Creates a new global module.
4365	*
4366	* @returns VBox status code.
4367	* @param pGMM The GMM instance data.
4368	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4369	* @param cbModule The module size.
4370	* @param enmGuestOS The guest OS type.
4371	* @param cRegions The number of regions.
4372	* @param pszModuleName The module name.
4373	* @param pszVersion The module version.
4374	* @param paRegions The region descriptions.
4375	* @param ppGblMod Where to return the new module on success.
4376	*/
4377	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4378	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4379	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4380	{
4381	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, cRegions));
4382	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4383	{
4384	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4385	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4386	}
4387
4388	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULE, aRegions[cRegions]));
4389	if (!pGblMod)
4390	{
4391	Log(("gmmR0ShModNewGlobal: No memory\n"));
4392	return VERR_NO_MEMORY;
4393	}
4394
4395	pGblMod->Core.Key = uHash;
4396	pGblMod->cbModule = cbModule;
4397	pGblMod->cRegions = cRegions;
4398	pGblMod->cUsers = 1;
4399	pGblMod->enmGuestOS = enmGuestOS;
4400	strcpy(pGblMod->szName, pszModuleName);
4401	strcpy(pGblMod->szVersion, pszVersion);
4402
4403	for (uint32_t i = 0; i < cRegions; i++)
4404	{
4405	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4406	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4407	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4408	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4409	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4410	}
4411
4412	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4413	Assert(fInsert); NOREF(fInsert);
4414	pGMM->cShareableModules++;
4415
4416	*ppGblMod = pGblMod;
4417	return VINF_SUCCESS;
4418	}
4419
4420
4421	/**
4422	* Deletes a global module which is no longer referenced by anyone.
4423	*
4424	* @param pGMM The GMM instance data.
4425	* @param pGblMod The module to delete.
4426	*/
4427	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4428	{
4429	Assert(pGblMod->cUsers == 0);
4430	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4431
4432	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4433	Assert(pvTest == pGblMod); NOREF(pvTest);
4434	pGMM->cShareableModules--;
4435
4436	uint32_t i = pGblMod->cRegions;
4437	while (i-- > 0)
4438	{
4439	if (pGblMod->aRegions[i].paidPages)
4440	{
4441	/* We don't doing anything to the pages as they are handled by the
4442	copy-on-write mechanism in PGM. */
4443	RTMemFree(pGblMod->aRegions[i].paidPages);
4444	pGblMod->aRegions[i].paidPages = NULL;
4445	}
4446	}
4447	RTMemFree(pGblMod);
4448	}
4449
4450
4451	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4452	PGMMSHAREDMODULEPERVM *ppRecVM)
4453	{
4454	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4455	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4456
4457	PGMMSHAREDMODULEPERVM pRecVM;
4458	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4459	if (!pRecVM)
4460	return VERR_NO_MEMORY;
4461
4462	pRecVM->Core.Key = GCBaseAddr;
4463	for (uint32_t i = 0; i < cRegions; i++)
4464	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4465
4466	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4467	Assert(fInsert); NOREF(fInsert);
4468	pGVM->gmm.s.Stats.cShareableModules++;
4469
4470	*ppRecVM = pRecVM;
4471	return VINF_SUCCESS;
4472	}
4473
4474
4475	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4476	{
4477	/*
4478	* Free the per-VM module.
4479	*/
4480	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4481	pRecVM->pGlobalModule = NULL;
4482
4483	if (fRemove)
4484	{
4485	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4486	Assert(pvTest == &pRecVM->Core);
4487	}
4488
4489	RTMemFree(pRecVM);
4490
4491	/*
4492	* Release the global module.
4493	* (In the registration bailout case, it might not be.)
4494	*/
4495	if (pGblMod)
4496	{
4497	Assert(pGblMod->cUsers > 0);
4498	pGblMod->cUsers--;
4499	if (pGblMod->cUsers == 0)
4500	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4501	}
4502	}
4503
4504	#endif /* VBOX_WITH_PAGE_SHARING */
4505
4506	/**
4507	* Registers a new shared module for the VM.
4508	*
4509	* @returns VBox status code.
4510	* @param pVM Pointer to the VM.
4511	* @param idCpu The VCPU id.
4512	* @param enmGuestOS The guest OS type.
4513	* @param pszModuleName The module name.
4514	* @param pszVersion The module version.
4515	* @param GCPtrModBase The module base address.
4516	* @param cbModule The module size.
4517	* @param cRegions The mumber of shared region descriptors.
4518	* @param paRegions Pointer to an array of shared region(s).
4519	*/
4520	GMMR0DECL(int) GMMR0RegisterSharedModule(PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4521	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4522	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4523	{
4524	#ifdef VBOX_WITH_PAGE_SHARING
4525	/*
4526	* Validate input and get the basics.
4527	*
4528	* Note! Turns out the module size does necessarily match the size of the
4529	* regions. (iTunes on XP)
4530	*/
4531	PGMM pGMM;
4532	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4533	PGVM pGVM;
4534	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4535	if (RT_FAILURE(rc))
4536	return rc;
4537
4538	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4539	return VERR_GMM_TOO_MANY_REGIONS;
4540
4541	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4542	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4543
4544	uint32_t cbTotal = 0;
4545	for (uint32_t i = 0; i < cRegions; i++)
4546	{
4547	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4548	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4549
4550	cbTotal += paRegions[i].cbRegion;
4551	if (RT_UNLIKELY(cbTotal > _1G))
4552	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4553	}
4554
4555	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4556	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4557	return VERR_GMM_MODULE_NAME_TOO_LONG;
4558
4559	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4560	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4561	return VERR_GMM_MODULE_NAME_TOO_LONG;
4562
4563	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4564	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4565
4566	/*
4567	* Take the semaphore and do some more validations.
4568	*/
4569	gmmR0MutexAcquire(pGMM);
4570	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4571	{
4572	/*
4573	* Check if this module is already locally registered and register
4574	* it if it isn't. The base address is a unique module identifier
4575	* locally.
4576	*/
4577	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4578	bool fNewModule = pRecVM == NULL;
4579	if (fNewModule)
4580	{
4581	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4582	if (RT_SUCCESS(rc))
4583	{
4584	/*
4585	* Find a matching global module, register a new one if needed.
4586	*/
4587	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4588	pszModuleName, pszVersion, paRegions);
4589	if (!pGblMod)
4590	{
4591	Assert(fNewModule);
4592	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4593	pszModuleName, pszVersion, paRegions, &pGblMod);
4594	if (RT_SUCCESS(rc))
4595	{
4596	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4597	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4598	}
4599	else
4600	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4601	}
4602	else
4603	{
4604	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4605	pGblMod->cUsers++;
4606	pRecVM->pGlobalModule = pGblMod;
4607
4608	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4609	}
4610	}
4611	}
4612	else
4613	{
4614	/*
4615	* Attempt to re-register an existing module.
4616	*/
4617	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4618	pszModuleName, pszVersion, paRegions);
4619	if (pRecVM->pGlobalModule == pGblMod)
4620	{
4621	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4622	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4623	}
4624	else
4625	{
4626	/** @todo may have to unregister+register when this happens in case it's caused
4627	* by VBoxService crashing and being restarted... */
4628	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4629	" incoming at %RGvLB%#x %s %s rgns %u\n"
4630	" existing at %RGvLB%#x %s %s rgns %u\n",
4631	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4632	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4633	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4634	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4635	}
4636	}
4637	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4638	}
4639	else
4640	rc = VERR_GMM_IS_NOT_SANE;
4641
4642	gmmR0MutexRelease(pGMM);
4643	return rc;
4644	#else
4645
4646	NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4647	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4648	return VERR_NOT_IMPLEMENTED;
4649	#endif
4650	}
4651
4652
4653	/**
4654	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4655	*
4656	* @returns see GMMR0RegisterSharedModule.
4657	* @param pVM Pointer to the VM.
4658	* @param idCpu The VCPU id.
4659	* @param pReq Pointer to the request packet.
4660	*/
4661	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4662	{
4663	/*
4664	* Validate input and pass it on.
4665	*/
4666	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4667	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4668	AssertMsgReturn(pReq->Hdr.cbReq >= sizeof(pReq) && pReq->Hdr.cbReq == RT_UOFFSETOF(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4669
4670	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4671	pReq->rc = GMMR0RegisterSharedModule(pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4672	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4673	return VINF_SUCCESS;
4674	}
4675
4676
4677	/**
4678	* Unregisters a shared module for the VM
4679	*
4680	* @returns VBox status code.
4681	* @param pVM Pointer to the VM.
4682	* @param idCpu The VCPU id.
4683	* @param pszModuleName The module name.
4684	* @param pszVersion The module version.
4685	* @param GCPtrModBase The module base address.
4686	* @param cbModule The module size.
4687	*/
4688	GMMR0DECL(int) GMMR0UnregisterSharedModule(PVM pVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4689	RTGCPTR GCPtrModBase, uint32_t cbModule)
4690	{
4691	#ifdef VBOX_WITH_PAGE_SHARING
4692	/*
4693	* Validate input and get the basics.
4694	*/
4695	PGMM pGMM;
4696	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4697	PGVM pGVM;
4698	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4699	if (RT_FAILURE(rc))
4700	return rc;
4701
4702	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4703	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4704	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4705	return VERR_GMM_MODULE_NAME_TOO_LONG;
4706	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4707	return VERR_GMM_MODULE_NAME_TOO_LONG;
4708
4709	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4710
4711	/*
4712	* Take the semaphore and do some more validations.
4713	*/
4714	gmmR0MutexAcquire(pGMM);
4715	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4716	{
4717	/*
4718	* Locate and remove the specified module.
4719	*/
4720	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4721	if (pRecVM)
4722	{
4723	/** @todo Do we need to do more validations here, like that the
4724	* name + version + cbModule matches? */
4725	Assert(pRecVM->pGlobalModule);
4726	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4727	}
4728	else
4729	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4730
4731	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4732	}
4733	else
4734	rc = VERR_GMM_IS_NOT_SANE;
4735
4736	gmmR0MutexRelease(pGMM);
4737	return rc;
4738	#else
4739
4740	NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4741	return VERR_NOT_IMPLEMENTED;
4742	#endif
4743	}
4744
4745
4746	/**
4747	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4748	*
4749	* @returns see GMMR0UnregisterSharedModule.
4750	* @param pVM Pointer to the VM.
4751	* @param idCpu The VCPU id.
4752	* @param pReq Pointer to the request packet.
4753	*/
4754	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4755	{
4756	/*
4757	* Validate input and pass it on.
4758	*/
4759	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4760	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4761	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4762
4763	return GMMR0UnregisterSharedModule(pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4764	}
4765
4766	#ifdef VBOX_WITH_PAGE_SHARING
4767
4768	/**
4769	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4770	*
4771	* @param pGMM Pointer to the GMM instance.
4772	* @param pGVM Pointer to the GVM instance.
4773	* @param pPage The page structure.
4774	*/
4775	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4776	{
4777	Assert(pGMM->cSharedPages > 0);
4778	Assert(pGMM->cAllocatedPages > 0);
4779
4780	pGMM->cDuplicatePages++;
4781
4782	pPage->Shared.cRefs++;
4783	pGVM->gmm.s.Stats.cSharedPages++;
4784	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4785	}
4786
4787
4788	/**
4789	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4790	*
4791	* @param pGMM Pointer to the GMM instance.
4792	* @param pGVM Pointer to the GVM instance.
4793	* @param HCPhys Host physical address
4794	* @param idPage The Page ID
4795	* @param pPage The page structure.
4796	*/
4797	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4798	PGMMSHAREDPAGEDESC pPageDesc)
4799	{
4800	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4801	Assert(pChunk);
4802	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4803	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4804
4805	pChunk->cPrivate--;
4806	pChunk->cShared++;
4807
4808	pGMM->cSharedPages++;
4809
4810	pGVM->gmm.s.Stats.cSharedPages++;
4811	pGVM->gmm.s.Stats.cPrivatePages--;
4812
4813	/* Modify the page structure. */
4814	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4815	pPage->Shared.cRefs = 1;
4816	#ifdef VBOX_STRICT
4817	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4818	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4819	#else
4820	pPage->Shared.u14Checksum = 0;
4821	#endif
4822	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
4823	}
4824
4825
4826	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4827	unsigned idxRegion, unsigned idxPage,
4828	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4829	{
4830	NOREF(pModule);
4831
4832	/* Easy case: just change the internal page type. */
4833	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
4834	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
4835	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
4836	VERR_PGM_PHYS_INVALID_PAGE_ID);
4837
4838	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4839
4840	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
4841
4842	/* Keep track of these references. */
4843	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
4844
4845	return VINF_SUCCESS;
4846	}
4847
4848	/**
4849	* Checks specified shared module range for changes
4850	*
4851	* Performs the following tasks:
4852	* - If a shared page is new, then it changes the GMM page type to shared and
4853	* returns it in the pPageDesc descriptor.
4854	* - If a shared page already exists, then it checks if the VM page is
4855	* identical and if so frees the VM page and returns the shared page in
4856	* pPageDesc descriptor.
4857	*
4858	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
4859	*
4860	* @returns VBox status code.
4861	* @param pGMM Pointer to the GMM instance data.
4862	* @param pGVM Pointer to the GVM instance data.
4863	* @param pModule Module description
4864	* @param idxRegion Region index
4865	* @param idxPage Page index
4866	* @param paPageDesc Page descriptor
4867	*/
4868	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
4869	PGMMSHAREDPAGEDESC pPageDesc)
4870	{
4871	int rc;
4872	PGMM pGMM;
4873	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4874	pPageDesc->u32StrictChecksum = 0;
4875
4876	AssertMsgReturn(idxRegion < pModule->cRegions,
4877	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4878	VERR_INVALID_PARAMETER);
4879
4880	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
4881	AssertMsgReturn(idxPage < cPages,
4882	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4883	VERR_INVALID_PARAMETER);
4884
4885	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4886
4887	/*
4888	* First time; create a page descriptor array.
4889	*/
4890	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4891	if (!pGlobalRegion->paidPages)
4892	{
4893	Log(("Allocate page descriptor array for %d pages\n", cPages));
4894	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
4895	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
4896
4897	/* Invalidate all descriptors. */
4898	uint32_t i = cPages;
4899	while (i-- > 0)
4900	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
4901	}
4902
4903	/*
4904	* We've seen this shared page for the first time?
4905	*/
4906	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
4907	{
4908	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4909	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4910	}
4911
4912	/*
4913	* We've seen it before...
4914	*/
4915	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
4916	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
4917	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
4918
4919	/*
4920	* Get the shared page source.
4921	*/
4922	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
4923	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
4924	VERR_PGM_PHYS_INVALID_PAGE_ID);
4925
4926	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4927	{
4928	/*
4929	* Page was freed at some point; invalidate this entry.
4930	*/
4931	/** @todo this isn't really bullet proof. */
4932	Log(("Old shared page was freed -> create a new one\n"));
4933	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
4934	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4935	}
4936
4937	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4938
4939	/*
4940	* Calculate the virtual address of the local page.
4941	*/
4942	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
4943	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
4944	VERR_PGM_PHYS_INVALID_PAGE_ID);
4945
4946	uint8_t *pbChunk;
4947	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
4948	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
4949	VERR_PGM_PHYS_INVALID_PAGE_ID);
4950	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4951
4952	/*
4953	* Calculate the virtual address of the shared page.
4954	*/
4955	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
4956	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4957
4958	/*
4959	* Get the virtual address of the physical page; map the chunk into the VM
4960	* process if not already done.
4961	*/
4962	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4963	{
4964	Log(("Map chunk into process!\n"));
4965	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
4966	AssertRCReturn(rc, rc);
4967	}
4968	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4969
4970	#ifdef VBOX_STRICT
4971	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
4972	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
4973	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
4974	("%#x vs %#x - idPage=%# - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
4975	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
4976	#endif
4977
4978	/** @todo write ASMMemComparePage. */
4979	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
4980	{
4981	Log(("Unexpected differences found between local and shared page; skip\n"));
4982	/* Signal to the caller that this one hasn't changed. */
4983	pPageDesc->idPage = NIL_GMM_PAGEID;
4984	return VINF_SUCCESS;
4985	}
4986
4987	/*
4988	* Free the old local page.
4989	*/
4990	GMMFREEPAGEDESC PageDesc;
4991	PageDesc.idPage = pPageDesc->idPage;
4992	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
4993	AssertRCReturn(rc, rc);
4994
4995	gmmR0UseSharedPage(pGMM, pGVM, pPage);
4996
4997	/*
4998	* Pass along the new physical address & page id.
4999	*/
5000	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5001	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5002
5003	return VINF_SUCCESS;
5004	}
5005
5006
5007	/**
5008	* RTAvlGCPtrDestroy callback.
5009	*
5010	* @returns 0 or VERR_GMM_INSTANCE.
5011	* @param pNode The node to destroy.
5012	* @param pvArgs Pointer to an argument packet.
5013	*/
5014	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5015	{
5016	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5017	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5018	(PGMMSHAREDMODULEPERVM)pNode,
5019	false /fRemove/);
5020	return VINF_SUCCESS;
5021	}
5022
5023
5024	/**
5025	* Used by GMMR0CleanupVM to clean up shared modules.
5026	*
5027	* This is called without taking the GMM lock so that it can be yielded as
5028	* needed here.
5029	*
5030	* @param pGMM The GMM handle.
5031	* @param pGVM The global VM handle.
5032	*/
5033	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5034	{
5035	gmmR0MutexAcquire(pGMM);
5036	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5037
5038	GMMR0SHMODPERVMDTORARGS Args;
5039	Args.pGVM = pGVM;
5040	Args.pGMM = pGMM;
5041	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5042
5043	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5044	pGVM->gmm.s.Stats.cShareableModules = 0;
5045
5046	gmmR0MutexRelease(pGMM);
5047	}
5048
5049	#endif /* VBOX_WITH_PAGE_SHARING */
5050
5051	/**
5052	* Removes all shared modules for the specified VM
5053	*
5054	* @returns VBox status code.
5055	* @param pVM Pointer to the VM.
5056	* @param idCpu The VCPU id.
5057	*/
5058	GMMR0DECL(int) GMMR0ResetSharedModules(PVM pVM, VMCPUID idCpu)
5059	{
5060	#ifdef VBOX_WITH_PAGE_SHARING
5061	/*
5062	* Validate input and get the basics.
5063	*/
5064	PGMM pGMM;
5065	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5066	PGVM pGVM;
5067	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
5068	if (RT_FAILURE(rc))
5069	return rc;
5070
5071	/*
5072	* Take the semaphore and do some more validations.
5073	*/
5074	gmmR0MutexAcquire(pGMM);
5075	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5076	{
5077	Log(("GMMR0ResetSharedModules\n"));
5078	GMMR0SHMODPERVMDTORARGS Args;
5079	Args.pGVM = pGVM;
5080	Args.pGMM = pGMM;
5081	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5082	pGVM->gmm.s.Stats.cShareableModules = 0;
5083
5084	rc = VINF_SUCCESS;
5085	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5086	}
5087	else
5088	rc = VERR_GMM_IS_NOT_SANE;
5089
5090	gmmR0MutexRelease(pGMM);
5091	return rc;
5092	#else
5093	NOREF(pVM); NOREF(idCpu);
5094	return VERR_NOT_IMPLEMENTED;
5095	#endif
5096	}
5097
5098	#ifdef VBOX_WITH_PAGE_SHARING
5099
5100	/**
5101	* Tree enumeration callback for checking a shared module.
5102	*/
5103	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5104	{
5105	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5106	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5107	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5108
5109	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5110	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5111
5112	int rc = PGMR0SharedModuleCheck(pArgs->pGVM->pVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5113	if (RT_FAILURE(rc))
5114	return rc;
5115	return VINF_SUCCESS;
5116	}
5117
5118	#endif /* VBOX_WITH_PAGE_SHARING */
5119	#ifdef DEBUG_sandervl
5120
5121	/**
5122	* Setup for a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
5123	*
5124	* @returns VBox status code.
5125	* @param pVM Pointer to the VM.
5126	*/
5127	GMMR0DECL(int) GMMR0CheckSharedModulesStart(PVM pVM)
5128	{
5129	/*
5130	* Validate input and get the basics.
5131	*/
5132	PGMM pGMM;
5133	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5134
5135	/*
5136	* Take the semaphore and do some more validations.
5137	*/
5138	gmmR0MutexAcquire(pGMM);
5139	if (!GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5140	rc = VERR_GMM_IS_NOT_SANE;
5141	else
5142	rc = VINF_SUCCESS;
5143
5144	return rc;
5145	}
5146
5147	/**
5148	* Clean up after a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
5149	*
5150	* @returns VBox status code.
5151	* @param pVM Pointer to the VM.
5152	*/
5153	GMMR0DECL(int) GMMR0CheckSharedModulesEnd(PVM pVM)
5154	{
5155	/*
5156	* Validate input and get the basics.
5157	*/
5158	PGMM pGMM;
5159	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5160
5161	gmmR0MutexRelease(pGMM);
5162	return VINF_SUCCESS;
5163	}
5164
5165	#endif /* DEBUG_sandervl */
5166
5167	/**
5168	* Check all shared modules for the specified VM.
5169	*
5170	* @returns VBox status code.
5171	* @param pVM Pointer to the VM.
5172	* @param pVCpu Pointer to the VMCPU.
5173	*/
5174	GMMR0DECL(int) GMMR0CheckSharedModules(PVM pVM, PVMCPU pVCpu)
5175	{
5176	#ifdef VBOX_WITH_PAGE_SHARING
5177	/*
5178	* Validate input and get the basics.
5179	*/
5180	PGMM pGMM;
5181	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5182	PGVM pGVM;
5183	int rc = GVMMR0ByVMAndEMT(pVM, pVCpu->idCpu, &pGVM);
5184	if (RT_FAILURE(rc))
5185	return rc;
5186
5187	# ifndef DEBUG_sandervl
5188	/*
5189	* Take the semaphore and do some more validations.
5190	*/
5191	gmmR0MutexAcquire(pGMM);
5192	# endif
5193	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5194	{
5195	/*
5196	* Walk the tree, checking each module.
5197	*/
5198	Log(("GMMR0CheckSharedModules\n"));
5199
5200	GMMCHECKSHAREDMODULEINFO Args;
5201	Args.pGVM = pGVM;
5202	Args.idCpu = pVCpu->idCpu;
5203	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5204
5205	Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5206	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5207	}
5208	else
5209	rc = VERR_GMM_IS_NOT_SANE;
5210
5211	# ifndef DEBUG_sandervl
5212	gmmR0MutexRelease(pGMM);
5213	# endif
5214	return rc;
5215	#else
5216	NOREF(pVM); NOREF(pVCpu);
5217	return VERR_NOT_IMPLEMENTED;
5218	#endif
5219	}
5220
5221	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5222
5223	/**
5224	* RTAvlU32DoWithAll callback.
5225	*
5226	* @returns 0
5227	* @param pNode The node to search.
5228	* @param pvUser Pointer to the input argument packet.
5229	*/
5230	static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvUser)
5231	{
5232	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
5233	GMMFINDDUPPAGEINFO pArgs = (GMMFINDDUPPAGEINFO )pvUser;
5234	PGVM pGVM = pArgs->pGVM;
5235	PGMM pGMM = pArgs->pGMM;
5236	uint8_t *pbChunk;
5237
5238	/* Only take chunks not mapped into this VM process; not entirely correct. */
5239	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5240	{
5241	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5242	if (RT_SUCCESS(rc))
5243	{
5244	/*
5245	* Look for duplicate pages
5246	*/
5247	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5248	while (iPage-- > 0)
5249	{
5250	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5251	{
5252	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5253
5254	if (!memcmp(pArgs->pSourcePage, pbDestPage, PAGE_SIZE))
5255	{
5256	pArgs->fFoundDuplicate = true;
5257	break;
5258	}
5259	}
5260	}
5261	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5262	}
5263	}
5264	return pArgs->fFoundDuplicate; /* (stops search if true) */
5265	}
5266
5267
5268	/**
5269	* Find a duplicate of the specified page in other active VMs
5270	*
5271	* @returns VBox status code.
5272	* @param pVM Pointer to the VM.
5273	* @param pReq Pointer to the request packet.
5274	*/
5275	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5276	{
5277	/*
5278	* Validate input and pass it on.
5279	*/
5280	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
5281	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5282	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5283
5284	PGMM pGMM;
5285	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5286
5287	PGVM pGVM;
5288	int rc = GVMMR0ByVM(pVM, &pGVM);
5289	if (RT_FAILURE(rc))
5290	return rc;
5291
5292	/*
5293	* Take the semaphore and do some more validations.
5294	*/
5295	rc = gmmR0MutexAcquire(pGMM);
5296	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5297	{
5298	uint8_t *pbChunk;
5299	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5300	if (pChunk)
5301	{
5302	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5303	{
5304	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5305	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5306	if (pPage)
5307	{
5308	GMMFINDDUPPAGEINFO Args;
5309	Args.pGVM = pGVM;
5310	Args.pGMM = pGMM;
5311	Args.pSourcePage = pbSourcePage;
5312	Args.fFoundDuplicate = false;
5313	RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Args);
5314
5315	pReq->fDuplicate = Args.fFoundDuplicate;
5316	}
5317	else
5318	{
5319	AssertFailed();
5320	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5321	}
5322	}
5323	else
5324	AssertFailed();
5325	}
5326	else
5327	AssertFailed();
5328	}
5329	else
5330	rc = VERR_GMM_IS_NOT_SANE;
5331
5332	gmmR0MutexRelease(pGMM);
5333	return rc;
5334	}
5335
5336	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5337
5338
5339	/**
5340	* Retrieves the GMM statistics visible to the caller.
5341	*
5342	* @returns VBox status code.
5343	*
5344	* @param pStats Where to put the statistics.
5345	* @param pSession The current session.
5346	* @param pVM Pointer to the VM to obtain statistics for. Optional.
5347	*/
5348	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5349	{
5350	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pVM=%p\n", pStats, pSession, pVM));
5351
5352	/*
5353	* Validate input.
5354	*/
5355	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5356	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5357	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5358
5359	PGMM pGMM;
5360	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5361
5362	/*
5363	* Resolve the VM handle, if not NULL, and lock the GMM.
5364	*/
5365	int rc;
5366	PGVM pGVM;
5367	if (pVM)
5368	{
5369	rc = GVMMR0ByVM(pVM, &pGVM);
5370	if (RT_FAILURE(rc))
5371	return rc;
5372	}
5373	else
5374	pGVM = NULL;
5375
5376	rc = gmmR0MutexAcquire(pGMM);
5377	if (RT_FAILURE(rc))
5378	return rc;
5379
5380	/*
5381	* Copy out the GMM statistics.
5382	*/
5383	pStats->cMaxPages = pGMM->cMaxPages;
5384	pStats->cReservedPages = pGMM->cReservedPages;
5385	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5386	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5387	pStats->cSharedPages = pGMM->cSharedPages;
5388	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5389	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5390	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5391	pStats->cChunks = pGMM->cChunks;
5392	pStats->cFreedChunks = pGMM->cFreedChunks;
5393	pStats->cShareableModules = pGMM->cShareableModules;
5394	RT_ZERO(pStats->au64Reserved);
5395
5396	/*
5397	* Copy out the VM statistics.
5398	*/
5399	if (pGVM)
5400	pStats->VMStats = pGVM->gmm.s.Stats;
5401	else
5402	RT_ZERO(pStats->VMStats);
5403
5404	gmmR0MutexRelease(pGMM);
5405	return rc;
5406	}
5407
5408
5409	/**
5410	* VMMR0 request wrapper for GMMR0QueryStatistics.
5411	*
5412	* @returns see GMMR0QueryStatistics.
5413	* @param pVM Pointer to the VM. Optional.
5414	* @param pReq Pointer to the request packet.
5415	*/
5416	GMMR0DECL(int) GMMR0QueryStatisticsReq(PVM pVM, PGMMQUERYSTATISTICSSREQ pReq)
5417	{
5418	/*
5419	* Validate input and pass it on.
5420	*/
5421	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5422	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5423
5424	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pVM);
5425	}
5426
5427
5428	/**
5429	* Resets the specified GMM statistics.
5430	*
5431	* @returns VBox status code.
5432	*
5433	* @param pStats Which statistics to reset, that is, non-zero fields
5434	* indicates which to reset.
5435	* @param pSession The current session.
5436	* @param pVM The VM to reset statistics for. Optional.
5437	*/
5438	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5439	{
5440	NOREF(pStats); NOREF(pSession); NOREF(pVM);
5441	/* Currently nothing we can reset at the moment. */
5442	return VINF_SUCCESS;
5443	}
5444
5445
5446	/**
5447	* VMMR0 request wrapper for GMMR0ResetStatistics.
5448	*
5449	* @returns see GMMR0ResetStatistics.
5450	* @param pVM Pointer to the VM. Optional.
5451	* @param pReq Pointer to the request packet.
5452	*/
5453	GMMR0DECL(int) GMMR0ResetStatisticsReq(PVM pVM, PGMMRESETSTATISTICSSREQ pReq)
5454	{
5455	/*
5456	* Validate input and pass it on.
5457	*/
5458	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5459	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5460
5461	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pVM);
5462	}
5463

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 55889

Download in other formats: