GMMR0.cpp@ 95248

Last change on this file since 95248 was 93554, checked in by vboxsync, 3 years ago
VMM: Changed PAGE_SIZE -> GUEST_PAGE_SIZE / HOST_PAGE_SIZE, PAGE_SHIFT -> GUEST_PAGE_SHIFT / HOST_PAGE_SHIFT, and PAGE_OFFSET_MASK -> GUEST_PAGE_OFFSET_MASK / HOST_PAGE_OFFSET_MASK. Also removed most usage of ASMMemIsZeroPage and ASMMemZeroPage since the host and guest page size doesn't need to be the same any more. Some work left to do in the page pool code. bugref:9898
Property svn:eol-style set to `native` Property svn:keywords set to `Id Revision`
File size: 203.6 KB

Line
1	/* $Id: GMMR0.cpp 93554 2022-02-02 22:57:02Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2022 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relationship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / GUEST_PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	* @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111	* under the assumption that there is sufficient kernel virtual address
112	* space to map all of the guest memory allocations. So, we'll be using
113	* #RTR0MemObjAllocPage on some platforms as an alternative to
114	* #RTR0MemObjAllocPhysNC.
115	*
116	*
117	* @subsection sub_gmm_locking Serializing
118	*
119	* One simple fast mutex will be employed in the initial implementation, not
120	* two as mentioned in @ref sec_pgmPhys_Serializing.
121	*
122	* @see @ref sec_pgmPhys_Serializing
123	*
124	*
125	* @section sec_gmm_overcommit Memory Over-Commitment Management
126	*
127	* The GVM will have to do the system wide memory over-commitment
128	* management. My current ideas are:
129	* - Per VM oc policy that indicates how much to initially commit
130	* to it and what to do in a out-of-memory situation.
131	* - Prevent overtaxing the host.
132	*
133	* There are some challenges here, the main ones are configurability and
134	* security. Should we for instance permit anyone to request 100% memory
135	* commitment? Who should be allowed to do runtime adjustments of the
136	* config. And how to prevent these settings from being lost when the last
137	* VM process exits? The solution is probably to have an optional root
138	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
139	*
140	*
141	*
142	* @section sec_gmm_numa NUMA
143	*
144	* NUMA considerations will be designed and implemented a bit later.
145	*
146	* The preliminary guesses is that we will have to try allocate memory as
147	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
148	* threads). Which means it's mostly about allocation and sharing policies.
149	* Both the scheduler and allocator interface will to supply some NUMA info
150	* and we'll need to have a way to calc access costs.
151	*
152	*/
153
154
155	/*********************************************************************************************************************************
156	* Header Files *
157	*********************************************************************************************************************************/
158	#define LOG_GROUP LOG_GROUP_GMM
159	#include <VBox/rawpci.h>
160	#include <VBox/vmm/gmm.h>
161	#include "GMMR0Internal.h"
162	#include <VBox/vmm/vmcc.h>
163	#include <VBox/vmm/pgm.h>
164	#include <VBox/log.h>
165	#include <VBox/param.h>
166	#include <VBox/err.h>
167	#include <VBox/VMMDev.h>
168	#include <iprt/asm.h>
169	#include <iprt/avl.h>
170	#ifdef VBOX_STRICT
171	# include <iprt/crc.h>
172	#endif
173	#include <iprt/critsect.h>
174	#include <iprt/list.h>
175	#include <iprt/mem.h>
176	#include <iprt/memobj.h>
177	#include <iprt/mp.h>
178	#include <iprt/semaphore.h>
179	#include <iprt/spinlock.h>
180	#include <iprt/string.h>
181	#include <iprt/time.h>
182
183	/* This is 64-bit only code now. */
184	#if HC_ARCH_BITS != 64 \|\| ARCH_BITS != 64
185	# error "This is 64-bit only code"
186	#endif
187
188
189	/*********************************************************************************************************************************
190	* Defined Constants And Macros *
191	*********************************************************************************************************************************/
192	/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
193	* Use a critical section instead of a fast mutex for the giant GMM lock.
194	*
195	* @remarks This is primarily a way of avoiding the deadlock checks in the
196	* windows driver verifier. */
197	#if defined(RT_OS_WINDOWS) \|\| defined(RT_OS_DARWIN) \|\| defined(DOXYGEN_RUNNING)
198	# define VBOX_USE_CRIT_SECT_FOR_GIANT
199	#endif
200
201
202	/*********************************************************************************************************************************
203	* Structures and Typedefs *
204	*********************************************************************************************************************************/
205	/** Pointer to set of free chunks. */
206	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208	/**
209	* The per-page tracking structure employed by the GMM.
210	*
211	* Because of the different layout on 32-bit and 64-bit hosts in earlier
212	* versions of the code, macros are used to get and set some of the data.
213	*/
214	typedef union GMMPAGE
215	{
216	/** Unsigned integer view. */
217	uint64_t u;
218
219	/** The common view. */
220	struct GMMPAGECOMMON
221	{
222	uint32_t uStuff1 : 32;
223	uint32_t uStuff2 : 30;
224	/** The page state. */
225	uint32_t u2State : 2;
226	} Common;
227
228	/** The view of a private page. */
229	struct GMMPAGEPRIVATE
230	{
231	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
232	uint32_t pfn;
233	/** The GVM handle. (64K VMs) */
234	uint32_t hGVM : 16;
235	/** Reserved. */
236	uint32_t u16Reserved : 14;
237	/** The page state. */
238	uint32_t u2State : 2;
239	} Private;
240
241	/** The view of a shared page. */
242	struct GMMPAGESHARED
243	{
244	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
245	uint32_t pfn;
246	/** The reference count (64K VMs). */
247	uint32_t cRefs : 16;
248	/** Used for debug checksumming. */
249	uint32_t u14Checksum : 14;
250	/** The page state. */
251	uint32_t u2State : 2;
252	} Shared;
253
254	/** The view of a free page. */
255	struct GMMPAGEFREE
256	{
257	/** The index of the next page in the free list. UINT16_MAX is NIL. */
258	uint16_t iNext;
259	/** Reserved. Checksum or something? */
260	uint16_t u16Reserved0;
261	/** Reserved. Checksum or something? */
262	uint32_t u30Reserved1 : 29;
263	/** Set if the page was zeroed. */
264	uint32_t fZeroed : 1;
265	/** The page state. */
266	uint32_t u2State : 2;
267	} Free;
268	} GMMPAGE;
269	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
270	/** Pointer to a GMMPAGE. */
271	typedef GMMPAGE *PGMMPAGE;
272
273
274	/** @name The Page States.
275	* @{ */
276	/** A private page. */
277	#define GMM_PAGE_STATE_PRIVATE 0
278	/** A shared page. */
279	#define GMM_PAGE_STATE_SHARED 2
280	/** A free page. */
281	#define GMM_PAGE_STATE_FREE 3
282	/** @} */
283
284
285	/** @def GMM_PAGE_IS_PRIVATE
286	*
287	* @returns true if private, false if not.
288	* @param pPage The GMM page.
289	*/
290	#define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
291
292	/** @def GMM_PAGE_IS_SHARED
293	*
294	* @returns true if shared, false if not.
295	* @param pPage The GMM page.
296	*/
297	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
298
299	/** @def GMM_PAGE_IS_FREE
300	*
301	* @returns true if free, false if not.
302	* @param pPage The GMM page.
303	*/
304	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
305
306	/** @def GMM_PAGE_PFN_LAST
307	* The last valid guest pfn range.
308	* @remark Some of the values outside the range has special meaning,
309	* see GMM_PAGE_PFN_UNSHAREABLE.
310	*/
311	#define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
312	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> GUEST_PAGE_SHIFT));
313
314	/** @def GMM_PAGE_PFN_UNSHAREABLE
315	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
316	*/
317	#define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
318	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> GUEST_PAGE_SHIFT));
319
320
321	/**
322	* A GMM allocation chunk ring-3 mapping record.
323	*
324	* This should really be associated with a session and not a VM, but
325	* it's simpler to associated with a VM and cleanup with the VM object
326	* is destroyed.
327	*/
328	typedef struct GMMCHUNKMAP
329	{
330	/** The mapping object. */
331	RTR0MEMOBJ hMapObj;
332	/** The VM owning the mapping. */
333	PGVM pGVM;
334	} GMMCHUNKMAP;
335	/** Pointer to a GMM allocation chunk mapping. */
336	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
337
338
339	/**
340	* A GMM allocation chunk.
341	*/
342	typedef struct GMMCHUNK
343	{
344	/** The AVL node core.
345	* The Key is the chunk ID. (Giant mtx.) */
346	AVLU32NODECORE Core;
347	/** The memory object.
348	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
349	* what the host can dish up with. (Chunk mtx protects mapping accesses
350	* and related frees.) */
351	RTR0MEMOBJ hMemObj;
352	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
353	/** Pointer to the kernel mapping. */
354	uint8_t *pbMapping;
355	#endif
356	/** Pointer to the next chunk in the free list. (Giant mtx.) */
357	PGMMCHUNK pFreeNext;
358	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
359	PGMMCHUNK pFreePrev;
360	/** Pointer to the free set this chunk belongs to. NULL for
361	* chunks with no free pages. (Giant mtx.) */
362	PGMMCHUNKFREESET pSet;
363	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
364	RTLISTNODE ListNode;
365	/** Pointer to an array of mappings. (Chunk mtx.) */
366	PGMMCHUNKMAP paMappingsX;
367	/** The number of mappings. (Chunk mtx.) */
368	uint16_t cMappingsX;
369	/** The mapping lock this chunk is using using. UINT8_MAX if nobody is mapping
370	* or freeing anything. (Giant mtx.) */
371	uint8_t volatile iChunkMtx;
372	/** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
373	uint8_t fFlags;
374	/** The head of the list of free pages. UINT16_MAX is the NIL value.
375	* (Giant mtx.) */
376	uint16_t iFreeHead;
377	/** The number of free pages. (Giant mtx.) */
378	uint16_t cFree;
379	/** The GVM handle of the VM that first allocated pages from this chunk, this
380	* is used as a preference when there are several chunks to choose from.
381	* When in bound memory mode this isn't a preference any longer. (Giant
382	* mtx.) */
383	uint16_t hGVM;
384	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
385	* future use.) (Giant mtx.) */
386	uint16_t idNumaNode;
387	/** The number of private pages. (Giant mtx.) */
388	uint16_t cPrivate;
389	/** The number of shared pages. (Giant mtx.) */
390	uint16_t cShared;
391	/** The UID this chunk is associated with. */
392	RTUID uidOwner;
393	uint32_t u32Padding;
394	/** The pages. (Giant mtx.) */
395	GMMPAGE aPages[GMM_CHUNK_NUM_PAGES];
396	} GMMCHUNK;
397
398	/** Indicates that the NUMA properies of the memory is unknown. */
399	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
400
401	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
402	* @{ */
403	/** Indicates that the chunk is a large page (2MB). */
404	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
405	/** @} */
406
407
408	/**
409	* An allocation chunk TLB entry.
410	*/
411	typedef struct GMMCHUNKTLBE
412	{
413	/** The chunk id. */
414	uint32_t idChunk;
415	/** Pointer to the chunk. */
416	PGMMCHUNK pChunk;
417	} GMMCHUNKTLBE;
418	/** Pointer to an allocation chunk TLB entry. */
419	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
420
421
422	/** The number of entries in the allocation chunk TLB. */
423	#define GMM_CHUNKTLB_ENTRIES 32
424	/** Gets the TLB entry index for the given Chunk ID. */
425	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
426
427	/**
428	* An allocation chunk TLB.
429	*/
430	typedef struct GMMCHUNKTLB
431	{
432	/** The TLB entries. */
433	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
434	} GMMCHUNKTLB;
435	/** Pointer to an allocation chunk TLB. */
436	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
437
438
439	/**
440	* The GMM instance data.
441	*/
442	typedef struct GMM
443	{
444	/** Magic / eye catcher. GMM_MAGIC */
445	uint32_t u32Magic;
446	/** The number of threads waiting on the mutex. */
447	uint32_t cMtxContenders;
448	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
449	/** The critical section protecting the GMM.
450	* More fine grained locking can be implemented later if necessary. */
451	RTCRITSECT GiantCritSect;
452	#else
453	/** The fast mutex protecting the GMM.
454	* More fine grained locking can be implemented later if necessary. */
455	RTSEMFASTMUTEX hMtx;
456	#endif
457	#ifdef VBOX_STRICT
458	/** The current mutex owner. */
459	RTNATIVETHREAD hMtxOwner;
460	#endif
461	/** Spinlock protecting the AVL tree.
462	* @todo Make this a read-write spinlock as we should allow concurrent
463	* lookups. */
464	RTSPINLOCK hSpinLockTree;
465	/** The chunk tree.
466	* Protected by hSpinLockTree. */
467	PAVLU32NODECORE pChunks;
468	/** Chunk freeing generation - incremented whenever a chunk is freed. Used
469	* for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
470	* (exclusive), though higher numbers may temporarily occure while
471	* invalidating the individual TLBs during wrap-around processing. */
472	uint64_t volatile idFreeGeneration;
473	/** The chunk TLB.
474	* Protected by hSpinLockTree. */
475	GMMCHUNKTLB ChunkTLB;
476	/** The private free set. */
477	GMMCHUNKFREESET PrivateX;
478	/** The shared free set. */
479	GMMCHUNKFREESET Shared;
480
481	/** Shared module tree (global).
482	* @todo separate trees for distinctly different guest OSes. */
483	PAVLLU32NODECORE pGlobalSharedModuleTree;
484	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
485	uint32_t cShareableModules;
486
487	/** The chunk list. For simplifying the cleanup process and avoid tree
488	* traversal. */
489	RTLISTANCHOR ChunkList;
490
491	/** The maximum number of pages we're allowed to allocate.
492	* @gcfgm{GMM/MaxPages,64-bit, Direct.}
493	* @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
494	uint64_t cMaxPages;
495	/** The number of pages that has been reserved.
496	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
497	uint64_t cReservedPages;
498	/** The number of pages that we have over-committed in reservations. */
499	uint64_t cOverCommittedPages;
500	/** The number of actually allocated (committed if you like) pages. */
501	uint64_t cAllocatedPages;
502	/** The number of pages that are shared. A subset of cAllocatedPages. */
503	uint64_t cSharedPages;
504	/** The number of pages that are actually shared between VMs. */
505	uint64_t cDuplicatePages;
506	/** The number of pages that are shared that has been left behind by
507	* VMs not doing proper cleanups. */
508	uint64_t cLeftBehindSharedPages;
509	/** The number of allocation chunks.
510	* (The number of pages we've allocated from the host can be derived from this.) */
511	uint32_t cChunks;
512	/** The number of current ballooned pages. */
513	uint64_t cBalloonedPages;
514
515	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
516	/** Whether #RTR0MemObjAllocPhysNC works. */
517	bool fHasWorkingAllocPhysNC;
518	#else
519	bool fPadding;
520	#endif
521	/** The bound memory mode indicator.
522	* When set, the memory will be bound to a specific VM and never
523	* shared. This is always set if fLegacyAllocationMode is set.
524	* (Also determined at initialization time.) */
525	bool fBoundMemoryMode;
526	/** The number of registered VMs. */
527	uint16_t cRegisteredVMs;
528
529	/** The index of the next mutex to use. */
530	uint32_t iNextChunkMtx;
531	/** Chunk locks for reducing lock contention without having to allocate
532	* one lock per chunk. */
533	struct
534	{
535	/** The mutex */
536	RTSEMFASTMUTEX hMtx;
537	/** The number of threads currently using this mutex. */
538	uint32_t volatile cUsers;
539	} aChunkMtx[64];
540
541	/** The number of freed chunks ever. This is used as list generation to
542	* avoid restarting the cleanup scanning when the list wasn't modified. */
543	uint32_t volatile cFreedChunks;
544	/** The previous allocated Chunk ID.
545	* Used as a hint to avoid scanning the whole bitmap. */
546	uint32_t idChunkPrev;
547	/** Spinlock protecting idChunkPrev & bmChunkId. */
548	RTSPINLOCK hSpinLockChunkId;
549	/** Chunk ID allocation bitmap.
550	* Bits of allocated IDs are set, free ones are clear.
551	* The NIL id (0) is marked allocated. */
552	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
553	} GMM;
554	/** Pointer to the GMM instance. */
555	typedef GMM *PGMM;
556
557	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
558	#define GMM_MAGIC UINT32_C(0x19540414)
559
560
561	/**
562	* GMM chunk mutex state.
563	*
564	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
565	* gmmR0ChunkMutex* methods.
566	*/
567	typedef struct GMMR0CHUNKMTXSTATE
568	{
569	PGMM pGMM;
570	/** The index of the chunk mutex. */
571	uint8_t iChunkMtx;
572	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
573	uint8_t fFlags;
574	} GMMR0CHUNKMTXSTATE;
575	/** Pointer to a chunk mutex state. */
576	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
577
578	/** @name GMMR0CHUNK_MTX_XXX
579	* @{ */
580	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
581	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
582	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
583	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
584	#define GMMR0CHUNK_MTX_END UINT32_C(4)
585	/** @} */
586
587
588	/** The maximum number of shared modules per-vm. */
589	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
590	/** The maximum number of shared modules GMM is allowed to track. */
591	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
592
593
594	/**
595	* Argument packet for gmmR0SharedModuleCleanup.
596	*/
597	typedef struct GMMR0SHMODPERVMDTORARGS
598	{
599	PGVM pGVM;
600	PGMM pGMM;
601	} GMMR0SHMODPERVMDTORARGS;
602
603	/**
604	* Argument packet for gmmR0CheckSharedModule.
605	*/
606	typedef struct GMMCHECKSHAREDMODULEINFO
607	{
608	PGVM pGVM;
609	VMCPUID idCpu;
610	} GMMCHECKSHAREDMODULEINFO;
611
612
613	/*********************************************************************************************************************************
614	* Global Variables *
615	*********************************************************************************************************************************/
616	/** Pointer to the GMM instance data. */
617	static PGMM g_pGMM = NULL;
618
619	/** Macro for obtaining and validating the g_pGMM pointer.
620	*
621	* On failure it will return from the invoking function with the specified
622	* return value.
623	*
624	* @param pGMM The name of the pGMM variable.
625	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
626	* status codes.
627	*/
628	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
629	do { \
630	(pGMM) = g_pGMM; \
631	AssertPtrReturn((pGMM), (rc)); \
632	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
633	} while (0)
634
635	/** Macro for obtaining and validating the g_pGMM pointer, void function
636	* variant.
637	*
638	* On failure it will return from the invoking function.
639	*
640	* @param pGMM The name of the pGMM variable.
641	*/
642	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
643	do { \
644	(pGMM) = g_pGMM; \
645	AssertPtrReturnVoid((pGMM)); \
646	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
647	} while (0)
648
649
650	/** @def GMM_CHECK_SANITY_UPON_ENTERING
651	* Checks the sanity of the GMM instance data before making changes.
652	*
653	* This is macro is a stub by default and must be enabled manually in the code.
654	*
655	* @returns true if sane, false if not.
656	* @param pGMM The name of the pGMM variable.
657	*/
658	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
659	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (RT_LIKELY(gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0))
660	#else
661	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
662	#endif
663
664	/** @def GMM_CHECK_SANITY_UPON_LEAVING
665	* Checks the sanity of the GMM instance data after making changes.
666	*
667	* This is macro is a stub by default and must be enabled manually in the code.
668	*
669	* @returns true if sane, false if not.
670	* @param pGMM The name of the pGMM variable.
671	*/
672	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
673	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
674	#else
675	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
676	#endif
677
678	/** @def GMM_CHECK_SANITY_IN_LOOPS
679	* Checks the sanity of the GMM instance in the allocation loops.
680	*
681	* This is macro is a stub by default and must be enabled manually in the code.
682	*
683	* @returns true if sane, false if not.
684	* @param pGMM The name of the pGMM variable.
685	*/
686	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
687	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
688	#else
689	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
690	#endif
691
692
693	/*********************************************************************************************************************************
694	* Internal Functions *
695	*********************************************************************************************************************************/
696	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
697	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
698	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
699	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
700	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
701	#ifdef GMMR0_WITH_SANITY_CHECK
702	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
703	#endif
704	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
705	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
706	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
707	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
708	#ifdef VBOX_WITH_PAGE_SHARING
709	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
710	# ifdef VBOX_STRICT
711	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
712	# endif
713	#endif
714
715
716
717	/**
718	* Initializes the GMM component.
719	*
720	* This is called when the VMMR0.r0 module is loaded and protected by the
721	* loader semaphore.
722	*
723	* @returns VBox status code.
724	*/
725	GMMR0DECL(int) GMMR0Init(void)
726	{
727	LogFlow(("GMMInit:\n"));
728
729	/* Currently assuming same host and guest page size here. Can change it to
730	dish out guest pages with different size from the host page later if
731	needed, though a restriction would be the host page size must be larger
732	than the guest page size. */
733	AssertCompile(GUEST_PAGE_SIZE == HOST_PAGE_SIZE);
734	AssertCompile(GUEST_PAGE_SIZE <= HOST_PAGE_SIZE);
735
736	/*
737	* Allocate the instance data and the locks.
738	*/
739	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
740	if (!pGMM)
741	return VERR_NO_MEMORY;
742
743	pGMM->u32Magic = GMM_MAGIC;
744	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
745	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
746	RTListInit(&pGMM->ChunkList);
747	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
748
749	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
750	int rc = RTCritSectInit(&pGMM->GiantCritSect);
751	#else
752	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
753	#endif
754	if (RT_SUCCESS(rc))
755	{
756	unsigned iMtx;
757	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
758	{
759	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
760	if (RT_FAILURE(rc))
761	break;
762	}
763	pGMM->hSpinLockTree = NIL_RTSPINLOCK;
764	if (RT_SUCCESS(rc))
765	rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
766	pGMM->hSpinLockChunkId = NIL_RTSPINLOCK;
767	if (RT_SUCCESS(rc))
768	rc = RTSpinlockCreate(&pGMM->hSpinLockChunkId, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-id");
769	if (RT_SUCCESS(rc))
770	{
771	/*
772	* Figure out how we're going to allocate stuff (only applicable to
773	* host with linear physical memory mappings).
774	*/
775	pGMM->fBoundMemoryMode = false;
776	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
777	pGMM->fHasWorkingAllocPhysNC = false;
778
779	RTR0MEMOBJ hMemObj;
780	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
781	if (RT_SUCCESS(rc))
782	{
783	rc = RTR0MemObjFree(hMemObj, true);
784	AssertRC(rc);
785	pGMM->fHasWorkingAllocPhysNC = true;
786	}
787	else if (rc != VERR_NOT_SUPPORTED)
788	SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
789	# endif
790
791	/*
792	* Query system page count and guess a reasonable cMaxPages value.
793	*/
794	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
795
796	/*
797	* The idFreeGeneration value should be set so we actually trigger the
798	* wrap-around invalidation handling during a typical test run.
799	*/
800	pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
801
802	g_pGMM = pGMM;
803	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
804	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
805	#else
806	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
807	#endif
808	return VINF_SUCCESS;
809	}
810
811	/*
812	* Bail out.
813	*/
814	RTSpinlockDestroy(pGMM->hSpinLockChunkId);
815	RTSpinlockDestroy(pGMM->hSpinLockTree);
816	while (iMtx-- > 0)
817	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
818	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
819	RTCritSectDelete(&pGMM->GiantCritSect);
820	#else
821	RTSemFastMutexDestroy(pGMM->hMtx);
822	#endif
823	}
824
825	pGMM->u32Magic = 0;
826	RTMemFree(pGMM);
827	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
828	return rc;
829	}
830
831
832	/**
833	* Terminates the GMM component.
834	*/
835	GMMR0DECL(void) GMMR0Term(void)
836	{
837	LogFlow(("GMMTerm:\n"));
838
839	/*
840	* Take care / be paranoid...
841	*/
842	PGMM pGMM = g_pGMM;
843	if (!RT_VALID_PTR(pGMM))
844	return;
845	if (pGMM->u32Magic != GMM_MAGIC)
846	{
847	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
848	return;
849	}
850
851	/*
852	* Undo what init did and free all the resources we've acquired.
853	*/
854	/* Destroy the fundamentals. */
855	g_pGMM = NULL;
856	pGMM->u32Magic = ~GMM_MAGIC;
857	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
858	RTCritSectDelete(&pGMM->GiantCritSect);
859	#else
860	RTSemFastMutexDestroy(pGMM->hMtx);
861	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
862	#endif
863	RTSpinlockDestroy(pGMM->hSpinLockTree);
864	pGMM->hSpinLockTree = NIL_RTSPINLOCK;
865	RTSpinlockDestroy(pGMM->hSpinLockChunkId);
866	pGMM->hSpinLockChunkId = NIL_RTSPINLOCK;
867
868	/* Free any chunks still hanging around. */
869	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
870
871	/* Destroy the chunk locks. */
872	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
873	{
874	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
875	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
876	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
877	}
878
879	/* Finally the instance data itself. */
880	RTMemFree(pGMM);
881	LogFlow(("GMMTerm: done\n"));
882	}
883
884
885	/**
886	* RTAvlU32Destroy callback.
887	*
888	* @returns 0
889	* @param pNode The node to destroy.
890	* @param pvGMM The GMM handle.
891	*/
892	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
893	{
894	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
895
896	if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
897	SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
898	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
899
900	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
901	if (RT_FAILURE(rc))
902	{
903	SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
904	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
905	AssertRC(rc);
906	}
907	pChunk->hMemObj = NIL_RTR0MEMOBJ;
908
909	RTMemFree(pChunk->paMappingsX);
910	pChunk->paMappingsX = NULL;
911
912	RTMemFree(pChunk);
913	NOREF(pvGMM);
914	return 0;
915	}
916
917
918	/**
919	* Initializes the per-VM data for the GMM.
920	*
921	* This is called from within the GVMM lock (from GVMMR0CreateVM)
922	* and should only initialize the data members so GMMR0CleanupVM
923	* can deal with them. We reserve no memory or anything here,
924	* that's done later in GMMR0InitVM.
925	*
926	* @param pGVM Pointer to the Global VM structure.
927	*/
928	GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
929	{
930	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
931
932	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
933	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
934	pGVM->gmm.s.Stats.fMayAllocate = false;
935
936	pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
937	int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
938	AssertRCReturn(rc, rc);
939
940	return VINF_SUCCESS;
941	}
942
943
944	/**
945	* Acquires the GMM giant lock.
946	*
947	* @returns Assert status code from RTSemFastMutexRequest.
948	* @param pGMM Pointer to the GMM instance.
949	*/
950	static int gmmR0MutexAcquire(PGMM pGMM)
951	{
952	ASMAtomicIncU32(&pGMM->cMtxContenders);
953	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
954	int rc = RTCritSectEnter(&pGMM->GiantCritSect);
955	#else
956	int rc = RTSemFastMutexRequest(pGMM->hMtx);
957	#endif
958	ASMAtomicDecU32(&pGMM->cMtxContenders);
959	AssertRC(rc);
960	#ifdef VBOX_STRICT
961	pGMM->hMtxOwner = RTThreadNativeSelf();
962	#endif
963	return rc;
964	}
965
966
967	/**
968	* Releases the GMM giant lock.
969	*
970	* @returns Assert status code from RTSemFastMutexRequest.
971	* @param pGMM Pointer to the GMM instance.
972	*/
973	static int gmmR0MutexRelease(PGMM pGMM)
974	{
975	#ifdef VBOX_STRICT
976	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
977	#endif
978	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
979	int rc = RTCritSectLeave(&pGMM->GiantCritSect);
980	#else
981	int rc = RTSemFastMutexRelease(pGMM->hMtx);
982	AssertRC(rc);
983	#endif
984	return rc;
985	}
986
987
988	/**
989	* Yields the GMM giant lock if there is contention and a certain minimum time
990	* has elapsed since we took it.
991	*
992	* @returns @c true if the mutex was yielded, @c false if not.
993	* @param pGMM Pointer to the GMM instance.
994	* @param puLockNanoTS Where the lock acquisition time stamp is kept
995	* (in/out).
996	*/
997	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
998	{
999	/*
1000	* If nobody is contending the mutex, don't bother checking the time.
1001	*/
1002	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1003	return false;
1004
1005	/*
1006	* Don't yield if we haven't executed for at least 2 milliseconds.
1007	*/
1008	uint64_t uNanoNow = RTTimeSystemNanoTS();
1009	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1010	return false;
1011
1012	/*
1013	* Yield the mutex.
1014	*/
1015	#ifdef VBOX_STRICT
1016	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1017	#endif
1018	ASMAtomicIncU32(&pGMM->cMtxContenders);
1019	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1020	int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1021	#else
1022	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1023	#endif
1024
1025	RTThreadYield();
1026
1027	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1028	int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1029	#else
1030	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1031	#endif
1032	*puLockNanoTS = RTTimeSystemNanoTS();
1033	ASMAtomicDecU32(&pGMM->cMtxContenders);
1034	#ifdef VBOX_STRICT
1035	pGMM->hMtxOwner = RTThreadNativeSelf();
1036	#endif
1037
1038	return true;
1039	}
1040
1041
1042	/**
1043	* Acquires a chunk lock.
1044	*
1045	* The caller must own the giant lock.
1046	*
1047	* @returns Assert status code from RTSemFastMutexRequest.
1048	* @param pMtxState The chunk mutex state info. (Avoids
1049	* passing the same flags and stuff around
1050	* for subsequent release and drop-giant
1051	* calls.)
1052	* @param pGMM Pointer to the GMM instance.
1053	* @param pChunk Pointer to the chunk.
1054	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1055	*/
1056	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1057	{
1058	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1059	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1060
1061	pMtxState->pGMM = pGMM;
1062	pMtxState->fFlags = (uint8_t)fFlags;
1063
1064	/*
1065	* Get the lock index and reference the lock.
1066	*/
1067	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1068	uint32_t iChunkMtx = pChunk->iChunkMtx;
1069	if (iChunkMtx == UINT8_MAX)
1070	{
1071	iChunkMtx = pGMM->iNextChunkMtx++;
1072	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1073
1074	/* Try get an unused one... */
1075	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1076	{
1077	iChunkMtx = pGMM->iNextChunkMtx++;
1078	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1079	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1080	{
1081	iChunkMtx = pGMM->iNextChunkMtx++;
1082	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1083	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1084	{
1085	iChunkMtx = pGMM->iNextChunkMtx++;
1086	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1087	}
1088	}
1089	}
1090
1091	pChunk->iChunkMtx = iChunkMtx;
1092	}
1093	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1094	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1095	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1096
1097	/*
1098	* Drop the giant?
1099	*/
1100	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1101	{
1102	/** @todo GMM life cycle cleanup (we may race someone
1103	* destroying and cleaning up GMM)? */
1104	gmmR0MutexRelease(pGMM);
1105	}
1106
1107	/*
1108	* Take the chunk mutex.
1109	*/
1110	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1111	AssertRC(rc);
1112	return rc;
1113	}
1114
1115
1116	/**
1117	* Releases the GMM giant lock.
1118	*
1119	* @returns Assert status code from RTSemFastMutexRequest.
1120	* @param pMtxState Pointer to the chunk mutex state.
1121	* @param pChunk Pointer to the chunk if it's still
1122	* alive, NULL if it isn't. This is used to deassociate
1123	* the chunk from the mutex on the way out so a new one
1124	* can be selected next time, thus avoiding contented
1125	* mutexes.
1126	*/
1127	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1128	{
1129	PGMM pGMM = pMtxState->pGMM;
1130
1131	/*
1132	* Release the chunk mutex and reacquire the giant if requested.
1133	*/
1134	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1135	AssertRC(rc);
1136	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1137	rc = gmmR0MutexAcquire(pGMM);
1138	else
1139	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1140
1141	/*
1142	* Drop the chunk mutex user reference and deassociate it from the chunk
1143	* when possible.
1144	*/
1145	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1146	&& pChunk
1147	&& RT_SUCCESS(rc) )
1148	{
1149	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1150	pChunk->iChunkMtx = UINT8_MAX;
1151	else
1152	{
1153	rc = gmmR0MutexAcquire(pGMM);
1154	if (RT_SUCCESS(rc))
1155	{
1156	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1157	pChunk->iChunkMtx = UINT8_MAX;
1158	rc = gmmR0MutexRelease(pGMM);
1159	}
1160	}
1161	}
1162
1163	pMtxState->pGMM = NULL;
1164	return rc;
1165	}
1166
1167
1168	/**
1169	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1170	* chunk locked.
1171	*
1172	* This only works if gmmR0ChunkMutexAcquire was called with
1173	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1174	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1175	*
1176	* @returns VBox status code (assuming success is ok).
1177	* @param pMtxState Pointer to the chunk mutex state.
1178	*/
1179	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1180	{
1181	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1182	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1183	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1184	/** @todo GMM life cycle cleanup (we may race someone
1185	* destroying and cleaning up GMM)? */
1186	return gmmR0MutexRelease(pMtxState->pGMM);
1187	}
1188
1189
1190	/**
1191	* For experimenting with NUMA affinity and such.
1192	*
1193	* @returns The current NUMA Node ID.
1194	*/
1195	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1196	{
1197	#if 1
1198	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1199	#else
1200	return RTMpCpuId() / 16;
1201	#endif
1202	}
1203
1204
1205
1206	/**
1207	* Cleans up when a VM is terminating.
1208	*
1209	* @param pGVM Pointer to the Global VM structure.
1210	*/
1211	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1212	{
1213	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1214
1215	PGMM pGMM;
1216	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1217
1218	#ifdef VBOX_WITH_PAGE_SHARING
1219	/*
1220	* Clean up all registered shared modules first.
1221	*/
1222	gmmR0SharedModuleCleanup(pGMM, pGVM);
1223	#endif
1224
1225	gmmR0MutexAcquire(pGMM);
1226	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1227	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1228
1229	/*
1230	* The policy is 'INVALID' until the initial reservation
1231	* request has been serviced.
1232	*/
1233	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1234	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1235	{
1236	/*
1237	* If it's the last VM around, we can skip walking all the chunk looking
1238	* for the pages owned by this VM and instead flush the whole shebang.
1239	*
1240	* This takes care of the eventuality that a VM has left shared page
1241	* references behind (shouldn't happen of course, but you never know).
1242	*/
1243	Assert(pGMM->cRegisteredVMs);
1244	pGMM->cRegisteredVMs--;
1245
1246	/*
1247	* Walk the entire pool looking for pages that belong to this VM
1248	* and leftover mappings. (This'll only catch private pages,
1249	* shared pages will be 'left behind'.)
1250	*/
1251	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1252	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1253
1254	unsigned iCountDown = 64;
1255	bool fRedoFromStart;
1256	PGMMCHUNK pChunk;
1257	do
1258	{
1259	fRedoFromStart = false;
1260	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1261	{
1262	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1263	if ( ( !pGMM->fBoundMemoryMode
1264	\|\| pChunk->hGVM == pGVM->hSelf)
1265	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1266	{
1267	/* We left the giant mutex, so reset the yield counters. */
1268	uLockNanoTS = RTTimeSystemNanoTS();
1269	iCountDown = 64;
1270	}
1271	else
1272	{
1273	/* Didn't leave it, so do normal yielding. */
1274	if (!iCountDown)
1275	gmmR0MutexYield(pGMM, &uLockNanoTS);
1276	else
1277	iCountDown--;
1278	}
1279	if (pGMM->cFreedChunks != cFreeChunksOld)
1280	{
1281	fRedoFromStart = true;
1282	break;
1283	}
1284	}
1285	} while (fRedoFromStart);
1286
1287	if (pGVM->gmm.s.Stats.cPrivatePages)
1288	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1289
1290	pGMM->cAllocatedPages -= cPrivatePages;
1291
1292	/*
1293	* Free empty chunks.
1294	*/
1295	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1296	do
1297	{
1298	fRedoFromStart = false;
1299	iCountDown = 10240;
1300	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1301	while (pChunk)
1302	{
1303	PGMMCHUNK pNext = pChunk->pFreeNext;
1304	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1305	if ( !pGMM->fBoundMemoryMode
1306	\|\| pChunk->hGVM == pGVM->hSelf)
1307	{
1308	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1309	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1310	{
1311	/* We've left the giant mutex, restart? (+1 for our unlink) */
1312	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1313	if (fRedoFromStart)
1314	break;
1315	uLockNanoTS = RTTimeSystemNanoTS();
1316	iCountDown = 10240;
1317	}
1318	}
1319
1320	/* Advance and maybe yield the lock. */
1321	pChunk = pNext;
1322	if (--iCountDown == 0)
1323	{
1324	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1325	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1326	&& pPrivateSet->idGeneration != idGenerationOld;
1327	if (fRedoFromStart)
1328	break;
1329	iCountDown = 10240;
1330	}
1331	}
1332	} while (fRedoFromStart);
1333
1334	/*
1335	* Account for shared pages that weren't freed.
1336	*/
1337	if (pGVM->gmm.s.Stats.cSharedPages)
1338	{
1339	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1340	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1341	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1342	}
1343
1344	/*
1345	* Clean up balloon statistics in case the VM process crashed.
1346	*/
1347	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1348	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1349
1350	/*
1351	* Update the over-commitment management statistics.
1352	*/
1353	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1354	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1355	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1356	switch (pGVM->gmm.s.Stats.enmPolicy)
1357	{
1358	case GMMOCPOLICY_NO_OC:
1359	break;
1360	default:
1361	/** @todo Update GMM->cOverCommittedPages */
1362	break;
1363	}
1364	}
1365
1366	/* zap the GVM data. */
1367	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1368	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1369	pGVM->gmm.s.Stats.fMayAllocate = false;
1370
1371	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1372	gmmR0MutexRelease(pGMM);
1373
1374	/*
1375	* Destroy the spinlock.
1376	*/
1377	RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1378	ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1379	RTSpinlockDestroy(hSpinlock);
1380
1381	LogFlow(("GMMR0CleanupVM: returns\n"));
1382	}
1383
1384
1385	/**
1386	* Scan one chunk for private pages belonging to the specified VM.
1387	*
1388	* @note This function may drop the giant mutex!
1389	*
1390	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1391	* we didn't.
1392	* @param pGMM Pointer to the GMM instance.
1393	* @param pGVM The global VM handle.
1394	* @param pChunk The chunk to scan.
1395	*/
1396	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1397	{
1398	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1399
1400	/*
1401	* Look for pages belonging to the VM.
1402	* (Perform some internal checks while we're scanning.)
1403	*/
1404	#ifndef VBOX_STRICT
1405	if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1406	#endif
1407	{
1408	unsigned cPrivate = 0;
1409	unsigned cShared = 0;
1410	unsigned cFree = 0;
1411
1412	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1413
1414	uint16_t hGVM = pGVM->hSelf;
1415	unsigned iPage = (GMM_CHUNK_SIZE >> GUEST_PAGE_SHIFT);
1416	while (iPage-- > 0)
1417	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1418	{
1419	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1420	{
1421	/*
1422	* Free the page.
1423	*
1424	* The reason for not using gmmR0FreePrivatePage here is that we
1425	* must not cause the chunk to be freed from under us - we're in
1426	* an AVL tree walk here.
1427	*/
1428	pChunk->aPages[iPage].u = 0;
1429	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1430	pChunk->aPages[iPage].Free.fZeroed = false;
1431	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1432	pChunk->iFreeHead = iPage;
1433	pChunk->cPrivate--;
1434	pChunk->cFree++;
1435	pGVM->gmm.s.Stats.cPrivatePages--;
1436	cFree++;
1437	}
1438	else
1439	cPrivate++;
1440	}
1441	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1442	cFree++;
1443	else
1444	cShared++;
1445
1446	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1447
1448	/*
1449	* Did it add up?
1450	*/
1451	if (RT_UNLIKELY( pChunk->cFree != cFree
1452	\|\| pChunk->cPrivate != cPrivate
1453	\|\| pChunk->cShared != cShared))
1454	{
1455	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1456	pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1457	pChunk->cFree = cFree;
1458	pChunk->cPrivate = cPrivate;
1459	pChunk->cShared = cShared;
1460	}
1461	}
1462
1463	/*
1464	* If not in bound memory mode, we should reset the hGVM field
1465	* if it has our handle in it.
1466	*/
1467	if (pChunk->hGVM == pGVM->hSelf)
1468	{
1469	if (!g_pGMM->fBoundMemoryMode)
1470	pChunk->hGVM = NIL_GVM_HANDLE;
1471	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1472	{
1473	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1474	pChunk, pChunk->Core.Key, pChunk->cFree);
1475	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1476
1477	gmmR0UnlinkChunk(pChunk);
1478	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1479	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1480	}
1481	}
1482
1483	/*
1484	* Look for a mapping belonging to the terminating VM.
1485	*/
1486	GMMR0CHUNKMTXSTATE MtxState;
1487	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1488	unsigned cMappings = pChunk->cMappingsX;
1489	for (unsigned i = 0; i < cMappings; i++)
1490	if (pChunk->paMappingsX[i].pGVM == pGVM)
1491	{
1492	gmmR0ChunkMutexDropGiant(&MtxState);
1493
1494	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1495
1496	cMappings--;
1497	if (i < cMappings)
1498	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1499	pChunk->paMappingsX[cMappings].pGVM = NULL;
1500	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1501	Assert(pChunk->cMappingsX - 1U == cMappings);
1502	pChunk->cMappingsX = cMappings;
1503
1504	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1505	if (RT_FAILURE(rc))
1506	{
1507	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1508	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1509	AssertRC(rc);
1510	}
1511
1512	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1513	return true;
1514	}
1515
1516	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1517	return false;
1518	}
1519
1520
1521	/**
1522	* The initial resource reservations.
1523	*
1524	* This will make memory reservations according to policy and priority. If there aren't
1525	* sufficient resources available to sustain the VM this function will fail and all
1526	* future allocations requests will fail as well.
1527	*
1528	* These are just the initial reservations made very very early during the VM creation
1529	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1530	* ring-3 init has completed.
1531	*
1532	* @returns VBox status code.
1533	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1534	* @retval VERR_GMM_
1535	*
1536	* @param pGVM The global (ring-0) VM structure.
1537	* @param idCpu The VCPU id - must be zero.
1538	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1539	* This does not include MMIO2 and similar.
1540	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1541	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1542	* hyper heap, MMIO2 and similar.
1543	* @param enmPolicy The OC policy to use on this VM.
1544	* @param enmPriority The priority in an out-of-memory situation.
1545	*
1546	* @thread The creator thread / EMT(0).
1547	*/
1548	GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1549	uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1550	{
1551	LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1552	pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1553
1554	/*
1555	* Validate, get basics and take the semaphore.
1556	*/
1557	AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1558	PGMM pGMM;
1559	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1560	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1561	if (RT_FAILURE(rc))
1562	return rc;
1563
1564	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1565	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1566	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1567	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1568	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1569
1570	gmmR0MutexAcquire(pGMM);
1571	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1572	{
1573	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1574	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1575	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1576	{
1577	/*
1578	* Check if we can accommodate this.
1579	*/
1580	/* ... later ... */
1581	if (RT_SUCCESS(rc))
1582	{
1583	/*
1584	* Update the records.
1585	*/
1586	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1587	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1588	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1589	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1590	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1591	pGVM->gmm.s.Stats.fMayAllocate = true;
1592
1593	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1594	pGMM->cRegisteredVMs++;
1595	}
1596	}
1597	else
1598	rc = VERR_WRONG_ORDER;
1599	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1600	}
1601	else
1602	rc = VERR_GMM_IS_NOT_SANE;
1603	gmmR0MutexRelease(pGMM);
1604	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1605	return rc;
1606	}
1607
1608
1609	/**
1610	* VMMR0 request wrapper for GMMR0InitialReservation.
1611	*
1612	* @returns see GMMR0InitialReservation.
1613	* @param pGVM The global (ring-0) VM structure.
1614	* @param idCpu The VCPU id.
1615	* @param pReq Pointer to the request packet.
1616	*/
1617	GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1618	{
1619	/*
1620	* Validate input and pass it on.
1621	*/
1622	AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1623	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1624	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1625
1626	return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1627	pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1628	}
1629
1630
1631	/**
1632	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1633	*
1634	* @returns VBox status code.
1635	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1636	*
1637	* @param pGVM The global (ring-0) VM structure.
1638	* @param idCpu The VCPU id.
1639	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1640	* This does not include MMIO2 and similar.
1641	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1642	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1643	* hyper heap, MMIO2 and similar.
1644	*
1645	* @thread EMT(idCpu)
1646	*/
1647	GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1648	uint32_t cShadowPages, uint32_t cFixedPages)
1649	{
1650	LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1651	pGVM, cBasePages, cShadowPages, cFixedPages));
1652
1653	/*
1654	* Validate, get basics and take the semaphore.
1655	*/
1656	PGMM pGMM;
1657	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1658	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1659	if (RT_FAILURE(rc))
1660	return rc;
1661
1662	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1663	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1664	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1665
1666	gmmR0MutexAcquire(pGMM);
1667	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1668	{
1669	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1670	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1671	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1672	{
1673	/*
1674	* Check if we can accommodate this.
1675	*/
1676	/* ... later ... */
1677	if (RT_SUCCESS(rc))
1678	{
1679	/*
1680	* Update the records.
1681	*/
1682	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1683	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1684	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1685	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1686
1687	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1688	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1689	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1690	}
1691	}
1692	else
1693	rc = VERR_WRONG_ORDER;
1694	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1695	}
1696	else
1697	rc = VERR_GMM_IS_NOT_SANE;
1698	gmmR0MutexRelease(pGMM);
1699	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1700	return rc;
1701	}
1702
1703
1704	/**
1705	* VMMR0 request wrapper for GMMR0UpdateReservation.
1706	*
1707	* @returns see GMMR0UpdateReservation.
1708	* @param pGVM The global (ring-0) VM structure.
1709	* @param idCpu The VCPU id.
1710	* @param pReq Pointer to the request packet.
1711	*/
1712	GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1713	{
1714	/*
1715	* Validate input and pass it on.
1716	*/
1717	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1718	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1719
1720	return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1721	}
1722
1723	#ifdef GMMR0_WITH_SANITY_CHECK
1724
1725	/**
1726	* Performs sanity checks on a free set.
1727	*
1728	* @returns Error count.
1729	*
1730	* @param pGMM Pointer to the GMM instance.
1731	* @param pSet Pointer to the set.
1732	* @param pszSetName The set name.
1733	* @param pszFunction The function from which it was called.
1734	* @param uLine The line number.
1735	*/
1736	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1737	const char *pszFunction, unsigned uLineNo)
1738	{
1739	uint32_t cErrors = 0;
1740
1741	/*
1742	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1743	*/
1744	uint32_t cPages = 0;
1745	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1746	{
1747	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1748	{
1749	/** @todo check that the chunk is hash into the right set. */
1750	cPages += pCur->cFree;
1751	}
1752	}
1753	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1754	{
1755	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1756	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1757	cErrors++;
1758	}
1759
1760	return cErrors;
1761	}
1762
1763
1764	/**
1765	* Performs some sanity checks on the GMM while owning lock.
1766	*
1767	* @returns Error count.
1768	*
1769	* @param pGMM Pointer to the GMM instance.
1770	* @param pszFunction The function from which it is called.
1771	* @param uLineNo The line number.
1772	*/
1773	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1774	{
1775	uint32_t cErrors = 0;
1776
1777	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1778	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1779	/** @todo add more sanity checks. */
1780
1781	return cErrors;
1782	}
1783
1784	#endif /* GMMR0_WITH_SANITY_CHECK */
1785
1786	/**
1787	* Looks up a chunk in the tree and fill in the TLB entry for it.
1788	*
1789	* This is not expected to fail and will bitch if it does.
1790	*
1791	* @returns Pointer to the allocation chunk, NULL if not found.
1792	* @param pGMM Pointer to the GMM instance.
1793	* @param idChunk The ID of the chunk to find.
1794	* @param pTlbe Pointer to the TLB entry.
1795	*
1796	* @note Caller owns spinlock.
1797	*/
1798	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1799	{
1800	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1801	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1802	pTlbe->idChunk = idChunk;
1803	pTlbe->pChunk = pChunk;
1804	return pChunk;
1805	}
1806
1807
1808	/**
1809	* Finds a allocation chunk, spin-locked.
1810	*
1811	* This is not expected to fail and will bitch if it does.
1812	*
1813	* @returns Pointer to the allocation chunk, NULL if not found.
1814	* @param pGMM Pointer to the GMM instance.
1815	* @param idChunk The ID of the chunk to find.
1816	*/
1817	DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1818	{
1819	/*
1820	* Do a TLB lookup, branch if not in the TLB.
1821	*/
1822	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1823	PGMMCHUNK pChunk = pTlbe->pChunk;
1824	if ( pChunk == NULL
1825	\|\| pTlbe->idChunk != idChunk)
1826	pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1827	return pChunk;
1828	}
1829
1830
1831	/**
1832	* Finds a allocation chunk.
1833	*
1834	* This is not expected to fail and will bitch if it does.
1835	*
1836	* @returns Pointer to the allocation chunk, NULL if not found.
1837	* @param pGMM Pointer to the GMM instance.
1838	* @param idChunk The ID of the chunk to find.
1839	*/
1840	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1841	{
1842	RTSpinlockAcquire(pGMM->hSpinLockTree);
1843	PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1844	RTSpinlockRelease(pGMM->hSpinLockTree);
1845	return pChunk;
1846	}
1847
1848
1849	/**
1850	* Finds a page.
1851	*
1852	* This is not expected to fail and will bitch if it does.
1853	*
1854	* @returns Pointer to the page, NULL if not found.
1855	* @param pGMM Pointer to the GMM instance.
1856	* @param idPage The ID of the page to find.
1857	*/
1858	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1859	{
1860	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1861	if (RT_LIKELY(pChunk))
1862	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1863	return NULL;
1864	}
1865
1866
1867	#if 0 /* unused */
1868	/**
1869	* Gets the host physical address for a page given by it's ID.
1870	*
1871	* @returns The host physical address or NIL_RTHCPHYS.
1872	* @param pGMM Pointer to the GMM instance.
1873	* @param idPage The ID of the page to find.
1874	*/
1875	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1876	{
1877	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1878	if (RT_LIKELY(pChunk))
1879	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1880	return NIL_RTHCPHYS;
1881	}
1882	#endif /* unused */
1883
1884
1885	/**
1886	* Selects the appropriate free list given the number of free pages.
1887	*
1888	* @returns Free list index.
1889	* @param cFree The number of free pages in the chunk.
1890	*/
1891	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1892	{
1893	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1894	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1895	("%d (%u)\n", iList, cFree));
1896	return iList;
1897	}
1898
1899
1900	/**
1901	* Unlinks the chunk from the free list it's currently on (if any).
1902	*
1903	* @param pChunk The allocation chunk.
1904	*/
1905	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1906	{
1907	PGMMCHUNKFREESET pSet = pChunk->pSet;
1908	if (RT_LIKELY(pSet))
1909	{
1910	pSet->cFreePages -= pChunk->cFree;
1911	pSet->idGeneration++;
1912
1913	PGMMCHUNK pPrev = pChunk->pFreePrev;
1914	PGMMCHUNK pNext = pChunk->pFreeNext;
1915	if (pPrev)
1916	pPrev->pFreeNext = pNext;
1917	else
1918	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1919	if (pNext)
1920	pNext->pFreePrev = pPrev;
1921
1922	pChunk->pSet = NULL;
1923	pChunk->pFreeNext = NULL;
1924	pChunk->pFreePrev = NULL;
1925	}
1926	else
1927	{
1928	Assert(!pChunk->pFreeNext);
1929	Assert(!pChunk->pFreePrev);
1930	Assert(!pChunk->cFree);
1931	}
1932	}
1933
1934
1935	/**
1936	* Links the chunk onto the appropriate free list in the specified free set.
1937	*
1938	* If no free entries, it's not linked into any list.
1939	*
1940	* @param pChunk The allocation chunk.
1941	* @param pSet The free set.
1942	*/
1943	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1944	{
1945	Assert(!pChunk->pSet);
1946	Assert(!pChunk->pFreeNext);
1947	Assert(!pChunk->pFreePrev);
1948
1949	if (pChunk->cFree > 0)
1950	{
1951	pChunk->pSet = pSet;
1952	pChunk->pFreePrev = NULL;
1953	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1954	pChunk->pFreeNext = pSet->apLists[iList];
1955	if (pChunk->pFreeNext)
1956	pChunk->pFreeNext->pFreePrev = pChunk;
1957	pSet->apLists[iList] = pChunk;
1958
1959	pSet->cFreePages += pChunk->cFree;
1960	pSet->idGeneration++;
1961	}
1962	}
1963
1964
1965	/**
1966	* Links the chunk onto the appropriate free list in the specified free set.
1967	*
1968	* If no free entries, it's not linked into any list.
1969	*
1970	* @param pGMM Pointer to the GMM instance.
1971	* @param pGVM Pointer to the kernel-only VM instace data.
1972	* @param pChunk The allocation chunk.
1973	*/
1974	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1975	{
1976	PGMMCHUNKFREESET pSet;
1977	if (pGMM->fBoundMemoryMode)
1978	pSet = &pGVM->gmm.s.Private;
1979	else if (pChunk->cShared)
1980	pSet = &pGMM->Shared;
1981	else
1982	pSet = &pGMM->PrivateX;
1983	gmmR0LinkChunk(pChunk, pSet);
1984	}
1985
1986
1987	/**
1988	* Frees a Chunk ID.
1989	*
1990	* @param pGMM Pointer to the GMM instance.
1991	* @param idChunk The Chunk ID to free.
1992	*/
1993	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1994	{
1995	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1996	RTSpinlockAcquire(pGMM->hSpinLockChunkId); /* We could probably skip the locking here, I think. */
1997
1998	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1999	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2000
2001	RTSpinlockRelease(pGMM->hSpinLockChunkId);
2002	}
2003
2004
2005	/**
2006	* Allocates a new Chunk ID.
2007	*
2008	* @returns The Chunk ID.
2009	* @param pGMM Pointer to the GMM instance.
2010	*/
2011	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2012	{
2013	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2014	AssertCompile(NIL_GMM_CHUNKID == 0);
2015
2016	RTSpinlockAcquire(pGMM->hSpinLockChunkId);
2017
2018	/*
2019	* Try the next sequential one.
2020	*/
2021	int32_t idChunk = ++pGMM->idChunkPrev;
2022	if ( (uint32_t)idChunk <= GMM_CHUNKID_LAST
2023	&& idChunk > NIL_GMM_CHUNKID)
2024	{
2025	if (!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk))
2026	{
2027	RTSpinlockRelease(pGMM->hSpinLockChunkId);
2028	return idChunk;
2029	}
2030
2031	/*
2032	* Scan sequentially from the last one.
2033	*/
2034	if ((uint32_t)idChunk < GMM_CHUNKID_LAST)
2035	{
2036	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk);
2037	if ( idChunk > NIL_GMM_CHUNKID
2038	&& (uint32_t)idChunk <= GMM_CHUNKID_LAST)
2039	{
2040	AssertMsgReturnStmt(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk),
2041	RTSpinlockRelease(pGMM->hSpinLockChunkId), NIL_GMM_CHUNKID);
2042
2043	pGMM->idChunkPrev = idChunk;
2044	RTSpinlockRelease(pGMM->hSpinLockChunkId);
2045	return idChunk;
2046	}
2047	}
2048	}
2049
2050	/*
2051	* Ok, scan from the start.
2052	* We're not racing anyone, so there is no need to expect failures or have restart loops.
2053	*/
2054	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2055	AssertMsgReturnStmt(idChunk > NIL_GMM_CHUNKID && (uint32_t)idChunk <= GMM_CHUNKID_LAST, ("%#x\n", idChunk),
2056	RTSpinlockRelease(pGMM->hSpinLockChunkId), NIL_GVM_HANDLE);
2057	AssertMsgReturnStmt(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk),
2058	RTSpinlockRelease(pGMM->hSpinLockChunkId), NIL_GMM_CHUNKID);
2059
2060	pGMM->idChunkPrev = idChunk;
2061	RTSpinlockRelease(pGMM->hSpinLockChunkId);
2062	return idChunk;
2063	}
2064
2065
2066	/**
2067	* Allocates one private page.
2068	*
2069	* Worker for gmmR0AllocatePages.
2070	*
2071	* @param pChunk The chunk to allocate it from.
2072	* @param hGVM The GVM handle of the VM requesting memory.
2073	* @param pPageDesc The page descriptor.
2074	*/
2075	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2076	{
2077	/* update the chunk stats. */
2078	if (pChunk->hGVM == NIL_GVM_HANDLE)
2079	pChunk->hGVM = hGVM;
2080	Assert(pChunk->cFree);
2081	pChunk->cFree--;
2082	pChunk->cPrivate++;
2083
2084	/* unlink the first free page. */
2085	const uint32_t iPage = pChunk->iFreeHead;
2086	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2087	PGMMPAGE pPage = &pChunk->aPages[iPage];
2088	Assert(GMM_PAGE_IS_FREE(pPage));
2089	pChunk->iFreeHead = pPage->Free.iNext;
2090	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2091	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2092	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2093
2094	bool const fZeroed = pPage->Free.fZeroed;
2095
2096	/* make the page private. */
2097	pPage->u = 0;
2098	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2099	pPage->Private.hGVM = hGVM;
2100	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2101	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2102	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2103	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> GUEST_PAGE_SHIFT;
2104	else
2105	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2106
2107	/* update the page descriptor. */
2108	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2109	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2110	RTHCPHYS const HCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2111	Assert(HCPhys != NIL_RTHCPHYS); Assert(HCPhys < NIL_GMMPAGEDESC_PHYS);
2112	pPageDesc->HCPhysGCPhys = HCPhys;
2113	pPageDesc->fZeroed = fZeroed;
2114	}
2115
2116
2117	/**
2118	* Picks the free pages from a chunk.
2119	*
2120	* @returns The new page descriptor table index.
2121	* @param pChunk The chunk.
2122	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2123	* affinity.
2124	* @param iPage The current page descriptor table index.
2125	* @param cPages The total number of pages to allocate.
2126	* @param paPages The page descriptor table (input + ouput).
2127	*/
2128	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2129	PGMMPAGEDESC paPages)
2130	{
2131	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2132	gmmR0UnlinkChunk(pChunk);
2133
2134	for (; pChunk->cFree && iPage < cPages; iPage++)
2135	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2136
2137	gmmR0LinkChunk(pChunk, pSet);
2138	return iPage;
2139	}
2140
2141
2142	/**
2143	* Registers a new chunk of memory.
2144	*
2145	* This is called by gmmR0AllocateOneChunk and GMMR0AllocateLargePage.
2146	*
2147	* In the GMMR0AllocateLargePage case the GMM_CHUNK_FLAGS_LARGE_PAGE flag is
2148	* set and the chunk will be registered as fully allocated to save time.
2149	*
2150	* @returns VBox status code. On success, the giant GMM lock will be held, the
2151	* caller must release it (ugly).
2152	* @param pGMM Pointer to the GMM instance.
2153	* @param pSet Pointer to the set.
2154	* @param hMemObj The memory object for the chunk.
2155	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2156	* affinity.
2157	* @param pSession Same as @a hGVM.
2158	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2159	* @param cPages The number of pages requested. Zero for large pages.
2160	* @param paPages The page descriptor table (input + output). NULL for
2161	* large pages.
2162	* @param piPage The pointer to the page descriptor table index variable.
2163	* This will be updated. NULL for large pages.
2164	* @param ppChunk Chunk address (out).
2165	*
2166	* @remarks The caller must not own the giant GMM mutex.
2167	* The giant GMM mutex will be acquired and returned acquired in
2168	* the success path. On failure, no locks will be held.
2169	*/
2170	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, PSUPDRVSESSION pSession,
2171	uint16_t fChunkFlags, uint32_t cPages, PGMMPAGEDESC paPages, uint32_t piPage, PGMMCHUNK ppChunk)
2172	{
2173	/*
2174	* Validate input & state.
2175	*/
2176	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2177	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2178	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2179	if (!(fChunkFlags &= GMM_CHUNK_FLAGS_LARGE_PAGE))
2180	{
2181	AssertPtr(paPages);
2182	AssertPtr(piPage);
2183	Assert(cPages > 0);
2184	Assert(cPages > *piPage);
2185	}
2186	else
2187	{
2188	Assert(cPages == 0);
2189	Assert(!paPages);
2190	Assert(!piPage);
2191	}
2192
2193	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2194	/*
2195	* Get a ring-0 mapping of the object.
2196	*/
2197	uint8_t pbMapping = (uint8_t )RTR0MemObjAddress(hMemObj);
2198	if (!pbMapping)
2199	{
2200	RTR0MEMOBJ hMapObj;
2201	int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE);
2202	if (RT_SUCCESS(rc))
2203	pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2204	else
2205	return rc;
2206	AssertPtr(pbMapping);
2207	}
2208	#endif
2209
2210	/*
2211	* Allocate a chunk and an ID for it.
2212	*/
2213	int rc;
2214	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2215	if (pChunk)
2216	{
2217	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2218	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2219	&& pChunk->Core.Key <= GMM_CHUNKID_LAST)
2220	{
2221	/*
2222	* Initialize it.
2223	*/
2224	pChunk->hMemObj = hMemObj;
2225	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2226	pChunk->pbMapping = pbMapping;
2227	#endif
2228	pChunk->hGVM = hGVM;
2229	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2230	pChunk->iChunkMtx = UINT8_MAX;
2231	pChunk->fFlags = fChunkFlags;
2232	pChunk->uidOwner = pSession ? SUPR0GetSessionUid(pSession) : NIL_RTUID;
2233	/pChunk->cShared = 0; /
2234
2235	uint32_t const iDstPageFirst = piPage ? *piPage : cPages;
2236	if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2237	{
2238	/*
2239	* Allocate the requested number of pages from the start of the chunk,
2240	* queue the rest (if any) on the free list.
2241	*/
2242	uint32_t const cPagesAlloc = RT_MIN(cPages - iDstPageFirst, GMM_CHUNK_NUM_PAGES);
2243	pChunk->cPrivate = cPagesAlloc;
2244	pChunk->cFree = GMM_CHUNK_NUM_PAGES - cPagesAlloc;
2245	pChunk->iFreeHead = GMM_CHUNK_NUM_PAGES > cPagesAlloc ? cPagesAlloc : UINT16_MAX;
2246
2247	/* Alloc pages: */
2248	uint32_t const idPageChunk = pChunk->Core.Key << GMM_CHUNKID_SHIFT;
2249	uint32_t iDstPage = iDstPageFirst;
2250	uint32_t iPage;
2251	for (iPage = 0; iPage < cPagesAlloc; iPage++, iDstPage++)
2252	{
2253	if (paPages[iDstPage].HCPhysGCPhys <= GMM_GCPHYS_LAST)
2254	pChunk->aPages[iPage].Private.pfn = paPages[iDstPage].HCPhysGCPhys >> GUEST_PAGE_SHIFT;
2255	else
2256	pChunk->aPages[iPage].Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2257	pChunk->aPages[iPage].Private.hGVM = hGVM;
2258	pChunk->aPages[iPage].Private.u2State = GMM_PAGE_STATE_PRIVATE;
2259
2260	paPages[iDstPage].HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(hMemObj, iPage);
2261	paPages[iDstPage].fZeroed = true;
2262	paPages[iDstPage].idPage = idPageChunk \| iPage;
2263	paPages[iDstPage].idSharedPage = NIL_GMM_PAGEID;
2264	}
2265	*piPage = iDstPage;
2266
2267	/* Build free list: */
2268	if (iPage < RT_ELEMENTS(pChunk->aPages))
2269	{
2270	Assert(pChunk->iFreeHead == iPage);
2271	for (; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2272	{
2273	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2274	pChunk->aPages[iPage].Free.fZeroed = true;
2275	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2276	}
2277	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2278	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.fZeroed = true;
2279	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2280	}
2281	else
2282	Assert(pChunk->iFreeHead == UINT16_MAX);
2283	}
2284	else
2285	{
2286	/*
2287	* Large page: Mark all pages as privately allocated (watered down gmmR0AllocatePage).
2288	*/
2289	pChunk->cFree = 0;
2290	pChunk->cPrivate = GMM_CHUNK_NUM_PAGES;
2291	pChunk->iFreeHead = UINT16_MAX;
2292
2293	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages); iPage++)
2294	{
2295	pChunk->aPages[iPage].Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2296	pChunk->aPages[iPage].Private.hGVM = hGVM;
2297	pChunk->aPages[iPage].Private.u2State = GMM_PAGE_STATE_PRIVATE;
2298	}
2299	}
2300
2301	/*
2302	* Zero the memory if it wasn't zeroed by the host already.
2303	* This simplifies keeping secret kernel bits from userland and brings
2304	* everyone to the same level wrt allocation zeroing.
2305	*/
2306	rc = VINF_SUCCESS;
2307	if (!RTR0MemObjWasZeroInitialized(hMemObj))
2308	{
2309	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2310	if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2311	{
2312	for (uint32_t iPage = 0; iPage < GMM_CHUNK_SIZE / HOST_PAGE_SIZE; iPage++)
2313	{
2314	void *pvPage = NULL;
2315	rc = SUPR0HCPhysToVirt(RTR0MemObjGetPagePhysAddr(hMemObj, iPage), &pvPage);
2316	AssertRCBreak(rc);
2317	RT_BZERO(pvPage, HOST_PAGE_SIZE);
2318	}
2319	}
2320	else
2321	{
2322	/* Can do the whole large page in one go. */
2323	void *pvPage = NULL;
2324	rc = SUPR0HCPhysToVirt(RTR0MemObjGetPagePhysAddr(hMemObj, 0), &pvPage);
2325	AssertRC(rc);
2326	if (RT_SUCCESS(rc))
2327	RT_BZERO(pvPage, GMM_CHUNK_SIZE);
2328	}
2329	#else
2330	RT_BZERO(pbMapping, GMM_CHUNK_SIZE);
2331	#endif
2332	}
2333	if (RT_SUCCESS(rc))
2334	{
2335	*ppChunk = pChunk;
2336
2337	/*
2338	* Allocate a Chunk ID and insert it into the tree.
2339	* This has to be done behind the mutex of course.
2340	*/
2341	rc = gmmR0MutexAcquire(pGMM);
2342	if (RT_SUCCESS(rc))
2343	{
2344	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2345	{
2346	RTSpinlockAcquire(pGMM->hSpinLockTree);
2347	if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2348	{
2349	pGMM->cChunks++;
2350	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2351	RTSpinlockRelease(pGMM->hSpinLockTree);
2352
2353	gmmR0LinkChunk(pChunk, pSet);
2354
2355	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2356	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2357	return VINF_SUCCESS;
2358	}
2359
2360	/*
2361	* Bail out.
2362	*/
2363	RTSpinlockRelease(pGMM->hSpinLockTree);
2364	rc = VERR_GMM_CHUNK_INSERT;
2365	}
2366	else
2367	rc = VERR_GMM_IS_NOT_SANE;
2368	gmmR0MutexRelease(pGMM);
2369	}
2370	*ppChunk = NULL;
2371	}
2372
2373	/* Undo any page allocations. */
2374	if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2375	{
2376	uint32_t const cToFree = pChunk->cPrivate;
2377	Assert(*piPage - iDstPageFirst == cToFree);
2378	for (uint32_t iDstPage = iDstPageFirst, iPage = 0; iPage < cToFree; iPage++, iDstPage++)
2379	{
2380	paPages[iDstPageFirst].fZeroed = false;
2381	if (pChunk->aPages[iPage].Private.pfn == GMM_PAGE_PFN_UNSHAREABLE)
2382	paPages[iDstPageFirst].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2383	else
2384	paPages[iDstPageFirst].HCPhysGCPhys = (RTHCPHYS)pChunk->aPages[iPage].Private.pfn << GUEST_PAGE_SHIFT;
2385	paPages[iDstPageFirst].idPage = NIL_GMM_PAGEID;
2386	paPages[iDstPageFirst].idSharedPage = NIL_GMM_PAGEID;
2387	}
2388	*piPage = iDstPageFirst;
2389	}
2390
2391	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
2392	}
2393	else
2394	rc = VERR_GMM_CHUNK_INSERT;
2395	RTMemFree(pChunk);
2396	}
2397	else
2398	rc = VERR_NO_MEMORY;
2399	return rc;
2400	}
2401
2402
2403	/**
2404	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2405	* what's remaining to the specified free set.
2406	*
2407	* @note This will leave the giant mutex while allocating the new chunk!
2408	*
2409	* @returns VBox status code.
2410	* @param pGMM Pointer to the GMM instance data.
2411	* @param pGVM Pointer to the kernel-only VM instace data.
2412	* @param pSet Pointer to the free set.
2413	* @param cPages The number of pages requested.
2414	* @param paPages The page descriptor table (input + output).
2415	* @param piPage The pointer to the page descriptor table index variable.
2416	* This will be updated.
2417	*/
2418	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2419	PGMMPAGEDESC paPages, uint32_t *piPage)
2420	{
2421	gmmR0MutexRelease(pGMM);
2422
2423	RTR0MEMOBJ hMemObj;
2424	int rc;
2425	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2426	if (pGMM->fHasWorkingAllocPhysNC)
2427	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2428	else
2429	#endif
2430	rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /fExecutable/);
2431	if (RT_SUCCESS(rc))
2432	{
2433	PGMMCHUNK pIgnored;
2434	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, 0 /fChunkFlags/,
2435	cPages, paPages, piPage, &pIgnored);
2436	if (RT_SUCCESS(rc))
2437	return VINF_SUCCESS;
2438
2439	/* bail out */
2440	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2441	}
2442
2443	int rc2 = gmmR0MutexAcquire(pGMM);
2444	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2445	return rc;
2446
2447	}
2448
2449
2450	/**
2451	* As a last restort we'll pick any page we can get.
2452	*
2453	* @returns The new page descriptor table index.
2454	* @param pSet The set to pick from.
2455	* @param pGVM Pointer to the global VM structure.
2456	* @param uidSelf The UID of the caller.
2457	* @param iPage The current page descriptor table index.
2458	* @param cPages The total number of pages to allocate.
2459	* @param paPages The page descriptor table (input + ouput).
2460	*/
2461	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2462	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2463	{
2464	unsigned iList = RT_ELEMENTS(pSet->apLists);
2465	while (iList-- > 0)
2466	{
2467	PGMMCHUNK pChunk = pSet->apLists[iList];
2468	while (pChunk)
2469	{
2470	PGMMCHUNK pNext = pChunk->pFreeNext;
2471	if ( pChunk->uidOwner == uidSelf
2472	\|\| ( pChunk->cMappingsX == 0
2473	&& pChunk->cFree == (GMM_CHUNK_SIZE >> GUEST_PAGE_SHIFT)))
2474	{
2475	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2476	if (iPage >= cPages)
2477	return iPage;
2478	}
2479
2480	pChunk = pNext;
2481	}
2482	}
2483	return iPage;
2484	}
2485
2486
2487	/**
2488	* Pick pages from empty chunks on the same NUMA node.
2489	*
2490	* @returns The new page descriptor table index.
2491	* @param pSet The set to pick from.
2492	* @param pGVM Pointer to the global VM structure.
2493	* @param uidSelf The UID of the caller.
2494	* @param iPage The current page descriptor table index.
2495	* @param cPages The total number of pages to allocate.
2496	* @param paPages The page descriptor table (input + ouput).
2497	*/
2498	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2499	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2500	{
2501	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2502	if (pChunk)
2503	{
2504	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2505	while (pChunk)
2506	{
2507	PGMMCHUNK pNext = pChunk->pFreeNext;
2508
2509	if ( pChunk->idNumaNode == idNumaNode
2510	&& ( pChunk->uidOwner == uidSelf
2511	\|\| pChunk->cMappingsX == 0))
2512	{
2513	pChunk->hGVM = pGVM->hSelf;
2514	pChunk->uidOwner = uidSelf;
2515	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2516	if (iPage >= cPages)
2517	{
2518	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2519	return iPage;
2520	}
2521	}
2522
2523	pChunk = pNext;
2524	}
2525	}
2526	return iPage;
2527	}
2528
2529
2530	/**
2531	* Pick pages from non-empty chunks on the same NUMA node.
2532	*
2533	* @returns The new page descriptor table index.
2534	* @param pSet The set to pick from.
2535	* @param pGVM Pointer to the global VM structure.
2536	* @param uidSelf The UID of the caller.
2537	* @param iPage The current page descriptor table index.
2538	* @param cPages The total number of pages to allocate.
2539	* @param paPages The page descriptor table (input + ouput).
2540	*/
2541	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID const uidSelf,
2542	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2543	{
2544	/** @todo start by picking from chunks with about the right size first? */
2545	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2546	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2547	while (iList-- > 0)
2548	{
2549	PGMMCHUNK pChunk = pSet->apLists[iList];
2550	while (pChunk)
2551	{
2552	PGMMCHUNK pNext = pChunk->pFreeNext;
2553
2554	if ( pChunk->idNumaNode == idNumaNode
2555	&& pChunk->uidOwner == uidSelf)
2556	{
2557	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2558	if (iPage >= cPages)
2559	{
2560	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2561	return iPage;
2562	}
2563	}
2564
2565	pChunk = pNext;
2566	}
2567	}
2568	return iPage;
2569	}
2570
2571
2572	/**
2573	* Pick pages that are in chunks already associated with the VM.
2574	*
2575	* @returns The new page descriptor table index.
2576	* @param pGMM Pointer to the GMM instance data.
2577	* @param pGVM Pointer to the global VM structure.
2578	* @param pSet The set to pick from.
2579	* @param iPage The current page descriptor table index.
2580	* @param cPages The total number of pages to allocate.
2581	* @param paPages The page descriptor table (input + ouput).
2582	*/
2583	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2584	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2585	{
2586	uint16_t const hGVM = pGVM->hSelf;
2587
2588	/* Hint. */
2589	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2590	{
2591	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2592	if (pChunk && pChunk->cFree)
2593	{
2594	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2595	if (iPage >= cPages)
2596	return iPage;
2597	}
2598	}
2599
2600	/* Scan. */
2601	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2602	{
2603	PGMMCHUNK pChunk = pSet->apLists[iList];
2604	while (pChunk)
2605	{
2606	PGMMCHUNK pNext = pChunk->pFreeNext;
2607
2608	if (pChunk->hGVM == hGVM)
2609	{
2610	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2611	if (iPage >= cPages)
2612	{
2613	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2614	return iPage;
2615	}
2616	}
2617
2618	pChunk = pNext;
2619	}
2620	}
2621	return iPage;
2622	}
2623
2624
2625
2626	/**
2627	* Pick pages in bound memory mode.
2628	*
2629	* @returns The new page descriptor table index.
2630	* @param pGVM Pointer to the global VM structure.
2631	* @param iPage The current page descriptor table index.
2632	* @param cPages The total number of pages to allocate.
2633	* @param paPages The page descriptor table (input + ouput).
2634	*/
2635	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2636	{
2637	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2638	{
2639	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2640	while (pChunk)
2641	{
2642	Assert(pChunk->hGVM == pGVM->hSelf);
2643	PGMMCHUNK pNext = pChunk->pFreeNext;
2644	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2645	if (iPage >= cPages)
2646	return iPage;
2647	pChunk = pNext;
2648	}
2649	}
2650	return iPage;
2651	}
2652
2653
2654	/**
2655	* Checks if we should start picking pages from chunks of other VMs because
2656	* we're getting close to the system memory or reserved limit.
2657	*
2658	* @returns @c true if we should, @c false if we should first try allocate more
2659	* chunks.
2660	*/
2661	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2662	{
2663	/*
2664	* Don't allocate a new chunk if we're
2665	*/
2666	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2667	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2668	- pGVM->gmm.s.Stats.cBalloonedPages
2669	/** @todo what about shared pages? */;
2670	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2671	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2672	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2673	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2674	return true;
2675	/** @todo make the threshold configurable, also test the code to see if
2676	* this ever kicks in (we might be reserving too much or smth). */
2677
2678	/*
2679	* Check how close we're to the max memory limit and how many fragments
2680	* there are?...
2681	*/
2682	/** @todo */
2683
2684	return false;
2685	}
2686
2687
2688	/**
2689	* Checks if we should start picking pages from chunks of other VMs because
2690	* there is a lot of free pages around.
2691	*
2692	* @returns @c true if we should, @c false if we should first try allocate more
2693	* chunks.
2694	*/
2695	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2696	{
2697	/*
2698	* Setting the limit at 16 chunks (32 MB) at the moment.
2699	*/
2700	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2701	return true;
2702	return false;
2703	}
2704
2705
2706	/**
2707	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2708	*
2709	* @returns VBox status code:
2710	* @retval VINF_SUCCESS on success.
2711	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2712	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2713	* that is we're trying to allocate more than we've reserved.
2714	*
2715	* @param pGMM Pointer to the GMM instance data.
2716	* @param pGVM Pointer to the VM.
2717	* @param cPages The number of pages to allocate.
2718	* @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2719	* details on what is expected on input.
2720	* @param enmAccount The account to charge.
2721	*
2722	* @remarks Caller owns the giant GMM lock.
2723	*/
2724	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2725	{
2726	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2727
2728	/*
2729	* Check allocation limits.
2730	*/
2731	if (RT_LIKELY(pGMM->cAllocatedPages + cPages <= pGMM->cMaxPages))
2732	{ /* likely */ }
2733	else
2734	return VERR_GMM_HIT_GLOBAL_LIMIT;
2735
2736	switch (enmAccount)
2737	{
2738	case GMMACCOUNT_BASE:
2739	if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2740	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
2741	{ /* likely */ }
2742	else
2743	{
2744	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2745	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2746	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2747	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2748	}
2749	break;
2750	case GMMACCOUNT_SHADOW:
2751	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages <= pGVM->gmm.s.Stats.Reserved.cShadowPages))
2752	{ /* likely */ }
2753	else
2754	{
2755	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2756	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2757	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2758	}
2759	break;
2760	case GMMACCOUNT_FIXED:
2761	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages <= pGVM->gmm.s.Stats.Reserved.cFixedPages))
2762	{ /* likely */ }
2763	else
2764	{
2765	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2766	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2767	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2768	}
2769	break;
2770	default:
2771	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2772	}
2773
2774	/*
2775	* Update the accounts before we proceed because we might be leaving the
2776	* protection of the global mutex and thus run the risk of permitting
2777	* too much memory to be allocated.
2778	*/
2779	switch (enmAccount)
2780	{
2781	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2782	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2783	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2784	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2785	}
2786	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2787	pGMM->cAllocatedPages += cPages;
2788
2789	/*
2790	* Bound mode is also relatively straightforward.
2791	*/
2792	uint32_t iPage = 0;
2793	int rc = VINF_SUCCESS;
2794	if (pGMM->fBoundMemoryMode)
2795	{
2796	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2797	if (iPage < cPages)
2798	do
2799	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2800	while (iPage < cPages && RT_SUCCESS(rc));
2801	}
2802	/*
2803	* Shared mode is trickier as we should try archive the same locality as
2804	* in bound mode, but smartly make use of non-full chunks allocated by
2805	* other VMs if we're low on memory.
2806	*/
2807	else
2808	{
2809	RTUID const uidSelf = SUPR0GetSessionUid(pGVM->pSession);
2810
2811	/* Pick the most optimal pages first. */
2812	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2813	if (iPage < cPages)
2814	{
2815	/* Maybe we should try getting pages from chunks "belonging" to
2816	other VMs before allocating more chunks? */
2817	bool fTriedOnSameAlready = false;
2818	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2819	{
2820	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2821	fTriedOnSameAlready = true;
2822	}
2823
2824	/* Allocate memory from empty chunks. */
2825	if (iPage < cPages)
2826	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2827
2828	/* Grab empty shared chunks. */
2829	if (iPage < cPages)
2830	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, uidSelf, iPage, cPages, paPages);
2831
2832	/* If there is a lof of free pages spread around, try not waste
2833	system memory on more chunks. (Should trigger defragmentation.) */
2834	if ( !fTriedOnSameAlready
2835	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2836	{
2837	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2838	if (iPage < cPages)
2839	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2840	}
2841
2842	/*
2843	* Ok, try allocate new chunks.
2844	*/
2845	if (iPage < cPages)
2846	{
2847	do
2848	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2849	while (iPage < cPages && RT_SUCCESS(rc));
2850
2851	#if 0 /* We cannot mix chunks with different UIDs. */
2852	/* If the host is out of memory, take whatever we can get. */
2853	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2854	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2855	{
2856	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2857	if (iPage < cPages)
2858	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2859	AssertRelease(iPage == cPages);
2860	rc = VINF_SUCCESS;
2861	}
2862	#endif
2863	}
2864	}
2865	}
2866
2867	/*
2868	* Clean up on failure. Since this is bound to be a low-memory condition
2869	* we will give back any empty chunks that might be hanging around.
2870	*/
2871	if (RT_SUCCESS(rc))
2872	{ /* likely */ }
2873	else
2874	{
2875	/* Update the statistics. */
2876	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2877	pGMM->cAllocatedPages -= cPages - iPage;
2878	switch (enmAccount)
2879	{
2880	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2881	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2882	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2883	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2884	}
2885
2886	/* Release the pages. */
2887	while (iPage-- > 0)
2888	{
2889	uint32_t idPage = paPages[iPage].idPage;
2890	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2891	if (RT_LIKELY(pPage))
2892	{
2893	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2894	Assert(pPage->Private.hGVM == pGVM->hSelf);
2895	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2896	}
2897	else
2898	AssertMsgFailed(("idPage=%#x\n", idPage));
2899
2900	paPages[iPage].idPage = NIL_GMM_PAGEID;
2901	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2902	paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2903	paPages[iPage].fZeroed = false;
2904	}
2905
2906	/* Free empty chunks. */
2907	/** @todo */
2908
2909	/* return the fail status on failure */
2910	return rc;
2911	}
2912	return VINF_SUCCESS;
2913	}
2914
2915
2916	/**
2917	* Updates the previous allocations and allocates more pages.
2918	*
2919	* The handy pages are always taken from the 'base' memory account.
2920	* The allocated pages are not cleared and will contains random garbage.
2921	*
2922	* @returns VBox status code:
2923	* @retval VINF_SUCCESS on success.
2924	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2925	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2926	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2927	* private page.
2928	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2929	* shared page.
2930	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2931	* owned by the VM.
2932	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2933	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2934	* that is we're trying to allocate more than we've reserved.
2935	*
2936	* @param pGVM The global (ring-0) VM structure.
2937	* @param idCpu The VCPU id.
2938	* @param cPagesToUpdate The number of pages to update (starting from the head).
2939	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2940	* @param paPages The array of page descriptors.
2941	* See GMMPAGEDESC for details on what is expected on input.
2942	* @thread EMT(idCpu)
2943	*/
2944	GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2945	uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2946	{
2947	LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2948	pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2949
2950	/*
2951	* Validate & get basics.
2952	* (This is a relatively busy path, so make predictions where possible.)
2953	*/
2954	PGMM pGMM;
2955	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2956	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2957	if (RT_FAILURE(rc))
2958	return rc;
2959
2960	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2961	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2962	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2963	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2964	VERR_INVALID_PARAMETER);
2965
2966	unsigned iPage = 0;
2967	for (; iPage < cPagesToUpdate; iPage++)
2968	{
2969	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2970	&& !(paPages[iPage].HCPhysGCPhys & GUEST_PAGE_OFFSET_MASK))
2971	\|\| paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
2972	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2973	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2974	VERR_INVALID_PARAMETER);
2975	/* ignore fZeroed here */
2976	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2977	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2978	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2979	AssertMsgReturn( paPages[iPage].idSharedPage == NIL_GMM_PAGEID
2980	\|\| paPages[iPage].idSharedPage <= GMM_PAGEID_LAST,
2981	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2982	}
2983
2984	for (; iPage < cPagesToAlloc; iPage++)
2985	{
2986	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2987	AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
2988	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2989	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2990	}
2991
2992	/*
2993	* Take the semaphore
2994	*/
2995	VMMR0EMTBLOCKCTX Ctx;
2996	PGVMCPU pGVCpu = &pGVM->aCpus[idCpu];
2997	rc = VMMR0EmtPrepareToBlock(pGVCpu, VINF_SUCCESS, "GMMR0AllocateHandyPages", pGMM, &Ctx);
2998	AssertRCReturn(rc, rc);
2999
3000	rc = gmmR0MutexAcquire(pGMM);
3001	if ( RT_SUCCESS(rc)
3002	&& GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3003	{
3004	/* No allocations before the initial reservation has been made! */
3005	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3006	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
3007	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
3008	{
3009	/*
3010	* Perform the updates.
3011	* Stop on the first error.
3012	*/
3013	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
3014	{
3015	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
3016	{
3017	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
3018	if (RT_LIKELY(pPage))
3019	{
3020	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3021	{
3022	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3023	{
3024	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
3025	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
3026	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> GUEST_PAGE_SHIFT;
3027	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
3028	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
3029	/* else: NIL_RTHCPHYS nothing */
3030
3031	paPages[iPage].idPage = NIL_GMM_PAGEID;
3032	paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
3033	paPages[iPage].fZeroed = false;
3034	}
3035	else
3036	{
3037	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
3038	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
3039	rc = VERR_GMM_NOT_PAGE_OWNER;
3040	break;
3041	}
3042	}
3043	else
3044	{
3045	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
3046	rc = VERR_GMM_PAGE_NOT_PRIVATE;
3047	break;
3048	}
3049	}
3050	else
3051	{
3052	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
3053	rc = VERR_GMM_PAGE_NOT_FOUND;
3054	break;
3055	}
3056	}
3057
3058	if (paPages[iPage].idSharedPage == NIL_GMM_PAGEID)
3059	{ /* likely */ }
3060	else
3061	{
3062	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
3063	if (RT_LIKELY(pPage))
3064	{
3065	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3066	{
3067	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
3068	Assert(pPage->Shared.cRefs);
3069	Assert(pGVM->gmm.s.Stats.cSharedPages);
3070	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
3071
3072	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
3073	pGVM->gmm.s.Stats.cSharedPages--;
3074	pGVM->gmm.s.Stats.Allocated.cBasePages--;
3075	if (!--pPage->Shared.cRefs)
3076	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
3077	else
3078	{
3079	Assert(pGMM->cDuplicatePages);
3080	pGMM->cDuplicatePages--;
3081	}
3082
3083	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
3084	}
3085	else
3086	{
3087	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
3088	rc = VERR_GMM_PAGE_NOT_SHARED;
3089	break;
3090	}
3091	}
3092	else
3093	{
3094	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3095	rc = VERR_GMM_PAGE_NOT_FOUND;
3096	break;
3097	}
3098	}
3099	} /* for each page to update */
3100
3101	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3102	{
3103	#ifdef VBOX_STRICT
3104	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3105	{
3106	Assert(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS);
3107	Assert(paPages[iPage].fZeroed == false);
3108	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3109	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3110	}
3111	#endif
3112
3113	/*
3114	* Join paths with GMMR0AllocatePages for the allocation.
3115	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3116	*/
3117	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3118	}
3119	}
3120	else
3121	rc = VERR_WRONG_ORDER;
3122	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3123	gmmR0MutexRelease(pGMM);
3124	}
3125	else if (RT_SUCCESS(rc))
3126	{
3127	gmmR0MutexRelease(pGMM);
3128	rc = VERR_GMM_IS_NOT_SANE;
3129	}
3130	VMMR0EmtResumeAfterBlocking(pGVCpu, &Ctx);
3131
3132	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3133	return rc;
3134	}
3135
3136
3137	/**
3138	* Allocate one or more pages.
3139	*
3140	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3141	* The allocated pages are not cleared and will contain random garbage.
3142	*
3143	* @returns VBox status code:
3144	* @retval VINF_SUCCESS on success.
3145	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3146	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3147	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3148	* that is we're trying to allocate more than we've reserved.
3149	*
3150	* @param pGVM The global (ring-0) VM structure.
3151	* @param idCpu The VCPU id.
3152	* @param cPages The number of pages to allocate.
3153	* @param paPages Pointer to the page descriptors.
3154	* See GMMPAGEDESC for details on what is expected on
3155	* input.
3156	* @param enmAccount The account to charge.
3157	*
3158	* @thread EMT.
3159	*/
3160	GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3161	{
3162	LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3163
3164	/*
3165	* Validate, get basics and take the semaphore.
3166	*/
3167	PGMM pGMM;
3168	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3169	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3170	if (RT_FAILURE(rc))
3171	return rc;
3172
3173	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3174	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3175	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - GUEST_PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3176
3177	for (unsigned iPage = 0; iPage < cPages; iPage++)
3178	{
3179	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
3180	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3181	\|\| ( enmAccount == GMMACCOUNT_BASE
3182	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3183	&& !(paPages[iPage].HCPhysGCPhys & GUEST_PAGE_OFFSET_MASK)),
3184	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3185	VERR_INVALID_PARAMETER);
3186	AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
3187	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3188	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3189	}
3190
3191	/*
3192	* Grab the giant mutex and get working.
3193	*/
3194	gmmR0MutexAcquire(pGMM);
3195	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3196	{
3197
3198	/* No allocations before the initial reservation has been made! */
3199	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3200	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
3201	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
3202	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3203	else
3204	rc = VERR_WRONG_ORDER;
3205	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3206	}
3207	else
3208	rc = VERR_GMM_IS_NOT_SANE;
3209	gmmR0MutexRelease(pGMM);
3210
3211	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3212	return rc;
3213	}
3214
3215
3216	/**
3217	* VMMR0 request wrapper for GMMR0AllocatePages.
3218	*
3219	* @returns see GMMR0AllocatePages.
3220	* @param pGVM The global (ring-0) VM structure.
3221	* @param idCpu The VCPU id.
3222	* @param pReq Pointer to the request packet.
3223	*/
3224	GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3225	{
3226	/*
3227	* Validate input and pass it on.
3228	*/
3229	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3230	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3231	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3232	VERR_INVALID_PARAMETER);
3233	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3234	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3235	VERR_INVALID_PARAMETER);
3236
3237	return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3238	}
3239
3240
3241	/**
3242	* Allocate a large page to represent guest RAM
3243	*
3244	* The allocated pages are zeroed upon return.
3245	*
3246	* @returns VBox status code:
3247	* @retval VINF_SUCCESS on success.
3248	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3249	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3250	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3251	* that is we're trying to allocate more than we've reserved.
3252	* @retval VERR_TRY_AGAIN if the host is temporarily out of large pages.
3253	* @returns see GMMR0AllocatePages.
3254	*
3255	* @param pGVM The global (ring-0) VM structure.
3256	* @param idCpu The VCPU id.
3257	* @param cbPage Large page size.
3258	* @param pIdPage Where to return the GMM page ID of the page.
3259	* @param pHCPhys Where to return the host physical address of the page.
3260	*/
3261	GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
3262	{
3263	LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3264
3265	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3266	*pIdPage = NIL_GMM_PAGEID;
3267	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3268	*pHCPhys = NIL_RTHCPHYS;
3269	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3270
3271	/*
3272	* Validate GVM + idCpu, get basics and take the semaphore.
3273	*/
3274	PGMM pGMM;
3275	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3276	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3277	AssertRCReturn(rc, rc);
3278
3279	VMMR0EMTBLOCKCTX Ctx;
3280	PGVMCPU pGVCpu = &pGVM->aCpus[idCpu];
3281	rc = VMMR0EmtPrepareToBlock(pGVCpu, VINF_SUCCESS, "GMMR0AllocateLargePage", pGMM, &Ctx);
3282	AssertRCReturn(rc, rc);
3283
3284	rc = gmmR0MutexAcquire(pGMM);
3285	if (RT_SUCCESS(rc))
3286	{
3287	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3288	{
3289	/*
3290	* Check the quota.
3291	*/
3292	/** @todo r=bird: Quota checking could be done w/o the giant mutex but using
3293	* a VM specific mutex... */
3294	if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + GMM_CHUNK_NUM_PAGES
3295	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3296	{
3297	/*
3298	* Allocate a new large page chunk.
3299	*
3300	* Note! We leave the giant GMM lock temporarily as the allocation might
3301	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3302	*/
3303	AssertCompile(GMM_CHUNK_SIZE == _2M);
3304	gmmR0MutexRelease(pGMM);
3305
3306	RTR0MEMOBJ hMemObj;
3307	rc = RTR0MemObjAllocLarge(&hMemObj, GMM_CHUNK_SIZE, GMM_CHUNK_SIZE, RTMEMOBJ_ALLOC_LARGE_F_FAST);
3308	if (RT_SUCCESS(rc))
3309	{
3310	*pHCPhys = RTR0MemObjGetPagePhysAddr(hMemObj, 0);
3311
3312	/*
3313	* Register the chunk as fully allocated.
3314	* Note! As mentioned above, this will return owning the mutex on success.
3315	*/
3316	PGMMCHUNK pChunk = NULL;
3317	PGMMCHUNKFREESET const pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3318	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, GMM_CHUNK_FLAGS_LARGE_PAGE,
3319	0 /cPages/, NULL /paPages/, NULL /piPage/, &pChunk);
3320	if (RT_SUCCESS(rc))
3321	{
3322	/*
3323	* The gmmR0RegisterChunk call already marked all pages allocated,
3324	* so we just have to fill in the return values and update stats now.
3325	*/
3326	*pIdPage = pChunk->Core.Key << GMM_CHUNKID_SHIFT;
3327
3328	/* Update accounting. */
3329	pGVM->gmm.s.Stats.Allocated.cBasePages += GMM_CHUNK_NUM_PAGES;
3330	pGVM->gmm.s.Stats.cPrivatePages += GMM_CHUNK_NUM_PAGES;
3331	pGMM->cAllocatedPages += GMM_CHUNK_NUM_PAGES;
3332
3333	gmmR0LinkChunk(pChunk, pSet);
3334	gmmR0MutexRelease(pGMM);
3335
3336	VMMR0EmtResumeAfterBlocking(pGVCpu, &Ctx);
3337	LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3338	return VINF_SUCCESS;
3339	}
3340
3341	/*
3342	* Bail out.
3343	*/
3344	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3345	*pHCPhys = NIL_RTHCPHYS;
3346	}
3347	/** @todo r=bird: Turn VERR_NO_MEMORY etc into VERR_TRY_AGAIN? Docs say we
3348	* return it, but I am sure IPRT doesn't... */
3349	}
3350	else
3351	{
3352	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3353	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, GMM_CHUNK_NUM_PAGES));
3354	gmmR0MutexRelease(pGMM);
3355	rc = VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3356	}
3357	}
3358	else
3359	{
3360	gmmR0MutexRelease(pGMM);
3361	rc = VERR_GMM_IS_NOT_SANE;
3362	}
3363	}
3364
3365	VMMR0EmtResumeAfterBlocking(pGVCpu, &Ctx);
3366	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3367	return rc;
3368	}
3369
3370
3371	/**
3372	* Free a large page.
3373	*
3374	* @returns VBox status code:
3375	* @param pGVM The global (ring-0) VM structure.
3376	* @param idCpu The VCPU id.
3377	* @param idPage The large page id.
3378	*/
3379	GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3380	{
3381	LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3382
3383	/*
3384	* Validate, get basics and take the semaphore.
3385	*/
3386	PGMM pGMM;
3387	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3388	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3389	if (RT_FAILURE(rc))
3390	return rc;
3391
3392	gmmR0MutexAcquire(pGMM);
3393	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3394	{
3395	const unsigned cPages = GMM_CHUNK_NUM_PAGES;
3396
3397	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3398	{
3399	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3400	gmmR0MutexRelease(pGMM);
3401	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3402	}
3403
3404	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3405	if (RT_LIKELY( pPage
3406	&& GMM_PAGE_IS_PRIVATE(pPage)))
3407	{
3408	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3409	Assert(pChunk);
3410	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3411	Assert(pChunk->cPrivate > 0);
3412
3413	/* Release the memory immediately. */
3414	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3415
3416	/* Update accounting. */
3417	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3418	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3419	pGMM->cAllocatedPages -= cPages;
3420	}
3421	else
3422	rc = VERR_GMM_PAGE_NOT_FOUND;
3423	}
3424	else
3425	rc = VERR_GMM_IS_NOT_SANE;
3426
3427	gmmR0MutexRelease(pGMM);
3428	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3429	return rc;
3430	}
3431
3432
3433	/**
3434	* VMMR0 request wrapper for GMMR0FreeLargePage.
3435	*
3436	* @returns see GMMR0FreeLargePage.
3437	* @param pGVM The global (ring-0) VM structure.
3438	* @param idCpu The VCPU id.
3439	* @param pReq Pointer to the request packet.
3440	*/
3441	GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3442	{
3443	/*
3444	* Validate input and pass it on.
3445	*/
3446	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3447	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3448	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3449	VERR_INVALID_PARAMETER);
3450
3451	return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3452	}
3453
3454
3455	/**
3456	* @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3457	* Used by gmmR0FreeChunkFlushPerVmTlbs().}
3458	*/
3459	static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3460	{
3461	RT_NOREF(pvUser);
3462	if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3463	{
3464	RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3465	uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3466	while (i-- > 0)
3467	{
3468	pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3469	pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3470	}
3471	RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3472	}
3473	return VINF_SUCCESS;
3474	}
3475
3476
3477	/**
3478	* Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3479	* free generation ID value.
3480	*
3481	* This is done at 2^62 - 1, which allows us to drop all locks and as it will
3482	* take a while before 12 exa (2 305 843 009 213 693 952) calls to
3483	* gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3484	* invalidation passes and resets the generation ID between then. This will
3485	* make sure there are no false positives.
3486	*
3487	* @param pGMM Pointer to the GMM instance.
3488	*/
3489	static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3490	{
3491	/*
3492	* First invalidation pass.
3493	*/
3494	int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3495	AssertRCSuccess(rc);
3496
3497	/*
3498	* Reset the generation number.
3499	*/
3500	RTSpinlockAcquire(pGMM->hSpinLockTree);
3501	ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3502	RTSpinlockRelease(pGMM->hSpinLockTree);
3503
3504	/*
3505	* Second invalidation pass.
3506	*/
3507	rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3508	AssertRCSuccess(rc);
3509	}
3510
3511
3512	/**
3513	* Frees a chunk, giving it back to the host OS.
3514	*
3515	* @param pGMM Pointer to the GMM instance.
3516	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3517	* unmap and free the chunk in one go.
3518	* @param pChunk The chunk to free.
3519	* @param fRelaxedSem Whether we can release the semaphore while doing the
3520	* freeing (@c true) or not.
3521	*/
3522	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3523	{
3524	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3525
3526	GMMR0CHUNKMTXSTATE MtxState;
3527	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3528
3529	/*
3530	* Cleanup hack! Unmap the chunk from the callers address space.
3531	* This shouldn't happen, so screw lock contention...
3532	*/
3533	if (pChunk->cMappingsX && pGVM)
3534	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3535
3536	/*
3537	* If there are current mappings of the chunk, then request the
3538	* VMs to unmap them. Reposition the chunk in the free list so
3539	* it won't be a likely candidate for allocations.
3540	*/
3541	if (pChunk->cMappingsX)
3542	{
3543	/** @todo R0 -> VM request */
3544	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3545	Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3546	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3547	return false;
3548	}
3549
3550
3551	/*
3552	* Save and trash the handle.
3553	*/
3554	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3555	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3556
3557	/*
3558	* Unlink it from everywhere.
3559	*/
3560	gmmR0UnlinkChunk(pChunk);
3561
3562	RTSpinlockAcquire(pGMM->hSpinLockTree);
3563
3564	RTListNodeRemove(&pChunk->ListNode);
3565
3566	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3567	Assert(pCore == &pChunk->Core); NOREF(pCore);
3568
3569	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3570	if (pTlbe->pChunk == pChunk)
3571	{
3572	pTlbe->idChunk = NIL_GMM_CHUNKID;
3573	pTlbe->pChunk = NULL;
3574	}
3575
3576	Assert(pGMM->cChunks > 0);
3577	pGMM->cChunks--;
3578
3579	uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3580
3581	RTSpinlockRelease(pGMM->hSpinLockTree);
3582
3583	pGMM->cFreedChunks++;
3584
3585	/* Drop the lock. */
3586	gmmR0ChunkMutexRelease(&MtxState, NULL);
3587	if (fRelaxedSem)
3588	gmmR0MutexRelease(pGMM);
3589
3590	/*
3591	* Flush per VM chunk TLBs if we're getting remotely close to a generation wraparound.
3592	*/
3593	if (idFreeGeneration == UINT64_MAX / 4)
3594	gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3595
3596	/*
3597	* Free the Chunk ID and all memory associated with the chunk.
3598	*/
3599	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3600	pChunk->Core.Key = NIL_GMM_CHUNKID;
3601
3602	RTMemFree(pChunk->paMappingsX);
3603	pChunk->paMappingsX = NULL;
3604
3605	RTMemFree(pChunk);
3606
3607	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3608	int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3609	#else
3610	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3611	#endif
3612	AssertLogRelRC(rc);
3613
3614	if (fRelaxedSem)
3615	gmmR0MutexAcquire(pGMM);
3616	return fRelaxedSem;
3617	}
3618
3619
3620	/**
3621	* Free page worker.
3622	*
3623	* The caller does all the statistic decrementing, we do all the incrementing.
3624	*
3625	* @param pGMM Pointer to the GMM instance data.
3626	* @param pGVM Pointer to the GVM instance.
3627	* @param pChunk Pointer to the chunk this page belongs to.
3628	* @param idPage The Page ID.
3629	* @param pPage Pointer to the page.
3630	*/
3631	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3632	{
3633	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3634	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3635
3636	/*
3637	* Put the page on the free list.
3638	*/
3639	pPage->u = 0;
3640	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3641	pPage->Free.fZeroed = false;
3642	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3643	pPage->Free.iNext = pChunk->iFreeHead;
3644	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3645
3646	/*
3647	* Update statistics (the cShared/cPrivate stats are up to date already),
3648	* and relink the chunk if necessary.
3649	*/
3650	unsigned const cFree = pChunk->cFree;
3651	if ( !cFree
3652	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3653	{
3654	gmmR0UnlinkChunk(pChunk);
3655	pChunk->cFree++;
3656	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3657	}
3658	else
3659	{
3660	pChunk->cFree = cFree + 1;
3661	pChunk->pSet->cFreePages++;
3662	}
3663
3664	/*
3665	* If the chunk becomes empty, consider giving memory back to the host OS.
3666	*
3667	* The current strategy is to try give it back if there are other chunks
3668	* in this free list, meaning if there are at least 240 free pages in this
3669	* category. Note that since there are probably mappings of the chunk,
3670	* it won't be freed up instantly, which probably screws up this logic
3671	* a bit...
3672	*/
3673	/** @todo Do this on the way out. */
3674	if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3675	\|\| pChunk->pFreeNext == NULL
3676	\|\| pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3677	{ /* likely */ }
3678	else
3679	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3680	}
3681
3682
3683	/**
3684	* Frees a shared page, the page is known to exist and be valid and such.
3685	*
3686	* @param pGMM Pointer to the GMM instance.
3687	* @param pGVM Pointer to the GVM instance.
3688	* @param idPage The page id.
3689	* @param pPage The page structure.
3690	*/
3691	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3692	{
3693	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3694	Assert(pChunk);
3695	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3696	Assert(pChunk->cShared > 0);
3697	Assert(pGMM->cSharedPages > 0);
3698	Assert(pGMM->cAllocatedPages > 0);
3699	Assert(!pPage->Shared.cRefs);
3700
3701	pChunk->cShared--;
3702	pGMM->cAllocatedPages--;
3703	pGMM->cSharedPages--;
3704	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3705	}
3706
3707
3708	/**
3709	* Frees a private page, the page is known to exist and be valid and such.
3710	*
3711	* @param pGMM Pointer to the GMM instance.
3712	* @param pGVM Pointer to the GVM instance.
3713	* @param idPage The page id.
3714	* @param pPage The page structure.
3715	*/
3716	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3717	{
3718	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3719	Assert(pChunk);
3720	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3721	Assert(pChunk->cPrivate > 0);
3722	Assert(pGMM->cAllocatedPages > 0);
3723
3724	pChunk->cPrivate--;
3725	pGMM->cAllocatedPages--;
3726	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3727	}
3728
3729
3730	/**
3731	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3732	*
3733	* @returns VBox status code:
3734	* @retval xxx
3735	*
3736	* @param pGMM Pointer to the GMM instance data.
3737	* @param pGVM Pointer to the VM.
3738	* @param cPages The number of pages to free.
3739	* @param paPages Pointer to the page descriptors.
3740	* @param enmAccount The account this relates to.
3741	*/
3742	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3743	{
3744	/*
3745	* Check that the request isn't impossible wrt to the account status.
3746	*/
3747	switch (enmAccount)
3748	{
3749	case GMMACCOUNT_BASE:
3750	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3751	{
3752	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3753	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3754	}
3755	break;
3756	case GMMACCOUNT_SHADOW:
3757	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3758	{
3759	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3760	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3761	}
3762	break;
3763	case GMMACCOUNT_FIXED:
3764	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3765	{
3766	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3767	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3768	}
3769	break;
3770	default:
3771	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3772	}
3773
3774	/*
3775	* Walk the descriptors and free the pages.
3776	*
3777	* Statistics (except the account) are being updated as we go along,
3778	* unlike the alloc code. Also, stop on the first error.
3779	*/
3780	int rc = VINF_SUCCESS;
3781	uint32_t iPage;
3782	for (iPage = 0; iPage < cPages; iPage++)
3783	{
3784	uint32_t idPage = paPages[iPage].idPage;
3785	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3786	if (RT_LIKELY(pPage))
3787	{
3788	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3789	{
3790	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3791	{
3792	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3793	pGVM->gmm.s.Stats.cPrivatePages--;
3794	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3795	}
3796	else
3797	{
3798	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3799	pPage->Private.hGVM, pGVM->hSelf));
3800	rc = VERR_GMM_NOT_PAGE_OWNER;
3801	break;
3802	}
3803	}
3804	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3805	{
3806	Assert(pGVM->gmm.s.Stats.cSharedPages);
3807	Assert(pPage->Shared.cRefs);
3808	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT)
3809	if (pPage->Shared.u14Checksum)
3810	{
3811	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3812	uChecksum &= UINT32_C(0x00003fff);
3813	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3814	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3815	}
3816	#endif
3817	pGVM->gmm.s.Stats.cSharedPages--;
3818	if (!--pPage->Shared.cRefs)
3819	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3820	else
3821	{
3822	Assert(pGMM->cDuplicatePages);
3823	pGMM->cDuplicatePages--;
3824	}
3825	}
3826	else
3827	{
3828	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3829	rc = VERR_GMM_PAGE_ALREADY_FREE;
3830	break;
3831	}
3832	}
3833	else
3834	{
3835	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3836	rc = VERR_GMM_PAGE_NOT_FOUND;
3837	break;
3838	}
3839	paPages[iPage].idPage = NIL_GMM_PAGEID;
3840	}
3841
3842	/*
3843	* Update the account.
3844	*/
3845	switch (enmAccount)
3846	{
3847	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3848	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3849	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3850	default:
3851	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3852	}
3853
3854	/*
3855	* Any threshold stuff to be done here?
3856	*/
3857
3858	return rc;
3859	}
3860
3861
3862	/**
3863	* Free one or more pages.
3864	*
3865	* This is typically used at reset time or power off.
3866	*
3867	* @returns VBox status code:
3868	* @retval xxx
3869	*
3870	* @param pGVM The global (ring-0) VM structure.
3871	* @param idCpu The VCPU id.
3872	* @param cPages The number of pages to allocate.
3873	* @param paPages Pointer to the page descriptors containing the page IDs
3874	* for each page.
3875	* @param enmAccount The account this relates to.
3876	* @thread EMT.
3877	*/
3878	GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3879	{
3880	LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3881
3882	/*
3883	* Validate input and get the basics.
3884	*/
3885	PGMM pGMM;
3886	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3887	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3888	if (RT_FAILURE(rc))
3889	return rc;
3890
3891	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3892	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3893	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - GUEST_PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3894
3895	for (unsigned iPage = 0; iPage < cPages; iPage++)
3896	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3897	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3898	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3899
3900	/*
3901	* Take the semaphore and call the worker function.
3902	*/
3903	gmmR0MutexAcquire(pGMM);
3904	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3905	{
3906	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3907	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3908	}
3909	else
3910	rc = VERR_GMM_IS_NOT_SANE;
3911	gmmR0MutexRelease(pGMM);
3912	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3913	return rc;
3914	}
3915
3916
3917	/**
3918	* VMMR0 request wrapper for GMMR0FreePages.
3919	*
3920	* @returns see GMMR0FreePages.
3921	* @param pGVM The global (ring-0) VM structure.
3922	* @param idCpu The VCPU id.
3923	* @param pReq Pointer to the request packet.
3924	*/
3925	GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3926	{
3927	/*
3928	* Validate input and pass it on.
3929	*/
3930	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3931	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3932	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3933	VERR_INVALID_PARAMETER);
3934	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3935	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3936	VERR_INVALID_PARAMETER);
3937
3938	return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3939	}
3940
3941
3942	/**
3943	* Report back on a memory ballooning request.
3944	*
3945	* The request may or may not have been initiated by the GMM. If it was initiated
3946	* by the GMM it is important that this function is called even if no pages were
3947	* ballooned.
3948	*
3949	* @returns VBox status code:
3950	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3951	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3952	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3953	* indicating that we won't necessarily have sufficient RAM to boot
3954	* the VM again and that it should pause until this changes (we'll try
3955	* balloon some other VM). (For standard deflate we have little choice
3956	* but to hope the VM won't use the memory that was returned to it.)
3957	*
3958	* @param pGVM The global (ring-0) VM structure.
3959	* @param idCpu The VCPU id.
3960	* @param enmAction Inflate/deflate/reset.
3961	* @param cBalloonedPages The number of pages that was ballooned.
3962	*
3963	* @thread EMT(idCpu)
3964	*/
3965	GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3966	{
3967	LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3968	pGVM, enmAction, cBalloonedPages));
3969
3970	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - GUEST_PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3971
3972	/*
3973	* Validate input and get the basics.
3974	*/
3975	PGMM pGMM;
3976	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3977	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3978	if (RT_FAILURE(rc))
3979	return rc;
3980
3981	/*
3982	* Take the semaphore and do some more validations.
3983	*/
3984	gmmR0MutexAcquire(pGMM);
3985	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3986	{
3987	switch (enmAction)
3988	{
3989	case GMMBALLOONACTION_INFLATE:
3990	{
3991	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3992	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3993	{
3994	/*
3995	* Record the ballooned memory.
3996	*/
3997	pGMM->cBalloonedPages += cBalloonedPages;
3998	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3999	{
4000	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
4001	AssertFailed();
4002
4003	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
4004	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
4005	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
4006	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
4007	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
4008	}
4009	else
4010	{
4011	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
4012	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
4013	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
4014	}
4015	}
4016	else
4017	{
4018	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
4019	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
4020	pGVM->gmm.s.Stats.Reserved.cBasePages));
4021	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
4022	}
4023	break;
4024	}
4025
4026	case GMMBALLOONACTION_DEFLATE:
4027	{
4028	/* Deflate. */
4029	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
4030	{
4031	/*
4032	* Record the ballooned memory.
4033	*/
4034	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
4035	pGMM->cBalloonedPages -= cBalloonedPages;
4036	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
4037	if (pGVM->gmm.s.Stats.cReqDeflatePages)
4038	{
4039	AssertFailed(); /* This is path is for later. */
4040	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
4041	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
4042
4043	/*
4044	* Anything we need to do here now when the request has been completed?
4045	*/
4046	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
4047	}
4048	else
4049	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
4050	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
4051	}
4052	else
4053	{
4054	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
4055	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
4056	}
4057	break;
4058	}
4059
4060	case GMMBALLOONACTION_RESET:
4061	{
4062	/* Reset to an empty balloon. */
4063	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
4064
4065	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
4066	pGVM->gmm.s.Stats.cBalloonedPages = 0;
4067	break;
4068	}
4069
4070	default:
4071	rc = VERR_INVALID_PARAMETER;
4072	break;
4073	}
4074	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4075	}
4076	else
4077	rc = VERR_GMM_IS_NOT_SANE;
4078
4079	gmmR0MutexRelease(pGMM);
4080	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
4081	return rc;
4082	}
4083
4084
4085	/**
4086	* VMMR0 request wrapper for GMMR0BalloonedPages.
4087	*
4088	* @returns see GMMR0BalloonedPages.
4089	* @param pGVM The global (ring-0) VM structure.
4090	* @param idCpu The VCPU id.
4091	* @param pReq Pointer to the request packet.
4092	*/
4093	GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
4094	{
4095	/*
4096	* Validate input and pass it on.
4097	*/
4098	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4099	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
4100	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
4101	VERR_INVALID_PARAMETER);
4102
4103	return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
4104	}
4105
4106
4107	/**
4108	* Return memory statistics for the hypervisor
4109	*
4110	* @returns VBox status code.
4111	* @param pReq Pointer to the request packet.
4112	*/
4113	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
4114	{
4115	/*
4116	* Validate input and pass it on.
4117	*/
4118	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4119	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4120	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4121	VERR_INVALID_PARAMETER);
4122
4123	/*
4124	* Validate input and get the basics.
4125	*/
4126	PGMM pGMM;
4127	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4128	pReq->cAllocPages = pGMM->cAllocatedPages;
4129	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT - GUEST_PAGE_SHIFT)) - pGMM->cAllocatedPages;
4130	pReq->cBalloonedPages = pGMM->cBalloonedPages;
4131	pReq->cMaxPages = pGMM->cMaxPages;
4132	pReq->cSharedPages = pGMM->cDuplicatePages;
4133	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4134
4135	return VINF_SUCCESS;
4136	}
4137
4138
4139	/**
4140	* Return memory statistics for the VM
4141	*
4142	* @returns VBox status code.
4143	* @param pGVM The global (ring-0) VM structure.
4144	* @param idCpu Cpu id.
4145	* @param pReq Pointer to the request packet.
4146	*
4147	* @thread EMT(idCpu)
4148	*/
4149	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4150	{
4151	/*
4152	* Validate input and pass it on.
4153	*/
4154	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4155	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4156	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4157	VERR_INVALID_PARAMETER);
4158
4159	/*
4160	* Validate input and get the basics.
4161	*/
4162	PGMM pGMM;
4163	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4164	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4165	if (RT_FAILURE(rc))
4166	return rc;
4167
4168	/*
4169	* Take the semaphore and do some more validations.
4170	*/
4171	gmmR0MutexAcquire(pGMM);
4172	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4173	{
4174	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4175	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4176	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4177	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4178	}
4179	else
4180	rc = VERR_GMM_IS_NOT_SANE;
4181
4182	gmmR0MutexRelease(pGMM);
4183	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4184	return rc;
4185	}
4186
4187
4188	/**
4189	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4190	*
4191	* Don't call this in legacy allocation mode!
4192	*
4193	* @returns VBox status code.
4194	* @param pGMM Pointer to the GMM instance data.
4195	* @param pGVM Pointer to the Global VM structure.
4196	* @param pChunk Pointer to the chunk to be unmapped.
4197	*/
4198	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4199	{
4200	RT_NOREF_PV(pGMM);
4201
4202	/*
4203	* Find the mapping and try unmapping it.
4204	*/
4205	uint32_t cMappings = pChunk->cMappingsX;
4206	for (uint32_t i = 0; i < cMappings; i++)
4207	{
4208	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4209	if (pChunk->paMappingsX[i].pGVM == pGVM)
4210	{
4211	/* unmap */
4212	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4213	if (RT_SUCCESS(rc))
4214	{
4215	/* update the record. */
4216	cMappings--;
4217	if (i < cMappings)
4218	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4219	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4220	pChunk->paMappingsX[cMappings].pGVM = NULL;
4221	Assert(pChunk->cMappingsX - 1U == cMappings);
4222	pChunk->cMappingsX = cMappings;
4223	}
4224
4225	return rc;
4226	}
4227	}
4228
4229	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4230	return VERR_GMM_CHUNK_NOT_MAPPED;
4231	}
4232
4233
4234	/**
4235	* Unmaps a chunk previously mapped into the address space of the current process.
4236	*
4237	* @returns VBox status code.
4238	* @param pGMM Pointer to the GMM instance data.
4239	* @param pGVM Pointer to the Global VM structure.
4240	* @param pChunk Pointer to the chunk to be unmapped.
4241	* @param fRelaxedSem Whether we can release the semaphore while doing the
4242	* mapping (@c true) or not.
4243	*/
4244	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4245	{
4246	/*
4247	* Lock the chunk and if possible leave the giant GMM lock.
4248	*/
4249	GMMR0CHUNKMTXSTATE MtxState;
4250	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4251	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4252	if (RT_SUCCESS(rc))
4253	{
4254	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4255	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4256	}
4257	return rc;
4258	}
4259
4260
4261	/**
4262	* Worker for gmmR0MapChunk.
4263	*
4264	* @returns VBox status code.
4265	* @param pGMM Pointer to the GMM instance data.
4266	* @param pGVM Pointer to the Global VM structure.
4267	* @param pChunk Pointer to the chunk to be mapped.
4268	* @param ppvR3 Where to store the ring-3 address of the mapping.
4269	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4270	* contain the address of the existing mapping.
4271	*/
4272	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4273	{
4274	RT_NOREF(pGMM);
4275
4276	/*
4277	* Check to see if the chunk is already mapped.
4278	*/
4279	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4280	{
4281	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4282	if (pChunk->paMappingsX[i].pGVM == pGVM)
4283	{
4284	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4285	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4286	#ifdef VBOX_WITH_PAGE_SHARING
4287	/* The ring-3 chunk cache can be out of sync; don't fail. */
4288	return VINF_SUCCESS;
4289	#else
4290	return VERR_GMM_CHUNK_ALREADY_MAPPED;
4291	#endif
4292	}
4293	}
4294
4295	/*
4296	* Do the mapping.
4297	*/
4298	RTR0MEMOBJ hMapObj;
4299	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4300	if (RT_SUCCESS(rc))
4301	{
4302	/* reallocate the array? assumes few users per chunk (usually one). */
4303	unsigned iMapping = pChunk->cMappingsX;
4304	if ( iMapping <= 3
4305	\|\| (iMapping & 3) == 0)
4306	{
4307	unsigned cNewSize = iMapping <= 3
4308	? iMapping + 1
4309	: iMapping + 4;
4310	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
4311	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4312	{
4313	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4314	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4315	}
4316
4317	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
4318	if (RT_UNLIKELY(!pvMappings))
4319	{
4320	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4321	return VERR_NO_MEMORY;
4322	}
4323	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4324	}
4325
4326	/* insert new entry */
4327	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4328	pChunk->paMappingsX[iMapping].pGVM = pGVM;
4329	Assert(pChunk->cMappingsX == iMapping);
4330	pChunk->cMappingsX = iMapping + 1;
4331
4332	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
4333	}
4334
4335	return rc;
4336	}
4337
4338
4339	/**
4340	* Maps a chunk into the user address space of the current process.
4341	*
4342	* @returns VBox status code.
4343	* @param pGMM Pointer to the GMM instance data.
4344	* @param pGVM Pointer to the Global VM structure.
4345	* @param pChunk Pointer to the chunk to be mapped.
4346	* @param fRelaxedSem Whether we can release the semaphore while doing the
4347	* mapping (@c true) or not.
4348	* @param ppvR3 Where to store the ring-3 address of the mapping.
4349	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4350	* contain the address of the existing mapping.
4351	*/
4352	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4353	{
4354	/*
4355	* Take the chunk lock and leave the giant GMM lock when possible, then
4356	* call the worker function.
4357	*/
4358	GMMR0CHUNKMTXSTATE MtxState;
4359	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4360	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4361	if (RT_SUCCESS(rc))
4362	{
4363	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4364	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4365	}
4366
4367	return rc;
4368	}
4369
4370
4371
4372	#if defined(VBOX_WITH_PAGE_SHARING) \|\| defined(VBOX_STRICT)
4373	/**
4374	* Check if a chunk is mapped into the specified VM
4375	*
4376	* @returns mapped yes/no
4377	* @param pGMM Pointer to the GMM instance.
4378	* @param pGVM Pointer to the Global VM structure.
4379	* @param pChunk Pointer to the chunk to be mapped.
4380	* @param ppvR3 Where to store the ring-3 address of the mapping.
4381	*/
4382	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4383	{
4384	GMMR0CHUNKMTXSTATE MtxState;
4385	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4386	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4387	{
4388	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4389	if (pChunk->paMappingsX[i].pGVM == pGVM)
4390	{
4391	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4392	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4393	return true;
4394	}
4395	}
4396	*ppvR3 = NULL;
4397	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4398	return false;
4399	}
4400	#endif /* VBOX_WITH_PAGE_SHARING \|\| VBOX_STRICT */
4401
4402
4403	/**
4404	* Map a chunk and/or unmap another chunk.
4405	*
4406	* The mapping and unmapping applies to the current process.
4407	*
4408	* This API does two things because it saves a kernel call per mapping when
4409	* when the ring-3 mapping cache is full.
4410	*
4411	* @returns VBox status code.
4412	* @param pGVM The global (ring-0) VM structure.
4413	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4414	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4415	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4416	* @thread EMT ???
4417	*/
4418	GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4419	{
4420	LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4421	pGVM, idChunkMap, idChunkUnmap, ppvR3));
4422
4423	/*
4424	* Validate input and get the basics.
4425	*/
4426	PGMM pGMM;
4427	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4428	int rc = GVMMR0ValidateGVM(pGVM);
4429	if (RT_FAILURE(rc))
4430	return rc;
4431
4432	AssertCompile(NIL_GMM_CHUNKID == 0);
4433	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4434	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4435
4436	if ( idChunkMap == NIL_GMM_CHUNKID
4437	&& idChunkUnmap == NIL_GMM_CHUNKID)
4438	return VERR_INVALID_PARAMETER;
4439
4440	if (idChunkMap != NIL_GMM_CHUNKID)
4441	{
4442	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4443	*ppvR3 = NIL_RTR3PTR;
4444	}
4445
4446	/*
4447	* Take the semaphore and do the work.
4448	*
4449	* The unmapping is done last since it's easier to undo a mapping than
4450	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4451	* that it pushes the user virtual address space to within a chunk of
4452	* it it's limits, so, no problem here.
4453	*/
4454	gmmR0MutexAcquire(pGMM);
4455	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4456	{
4457	PGMMCHUNK pMap = NULL;
4458	if (idChunkMap != NIL_GVM_HANDLE)
4459	{
4460	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4461	if (RT_LIKELY(pMap))
4462	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4463	else
4464	{
4465	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4466	rc = VERR_GMM_CHUNK_NOT_FOUND;
4467	}
4468	}
4469	/** @todo split this operation, the bail out might (theoretcially) not be
4470	* entirely safe. */
4471
4472	if ( idChunkUnmap != NIL_GMM_CHUNKID
4473	&& RT_SUCCESS(rc))
4474	{
4475	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4476	if (RT_LIKELY(pUnmap))
4477	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4478	else
4479	{
4480	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4481	rc = VERR_GMM_CHUNK_NOT_FOUND;
4482	}
4483
4484	if (RT_FAILURE(rc) && pMap)
4485	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4486	}
4487
4488	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4489	}
4490	else
4491	rc = VERR_GMM_IS_NOT_SANE;
4492	gmmR0MutexRelease(pGMM);
4493
4494	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4495	return rc;
4496	}
4497
4498
4499	/**
4500	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4501	*
4502	* @returns see GMMR0MapUnmapChunk.
4503	* @param pGVM The global (ring-0) VM structure.
4504	* @param pReq Pointer to the request packet.
4505	*/
4506	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4507	{
4508	/*
4509	* Validate input and pass it on.
4510	*/
4511	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4512	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4513
4514	return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4515	}
4516
4517
4518	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4519	/**
4520	* Gets the ring-0 virtual address for the given page.
4521	*
4522	* This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4523	* One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4524	* corresponding chunk will remain valid beyond the call (at least till the EMT
4525	* returns to ring-3).
4526	*
4527	* @returns VBox status code.
4528	* @param pGVM Pointer to the kernel-only VM instace data.
4529	* @param idPage The page ID.
4530	* @param ppv Where to store the address.
4531	* @thread EMT
4532	*/
4533	GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4534	{
4535	*ppv = NULL;
4536	PGMM pGMM;
4537	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4538
4539	uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4540
4541	/*
4542	* Start with the per-VM TLB.
4543	*/
4544	RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4545
4546	PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4547	PGMMCHUNK pChunk = pTlbe->pChunk;
4548	if ( pChunk != NULL
4549	&& pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4550	&& pChunk->Core.Key == idChunk)
4551	pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4552	else
4553	{
4554	pGVM->R0Stats.gmm.cChunkTlbMisses++;
4555
4556	/*
4557	* Look it up in the chunk tree.
4558	*/
4559	RTSpinlockAcquire(pGMM->hSpinLockTree);
4560	pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4561	if (RT_LIKELY(pChunk))
4562	{
4563	pTlbe->idGeneration = pGMM->idFreeGeneration;
4564	RTSpinlockRelease(pGMM->hSpinLockTree);
4565	pTlbe->pChunk = pChunk;
4566	}
4567	else
4568	{
4569	RTSpinlockRelease(pGMM->hSpinLockTree);
4570	RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4571	AssertMsgFailed(("idPage=%#x\n", idPage));
4572	return VERR_GMM_PAGE_NOT_FOUND;
4573	}
4574	}
4575
4576	RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4577
4578	/*
4579	* Got a chunk, now validate the page ownership and calcuate it's address.
4580	*/
4581	const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4582	if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4583	&& pPage->Private.hGVM == pGVM->hSelf)
4584	\|\| GMM_PAGE_IS_SHARED(pPage)))
4585	{
4586	AssertPtr(pChunk->pbMapping);
4587	*ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT];
4588	return VINF_SUCCESS;
4589	}
4590	AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4591	idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4592	return VERR_GMM_NOT_PAGE_OWNER;
4593	}
4594	#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4595
4596	#ifdef VBOX_WITH_PAGE_SHARING
4597
4598	# ifdef VBOX_STRICT
4599	/**
4600	* For checksumming shared pages in strict builds.
4601	*
4602	* The purpose is making sure that a page doesn't change.
4603	*
4604	* @returns Checksum, 0 on failure.
4605	* @param pGMM The GMM instance data.
4606	* @param pGVM Pointer to the kernel-only VM instace data.
4607	* @param idPage The page ID.
4608	*/
4609	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4610	{
4611	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4612	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4613
4614	uint8_t *pbChunk;
4615	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4616	return 0;
4617	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
4618
4619	return RTCrc32(pbPage, GUEST_PAGE_SIZE);
4620	}
4621	# endif /* VBOX_STRICT */
4622
4623
4624	/**
4625	* Calculates the module hash value.
4626	*
4627	* @returns Hash value.
4628	* @param pszModuleName The module name.
4629	* @param pszVersion The module version string.
4630	*/
4631	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4632	{
4633	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4634	}
4635
4636
4637	/**
4638	* Finds a global module.
4639	*
4640	* @returns Pointer to the global module on success, NULL if not found.
4641	* @param pGMM The GMM instance data.
4642	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4643	* @param cbModule The module size.
4644	* @param enmGuestOS The guest OS type.
4645	* @param cRegions The number of regions.
4646	* @param pszModuleName The module name.
4647	* @param pszVersion The module version.
4648	* @param paRegions The region descriptions.
4649	*/
4650	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4651	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4652	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4653	{
4654	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4655	pGblMod;
4656	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4657	{
4658	if (pGblMod->cbModule != cbModule)
4659	continue;
4660	if (pGblMod->enmGuestOS != enmGuestOS)
4661	continue;
4662	if (pGblMod->cRegions != cRegions)
4663	continue;
4664	if (strcmp(pGblMod->szName, pszModuleName))
4665	continue;
4666	if (strcmp(pGblMod->szVersion, pszVersion))
4667	continue;
4668
4669	uint32_t i;
4670	for (i = 0; i < cRegions; i++)
4671	{
4672	uint32_t off = paRegions[i].GCRegionAddr & GUEST_PAGE_OFFSET_MASK;
4673	if (pGblMod->aRegions[i].off != off)
4674	break;
4675
4676	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, GUEST_PAGE_SIZE);
4677	if (pGblMod->aRegions[i].cb != cb)
4678	break;
4679	}
4680
4681	if (i == cRegions)
4682	return pGblMod;
4683	}
4684
4685	return NULL;
4686	}
4687
4688
4689	/**
4690	* Creates a new global module.
4691	*
4692	* @returns VBox status code.
4693	* @param pGMM The GMM instance data.
4694	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4695	* @param cbModule The module size.
4696	* @param enmGuestOS The guest OS type.
4697	* @param cRegions The number of regions.
4698	* @param pszModuleName The module name.
4699	* @param pszVersion The module version.
4700	* @param paRegions The region descriptions.
4701	* @param ppGblMod Where to return the new module on success.
4702	*/
4703	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4704	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4705	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4706	{
4707	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4708	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4709	{
4710	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4711	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4712	}
4713
4714	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4715	if (!pGblMod)
4716	{
4717	Log(("gmmR0ShModNewGlobal: No memory\n"));
4718	return VERR_NO_MEMORY;
4719	}
4720
4721	pGblMod->Core.Key = uHash;
4722	pGblMod->cbModule = cbModule;
4723	pGblMod->cRegions = cRegions;
4724	pGblMod->cUsers = 1;
4725	pGblMod->enmGuestOS = enmGuestOS;
4726	strcpy(pGblMod->szName, pszModuleName);
4727	strcpy(pGblMod->szVersion, pszVersion);
4728
4729	for (uint32_t i = 0; i < cRegions; i++)
4730	{
4731	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4732	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & GUEST_PAGE_OFFSET_MASK;
4733	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4734	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, GUEST_PAGE_SIZE);
4735	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4736	}
4737
4738	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4739	Assert(fInsert); NOREF(fInsert);
4740	pGMM->cShareableModules++;
4741
4742	*ppGblMod = pGblMod;
4743	return VINF_SUCCESS;
4744	}
4745
4746
4747	/**
4748	* Deletes a global module which is no longer referenced by anyone.
4749	*
4750	* @param pGMM The GMM instance data.
4751	* @param pGblMod The module to delete.
4752	*/
4753	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4754	{
4755	Assert(pGblMod->cUsers == 0);
4756	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4757
4758	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4759	Assert(pvTest == pGblMod); NOREF(pvTest);
4760	pGMM->cShareableModules--;
4761
4762	uint32_t i = pGblMod->cRegions;
4763	while (i-- > 0)
4764	{
4765	if (pGblMod->aRegions[i].paidPages)
4766	{
4767	/* We don't doing anything to the pages as they are handled by the
4768	copy-on-write mechanism in PGM. */
4769	RTMemFree(pGblMod->aRegions[i].paidPages);
4770	pGblMod->aRegions[i].paidPages = NULL;
4771	}
4772	}
4773	RTMemFree(pGblMod);
4774	}
4775
4776
4777	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4778	PGMMSHAREDMODULEPERVM *ppRecVM)
4779	{
4780	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4781	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4782
4783	PGMMSHAREDMODULEPERVM pRecVM;
4784	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4785	if (!pRecVM)
4786	return VERR_NO_MEMORY;
4787
4788	pRecVM->Core.Key = GCBaseAddr;
4789	for (uint32_t i = 0; i < cRegions; i++)
4790	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4791
4792	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4793	Assert(fInsert); NOREF(fInsert);
4794	pGVM->gmm.s.Stats.cShareableModules++;
4795
4796	*ppRecVM = pRecVM;
4797	return VINF_SUCCESS;
4798	}
4799
4800
4801	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4802	{
4803	/*
4804	* Free the per-VM module.
4805	*/
4806	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4807	pRecVM->pGlobalModule = NULL;
4808
4809	if (fRemove)
4810	{
4811	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4812	Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4813	}
4814
4815	RTMemFree(pRecVM);
4816
4817	/*
4818	* Release the global module.
4819	* (In the registration bailout case, it might not be.)
4820	*/
4821	if (pGblMod)
4822	{
4823	Assert(pGblMod->cUsers > 0);
4824	pGblMod->cUsers--;
4825	if (pGblMod->cUsers == 0)
4826	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4827	}
4828	}
4829
4830	#endif /* VBOX_WITH_PAGE_SHARING */
4831
4832	/**
4833	* Registers a new shared module for the VM.
4834	*
4835	* @returns VBox status code.
4836	* @param pGVM The global (ring-0) VM structure.
4837	* @param idCpu The VCPU id.
4838	* @param enmGuestOS The guest OS type.
4839	* @param pszModuleName The module name.
4840	* @param pszVersion The module version.
4841	* @param GCPtrModBase The module base address.
4842	* @param cbModule The module size.
4843	* @param cRegions The mumber of shared region descriptors.
4844	* @param paRegions Pointer to an array of shared region(s).
4845	* @thread EMT(idCpu)
4846	*/
4847	GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4848	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4849	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4850	{
4851	#ifdef VBOX_WITH_PAGE_SHARING
4852	/*
4853	* Validate input and get the basics.
4854	*
4855	* Note! Turns out the module size does necessarily match the size of the
4856	* regions. (iTunes on XP)
4857	*/
4858	PGMM pGMM;
4859	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4860	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4861	if (RT_FAILURE(rc))
4862	return rc;
4863
4864	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4865	return VERR_GMM_TOO_MANY_REGIONS;
4866
4867	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4868	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4869
4870	uint32_t cbTotal = 0;
4871	for (uint32_t i = 0; i < cRegions; i++)
4872	{
4873	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4874	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4875
4876	cbTotal += paRegions[i].cbRegion;
4877	if (RT_UNLIKELY(cbTotal > _1G))
4878	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4879	}
4880
4881	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4882	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4883	return VERR_GMM_MODULE_NAME_TOO_LONG;
4884
4885	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4886	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4887	return VERR_GMM_MODULE_NAME_TOO_LONG;
4888
4889	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4890	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4891
4892	/*
4893	* Take the semaphore and do some more validations.
4894	*/
4895	gmmR0MutexAcquire(pGMM);
4896	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4897	{
4898	/*
4899	* Check if this module is already locally registered and register
4900	* it if it isn't. The base address is a unique module identifier
4901	* locally.
4902	*/
4903	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4904	bool fNewModule = pRecVM == NULL;
4905	if (fNewModule)
4906	{
4907	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4908	if (RT_SUCCESS(rc))
4909	{
4910	/*
4911	* Find a matching global module, register a new one if needed.
4912	*/
4913	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4914	pszModuleName, pszVersion, paRegions);
4915	if (!pGblMod)
4916	{
4917	Assert(fNewModule);
4918	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4919	pszModuleName, pszVersion, paRegions, &pGblMod);
4920	if (RT_SUCCESS(rc))
4921	{
4922	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4923	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4924	}
4925	else
4926	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4927	}
4928	else
4929	{
4930	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4931	pGblMod->cUsers++;
4932	pRecVM->pGlobalModule = pGblMod;
4933
4934	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4935	}
4936	}
4937	}
4938	else
4939	{
4940	/*
4941	* Attempt to re-register an existing module.
4942	*/
4943	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4944	pszModuleName, pszVersion, paRegions);
4945	if (pRecVM->pGlobalModule == pGblMod)
4946	{
4947	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4948	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4949	}
4950	else
4951	{
4952	/** @todo may have to unregister+register when this happens in case it's caused
4953	* by VBoxService crashing and being restarted... */
4954	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4955	" incoming at %RGvLB%#x %s %s rgns %u\n"
4956	" existing at %RGvLB%#x %s %s rgns %u\n",
4957	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4958	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4959	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4960	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4961	}
4962	}
4963	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4964	}
4965	else
4966	rc = VERR_GMM_IS_NOT_SANE;
4967
4968	gmmR0MutexRelease(pGMM);
4969	return rc;
4970	#else
4971
4972	NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4973	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4974	return VERR_NOT_IMPLEMENTED;
4975	#endif
4976	}
4977
4978
4979	/**
4980	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4981	*
4982	* @returns see GMMR0RegisterSharedModule.
4983	* @param pGVM The global (ring-0) VM structure.
4984	* @param idCpu The VCPU id.
4985	* @param pReq Pointer to the request packet.
4986	*/
4987	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4988	{
4989	/*
4990	* Validate input and pass it on.
4991	*/
4992	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4993	AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4994	&& pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4995	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4996
4997	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4998	pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4999	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
5000	return VINF_SUCCESS;
5001	}
5002
5003
5004	/**
5005	* Unregisters a shared module for the VM
5006	*
5007	* @returns VBox status code.
5008	* @param pGVM The global (ring-0) VM structure.
5009	* @param idCpu The VCPU id.
5010	* @param pszModuleName The module name.
5011	* @param pszVersion The module version.
5012	* @param GCPtrModBase The module base address.
5013	* @param cbModule The module size.
5014	*/
5015	GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
5016	RTGCPTR GCPtrModBase, uint32_t cbModule)
5017	{
5018	#ifdef VBOX_WITH_PAGE_SHARING
5019	/*
5020	* Validate input and get the basics.
5021	*/
5022	PGMM pGMM;
5023	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5024	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5025	if (RT_FAILURE(rc))
5026	return rc;
5027
5028	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
5029	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
5030	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
5031	return VERR_GMM_MODULE_NAME_TOO_LONG;
5032	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
5033	return VERR_GMM_MODULE_NAME_TOO_LONG;
5034
5035	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
5036
5037	/*
5038	* Take the semaphore and do some more validations.
5039	*/
5040	gmmR0MutexAcquire(pGMM);
5041	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5042	{
5043	/*
5044	* Locate and remove the specified module.
5045	*/
5046	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
5047	if (pRecVM)
5048	{
5049	/** @todo Do we need to do more validations here, like that the
5050	* name + version + cbModule matches? */
5051	NOREF(cbModule);
5052	Assert(pRecVM->pGlobalModule);
5053	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
5054	}
5055	else
5056	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
5057
5058	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5059	}
5060	else
5061	rc = VERR_GMM_IS_NOT_SANE;
5062
5063	gmmR0MutexRelease(pGMM);
5064	return rc;
5065	#else
5066
5067	NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
5068	return VERR_NOT_IMPLEMENTED;
5069	#endif
5070	}
5071
5072
5073	/**
5074	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
5075	*
5076	* @returns see GMMR0UnregisterSharedModule.
5077	* @param pGVM The global (ring-0) VM structure.
5078	* @param idCpu The VCPU id.
5079	* @param pReq Pointer to the request packet.
5080	*/
5081	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
5082	{
5083	/*
5084	* Validate input and pass it on.
5085	*/
5086	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5087	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5088
5089	return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
5090	}
5091
5092	#ifdef VBOX_WITH_PAGE_SHARING
5093
5094	/**
5095	* Increase the use count of a shared page, the page is known to exist and be valid and such.
5096	*
5097	* @param pGMM Pointer to the GMM instance.
5098	* @param pGVM Pointer to the GVM instance.
5099	* @param pPage The page structure.
5100	*/
5101	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
5102	{
5103	Assert(pGMM->cSharedPages > 0);
5104	Assert(pGMM->cAllocatedPages > 0);
5105
5106	pGMM->cDuplicatePages++;
5107
5108	pPage->Shared.cRefs++;
5109	pGVM->gmm.s.Stats.cSharedPages++;
5110	pGVM->gmm.s.Stats.Allocated.cBasePages++;
5111	}
5112
5113
5114	/**
5115	* Converts a private page to a shared page, the page is known to exist and be valid and such.
5116	*
5117	* @param pGMM Pointer to the GMM instance.
5118	* @param pGVM Pointer to the GVM instance.
5119	* @param HCPhys Host physical address
5120	* @param idPage The Page ID
5121	* @param pPage The page structure.
5122	* @param pPageDesc Shared page descriptor
5123	*/
5124	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5125	PGMMSHAREDPAGEDESC pPageDesc)
5126	{
5127	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5128	Assert(pChunk);
5129	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5130	Assert(GMM_PAGE_IS_PRIVATE(pPage));
5131
5132	pChunk->cPrivate--;
5133	pChunk->cShared++;
5134
5135	pGMM->cSharedPages++;
5136
5137	pGVM->gmm.s.Stats.cSharedPages++;
5138	pGVM->gmm.s.Stats.cPrivatePages--;
5139
5140	/* Modify the page structure. */
5141	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> GUEST_PAGE_SHIFT);
5142	pPage->Shared.cRefs = 1;
5143	#ifdef VBOX_STRICT
5144	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5145	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5146	#else
5147	NOREF(pPageDesc);
5148	pPage->Shared.u14Checksum = 0;
5149	#endif
5150	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5151	}
5152
5153
5154	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5155	unsigned idxRegion, unsigned idxPage,
5156	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5157	{
5158	NOREF(pModule);
5159
5160	/* Easy case: just change the internal page type. */
5161	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5162	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5163	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5164	VERR_PGM_PHYS_INVALID_PAGE_ID);
5165	NOREF(idxRegion);
5166
5167	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5168
5169	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5170
5171	/* Keep track of these references. */
5172	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5173
5174	return VINF_SUCCESS;
5175	}
5176
5177	/**
5178	* Checks specified shared module range for changes
5179	*
5180	* Performs the following tasks:
5181	* - If a shared page is new, then it changes the GMM page type to shared and
5182	* returns it in the pPageDesc descriptor.
5183	* - If a shared page already exists, then it checks if the VM page is
5184	* identical and if so frees the VM page and returns the shared page in
5185	* pPageDesc descriptor.
5186	*
5187	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
5188	*
5189	* @returns VBox status code.
5190	* @param pGVM Pointer to the GVM instance data.
5191	* @param pModule Module description
5192	* @param idxRegion Region index
5193	* @param idxPage Page index
5194	* @param pPageDesc Page descriptor
5195	*/
5196	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5197	PGMMSHAREDPAGEDESC pPageDesc)
5198	{
5199	int rc;
5200	PGMM pGMM;
5201	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5202	pPageDesc->u32StrictChecksum = 0;
5203
5204	AssertMsgReturn(idxRegion < pModule->cRegions,
5205	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5206	VERR_INVALID_PARAMETER);
5207
5208	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> GUEST_PAGE_SHIFT;
5209	AssertMsgReturn(idxPage < cPages,
5210	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5211	VERR_INVALID_PARAMETER);
5212
5213	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5214
5215	/*
5216	* First time; create a page descriptor array.
5217	*/
5218	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5219	if (!pGlobalRegion->paidPages)
5220	{
5221	Log(("Allocate page descriptor array for %d pages\n", cPages));
5222	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
5223	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5224
5225	/* Invalidate all descriptors. */
5226	uint32_t i = cPages;
5227	while (i-- > 0)
5228	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5229	}
5230
5231	/*
5232	* We've seen this shared page for the first time?
5233	*/
5234	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5235	{
5236	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5237	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5238	}
5239
5240	/*
5241	* We've seen it before...
5242	*/
5243	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5244	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5245	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5246
5247	/*
5248	* Get the shared page source.
5249	*/
5250	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5251	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5252	VERR_PGM_PHYS_INVALID_PAGE_ID);
5253
5254	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5255	{
5256	/*
5257	* Page was freed at some point; invalidate this entry.
5258	*/
5259	/** @todo this isn't really bullet proof. */
5260	Log(("Old shared page was freed -> create a new one\n"));
5261	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5262	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5263	}
5264
5265	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << GUEST_PAGE_SHIFT));
5266
5267	/*
5268	* Calculate the virtual address of the local page.
5269	*/
5270	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5271	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5272	VERR_PGM_PHYS_INVALID_PAGE_ID);
5273
5274	uint8_t *pbChunk;
5275	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5276	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5277	VERR_PGM_PHYS_INVALID_PAGE_ID);
5278	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
5279
5280	/*
5281	* Calculate the virtual address of the shared page.
5282	*/
5283	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5284	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5285
5286	/*
5287	* Get the virtual address of the physical page; map the chunk into the VM
5288	* process if not already done.
5289	*/
5290	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5291	{
5292	Log(("Map chunk into process!\n"));
5293	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5294	AssertRCReturn(rc, rc);
5295	}
5296	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
5297
5298	#ifdef VBOX_STRICT
5299	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, GUEST_PAGE_SIZE);
5300	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5301	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
5302	("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5303	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5304	#endif
5305
5306	if (memcmp(pbSharedPage, pbLocalPage, GUEST_PAGE_SIZE))
5307	{
5308	Log(("Unexpected differences found between local and shared page; skip\n"));
5309	/* Signal to the caller that this one hasn't changed. */
5310	pPageDesc->idPage = NIL_GMM_PAGEID;
5311	return VINF_SUCCESS;
5312	}
5313
5314	/*
5315	* Free the old local page.
5316	*/
5317	GMMFREEPAGEDESC PageDesc;
5318	PageDesc.idPage = pPageDesc->idPage;
5319	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5320	AssertRCReturn(rc, rc);
5321
5322	gmmR0UseSharedPage(pGMM, pGVM, pPage);
5323
5324	/*
5325	* Pass along the new physical address & page id.
5326	*/
5327	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << GUEST_PAGE_SHIFT;
5328	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5329
5330	return VINF_SUCCESS;
5331	}
5332
5333
5334	/**
5335	* RTAvlGCPtrDestroy callback.
5336	*
5337	* @returns 0 or VERR_GMM_INSTANCE.
5338	* @param pNode The node to destroy.
5339	* @param pvArgs Pointer to an argument packet.
5340	*/
5341	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5342	{
5343	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5344	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5345	(PGMMSHAREDMODULEPERVM)pNode,
5346	false /fRemove/);
5347	return VINF_SUCCESS;
5348	}
5349
5350
5351	/**
5352	* Used by GMMR0CleanupVM to clean up shared modules.
5353	*
5354	* This is called without taking the GMM lock so that it can be yielded as
5355	* needed here.
5356	*
5357	* @param pGMM The GMM handle.
5358	* @param pGVM The global VM handle.
5359	*/
5360	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5361	{
5362	gmmR0MutexAcquire(pGMM);
5363	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5364
5365	GMMR0SHMODPERVMDTORARGS Args;
5366	Args.pGVM = pGVM;
5367	Args.pGMM = pGMM;
5368	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5369
5370	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5371	pGVM->gmm.s.Stats.cShareableModules = 0;
5372
5373	gmmR0MutexRelease(pGMM);
5374	}
5375
5376	#endif /* VBOX_WITH_PAGE_SHARING */
5377
5378	/**
5379	* Removes all shared modules for the specified VM
5380	*
5381	* @returns VBox status code.
5382	* @param pGVM The global (ring-0) VM structure.
5383	* @param idCpu The VCPU id.
5384	*/
5385	GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5386	{
5387	#ifdef VBOX_WITH_PAGE_SHARING
5388	/*
5389	* Validate input and get the basics.
5390	*/
5391	PGMM pGMM;
5392	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5393	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5394	if (RT_FAILURE(rc))
5395	return rc;
5396
5397	/*
5398	* Take the semaphore and do some more validations.
5399	*/
5400	gmmR0MutexAcquire(pGMM);
5401	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5402	{
5403	Log(("GMMR0ResetSharedModules\n"));
5404	GMMR0SHMODPERVMDTORARGS Args;
5405	Args.pGVM = pGVM;
5406	Args.pGMM = pGMM;
5407	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5408	pGVM->gmm.s.Stats.cShareableModules = 0;
5409
5410	rc = VINF_SUCCESS;
5411	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5412	}
5413	else
5414	rc = VERR_GMM_IS_NOT_SANE;
5415
5416	gmmR0MutexRelease(pGMM);
5417	return rc;
5418	#else
5419	RT_NOREF(pGVM, idCpu);
5420	return VERR_NOT_IMPLEMENTED;
5421	#endif
5422	}
5423
5424	#ifdef VBOX_WITH_PAGE_SHARING
5425
5426	/**
5427	* Tree enumeration callback for checking a shared module.
5428	*/
5429	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5430	{
5431	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5432	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5433	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5434
5435	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5436	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5437
5438	int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5439	if (RT_FAILURE(rc))
5440	return rc;
5441	return VINF_SUCCESS;
5442	}
5443
5444	#endif /* VBOX_WITH_PAGE_SHARING */
5445
5446	/**
5447	* Check all shared modules for the specified VM.
5448	*
5449	* @returns VBox status code.
5450	* @param pGVM The global (ring-0) VM structure.
5451	* @param idCpu The calling EMT number.
5452	* @thread EMT(idCpu)
5453	*/
5454	GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5455	{
5456	#ifdef VBOX_WITH_PAGE_SHARING
5457	/*
5458	* Validate input and get the basics.
5459	*/
5460	PGMM pGMM;
5461	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5462	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5463	if (RT_FAILURE(rc))
5464	return rc;
5465
5466	# ifndef DEBUG_sandervl
5467	/*
5468	* Take the semaphore and do some more validations.
5469	*/
5470	gmmR0MutexAcquire(pGMM);
5471	# endif
5472	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5473	{
5474	/*
5475	* Walk the tree, checking each module.
5476	*/
5477	Log(("GMMR0CheckSharedModules\n"));
5478
5479	GMMCHECKSHAREDMODULEINFO Args;
5480	Args.pGVM = pGVM;
5481	Args.idCpu = idCpu;
5482	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5483
5484	Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5485	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5486	}
5487	else
5488	rc = VERR_GMM_IS_NOT_SANE;
5489
5490	# ifndef DEBUG_sandervl
5491	gmmR0MutexRelease(pGMM);
5492	# endif
5493	return rc;
5494	#else
5495	RT_NOREF(pGVM, idCpu);
5496	return VERR_NOT_IMPLEMENTED;
5497	#endif
5498	}
5499
5500	#ifdef VBOX_STRICT
5501
5502	/**
5503	* Worker for GMMR0FindDuplicatePageReq.
5504	*
5505	* @returns true if duplicate, false if not.
5506	*/
5507	static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5508	{
5509	bool fFoundDuplicate = false;
5510	/* Only take chunks not mapped into this VM process; not entirely correct. */
5511	uint8_t *pbChunk;
5512	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5513	{
5514	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5515	if (RT_SUCCESS(rc))
5516	{
5517	/*
5518	* Look for duplicate pages
5519	*/
5520	uintptr_t iPage = GMM_CHUNK_NUM_PAGES;
5521	while (iPage-- > 0)
5522	{
5523	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5524	{
5525	uint8_t *pbDestPage = pbChunk + (iPage << GUEST_PAGE_SHIFT);
5526	if (!memcmp(pbSourcePage, pbDestPage, GUEST_PAGE_SIZE))
5527	{
5528	fFoundDuplicate = true;
5529	break;
5530	}
5531	}
5532	}
5533	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5534	}
5535	}
5536	return fFoundDuplicate;
5537	}
5538
5539
5540	/**
5541	* Find a duplicate of the specified page in other active VMs
5542	*
5543	* @returns VBox status code.
5544	* @param pGVM The global (ring-0) VM structure.
5545	* @param pReq Pointer to the request packet.
5546	*/
5547	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5548	{
5549	/*
5550	* Validate input and pass it on.
5551	*/
5552	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5553	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5554
5555	PGMM pGMM;
5556	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5557
5558	int rc = GVMMR0ValidateGVM(pGVM);
5559	if (RT_FAILURE(rc))
5560	return rc;
5561
5562	/*
5563	* Take the semaphore and do some more validations.
5564	*/
5565	rc = gmmR0MutexAcquire(pGMM);
5566	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5567	{
5568	uint8_t *pbChunk;
5569	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5570	if (pChunk)
5571	{
5572	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5573	{
5574	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
5575	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5576	if (pPage)
5577	{
5578	/*
5579	* Walk the chunks
5580	*/
5581	pReq->fDuplicate = false;
5582	RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5583	{
5584	if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5585	{
5586	pReq->fDuplicate = true;
5587	break;
5588	}
5589	}
5590	}
5591	else
5592	{
5593	AssertFailed();
5594	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5595	}
5596	}
5597	else
5598	AssertFailed();
5599	}
5600	else
5601	AssertFailed();
5602	}
5603	else
5604	rc = VERR_GMM_IS_NOT_SANE;
5605
5606	gmmR0MutexRelease(pGMM);
5607	return rc;
5608	}
5609
5610	#endif /* VBOX_STRICT */
5611
5612
5613	/**
5614	* Retrieves the GMM statistics visible to the caller.
5615	*
5616	* @returns VBox status code.
5617	*
5618	* @param pStats Where to put the statistics.
5619	* @param pSession The current session.
5620	* @param pGVM The GVM to obtain statistics for. Optional.
5621	*/
5622	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5623	{
5624	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5625
5626	/*
5627	* Validate input.
5628	*/
5629	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5630	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5631	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5632
5633	PGMM pGMM;
5634	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5635
5636	/*
5637	* Validate the VM handle, if not NULL, and lock the GMM.
5638	*/
5639	int rc;
5640	if (pGVM)
5641	{
5642	rc = GVMMR0ValidateGVM(pGVM);
5643	if (RT_FAILURE(rc))
5644	return rc;
5645	}
5646
5647	rc = gmmR0MutexAcquire(pGMM);
5648	if (RT_FAILURE(rc))
5649	return rc;
5650
5651	/*
5652	* Copy out the GMM statistics.
5653	*/
5654	pStats->cMaxPages = pGMM->cMaxPages;
5655	pStats->cReservedPages = pGMM->cReservedPages;
5656	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5657	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5658	pStats->cSharedPages = pGMM->cSharedPages;
5659	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5660	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5661	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5662	pStats->cChunks = pGMM->cChunks;
5663	pStats->cFreedChunks = pGMM->cFreedChunks;
5664	pStats->cShareableModules = pGMM->cShareableModules;
5665	pStats->idFreeGeneration = pGMM->idFreeGeneration;
5666	RT_ZERO(pStats->au64Reserved);
5667
5668	/*
5669	* Copy out the VM statistics.
5670	*/
5671	if (pGVM)
5672	pStats->VMStats = pGVM->gmm.s.Stats;
5673	else
5674	RT_ZERO(pStats->VMStats);
5675
5676	gmmR0MutexRelease(pGMM);
5677	return rc;
5678	}
5679
5680
5681	/**
5682	* VMMR0 request wrapper for GMMR0QueryStatistics.
5683	*
5684	* @returns see GMMR0QueryStatistics.
5685	* @param pGVM The global (ring-0) VM structure. Optional.
5686	* @param pReq Pointer to the request packet.
5687	*/
5688	GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5689	{
5690	/*
5691	* Validate input and pass it on.
5692	*/
5693	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5694	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5695
5696	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5697	}
5698
5699
5700	/**
5701	* Resets the specified GMM statistics.
5702	*
5703	* @returns VBox status code.
5704	*
5705	* @param pStats Which statistics to reset, that is, non-zero fields
5706	* indicates which to reset.
5707	* @param pSession The current session.
5708	* @param pGVM The GVM to reset statistics for. Optional.
5709	*/
5710	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5711	{
5712	NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5713	/* Currently nothing we can reset at the moment. */
5714	return VINF_SUCCESS;
5715	}
5716
5717
5718	/**
5719	* VMMR0 request wrapper for GMMR0ResetStatistics.
5720	*
5721	* @returns see GMMR0ResetStatistics.
5722	* @param pGVM The global (ring-0) VM structure. Optional.
5723	* @param pReq Pointer to the request packet.
5724	*/
5725	GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5726	{
5727	/*
5728	* Validate input and pass it on.
5729	*/
5730	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5731	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5732
5733	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5734	}
5735

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 95248

Download in other formats: