GMMR0.cpp@ 46543

Last change on this file since 46543 was 44716, checked in by vboxsync, 12 years ago
GMMR0: Adjusting the allocation strategy to go look for foreign memory if there are enough free pages around (>= 32 MB).
Property svn:eol-style set to `native` Property svn:keywords set to `Id Revision`
File size: 188.3 KB

Line
1	/* $Id: GMMR0.cpp 44716 2013-02-15 14:38:53Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2013 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relation ship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	*
111	* @subsection sub_gmm_locking Serializing
112	*
113	* One simple fast mutex will be employed in the initial implementation, not
114	* two as mentioned in @ref subsec_pgmPhys_Serializing.
115	*
116	* @see @ref subsec_pgmPhys_Serializing
117	*
118	*
119	* @section sec_gmm_overcommit Memory Over-Commitment Management
120	*
121	* The GVM will have to do the system wide memory over-commitment
122	* management. My current ideas are:
123	* - Per VM oc policy that indicates how much to initially commit
124	* to it and what to do in a out-of-memory situation.
125	* - Prevent overtaxing the host.
126	*
127	* There are some challenges here, the main ones are configurability and
128	* security. Should we for instance permit anyone to request 100% memory
129	* commitment? Who should be allowed to do runtime adjustments of the
130	* config. And how to prevent these settings from being lost when the last
131	* VM process exits? The solution is probably to have an optional root
132	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
133	*
134	*
135	*
136	* @section sec_gmm_numa NUMA
137	*
138	* NUMA considerations will be designed and implemented a bit later.
139	*
140	* The preliminary guesses is that we will have to try allocate memory as
141	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
142	* threads). Which means it's mostly about allocation and sharing policies.
143	* Both the scheduler and allocator interface will to supply some NUMA info
144	* and we'll need to have a way to calc access costs.
145	*
146	*/
147
148
149	/*******************************************************************************
150	* Header Files *
151	*******************************************************************************/
152	#define LOG_GROUP LOG_GROUP_GMM
153	#include <VBox/rawpci.h>
154	#include <VBox/vmm/vm.h>
155	#include <VBox/vmm/gmm.h>
156	#include "GMMR0Internal.h"
157	#include <VBox/vmm/gvm.h>
158	#include <VBox/vmm/pgm.h>
159	#include <VBox/log.h>
160	#include <VBox/param.h>
161	#include <VBox/err.h>
162	#include <iprt/asm.h>
163	#include <iprt/avl.h>
164	#ifdef VBOX_STRICT
165	# include <iprt/crc.h>
166	#endif
167	#include <iprt/list.h>
168	#include <iprt/mem.h>
169	#include <iprt/memobj.h>
170	#include <iprt/mp.h>
171	#include <iprt/semaphore.h>
172	#include <iprt/string.h>
173	#include <iprt/time.h>
174
175
176	/*******************************************************************************
177	* Structures and Typedefs *
178	*******************************************************************************/
179	/** Pointer to set of free chunks. */
180	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
181
182	/**
183	* The per-page tracking structure employed by the GMM.
184	*
185	* On 32-bit hosts we'll some trickery is necessary to compress all
186	* the information into 32-bits. When the fSharedFree member is set,
187	* the 30th bit decides whether it's a free page or not.
188	*
189	* Because of the different layout on 32-bit and 64-bit hosts, macros
190	* are used to get and set some of the data.
191	*/
192	typedef union GMMPAGE
193	{
194	#if HC_ARCH_BITS == 64
195	/** Unsigned integer view. */
196	uint64_t u;
197
198	/** The common view. */
199	struct GMMPAGECOMMON
200	{
201	uint32_t uStuff1 : 32;
202	uint32_t uStuff2 : 30;
203	/** The page state. */
204	uint32_t u2State : 2;
205	} Common;
206
207	/** The view of a private page. */
208	struct GMMPAGEPRIVATE
209	{
210	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
211	uint32_t pfn;
212	/** The GVM handle. (64K VMs) */
213	uint32_t hGVM : 16;
214	/** Reserved. */
215	uint32_t u16Reserved : 14;
216	/** The page state. */
217	uint32_t u2State : 2;
218	} Private;
219
220	/** The view of a shared page. */
221	struct GMMPAGESHARED
222	{
223	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
224	uint32_t pfn;
225	/** The reference count (64K VMs). */
226	uint32_t cRefs : 16;
227	/** Used for debug checksumming. */
228	uint32_t u14Checksum : 14;
229	/** The page state. */
230	uint32_t u2State : 2;
231	} Shared;
232
233	/** The view of a free page. */
234	struct GMMPAGEFREE
235	{
236	/** The index of the next page in the free list. UINT16_MAX is NIL. */
237	uint16_t iNext;
238	/** Reserved. Checksum or something? */
239	uint16_t u16Reserved0;
240	/** Reserved. Checksum or something? */
241	uint32_t u30Reserved1 : 30;
242	/** The page state. */
243	uint32_t u2State : 2;
244	} Free;
245
246	#else /* 32-bit */
247	/** Unsigned integer view. */
248	uint32_t u;
249
250	/** The common view. */
251	struct GMMPAGECOMMON
252	{
253	uint32_t uStuff : 30;
254	/** The page state. */
255	uint32_t u2State : 2;
256	} Common;
257
258	/** The view of a private page. */
259	struct GMMPAGEPRIVATE
260	{
261	/** The guest page frame number. (Max addressable: 2 ^ 36) */
262	uint32_t pfn : 24;
263	/** The GVM handle. (127 VMs) */
264	uint32_t hGVM : 7;
265	/** The top page state bit, MBZ. */
266	uint32_t fZero : 1;
267	} Private;
268
269	/** The view of a shared page. */
270	struct GMMPAGESHARED
271	{
272	/** The reference count. */
273	uint32_t cRefs : 30;
274	/** The page state. */
275	uint32_t u2State : 2;
276	} Shared;
277
278	/** The view of a free page. */
279	struct GMMPAGEFREE
280	{
281	/** The index of the next page in the free list. UINT16_MAX is NIL. */
282	uint32_t iNext : 16;
283	/** Reserved. Checksum or something? */
284	uint32_t u14Reserved : 14;
285	/** The page state. */
286	uint32_t u2State : 2;
287	} Free;
288	#endif
289	} GMMPAGE;
290	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
291	/** Pointer to a GMMPAGE. */
292	typedef GMMPAGE *PGMMPAGE;
293
294
295	/** @name The Page States.
296	* @{ */
297	/** A private page. */
298	#define GMM_PAGE_STATE_PRIVATE 0
299	/** A private page - alternative value used on the 32-bit implementation.
300	* This will never be used on 64-bit hosts. */
301	#define GMM_PAGE_STATE_PRIVATE_32 1
302	/** A shared page. */
303	#define GMM_PAGE_STATE_SHARED 2
304	/** A free page. */
305	#define GMM_PAGE_STATE_FREE 3
306	/** @} */
307
308
309	/** @def GMM_PAGE_IS_PRIVATE
310	*
311	* @returns true if private, false if not.
312	* @param pPage The GMM page.
313	*/
314	#if HC_ARCH_BITS == 64
315	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
316	#else
317	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
318	#endif
319
320	/** @def GMM_PAGE_IS_SHARED
321	*
322	* @returns true if shared, false if not.
323	* @param pPage The GMM page.
324	*/
325	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
326
327	/** @def GMM_PAGE_IS_FREE
328	*
329	* @returns true if free, false if not.
330	* @param pPage The GMM page.
331	*/
332	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
333
334	/** @def GMM_PAGE_PFN_LAST
335	* The last valid guest pfn range.
336	* @remark Some of the values outside the range has special meaning,
337	* see GMM_PAGE_PFN_UNSHAREABLE.
338	*/
339	#if HC_ARCH_BITS == 64
340	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
341	#else
342	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
343	#endif
344	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
345
346	/** @def GMM_PAGE_PFN_UNSHAREABLE
347	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
348	*/
349	#if HC_ARCH_BITS == 64
350	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
351	#else
352	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
353	#endif
354	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
355
356
357	/**
358	* A GMM allocation chunk ring-3 mapping record.
359	*
360	* This should really be associated with a session and not a VM, but
361	* it's simpler to associated with a VM and cleanup with the VM object
362	* is destroyed.
363	*/
364	typedef struct GMMCHUNKMAP
365	{
366	/** The mapping object. */
367	RTR0MEMOBJ hMapObj;
368	/** The VM owning the mapping. */
369	PGVM pGVM;
370	} GMMCHUNKMAP;
371	/** Pointer to a GMM allocation chunk mapping. */
372	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
373
374
375	/**
376	* A GMM allocation chunk.
377	*/
378	typedef struct GMMCHUNK
379	{
380	/** The AVL node core.
381	* The Key is the chunk ID. (Giant mtx.) */
382	AVLU32NODECORE Core;
383	/** The memory object.
384	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
385	* what the host can dish up with. (Chunk mtx protects mapping accesses
386	* and related frees.) */
387	RTR0MEMOBJ hMemObj;
388	/** Pointer to the next chunk in the free list. (Giant mtx.) */
389	PGMMCHUNK pFreeNext;
390	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
391	PGMMCHUNK pFreePrev;
392	/** Pointer to the free set this chunk belongs to. NULL for
393	* chunks with no free pages. (Giant mtx.) */
394	PGMMCHUNKFREESET pSet;
395	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
396	RTLISTNODE ListNode;
397	/** Pointer to an array of mappings. (Chunk mtx.) */
398	PGMMCHUNKMAP paMappingsX;
399	/** The number of mappings. (Chunk mtx.) */
400	uint16_t cMappingsX;
401	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
402	* mapping or freeing anything. (Giant mtx.) */
403	uint8_t volatile iChunkMtx;
404	/** Flags field reserved for future use (like eliminating enmType).
405	* (Giant mtx.) */
406	uint8_t fFlags;
407	/** The head of the list of free pages. UINT16_MAX is the NIL value.
408	* (Giant mtx.) */
409	uint16_t iFreeHead;
410	/** The number of free pages. (Giant mtx.) */
411	uint16_t cFree;
412	/** The GVM handle of the VM that first allocated pages from this chunk, this
413	* is used as a preference when there are several chunks to choose from.
414	* When in bound memory mode this isn't a preference any longer. (Giant
415	* mtx.) */
416	uint16_t hGVM;
417	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
418	* future use.) (Giant mtx.) */
419	uint16_t idNumaNode;
420	/** The number of private pages. (Giant mtx.) */
421	uint16_t cPrivate;
422	/** The number of shared pages. (Giant mtx.) */
423	uint16_t cShared;
424	/** The pages. (Giant mtx.) */
425	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
426	} GMMCHUNK;
427
428	/** Indicates that the NUMA properies of the memory is unknown. */
429	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
430
431	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
432	* @{ */
433	/** Indicates that the chunk is a large page (2MB). */
434	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
435	/** @} */
436
437
438	/**
439	* An allocation chunk TLB entry.
440	*/
441	typedef struct GMMCHUNKTLBE
442	{
443	/** The chunk id. */
444	uint32_t idChunk;
445	/** Pointer to the chunk. */
446	PGMMCHUNK pChunk;
447	} GMMCHUNKTLBE;
448	/** Pointer to an allocation chunk TLB entry. */
449	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
450
451
452	/** The number of entries tin the allocation chunk TLB. */
453	#define GMM_CHUNKTLB_ENTRIES 32
454	/** Gets the TLB entry index for the given Chunk ID. */
455	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
456
457	/**
458	* An allocation chunk TLB.
459	*/
460	typedef struct GMMCHUNKTLB
461	{
462	/** The TLB entries. */
463	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
464	} GMMCHUNKTLB;
465	/** Pointer to an allocation chunk TLB. */
466	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
467
468
469	/**
470	* The GMM instance data.
471	*/
472	typedef struct GMM
473	{
474	/** Magic / eye catcher. GMM_MAGIC */
475	uint32_t u32Magic;
476	/** The number of threads waiting on the mutex. */
477	uint32_t cMtxContenders;
478	/** The fast mutex protecting the GMM.
479	* More fine grained locking can be implemented later if necessary. */
480	RTSEMFASTMUTEX hMtx;
481	#ifdef VBOX_STRICT
482	/** The current mutex owner. */
483	RTNATIVETHREAD hMtxOwner;
484	#endif
485	/** The chunk tree. */
486	PAVLU32NODECORE pChunks;
487	/** The chunk TLB. */
488	GMMCHUNKTLB ChunkTLB;
489	/** The private free set. */
490	GMMCHUNKFREESET PrivateX;
491	/** The shared free set. */
492	GMMCHUNKFREESET Shared;
493
494	/** Shared module tree (global).
495	* @todo separate trees for distinctly different guest OSes. */
496	PAVLLU32NODECORE pGlobalSharedModuleTree;
497	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
498	uint32_t cShareableModules;
499
500	/** The chunk list. For simplifying the cleanup process. */
501	RTLISTANCHOR ChunkList;
502
503	/** The maximum number of pages we're allowed to allocate.
504	* @gcfgm 64-bit GMM/MaxPages Direct.
505	* @gcfgm 32-bit GMM/PctPages Relative to the number of host pages. */
506	uint64_t cMaxPages;
507	/** The number of pages that has been reserved.
508	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
509	uint64_t cReservedPages;
510	/** The number of pages that we have over-committed in reservations. */
511	uint64_t cOverCommittedPages;
512	/** The number of actually allocated (committed if you like) pages. */
513	uint64_t cAllocatedPages;
514	/** The number of pages that are shared. A subset of cAllocatedPages. */
515	uint64_t cSharedPages;
516	/** The number of pages that are actually shared between VMs. */
517	uint64_t cDuplicatePages;
518	/** The number of pages that are shared that has been left behind by
519	* VMs not doing proper cleanups. */
520	uint64_t cLeftBehindSharedPages;
521	/** The number of allocation chunks.
522	* (The number of pages we've allocated from the host can be derived from this.) */
523	uint32_t cChunks;
524	/** The number of current ballooned pages. */
525	uint64_t cBalloonedPages;
526
527	/** The legacy allocation mode indicator.
528	* This is determined at initialization time. */
529	bool fLegacyAllocationMode;
530	/** The bound memory mode indicator.
531	* When set, the memory will be bound to a specific VM and never
532	* shared. This is always set if fLegacyAllocationMode is set.
533	* (Also determined at initialization time.) */
534	bool fBoundMemoryMode;
535	/** The number of registered VMs. */
536	uint16_t cRegisteredVMs;
537
538	/** The number of freed chunks ever. This is used a list generation to
539	* avoid restarting the cleanup scanning when the list wasn't modified. */
540	uint32_t cFreedChunks;
541	/** The previous allocated Chunk ID.
542	* Used as a hint to avoid scanning the whole bitmap. */
543	uint32_t idChunkPrev;
544	/** Chunk ID allocation bitmap.
545	* Bits of allocated IDs are set, free ones are clear.
546	* The NIL id (0) is marked allocated. */
547	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
548
549	/** The index of the next mutex to use. */
550	uint32_t iNextChunkMtx;
551	/** Chunk locks for reducing lock contention without having to allocate
552	* one lock per chunk. */
553	struct
554	{
555	/** The mutex */
556	RTSEMFASTMUTEX hMtx;
557	/** The number of threads currently using this mutex. */
558	uint32_t volatile cUsers;
559	} aChunkMtx[64];
560	} GMM;
561	/** Pointer to the GMM instance. */
562	typedef GMM *PGMM;
563
564	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
565	#define GMM_MAGIC UINT32_C(0x19540414)
566
567
568	/**
569	* GMM chunk mutex state.
570	*
571	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
572	* gmmR0ChunkMutex* methods.
573	*/
574	typedef struct GMMR0CHUNKMTXSTATE
575	{
576	PGMM pGMM;
577	/** The index of the chunk mutex. */
578	uint8_t iChunkMtx;
579	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
580	uint8_t fFlags;
581	} GMMR0CHUNKMTXSTATE;
582	/** Pointer to a chunk mutex state. */
583	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
584
585	/** @name GMMR0CHUNK_MTX_XXX
586	* @{ */
587	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
588	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
589	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
590	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
591	#define GMMR0CHUNK_MTX_END UINT32_C(4)
592	/** @} */
593
594
595	/** The maximum number of shared modules per-vm. */
596	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
597	/** The maximum number of shared modules GMM is allowed to track. */
598	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
599
600
601	/**
602	* Argument packet for gmmR0SharedModuleCleanup.
603	*/
604	typedef struct GMMR0SHMODPERVMDTORARGS
605	{
606	PGVM pGVM;
607	PGMM pGMM;
608	} GMMR0SHMODPERVMDTORARGS;
609
610	/**
611	* Argument packet for gmmR0CheckSharedModule.
612	*/
613	typedef struct GMMCHECKSHAREDMODULEINFO
614	{
615	PGVM pGVM;
616	VMCPUID idCpu;
617	} GMMCHECKSHAREDMODULEINFO;
618
619	/**
620	* Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
621	*/
622	typedef struct GMMFINDDUPPAGEINFO
623	{
624	PGVM pGVM;
625	PGMM pGMM;
626	uint8_t *pSourcePage;
627	bool fFoundDuplicate;
628	} GMMFINDDUPPAGEINFO;
629
630
631	/*******************************************************************************
632	* Global Variables *
633	*******************************************************************************/
634	/** Pointer to the GMM instance data. */
635	static PGMM g_pGMM = NULL;
636
637	/** Macro for obtaining and validating the g_pGMM pointer.
638	*
639	* On failure it will return from the invoking function with the specified
640	* return value.
641	*
642	* @param pGMM The name of the pGMM variable.
643	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
644	* status codes.
645	*/
646	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
647	do { \
648	(pGMM) = g_pGMM; \
649	AssertPtrReturn((pGMM), (rc)); \
650	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
651	} while (0)
652
653	/** Macro for obtaining and validating the g_pGMM pointer, void function
654	* variant.
655	*
656	* On failure it will return from the invoking function.
657	*
658	* @param pGMM The name of the pGMM variable.
659	*/
660	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
661	do { \
662	(pGMM) = g_pGMM; \
663	AssertPtrReturnVoid((pGMM)); \
664	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
665	} while (0)
666
667
668	/** @def GMM_CHECK_SANITY_UPON_ENTERING
669	* Checks the sanity of the GMM instance data before making changes.
670	*
671	* This is macro is a stub by default and must be enabled manually in the code.
672	*
673	* @returns true if sane, false if not.
674	* @param pGMM The name of the pGMM variable.
675	*/
676	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
677	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
678	#else
679	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
680	#endif
681
682	/** @def GMM_CHECK_SANITY_UPON_LEAVING
683	* Checks the sanity of the GMM instance data after making changes.
684	*
685	* This is macro is a stub by default and must be enabled manually in the code.
686	*
687	* @returns true if sane, false if not.
688	* @param pGMM The name of the pGMM variable.
689	*/
690	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
691	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
692	#else
693	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
694	#endif
695
696	/** @def GMM_CHECK_SANITY_IN_LOOPS
697	* Checks the sanity of the GMM instance in the allocation loops.
698	*
699	* This is macro is a stub by default and must be enabled manually in the code.
700	*
701	* @returns true if sane, false if not.
702	* @param pGMM The name of the pGMM variable.
703	*/
704	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
705	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
706	#else
707	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
708	#endif
709
710
711	/*******************************************************************************
712	* Internal Functions *
713	*******************************************************************************/
714	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
715	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
716	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
717	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
718	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
719	#ifdef GMMR0_WITH_SANITY_CHECK
720	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
721	#endif
722	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
723	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
724	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
725	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
726	#ifdef VBOX_WITH_PAGE_SHARING
727	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
728	# ifdef VBOX_STRICT
729	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
730	# endif
731	#endif
732
733
734
735	/**
736	* Initializes the GMM component.
737	*
738	* This is called when the VMMR0.r0 module is loaded and protected by the
739	* loader semaphore.
740	*
741	* @returns VBox status code.
742	*/
743	GMMR0DECL(int) GMMR0Init(void)
744	{
745	LogFlow(("GMMInit:\n"));
746
747	/*
748	* Allocate the instance data and the locks.
749	*/
750	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
751	if (!pGMM)
752	return VERR_NO_MEMORY;
753
754	pGMM->u32Magic = GMM_MAGIC;
755	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
756	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
757	RTListInit(&pGMM->ChunkList);
758	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
759
760	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
761	if (RT_SUCCESS(rc))
762	{
763	unsigned iMtx;
764	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
765	{
766	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
767	if (RT_FAILURE(rc))
768	break;
769	}
770	if (RT_SUCCESS(rc))
771	{
772	/*
773	* Check and see if RTR0MemObjAllocPhysNC works.
774	*/
775	#if 0 /* later, see @bufref{3170}. */
776	RTR0MEMOBJ MemObj;
777	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
778	if (RT_SUCCESS(rc))
779	{
780	rc = RTR0MemObjFree(MemObj, true);
781	AssertRC(rc);
782	}
783	else if (rc == VERR_NOT_SUPPORTED)
784	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
785	else
786	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
787	#else
788	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
789	pGMM->fLegacyAllocationMode = false;
790	# if ARCH_BITS == 32
791	/* Don't reuse possibly partial chunks because of the virtual
792	address space limitation. */
793	pGMM->fBoundMemoryMode = true;
794	# else
795	pGMM->fBoundMemoryMode = false;
796	# endif
797	# else
798	pGMM->fLegacyAllocationMode = true;
799	pGMM->fBoundMemoryMode = true;
800	# endif
801	#endif
802
803	/*
804	* Query system page count and guess a reasonable cMaxPages value.
805	*/
806	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
807
808	g_pGMM = pGMM;
809	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
810	return VINF_SUCCESS;
811	}
812
813	/*
814	* Bail out.
815	*/
816	while (iMtx-- > 0)
817	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
818	RTSemFastMutexDestroy(pGMM->hMtx);
819	}
820
821	pGMM->u32Magic = 0;
822	RTMemFree(pGMM);
823	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
824	return rc;
825	}
826
827
828	/**
829	* Terminates the GMM component.
830	*/
831	GMMR0DECL(void) GMMR0Term(void)
832	{
833	LogFlow(("GMMTerm:\n"));
834
835	/*
836	* Take care / be paranoid...
837	*/
838	PGMM pGMM = g_pGMM;
839	if (!VALID_PTR(pGMM))
840	return;
841	if (pGMM->u32Magic != GMM_MAGIC)
842	{
843	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
844	return;
845	}
846
847	/*
848	* Undo what init did and free all the resources we've acquired.
849	*/
850	/* Destroy the fundamentals. */
851	g_pGMM = NULL;
852	pGMM->u32Magic = ~GMM_MAGIC;
853	RTSemFastMutexDestroy(pGMM->hMtx);
854	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
855
856	/* Free any chunks still hanging around. */
857	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
858
859	/* Destroy the chunk locks. */
860	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
861	{
862	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
863	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
864	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
865	}
866
867	/* Finally the instance data itself. */
868	RTMemFree(pGMM);
869	LogFlow(("GMMTerm: done\n"));
870	}
871
872
873	/**
874	* RTAvlU32Destroy callback.
875	*
876	* @returns 0
877	* @param pNode The node to destroy.
878	* @param pvGMM The GMM handle.
879	*/
880	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
881	{
882	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
883
884	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
885	SUPR0Printf("GMMR0Term: %p/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
886	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
887
888	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
889	if (RT_FAILURE(rc))
890	{
891	SUPR0Printf("GMMR0Term: %p/%#x: RTRMemObjFree(%p,true) -> %d (cMappings=%d)\n", pChunk,
892	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
893	AssertRC(rc);
894	}
895	pChunk->hMemObj = NIL_RTR0MEMOBJ;
896
897	RTMemFree(pChunk->paMappingsX);
898	pChunk->paMappingsX = NULL;
899
900	RTMemFree(pChunk);
901	NOREF(pvGMM);
902	return 0;
903	}
904
905
906	/**
907	* Initializes the per-VM data for the GMM.
908	*
909	* This is called from within the GVMM lock (from GVMMR0CreateVM)
910	* and should only initialize the data members so GMMR0CleanupVM
911	* can deal with them. We reserve no memory or anything here,
912	* that's done later in GMMR0InitVM.
913	*
914	* @param pGVM Pointer to the Global VM structure.
915	*/
916	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
917	{
918	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
919
920	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
921	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
922	pGVM->gmm.s.Stats.fMayAllocate = false;
923	}
924
925
926	/**
927	* Acquires the GMM giant lock.
928	*
929	* @returns Assert status code from RTSemFastMutexRequest.
930	* @param pGMM Pointer to the GMM instance.
931	*/
932	static int gmmR0MutexAcquire(PGMM pGMM)
933	{
934	ASMAtomicIncU32(&pGMM->cMtxContenders);
935	int rc = RTSemFastMutexRequest(pGMM->hMtx);
936	ASMAtomicDecU32(&pGMM->cMtxContenders);
937	AssertRC(rc);
938	#ifdef VBOX_STRICT
939	pGMM->hMtxOwner = RTThreadNativeSelf();
940	#endif
941	return rc;
942	}
943
944
945	/**
946	* Releases the GMM giant lock.
947	*
948	* @returns Assert status code from RTSemFastMutexRequest.
949	* @param pGMM Pointer to the GMM instance.
950	*/
951	static int gmmR0MutexRelease(PGMM pGMM)
952	{
953	#ifdef VBOX_STRICT
954	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
955	#endif
956	int rc = RTSemFastMutexRelease(pGMM->hMtx);
957	AssertRC(rc);
958	return rc;
959	}
960
961
962	/**
963	* Yields the GMM giant lock if there is contention and a certain minimum time
964	* has elapsed since we took it.
965	*
966	* @returns @c true if the mutex was yielded, @c false if not.
967	* @param pGMM Pointer to the GMM instance.
968	* @param puLockNanoTS Where the lock acquisition time stamp is kept
969	* (in/out).
970	*/
971	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
972	{
973	/*
974	* If nobody is contending the mutex, don't bother checking the time.
975	*/
976	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
977	return false;
978
979	/*
980	* Don't yield if we haven't executed for at least 2 milliseconds.
981	*/
982	uint64_t uNanoNow = RTTimeSystemNanoTS();
983	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
984	return false;
985
986	/*
987	* Yield the mutex.
988	*/
989	#ifdef VBOX_STRICT
990	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
991	#endif
992	ASMAtomicIncU32(&pGMM->cMtxContenders);
993	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
994
995	RTThreadYield();
996
997	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
998	*puLockNanoTS = RTTimeSystemNanoTS();
999	ASMAtomicDecU32(&pGMM->cMtxContenders);
1000	#ifdef VBOX_STRICT
1001	pGMM->hMtxOwner = RTThreadNativeSelf();
1002	#endif
1003
1004	return true;
1005	}
1006
1007
1008	/**
1009	* Acquires a chunk lock.
1010	*
1011	* The caller must own the giant lock.
1012	*
1013	* @returns Assert status code from RTSemFastMutexRequest.
1014	* @param pMtxState The chunk mutex state info. (Avoids
1015	* passing the same flags and stuff around
1016	* for subsequent release and drop-giant
1017	* calls.)
1018	* @param pGMM Pointer to the GMM instance.
1019	* @param pChunk Pointer to the chunk.
1020	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1021	*/
1022	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1023	{
1024	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1025	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1026
1027	pMtxState->pGMM = pGMM;
1028	pMtxState->fFlags = (uint8_t)fFlags;
1029
1030	/*
1031	* Get the lock index and reference the lock.
1032	*/
1033	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1034	uint32_t iChunkMtx = pChunk->iChunkMtx;
1035	if (iChunkMtx == UINT8_MAX)
1036	{
1037	iChunkMtx = pGMM->iNextChunkMtx++;
1038	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1039
1040	/* Try get an unused one... */
1041	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1042	{
1043	iChunkMtx = pGMM->iNextChunkMtx++;
1044	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1045	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1046	{
1047	iChunkMtx = pGMM->iNextChunkMtx++;
1048	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1049	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1050	{
1051	iChunkMtx = pGMM->iNextChunkMtx++;
1052	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1053	}
1054	}
1055	}
1056
1057	pChunk->iChunkMtx = iChunkMtx;
1058	}
1059	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1060	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1061	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1062
1063	/*
1064	* Drop the giant?
1065	*/
1066	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1067	{
1068	/** @todo GMM life cycle cleanup (we may race someone
1069	* destroying and cleaning up GMM)? */
1070	gmmR0MutexRelease(pGMM);
1071	}
1072
1073	/*
1074	* Take the chunk mutex.
1075	*/
1076	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1077	AssertRC(rc);
1078	return rc;
1079	}
1080
1081
1082	/**
1083	* Releases the GMM giant lock.
1084	*
1085	* @returns Assert status code from RTSemFastMutexRequest.
1086	* @param pGMM Pointer to the GMM instance.
1087	* @param pChunk Pointer to the chunk if it's still
1088	* alive, NULL if it isn't. This is used to deassociate
1089	* the chunk from the mutex on the way out so a new one
1090	* can be selected next time, thus avoiding contented
1091	* mutexes.
1092	*/
1093	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1094	{
1095	PGMM pGMM = pMtxState->pGMM;
1096
1097	/*
1098	* Release the chunk mutex and reacquire the giant if requested.
1099	*/
1100	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1101	AssertRC(rc);
1102	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1103	rc = gmmR0MutexAcquire(pGMM);
1104	else
1105	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1106
1107	/*
1108	* Drop the chunk mutex user reference and deassociate it from the chunk
1109	* when possible.
1110	*/
1111	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1112	&& pChunk
1113	&& RT_SUCCESS(rc) )
1114	{
1115	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1116	pChunk->iChunkMtx = UINT8_MAX;
1117	else
1118	{
1119	rc = gmmR0MutexAcquire(pGMM);
1120	if (RT_SUCCESS(rc))
1121	{
1122	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1123	pChunk->iChunkMtx = UINT8_MAX;
1124	rc = gmmR0MutexRelease(pGMM);
1125	}
1126	}
1127	}
1128
1129	pMtxState->pGMM = NULL;
1130	return rc;
1131	}
1132
1133
1134	/**
1135	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1136	* chunk locked.
1137	*
1138	* This only works if gmmR0ChunkMutexAcquire was called with
1139	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1140	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1141	*
1142	* @returns VBox status code (assuming success is ok).
1143	* @param pMtxState Pointer to the chunk mutex state.
1144	*/
1145	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1146	{
1147	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1148	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1149	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1150	/** @todo GMM life cycle cleanup (we may race someone
1151	* destroying and cleaning up GMM)? */
1152	return gmmR0MutexRelease(pMtxState->pGMM);
1153	}
1154
1155
1156	/**
1157	* For experimenting with NUMA affinity and such.
1158	*
1159	* @returns The current NUMA Node ID.
1160	*/
1161	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1162	{
1163	#if 1
1164	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1165	#else
1166	return RTMpCpuId() / 16;
1167	#endif
1168	}
1169
1170
1171
1172	/**
1173	* Cleans up when a VM is terminating.
1174	*
1175	* @param pGVM Pointer to the Global VM structure.
1176	*/
1177	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1178	{
1179	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1180
1181	PGMM pGMM;
1182	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1183
1184	#ifdef VBOX_WITH_PAGE_SHARING
1185	/*
1186	* Clean up all registered shared modules first.
1187	*/
1188	gmmR0SharedModuleCleanup(pGMM, pGVM);
1189	#endif
1190
1191	gmmR0MutexAcquire(pGMM);
1192	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1193	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1194
1195	/*
1196	* The policy is 'INVALID' until the initial reservation
1197	* request has been serviced.
1198	*/
1199	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1200	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1201	{
1202	/*
1203	* If it's the last VM around, we can skip walking all the chunk looking
1204	* for the pages owned by this VM and instead flush the whole shebang.
1205	*
1206	* This takes care of the eventuality that a VM has left shared page
1207	* references behind (shouldn't happen of course, but you never know).
1208	*/
1209	Assert(pGMM->cRegisteredVMs);
1210	pGMM->cRegisteredVMs--;
1211
1212	/*
1213	* Walk the entire pool looking for pages that belong to this VM
1214	* and leftover mappings. (This'll only catch private pages,
1215	* shared pages will be 'left behind'.)
1216	*/
1217	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1218	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1219
1220	unsigned iCountDown = 64;
1221	bool fRedoFromStart;
1222	PGMMCHUNK pChunk;
1223	do
1224	{
1225	fRedoFromStart = false;
1226	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1227	{
1228	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1229	if ( ( !pGMM->fBoundMemoryMode
1230	\|\| pChunk->hGVM == pGVM->hSelf)
1231	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1232	{
1233	/* We left the giant mutex, so reset the yield counters. */
1234	uLockNanoTS = RTTimeSystemNanoTS();
1235	iCountDown = 64;
1236	}
1237	else
1238	{
1239	/* Didn't leave it, so do normal yielding. */
1240	if (!iCountDown)
1241	gmmR0MutexYield(pGMM, &uLockNanoTS);
1242	else
1243	iCountDown--;
1244	}
1245	if (pGMM->cFreedChunks != cFreeChunksOld)
1246	{
1247	fRedoFromStart = true;
1248	break;
1249	}
1250	}
1251	} while (fRedoFromStart);
1252
1253	if (pGVM->gmm.s.Stats.cPrivatePages)
1254	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1255
1256	pGMM->cAllocatedPages -= cPrivatePages;
1257
1258	/*
1259	* Free empty chunks.
1260	*/
1261	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1262	do
1263	{
1264	fRedoFromStart = false;
1265	iCountDown = 10240;
1266	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1267	while (pChunk)
1268	{
1269	PGMMCHUNK pNext = pChunk->pFreeNext;
1270	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1271	if ( !pGMM->fBoundMemoryMode
1272	\|\| pChunk->hGVM == pGVM->hSelf)
1273	{
1274	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1275	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1276	{
1277	/* We've left the giant mutex, restart? (+1 for our unlink) */
1278	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1279	if (fRedoFromStart)
1280	break;
1281	uLockNanoTS = RTTimeSystemNanoTS();
1282	iCountDown = 10240;
1283	}
1284	}
1285
1286	/* Advance and maybe yield the lock. */
1287	pChunk = pNext;
1288	if (--iCountDown == 0)
1289	{
1290	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1291	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1292	&& pPrivateSet->idGeneration != idGenerationOld;
1293	if (fRedoFromStart)
1294	break;
1295	iCountDown = 10240;
1296	}
1297	}
1298	} while (fRedoFromStart);
1299
1300	/*
1301	* Account for shared pages that weren't freed.
1302	*/
1303	if (pGVM->gmm.s.Stats.cSharedPages)
1304	{
1305	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1306	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1307	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1308	}
1309
1310	/*
1311	* Clean up balloon statistics in case the VM process crashed.
1312	*/
1313	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1314	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1315
1316	/*
1317	* Update the over-commitment management statistics.
1318	*/
1319	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1320	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1321	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1322	switch (pGVM->gmm.s.Stats.enmPolicy)
1323	{
1324	case GMMOCPOLICY_NO_OC:
1325	break;
1326	default:
1327	/** @todo Update GMM->cOverCommittedPages */
1328	break;
1329	}
1330	}
1331
1332	/* zap the GVM data. */
1333	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1334	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1335	pGVM->gmm.s.Stats.fMayAllocate = false;
1336
1337	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1338	gmmR0MutexRelease(pGMM);
1339
1340	LogFlow(("GMMR0CleanupVM: returns\n"));
1341	}
1342
1343
1344	/**
1345	* Scan one chunk for private pages belonging to the specified VM.
1346	*
1347	* @note This function may drop the giant mutex!
1348	*
1349	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1350	* we didn't.
1351	* @param pGMM Pointer to the GMM instance.
1352	* @param pGVM The global VM handle.
1353	* @param pChunk The chunk to scan.
1354	*/
1355	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1356	{
1357	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1358
1359	/*
1360	* Look for pages belonging to the VM.
1361	* (Perform some internal checks while we're scanning.)
1362	*/
1363	#ifndef VBOX_STRICT
1364	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1365	#endif
1366	{
1367	unsigned cPrivate = 0;
1368	unsigned cShared = 0;
1369	unsigned cFree = 0;
1370
1371	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1372
1373	uint16_t hGVM = pGVM->hSelf;
1374	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1375	while (iPage-- > 0)
1376	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1377	{
1378	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1379	{
1380	/*
1381	* Free the page.
1382	*
1383	* The reason for not using gmmR0FreePrivatePage here is that we
1384	* must not cause the chunk to be freed from under us - we're in
1385	* an AVL tree walk here.
1386	*/
1387	pChunk->aPages[iPage].u = 0;
1388	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1389	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1390	pChunk->iFreeHead = iPage;
1391	pChunk->cPrivate--;
1392	pChunk->cFree++;
1393	pGVM->gmm.s.Stats.cPrivatePages--;
1394	cFree++;
1395	}
1396	else
1397	cPrivate++;
1398	}
1399	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1400	cFree++;
1401	else
1402	cShared++;
1403
1404	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1405
1406	/*
1407	* Did it add up?
1408	*/
1409	if (RT_UNLIKELY( pChunk->cFree != cFree
1410	\|\| pChunk->cPrivate != cPrivate
1411	\|\| pChunk->cShared != cShared))
1412	{
1413	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %p/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1414	pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1415	pChunk->cFree = cFree;
1416	pChunk->cPrivate = cPrivate;
1417	pChunk->cShared = cShared;
1418	}
1419	}
1420
1421	/*
1422	* If not in bound memory mode, we should reset the hGVM field
1423	* if it has our handle in it.
1424	*/
1425	if (pChunk->hGVM == pGVM->hSelf)
1426	{
1427	if (!g_pGMM->fBoundMemoryMode)
1428	pChunk->hGVM = NIL_GVM_HANDLE;
1429	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1430	{
1431	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1432	pChunk, pChunk->Core.Key, pChunk->cFree);
1433	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1434
1435	gmmR0UnlinkChunk(pChunk);
1436	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1437	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1438	}
1439	}
1440
1441	/*
1442	* Look for a mapping belonging to the terminating VM.
1443	*/
1444	GMMR0CHUNKMTXSTATE MtxState;
1445	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1446	unsigned cMappings = pChunk->cMappingsX;
1447	for (unsigned i = 0; i < cMappings; i++)
1448	if (pChunk->paMappingsX[i].pGVM == pGVM)
1449	{
1450	gmmR0ChunkMutexDropGiant(&MtxState);
1451
1452	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1453
1454	cMappings--;
1455	if (i < cMappings)
1456	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1457	pChunk->paMappingsX[cMappings].pGVM = NULL;
1458	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1459	Assert(pChunk->cMappingsX - 1U == cMappings);
1460	pChunk->cMappingsX = cMappings;
1461
1462	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1463	if (RT_FAILURE(rc))
1464	{
1465	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: mapping #%x: RTRMemObjFree(%p,false) -> %d \n",
1466	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1467	AssertRC(rc);
1468	}
1469
1470	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1471	return true;
1472	}
1473
1474	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1475	return false;
1476	}
1477
1478
1479	/**
1480	* The initial resource reservations.
1481	*
1482	* This will make memory reservations according to policy and priority. If there aren't
1483	* sufficient resources available to sustain the VM this function will fail and all
1484	* future allocations requests will fail as well.
1485	*
1486	* These are just the initial reservations made very very early during the VM creation
1487	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1488	* ring-3 init has completed.
1489	*
1490	* @returns VBox status code.
1491	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1492	* @retval VERR_GMM_
1493	*
1494	* @param pVM Pointer to the VM.
1495	* @param idCpu The VCPU id.
1496	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1497	* This does not include MMIO2 and similar.
1498	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1499	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1500	* hyper heap, MMIO2 and similar.
1501	* @param enmPolicy The OC policy to use on this VM.
1502	* @param enmPriority The priority in an out-of-memory situation.
1503	*
1504	* @thread The creator thread / EMT.
1505	*/
1506	GMMR0DECL(int) GMMR0InitialReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages,
1507	GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1508	{
1509	LogFlow(("GMMR0InitialReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1510	pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1511
1512	/*
1513	* Validate, get basics and take the semaphore.
1514	*/
1515	PGMM pGMM;
1516	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1517	PGVM pGVM;
1518	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1519	if (RT_FAILURE(rc))
1520	return rc;
1521
1522	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1523	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1524	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1525	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1526	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1527
1528	gmmR0MutexAcquire(pGMM);
1529	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1530	{
1531	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1532	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1533	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1534	{
1535	/*
1536	* Check if we can accommodate this.
1537	*/
1538	/* ... later ... */
1539	if (RT_SUCCESS(rc))
1540	{
1541	/*
1542	* Update the records.
1543	*/
1544	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1545	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1546	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1547	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1548	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1549	pGVM->gmm.s.Stats.fMayAllocate = true;
1550
1551	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1552	pGMM->cRegisteredVMs++;
1553	}
1554	}
1555	else
1556	rc = VERR_WRONG_ORDER;
1557	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1558	}
1559	else
1560	rc = VERR_GMM_IS_NOT_SANE;
1561	gmmR0MutexRelease(pGMM);
1562	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1563	return rc;
1564	}
1565
1566
1567	/**
1568	* VMMR0 request wrapper for GMMR0InitialReservation.
1569	*
1570	* @returns see GMMR0InitialReservation.
1571	* @param pVM Pointer to the VM.
1572	* @param idCpu The VCPU id.
1573	* @param pReq Pointer to the request packet.
1574	*/
1575	GMMR0DECL(int) GMMR0InitialReservationReq(PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1576	{
1577	/*
1578	* Validate input and pass it on.
1579	*/
1580	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1581	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1582	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1583
1584	return GMMR0InitialReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1585	}
1586
1587
1588	/**
1589	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1590	*
1591	* @returns VBox status code.
1592	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1593	*
1594	* @param pVM Pointer to the VM.
1595	* @param idCpu The VCPU id.
1596	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1597	* This does not include MMIO2 and similar.
1598	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1599	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1600	* hyper heap, MMIO2 and similar.
1601	*
1602	* @thread EMT.
1603	*/
1604	GMMR0DECL(int) GMMR0UpdateReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages)
1605	{
1606	LogFlow(("GMMR0UpdateReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1607	pVM, cBasePages, cShadowPages, cFixedPages));
1608
1609	/*
1610	* Validate, get basics and take the semaphore.
1611	*/
1612	PGMM pGMM;
1613	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1614	PGVM pGVM;
1615	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1616	if (RT_FAILURE(rc))
1617	return rc;
1618
1619	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1620	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1621	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1622
1623	gmmR0MutexAcquire(pGMM);
1624	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1625	{
1626	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1627	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1628	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1629	{
1630	/*
1631	* Check if we can accommodate this.
1632	*/
1633	/* ... later ... */
1634	if (RT_SUCCESS(rc))
1635	{
1636	/*
1637	* Update the records.
1638	*/
1639	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1640	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1641	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1642	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1643
1644	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1645	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1646	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1647	}
1648	}
1649	else
1650	rc = VERR_WRONG_ORDER;
1651	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1652	}
1653	else
1654	rc = VERR_GMM_IS_NOT_SANE;
1655	gmmR0MutexRelease(pGMM);
1656	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1657	return rc;
1658	}
1659
1660
1661	/**
1662	* VMMR0 request wrapper for GMMR0UpdateReservation.
1663	*
1664	* @returns see GMMR0UpdateReservation.
1665	* @param pVM Pointer to the VM.
1666	* @param idCpu The VCPU id.
1667	* @param pReq Pointer to the request packet.
1668	*/
1669	GMMR0DECL(int) GMMR0UpdateReservationReq(PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1670	{
1671	/*
1672	* Validate input and pass it on.
1673	*/
1674	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1675	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1676	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1677
1678	return GMMR0UpdateReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1679	}
1680
1681	#ifdef GMMR0_WITH_SANITY_CHECK
1682
1683	/**
1684	* Performs sanity checks on a free set.
1685	*
1686	* @returns Error count.
1687	*
1688	* @param pGMM Pointer to the GMM instance.
1689	* @param pSet Pointer to the set.
1690	* @param pszSetName The set name.
1691	* @param pszFunction The function from which it was called.
1692	* @param uLine The line number.
1693	*/
1694	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1695	const char *pszFunction, unsigned uLineNo)
1696	{
1697	uint32_t cErrors = 0;
1698
1699	/*
1700	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1701	*/
1702	uint32_t cPages = 0;
1703	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1704	{
1705	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1706	{
1707	/** @todo check that the chunk is hash into the right set. */
1708	cPages += pCur->cFree;
1709	}
1710	}
1711	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1712	{
1713	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1714	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1715	cErrors++;
1716	}
1717
1718	return cErrors;
1719	}
1720
1721
1722	/**
1723	* Performs some sanity checks on the GMM while owning lock.
1724	*
1725	* @returns Error count.
1726	*
1727	* @param pGMM Pointer to the GMM instance.
1728	* @param pszFunction The function from which it is called.
1729	* @param uLineNo The line number.
1730	*/
1731	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1732	{
1733	uint32_t cErrors = 0;
1734
1735	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1736	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1737	/** @todo add more sanity checks. */
1738
1739	return cErrors;
1740	}
1741
1742	#endif /* GMMR0_WITH_SANITY_CHECK */
1743
1744	/**
1745	* Looks up a chunk in the tree and fill in the TLB entry for it.
1746	*
1747	* This is not expected to fail and will bitch if it does.
1748	*
1749	* @returns Pointer to the allocation chunk, NULL if not found.
1750	* @param pGMM Pointer to the GMM instance.
1751	* @param idChunk The ID of the chunk to find.
1752	* @param pTlbe Pointer to the TLB entry.
1753	*/
1754	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1755	{
1756	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1757	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1758	pTlbe->idChunk = idChunk;
1759	pTlbe->pChunk = pChunk;
1760	return pChunk;
1761	}
1762
1763
1764	/**
1765	* Finds a allocation chunk.
1766	*
1767	* This is not expected to fail and will bitch if it does.
1768	*
1769	* @returns Pointer to the allocation chunk, NULL if not found.
1770	* @param pGMM Pointer to the GMM instance.
1771	* @param idChunk The ID of the chunk to find.
1772	*/
1773	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1774	{
1775	/*
1776	* Do a TLB lookup, branch if not in the TLB.
1777	*/
1778	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1779	if ( pTlbe->idChunk != idChunk
1780	\|\| !pTlbe->pChunk)
1781	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1782	return pTlbe->pChunk;
1783	}
1784
1785
1786	/**
1787	* Finds a page.
1788	*
1789	* This is not expected to fail and will bitch if it does.
1790	*
1791	* @returns Pointer to the page, NULL if not found.
1792	* @param pGMM Pointer to the GMM instance.
1793	* @param idPage The ID of the page to find.
1794	*/
1795	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1796	{
1797	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1798	if (RT_LIKELY(pChunk))
1799	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1800	return NULL;
1801	}
1802
1803
1804	/**
1805	* Gets the host physical address for a page given by it's ID.
1806	*
1807	* @returns The host physical address or NIL_RTHCPHYS.
1808	* @param pGMM Pointer to the GMM instance.
1809	* @param idPage The ID of the page to find.
1810	*/
1811	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1812	{
1813	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1814	if (RT_LIKELY(pChunk))
1815	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1816	return NIL_RTHCPHYS;
1817	}
1818
1819
1820	/**
1821	* Selects the appropriate free list given the number of free pages.
1822	*
1823	* @returns Free list index.
1824	* @param cFree The number of free pages in the chunk.
1825	*/
1826	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1827	{
1828	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1829	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1830	("%d (%u)\n", iList, cFree));
1831	return iList;
1832	}
1833
1834
1835	/**
1836	* Unlinks the chunk from the free list it's currently on (if any).
1837	*
1838	* @param pChunk The allocation chunk.
1839	*/
1840	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1841	{
1842	PGMMCHUNKFREESET pSet = pChunk->pSet;
1843	if (RT_LIKELY(pSet))
1844	{
1845	pSet->cFreePages -= pChunk->cFree;
1846	pSet->idGeneration++;
1847
1848	PGMMCHUNK pPrev = pChunk->pFreePrev;
1849	PGMMCHUNK pNext = pChunk->pFreeNext;
1850	if (pPrev)
1851	pPrev->pFreeNext = pNext;
1852	else
1853	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1854	if (pNext)
1855	pNext->pFreePrev = pPrev;
1856
1857	pChunk->pSet = NULL;
1858	pChunk->pFreeNext = NULL;
1859	pChunk->pFreePrev = NULL;
1860	}
1861	else
1862	{
1863	Assert(!pChunk->pFreeNext);
1864	Assert(!pChunk->pFreePrev);
1865	Assert(!pChunk->cFree);
1866	}
1867	}
1868
1869
1870	/**
1871	* Links the chunk onto the appropriate free list in the specified free set.
1872	*
1873	* If no free entries, it's not linked into any list.
1874	*
1875	* @param pChunk The allocation chunk.
1876	* @param pSet The free set.
1877	*/
1878	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1879	{
1880	Assert(!pChunk->pSet);
1881	Assert(!pChunk->pFreeNext);
1882	Assert(!pChunk->pFreePrev);
1883
1884	if (pChunk->cFree > 0)
1885	{
1886	pChunk->pSet = pSet;
1887	pChunk->pFreePrev = NULL;
1888	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1889	pChunk->pFreeNext = pSet->apLists[iList];
1890	if (pChunk->pFreeNext)
1891	pChunk->pFreeNext->pFreePrev = pChunk;
1892	pSet->apLists[iList] = pChunk;
1893
1894	pSet->cFreePages += pChunk->cFree;
1895	pSet->idGeneration++;
1896	}
1897	}
1898
1899
1900	/**
1901	* Links the chunk onto the appropriate free list in the specified free set.
1902	*
1903	* If no free entries, it's not linked into any list.
1904	*
1905	* @param pChunk The allocation chunk.
1906	*/
1907	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1908	{
1909	PGMMCHUNKFREESET pSet;
1910	if (pGMM->fBoundMemoryMode)
1911	pSet = &pGVM->gmm.s.Private;
1912	else if (pChunk->cShared)
1913	pSet = &pGMM->Shared;
1914	else
1915	pSet = &pGMM->PrivateX;
1916	gmmR0LinkChunk(pChunk, pSet);
1917	}
1918
1919
1920	/**
1921	* Frees a Chunk ID.
1922	*
1923	* @param pGMM Pointer to the GMM instance.
1924	* @param idChunk The Chunk ID to free.
1925	*/
1926	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1927	{
1928	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1929	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1930	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1931	}
1932
1933
1934	/**
1935	* Allocates a new Chunk ID.
1936	*
1937	* @returns The Chunk ID.
1938	* @param pGMM Pointer to the GMM instance.
1939	*/
1940	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1941	{
1942	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1943	AssertCompile(NIL_GMM_CHUNKID == 0);
1944
1945	/*
1946	* Try the next sequential one.
1947	*/
1948	int32_t idChunk = ++pGMM->idChunkPrev;
1949	#if 0 /** @todo enable this code */
1950	if ( idChunk <= GMM_CHUNKID_LAST
1951	&& idChunk > NIL_GMM_CHUNKID
1952	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
1953	return idChunk;
1954	#endif
1955
1956	/*
1957	* Scan sequentially from the last one.
1958	*/
1959	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
1960	&& idChunk > NIL_GMM_CHUNKID)
1961	{
1962	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
1963	if (idChunk > NIL_GMM_CHUNKID)
1964	{
1965	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1966	return pGMM->idChunkPrev = idChunk;
1967	}
1968	}
1969
1970	/*
1971	* Ok, scan from the start.
1972	* We're not racing anyone, so there is no need to expect failures or have restart loops.
1973	*/
1974	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
1975	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
1976	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1977
1978	return pGMM->idChunkPrev = idChunk;
1979	}
1980
1981
1982	/**
1983	* Allocates one private page.
1984	*
1985	* Worker for gmmR0AllocatePages.
1986	*
1987	* @param pChunk The chunk to allocate it from.
1988	* @param hGVM The GVM handle of the VM requesting memory.
1989	* @param pPageDesc The page descriptor.
1990	*/
1991	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
1992	{
1993	/* update the chunk stats. */
1994	if (pChunk->hGVM == NIL_GVM_HANDLE)
1995	pChunk->hGVM = hGVM;
1996	Assert(pChunk->cFree);
1997	pChunk->cFree--;
1998	pChunk->cPrivate++;
1999
2000	/* unlink the first free page. */
2001	const uint32_t iPage = pChunk->iFreeHead;
2002	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2003	PGMMPAGE pPage = &pChunk->aPages[iPage];
2004	Assert(GMM_PAGE_IS_FREE(pPage));
2005	pChunk->iFreeHead = pPage->Free.iNext;
2006	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2007	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2008	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2009
2010	/* make the page private. */
2011	pPage->u = 0;
2012	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2013	pPage->Private.hGVM = hGVM;
2014	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2015	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2016	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2017	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2018	else
2019	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2020
2021	/* update the page descriptor. */
2022	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2023	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2024	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2025	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2026	}
2027
2028
2029	/**
2030	* Picks the free pages from a chunk.
2031	*
2032	* @returns The new page descriptor table index.
2033	* @param pGMM Pointer to the GMM instance data.
2034	* @param hGVM The global VM handle.
2035	* @param pChunk The chunk.
2036	* @param iPage The current page descriptor table index.
2037	* @param cPages The total number of pages to allocate.
2038	* @param paPages The page descriptor table (input + ouput).
2039	*/
2040	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2041	PGMMPAGEDESC paPages)
2042	{
2043	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2044	gmmR0UnlinkChunk(pChunk);
2045
2046	for (; pChunk->cFree && iPage < cPages; iPage++)
2047	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2048
2049	gmmR0LinkChunk(pChunk, pSet);
2050	return iPage;
2051	}
2052
2053
2054	/**
2055	* Registers a new chunk of memory.
2056	*
2057	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2058	*
2059	* @returns VBox status code. On success, the giant GMM lock will be held, the
2060	* caller must release it (ugly).
2061	* @param pGMM Pointer to the GMM instance.
2062	* @param pSet Pointer to the set.
2063	* @param MemObj The memory object for the chunk.
2064	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2065	* affinity.
2066	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2067	* @param ppChunk Chunk address (out). Optional.
2068	*
2069	* @remarks The caller must not own the giant GMM mutex.
2070	* The giant GMM mutex will be acquired and returned acquired in
2071	* the success path. On failure, no locks will be held.
2072	*/
2073	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2074	PGMMCHUNK *ppChunk)
2075	{
2076	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2077	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2078	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2079
2080	int rc;
2081	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2082	if (pChunk)
2083	{
2084	/*
2085	* Initialize it.
2086	*/
2087	pChunk->hMemObj = MemObj;
2088	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2089	pChunk->hGVM = hGVM;
2090	/pChunk->iFreeHead = 0;/
2091	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2092	pChunk->iChunkMtx = UINT8_MAX;
2093	pChunk->fFlags = fChunkFlags;
2094	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2095	{
2096	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2097	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2098	}
2099	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2100	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2101
2102	/*
2103	* Allocate a Chunk ID and insert it into the tree.
2104	* This has to be done behind the mutex of course.
2105	*/
2106	rc = gmmR0MutexAcquire(pGMM);
2107	if (RT_SUCCESS(rc))
2108	{
2109	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2110	{
2111	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2112	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2113	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2114	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2115	{
2116	pGMM->cChunks++;
2117	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2118	gmmR0LinkChunk(pChunk, pSet);
2119	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2120
2121	if (ppChunk)
2122	*ppChunk = pChunk;
2123	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2124	return VINF_SUCCESS;
2125	}
2126
2127	/* bail out */
2128	rc = VERR_GMM_CHUNK_INSERT;
2129	}
2130	else
2131	rc = VERR_GMM_IS_NOT_SANE;
2132	gmmR0MutexRelease(pGMM);
2133	}
2134
2135	RTMemFree(pChunk);
2136	}
2137	else
2138	rc = VERR_NO_MEMORY;
2139	return rc;
2140	}
2141
2142
2143	/**
2144	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2145	* what's remaining to the specified free set.
2146	*
2147	* @note This will leave the giant mutex while allocating the new chunk!
2148	*
2149	* @returns VBox status code.
2150	* @param pGMM Pointer to the GMM instance data.
2151	* @param pGVM Pointer to the kernel-only VM instace data.
2152	* @param pSet Pointer to the free set.
2153	* @param cPages The number of pages requested.
2154	* @param paPages The page descriptor table (input + output).
2155	* @param piPage The pointer to the page descriptor table index
2156	* variable. This will be updated.
2157	*/
2158	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2159	PGMMPAGEDESC paPages, uint32_t *piPage)
2160	{
2161	gmmR0MutexRelease(pGMM);
2162
2163	RTR0MEMOBJ hMemObj;
2164	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2165	if (RT_SUCCESS(rc))
2166	{
2167	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2168	* free pages first and then unchaining them right afterwards. Instead
2169	* do as much work as possible without holding the giant lock. */
2170	PGMMCHUNK pChunk;
2171	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2172	if (RT_SUCCESS(rc))
2173	{
2174	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2175	return VINF_SUCCESS;
2176	}
2177
2178	/* bail out */
2179	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2180	}
2181
2182	int rc2 = gmmR0MutexAcquire(pGMM);
2183	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2184	return rc;
2185
2186	}
2187
2188
2189	/**
2190	* As a last restort we'll pick any page we can get.
2191	*
2192	* @returns The new page descriptor table index.
2193	* @param pSet The set to pick from.
2194	* @param pGVM Pointer to the global VM structure.
2195	* @param iPage The current page descriptor table index.
2196	* @param cPages The total number of pages to allocate.
2197	* @param paPages The page descriptor table (input + ouput).
2198	*/
2199	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2200	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2201	{
2202	unsigned iList = RT_ELEMENTS(pSet->apLists);
2203	while (iList-- > 0)
2204	{
2205	PGMMCHUNK pChunk = pSet->apLists[iList];
2206	while (pChunk)
2207	{
2208	PGMMCHUNK pNext = pChunk->pFreeNext;
2209
2210	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2211	if (iPage >= cPages)
2212	return iPage;
2213
2214	pChunk = pNext;
2215	}
2216	}
2217	return iPage;
2218	}
2219
2220
2221	/**
2222	* Pick pages from empty chunks on the same NUMA node.
2223	*
2224	* @returns The new page descriptor table index.
2225	* @param pSet The set to pick from.
2226	* @param pGVM Pointer to the global VM structure.
2227	* @param iPage The current page descriptor table index.
2228	* @param cPages The total number of pages to allocate.
2229	* @param paPages The page descriptor table (input + ouput).
2230	*/
2231	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2232	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2233	{
2234	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2235	if (pChunk)
2236	{
2237	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2238	while (pChunk)
2239	{
2240	PGMMCHUNK pNext = pChunk->pFreeNext;
2241
2242	if (pChunk->idNumaNode == idNumaNode)
2243	{
2244	pChunk->hGVM = pGVM->hSelf;
2245	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2246	if (iPage >= cPages)
2247	{
2248	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2249	return iPage;
2250	}
2251	}
2252
2253	pChunk = pNext;
2254	}
2255	}
2256	return iPage;
2257	}
2258
2259
2260	/**
2261	* Pick pages from non-empty chunks on the same NUMA node.
2262	*
2263	* @returns The new page descriptor table index.
2264	* @param pSet The set to pick from.
2265	* @param pGVM Pointer to the global VM structure.
2266	* @param iPage The current page descriptor table index.
2267	* @param cPages The total number of pages to allocate.
2268	* @param paPages The page descriptor table (input + ouput).
2269	*/
2270	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2271	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2272	{
2273	/** @todo start by picking from chunks with about the right size first? */
2274	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2275	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2276	while (iList-- > 0)
2277	{
2278	PGMMCHUNK pChunk = pSet->apLists[iList];
2279	while (pChunk)
2280	{
2281	PGMMCHUNK pNext = pChunk->pFreeNext;
2282
2283	if (pChunk->idNumaNode == idNumaNode)
2284	{
2285	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2286	if (iPage >= cPages)
2287	{
2288	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2289	return iPage;
2290	}
2291	}
2292
2293	pChunk = pNext;
2294	}
2295	}
2296	return iPage;
2297	}
2298
2299
2300	/**
2301	* Pick pages that are in chunks already associated with the VM.
2302	*
2303	* @returns The new page descriptor table index.
2304	* @param pGMM Pointer to the GMM instance data.
2305	* @param pGVM Pointer to the global VM structure.
2306	* @param pSet The set to pick from.
2307	* @param iPage The current page descriptor table index.
2308	* @param cPages The total number of pages to allocate.
2309	* @param paPages The page descriptor table (input + ouput).
2310	*/
2311	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2312	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2313	{
2314	uint16_t const hGVM = pGVM->hSelf;
2315
2316	/* Hint. */
2317	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2318	{
2319	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2320	if (pChunk && pChunk->cFree)
2321	{
2322	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2323	if (iPage >= cPages)
2324	return iPage;
2325	}
2326	}
2327
2328	/* Scan. */
2329	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2330	{
2331	PGMMCHUNK pChunk = pSet->apLists[iList];
2332	while (pChunk)
2333	{
2334	PGMMCHUNK pNext = pChunk->pFreeNext;
2335
2336	if (pChunk->hGVM == hGVM)
2337	{
2338	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2339	if (iPage >= cPages)
2340	{
2341	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2342	return iPage;
2343	}
2344	}
2345
2346	pChunk = pNext;
2347	}
2348	}
2349	return iPage;
2350	}
2351
2352
2353
2354	/**
2355	* Pick pages in bound memory mode.
2356	*
2357	* @returns The new page descriptor table index.
2358	* @param pGVM Pointer to the global VM structure.
2359	* @param iPage The current page descriptor table index.
2360	* @param cPages The total number of pages to allocate.
2361	* @param paPages The page descriptor table (input + ouput).
2362	*/
2363	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2364	{
2365	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2366	{
2367	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2368	while (pChunk)
2369	{
2370	Assert(pChunk->hGVM == pGVM->hSelf);
2371	PGMMCHUNK pNext = pChunk->pFreeNext;
2372	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2373	if (iPage >= cPages)
2374	return iPage;
2375	pChunk = pNext;
2376	}
2377	}
2378	return iPage;
2379	}
2380
2381
2382	/**
2383	* Checks if we should start picking pages from chunks of other VMs because
2384	* we're getting close to the system memory or reserved limit.
2385	*
2386	* @returns @c true if we should, @c false if we should first try allocate more
2387	* chunks.
2388	*/
2389	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2390	{
2391	/*
2392	* Don't allocate a new chunk if we're
2393	*/
2394	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2395	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2396	- pGVM->gmm.s.Stats.cBalloonedPages
2397	/** @todo what about shared pages? */;
2398	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2399	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2400	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2401	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2402	return true;
2403	/** @todo make the threshold configurable, also test the code to see if
2404	* this ever kicks in (we might be reserving too much or smth). */
2405
2406	/*
2407	* Check how close we're to the max memory limit and how many fragments
2408	* there are?...
2409	*/
2410	/** @todo. */
2411
2412	return false;
2413	}
2414
2415
2416	/**
2417	* Checks if we should start picking pages from chunks of other VMs because
2418	* there is a lot of free pages around.
2419	*
2420	* @returns @c true if we should, @c false if we should first try allocate more
2421	* chunks.
2422	*/
2423	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2424	{
2425	/*
2426	* Setting the limit at 16 chunks (32 MB) at the moment.
2427	*/
2428	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2429	return true;
2430	return false;
2431	}
2432
2433
2434	/**
2435	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2436	*
2437	* @returns VBox status code:
2438	* @retval VINF_SUCCESS on success.
2439	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2440	* gmmR0AllocateMoreChunks is necessary.
2441	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2442	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2443	* that is we're trying to allocate more than we've reserved.
2444	*
2445	* @param pGMM Pointer to the GMM instance data.
2446	* @param pGVM Pointer to the VM.
2447	* @param cPages The number of pages to allocate.
2448	* @param paPages Pointer to the page descriptors.
2449	* See GMMPAGEDESC for details on what is expected on input.
2450	* @param enmAccount The account to charge.
2451	*
2452	* @remarks Call takes the giant GMM lock.
2453	*/
2454	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2455	{
2456	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2457
2458	/*
2459	* Check allocation limits.
2460	*/
2461	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2462	return VERR_GMM_HIT_GLOBAL_LIMIT;
2463
2464	switch (enmAccount)
2465	{
2466	case GMMACCOUNT_BASE:
2467	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2468	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2469	{
2470	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2471	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2472	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2473	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2474	}
2475	break;
2476	case GMMACCOUNT_SHADOW:
2477	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2478	{
2479	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2480	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2481	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2482	}
2483	break;
2484	case GMMACCOUNT_FIXED:
2485	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2486	{
2487	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2488	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2489	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2490	}
2491	break;
2492	default:
2493	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2494	}
2495
2496	/*
2497	* If we're in legacy memory mode, it's easy to figure if we have
2498	* sufficient number of pages up-front.
2499	*/
2500	if ( pGMM->fLegacyAllocationMode
2501	&& pGVM->gmm.s.Private.cFreePages < cPages)
2502	{
2503	Assert(pGMM->fBoundMemoryMode);
2504	return VERR_GMM_SEED_ME;
2505	}
2506
2507	/*
2508	* Update the accounts before we proceed because we might be leaving the
2509	* protection of the global mutex and thus run the risk of permitting
2510	* too much memory to be allocated.
2511	*/
2512	switch (enmAccount)
2513	{
2514	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2515	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2516	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2517	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2518	}
2519	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2520	pGMM->cAllocatedPages += cPages;
2521
2522	/*
2523	* Part two of it's-easy-in-legacy-memory-mode.
2524	*/
2525	uint32_t iPage = 0;
2526	if (pGMM->fLegacyAllocationMode)
2527	{
2528	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2529	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2530	return VINF_SUCCESS;
2531	}
2532
2533	/*
2534	* Bound mode is also relatively straightforward.
2535	*/
2536	int rc = VINF_SUCCESS;
2537	if (pGMM->fBoundMemoryMode)
2538	{
2539	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2540	if (iPage < cPages)
2541	do
2542	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2543	while (iPage < cPages && RT_SUCCESS(rc));
2544	}
2545	/*
2546	* Shared mode is trickier as we should try archive the same locality as
2547	* in bound mode, but smartly make use of non-full chunks allocated by
2548	* other VMs if we're low on memory.
2549	*/
2550	else
2551	{
2552	/* Pick the most optimal pages first. */
2553	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2554	if (iPage < cPages)
2555	{
2556	/* Maybe we should try getting pages from chunks "belonging" to
2557	other VMs before allocating more chunks? */
2558	bool fTriedOnSameAlready = false;
2559	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2560	{
2561	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2562	fTriedOnSameAlready = true;
2563	}
2564
2565	/* Allocate memory from empty chunks. */
2566	if (iPage < cPages)
2567	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2568
2569	/* Grab empty shared chunks. */
2570	if (iPage < cPages)
2571	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2572
2573	/* If there is a lof of free pages spread around, try not waste
2574	system memory on more chunks. (Should trigger defragmentation.) */
2575	if ( !fTriedOnSameAlready
2576	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2577	{
2578	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2579	if (iPage < cPages)
2580	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2581	}
2582
2583	/*
2584	* Ok, try allocate new chunks.
2585	*/
2586	if (iPage < cPages)
2587	{
2588	do
2589	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2590	while (iPage < cPages && RT_SUCCESS(rc));
2591
2592	/* If the host is out of memory, take whatever we can get. */
2593	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2594	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2595	{
2596	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2597	if (iPage < cPages)
2598	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2599	AssertRelease(iPage == cPages);
2600	rc = VINF_SUCCESS;
2601	}
2602	}
2603	}
2604	}
2605
2606	/*
2607	* Clean up on failure. Since this is bound to be a low-memory condition
2608	* we will give back any empty chunks that might be hanging around.
2609	*/
2610	if (RT_FAILURE(rc))
2611	{
2612	/* Update the statistics. */
2613	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2614	pGMM->cAllocatedPages -= cPages - iPage;
2615	switch (enmAccount)
2616	{
2617	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2618	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2619	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2620	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2621	}
2622
2623	/* Release the pages. */
2624	while (iPage-- > 0)
2625	{
2626	uint32_t idPage = paPages[iPage].idPage;
2627	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2628	if (RT_LIKELY(pPage))
2629	{
2630	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2631	Assert(pPage->Private.hGVM == pGVM->hSelf);
2632	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2633	}
2634	else
2635	AssertMsgFailed(("idPage=%#x\n", idPage));
2636
2637	paPages[iPage].idPage = NIL_GMM_PAGEID;
2638	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2639	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2640	}
2641
2642	/* Free empty chunks. */
2643	/** @todo */
2644
2645	/* return the fail status on failure */
2646	return rc;
2647	}
2648	return VINF_SUCCESS;
2649	}
2650
2651
2652	/**
2653	* Updates the previous allocations and allocates more pages.
2654	*
2655	* The handy pages are always taken from the 'base' memory account.
2656	* The allocated pages are not cleared and will contains random garbage.
2657	*
2658	* @returns VBox status code:
2659	* @retval VINF_SUCCESS on success.
2660	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2661	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2662	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2663	* private page.
2664	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2665	* shared page.
2666	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2667	* owned by the VM.
2668	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2669	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2670	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2671	* that is we're trying to allocate more than we've reserved.
2672	*
2673	* @param pVM Pointer to the VM.
2674	* @param idCpu The VCPU id.
2675	* @param cPagesToUpdate The number of pages to update (starting from the head).
2676	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2677	* @param paPages The array of page descriptors.
2678	* See GMMPAGEDESC for details on what is expected on input.
2679	* @thread EMT.
2680	*/
2681	GMMR0DECL(int) GMMR0AllocateHandyPages(PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate, uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2682	{
2683	LogFlow(("GMMR0AllocateHandyPages: pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2684	pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2685
2686	/*
2687	* Validate, get basics and take the semaphore.
2688	* (This is a relatively busy path, so make predictions where possible.)
2689	*/
2690	PGMM pGMM;
2691	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2692	PGVM pGVM;
2693	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2694	if (RT_FAILURE(rc))
2695	return rc;
2696
2697	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2698	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2699	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2700	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2701	VERR_INVALID_PARAMETER);
2702
2703	unsigned iPage = 0;
2704	for (; iPage < cPagesToUpdate; iPage++)
2705	{
2706	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2707	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2708	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2709	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2710	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2711	VERR_INVALID_PARAMETER);
2712	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2713	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2714	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2715	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2716	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2717	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2718	}
2719
2720	for (; iPage < cPagesToAlloc; iPage++)
2721	{
2722	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2723	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2724	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2725	}
2726
2727	gmmR0MutexAcquire(pGMM);
2728	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2729	{
2730	/* No allocations before the initial reservation has been made! */
2731	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2732	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2733	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2734	{
2735	/*
2736	* Perform the updates.
2737	* Stop on the first error.
2738	*/
2739	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2740	{
2741	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2742	{
2743	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2744	if (RT_LIKELY(pPage))
2745	{
2746	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2747	{
2748	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2749	{
2750	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2751	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2752	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2753	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2754	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2755	/* else: NIL_RTHCPHYS nothing */
2756
2757	paPages[iPage].idPage = NIL_GMM_PAGEID;
2758	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2759	}
2760	else
2761	{
2762	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2763	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2764	rc = VERR_GMM_NOT_PAGE_OWNER;
2765	break;
2766	}
2767	}
2768	else
2769	{
2770	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2771	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2772	break;
2773	}
2774	}
2775	else
2776	{
2777	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2778	rc = VERR_GMM_PAGE_NOT_FOUND;
2779	break;
2780	}
2781	}
2782
2783	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2784	{
2785	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2786	if (RT_LIKELY(pPage))
2787	{
2788	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2789	{
2790	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2791	Assert(pPage->Shared.cRefs);
2792	Assert(pGVM->gmm.s.Stats.cSharedPages);
2793	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2794
2795	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2796	pGVM->gmm.s.Stats.cSharedPages--;
2797	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2798	if (!--pPage->Shared.cRefs)
2799	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2800	else
2801	{
2802	Assert(pGMM->cDuplicatePages);
2803	pGMM->cDuplicatePages--;
2804	}
2805
2806	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2807	}
2808	else
2809	{
2810	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2811	rc = VERR_GMM_PAGE_NOT_SHARED;
2812	break;
2813	}
2814	}
2815	else
2816	{
2817	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2818	rc = VERR_GMM_PAGE_NOT_FOUND;
2819	break;
2820	}
2821	}
2822	} /* for each page to update */
2823
2824	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2825	{
2826	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2827	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2828	{
2829	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2830	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2831	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2832	}
2833	#endif
2834
2835	/*
2836	* Join paths with GMMR0AllocatePages for the allocation.
2837	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2838	*/
2839	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2840	}
2841	}
2842	else
2843	rc = VERR_WRONG_ORDER;
2844	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2845	}
2846	else
2847	rc = VERR_GMM_IS_NOT_SANE;
2848	gmmR0MutexRelease(pGMM);
2849	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2850	return rc;
2851	}
2852
2853
2854	/**
2855	* Allocate one or more pages.
2856	*
2857	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2858	* The allocated pages are not cleared and will contain random garbage.
2859	*
2860	* @returns VBox status code:
2861	* @retval VINF_SUCCESS on success.
2862	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2863	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2864	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2865	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2866	* that is we're trying to allocate more than we've reserved.
2867	*
2868	* @param pVM Pointer to the VM.
2869	* @param idCpu The VCPU id.
2870	* @param cPages The number of pages to allocate.
2871	* @param paPages Pointer to the page descriptors.
2872	* See GMMPAGEDESC for details on what is expected on input.
2873	* @param enmAccount The account to charge.
2874	*
2875	* @thread EMT.
2876	*/
2877	GMMR0DECL(int) GMMR0AllocatePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2878	{
2879	LogFlow(("GMMR0AllocatePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
2880
2881	/*
2882	* Validate, get basics and take the semaphore.
2883	*/
2884	PGMM pGMM;
2885	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2886	PGVM pGVM;
2887	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2888	if (RT_FAILURE(rc))
2889	return rc;
2890
2891	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2892	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2893	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2894
2895	for (unsigned iPage = 0; iPage < cPages; iPage++)
2896	{
2897	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2898	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2899	\|\| ( enmAccount == GMMACCOUNT_BASE
2900	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2901	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2902	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2903	VERR_INVALID_PARAMETER);
2904	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2905	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2906	}
2907
2908	gmmR0MutexAcquire(pGMM);
2909	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2910	{
2911
2912	/* No allocations before the initial reservation has been made! */
2913	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2914	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2915	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2916	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2917	else
2918	rc = VERR_WRONG_ORDER;
2919	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2920	}
2921	else
2922	rc = VERR_GMM_IS_NOT_SANE;
2923	gmmR0MutexRelease(pGMM);
2924	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2925	return rc;
2926	}
2927
2928
2929	/**
2930	* VMMR0 request wrapper for GMMR0AllocatePages.
2931	*
2932	* @returns see GMMR0AllocatePages.
2933	* @param pVM Pointer to the VM.
2934	* @param idCpu The VCPU id.
2935	* @param pReq Pointer to the request packet.
2936	*/
2937	GMMR0DECL(int) GMMR0AllocatePagesReq(PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2938	{
2939	/*
2940	* Validate input and pass it on.
2941	*/
2942	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
2943	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
2944	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
2945	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
2946	VERR_INVALID_PARAMETER);
2947	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
2948	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
2949	VERR_INVALID_PARAMETER);
2950
2951	return GMMR0AllocatePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
2952	}
2953
2954
2955	/**
2956	* Allocate a large page to represent guest RAM
2957	*
2958	* The allocated pages are not cleared and will contains random garbage.
2959	*
2960	* @returns VBox status code:
2961	* @retval VINF_SUCCESS on success.
2962	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2963	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2964	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2965	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2966	* that is we're trying to allocate more than we've reserved.
2967	* @returns see GMMR0AllocatePages.
2968	* @param pVM Pointer to the VM.
2969	* @param idCpu The VCPU id.
2970	* @param cbPage Large page size.
2971	*/
2972	GMMR0DECL(int) GMMR0AllocateLargePage(PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
2973	{
2974	LogFlow(("GMMR0AllocateLargePage: pVM=%p cbPage=%x\n", pVM, cbPage));
2975
2976	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
2977	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
2978	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
2979
2980	/*
2981	* Validate, get basics and take the semaphore.
2982	*/
2983	PGMM pGMM;
2984	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2985	PGVM pGVM;
2986	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2987	if (RT_FAILURE(rc))
2988	return rc;
2989
2990	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
2991	if (pGMM->fLegacyAllocationMode)
2992	return VERR_NOT_SUPPORTED;
2993
2994	*pHCPhys = NIL_RTHCPHYS;
2995	*pIdPage = NIL_GMM_PAGEID;
2996
2997	gmmR0MutexAcquire(pGMM);
2998	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2999	{
3000	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3001	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3002	> pGVM->gmm.s.Stats.Reserved.cBasePages))
3003	{
3004	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3005	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3006	gmmR0MutexRelease(pGMM);
3007	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3008	}
3009
3010	/*
3011	* Allocate a new large page chunk.
3012	*
3013	* Note! We leave the giant GMM lock temporarily as the allocation might
3014	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3015	*/
3016	AssertCompile(GMM_CHUNK_SIZE == _2M);
3017	gmmR0MutexRelease(pGMM);
3018
3019	RTR0MEMOBJ hMemObj;
3020	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3021	if (RT_SUCCESS(rc))
3022	{
3023	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3024	PGMMCHUNK pChunk;
3025	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3026	if (RT_SUCCESS(rc))
3027	{
3028	/*
3029	* Allocate all the pages in the chunk.
3030	*/
3031	/* Unlink the new chunk from the free list. */
3032	gmmR0UnlinkChunk(pChunk);
3033
3034	/** @todo rewrite this to skip the looping. */
3035	/* Allocate all pages. */
3036	GMMPAGEDESC PageDesc;
3037	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3038
3039	/* Return the first page as we'll use the whole chunk as one big page. */
3040	*pIdPage = PageDesc.idPage;
3041	*pHCPhys = PageDesc.HCPhysGCPhys;
3042
3043	for (unsigned i = 1; i < cPages; i++)
3044	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3045
3046	/* Update accounting. */
3047	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3048	pGVM->gmm.s.Stats.cPrivatePages += cPages;
3049	pGMM->cAllocatedPages += cPages;
3050
3051	gmmR0LinkChunk(pChunk, pSet);
3052	gmmR0MutexRelease(pGMM);
3053	}
3054	else
3055	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3056	}
3057	}
3058	else
3059	{
3060	gmmR0MutexRelease(pGMM);
3061	rc = VERR_GMM_IS_NOT_SANE;
3062	}
3063
3064	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3065	return rc;
3066	}
3067
3068
3069	/**
3070	* Free a large page.
3071	*
3072	* @returns VBox status code:
3073	* @param pVM Pointer to the VM.
3074	* @param idCpu The VCPU id.
3075	* @param idPage The large page id.
3076	*/
3077	GMMR0DECL(int) GMMR0FreeLargePage(PVM pVM, VMCPUID idCpu, uint32_t idPage)
3078	{
3079	LogFlow(("GMMR0FreeLargePage: pVM=%p idPage=%x\n", pVM, idPage));
3080
3081	/*
3082	* Validate, get basics and take the semaphore.
3083	*/
3084	PGMM pGMM;
3085	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3086	PGVM pGVM;
3087	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3088	if (RT_FAILURE(rc))
3089	return rc;
3090
3091	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3092	if (pGMM->fLegacyAllocationMode)
3093	return VERR_NOT_SUPPORTED;
3094
3095	gmmR0MutexAcquire(pGMM);
3096	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3097	{
3098	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3099
3100	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3101	{
3102	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3103	gmmR0MutexRelease(pGMM);
3104	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3105	}
3106
3107	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3108	if (RT_LIKELY( pPage
3109	&& GMM_PAGE_IS_PRIVATE(pPage)))
3110	{
3111	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3112	Assert(pChunk);
3113	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3114	Assert(pChunk->cPrivate > 0);
3115
3116	/* Release the memory immediately. */
3117	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3118
3119	/* Update accounting. */
3120	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3121	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3122	pGMM->cAllocatedPages -= cPages;
3123	}
3124	else
3125	rc = VERR_GMM_PAGE_NOT_FOUND;
3126	}
3127	else
3128	rc = VERR_GMM_IS_NOT_SANE;
3129
3130	gmmR0MutexRelease(pGMM);
3131	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3132	return rc;
3133	}
3134
3135
3136	/**
3137	* VMMR0 request wrapper for GMMR0FreeLargePage.
3138	*
3139	* @returns see GMMR0FreeLargePage.
3140	* @param pVM Pointer to the VM.
3141	* @param idCpu The VCPU id.
3142	* @param pReq Pointer to the request packet.
3143	*/
3144	GMMR0DECL(int) GMMR0FreeLargePageReq(PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3145	{
3146	/*
3147	* Validate input and pass it on.
3148	*/
3149	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3150	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3151	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3152	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3153	VERR_INVALID_PARAMETER);
3154
3155	return GMMR0FreeLargePage(pVM, idCpu, pReq->idPage);
3156	}
3157
3158
3159	/**
3160	* Frees a chunk, giving it back to the host OS.
3161	*
3162	* @param pGMM Pointer to the GMM instance.
3163	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3164	* unmap and free the chunk in one go.
3165	* @param pChunk The chunk to free.
3166	* @param fRelaxedSem Whether we can release the semaphore while doing the
3167	* freeing (@c true) or not.
3168	*/
3169	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3170	{
3171	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3172
3173	GMMR0CHUNKMTXSTATE MtxState;
3174	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3175
3176	/*
3177	* Cleanup hack! Unmap the chunk from the callers address space.
3178	* This shouldn't happen, so screw lock contention...
3179	*/
3180	if ( pChunk->cMappingsX
3181	&& !pGMM->fLegacyAllocationMode
3182	&& pGVM)
3183	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3184
3185	/*
3186	* If there are current mappings of the chunk, then request the
3187	* VMs to unmap them. Reposition the chunk in the free list so
3188	* it won't be a likely candidate for allocations.
3189	*/
3190	if (pChunk->cMappingsX)
3191	{
3192	/** @todo R0 -> VM request */
3193	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3194	Log(("gmmR0FreeChunk: chunk still has %d/%d mappings; don't free!\n", pChunk->cMappingsX));
3195	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3196	return false;
3197	}
3198
3199
3200	/*
3201	* Save and trash the handle.
3202	*/
3203	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3204	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3205
3206	/*
3207	* Unlink it from everywhere.
3208	*/
3209	gmmR0UnlinkChunk(pChunk);
3210
3211	RTListNodeRemove(&pChunk->ListNode);
3212
3213	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3214	Assert(pCore == &pChunk->Core); NOREF(pCore);
3215
3216	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3217	if (pTlbe->pChunk == pChunk)
3218	{
3219	pTlbe->idChunk = NIL_GMM_CHUNKID;
3220	pTlbe->pChunk = NULL;
3221	}
3222
3223	Assert(pGMM->cChunks > 0);
3224	pGMM->cChunks--;
3225
3226	/*
3227	* Free the Chunk ID before dropping the locks and freeing the rest.
3228	*/
3229	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3230	pChunk->Core.Key = NIL_GMM_CHUNKID;
3231
3232	pGMM->cFreedChunks++;
3233
3234	gmmR0ChunkMutexRelease(&MtxState, NULL);
3235	if (fRelaxedSem)
3236	gmmR0MutexRelease(pGMM);
3237
3238	RTMemFree(pChunk->paMappingsX);
3239	pChunk->paMappingsX = NULL;
3240
3241	RTMemFree(pChunk);
3242
3243	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3244	AssertLogRelRC(rc);
3245
3246	if (fRelaxedSem)
3247	gmmR0MutexAcquire(pGMM);
3248	return fRelaxedSem;
3249	}
3250
3251
3252	/**
3253	* Free page worker.
3254	*
3255	* The caller does all the statistic decrementing, we do all the incrementing.
3256	*
3257	* @param pGMM Pointer to the GMM instance data.
3258	* @param pGVM Pointer to the GVM instance.
3259	* @param pChunk Pointer to the chunk this page belongs to.
3260	* @param idPage The Page ID.
3261	* @param pPage Pointer to the page.
3262	*/
3263	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3264	{
3265	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3266	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3267
3268	/*
3269	* Put the page on the free list.
3270	*/
3271	pPage->u = 0;
3272	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3273	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3274	pPage->Free.iNext = pChunk->iFreeHead;
3275	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3276
3277	/*
3278	* Update statistics (the cShared/cPrivate stats are up to date already),
3279	* and relink the chunk if necessary.
3280	*/
3281	unsigned const cFree = pChunk->cFree;
3282	if ( !cFree
3283	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3284	{
3285	gmmR0UnlinkChunk(pChunk);
3286	pChunk->cFree++;
3287	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3288	}
3289	else
3290	{
3291	pChunk->cFree = cFree + 1;
3292	pChunk->pSet->cFreePages++;
3293	}
3294
3295	/*
3296	* If the chunk becomes empty, consider giving memory back to the host OS.
3297	*
3298	* The current strategy is to try give it back if there are other chunks
3299	* in this free list, meaning if there are at least 240 free pages in this
3300	* category. Note that since there are probably mappings of the chunk,
3301	* it won't be freed up instantly, which probably screws up this logic
3302	* a bit...
3303	*/
3304	/** @todo Do this on the way out. */
3305	if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3306	&& pChunk->pFreeNext
3307	&& pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3308	&& !pGMM->fLegacyAllocationMode))
3309	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3310
3311	}
3312
3313
3314	/**
3315	* Frees a shared page, the page is known to exist and be valid and such.
3316	*
3317	* @param pGMM Pointer to the GMM instance.
3318	* @param pGVM Pointer to the GVM instance.
3319	* @param idPage The page id.
3320	* @param pPage The page structure.
3321	*/
3322	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3323	{
3324	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3325	Assert(pChunk);
3326	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3327	Assert(pChunk->cShared > 0);
3328	Assert(pGMM->cSharedPages > 0);
3329	Assert(pGMM->cAllocatedPages > 0);
3330	Assert(!pPage->Shared.cRefs);
3331
3332	pChunk->cShared--;
3333	pGMM->cAllocatedPages--;
3334	pGMM->cSharedPages--;
3335	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3336	}
3337
3338
3339	/**
3340	* Frees a private page, the page is known to exist and be valid and such.
3341	*
3342	* @param pGMM Pointer to the GMM instance.
3343	* @param pGVM Pointer to the GVM instance.
3344	* @param idPage The page id.
3345	* @param pPage The page structure.
3346	*/
3347	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3348	{
3349	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3350	Assert(pChunk);
3351	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3352	Assert(pChunk->cPrivate > 0);
3353	Assert(pGMM->cAllocatedPages > 0);
3354
3355	pChunk->cPrivate--;
3356	pGMM->cAllocatedPages--;
3357	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3358	}
3359
3360
3361	/**
3362	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3363	*
3364	* @returns VBox status code:
3365	* @retval xxx
3366	*
3367	* @param pGMM Pointer to the GMM instance data.
3368	* @param pGVM Pointer to the VM.
3369	* @param cPages The number of pages to free.
3370	* @param paPages Pointer to the page descriptors.
3371	* @param enmAccount The account this relates to.
3372	*/
3373	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3374	{
3375	/*
3376	* Check that the request isn't impossible wrt to the account status.
3377	*/
3378	switch (enmAccount)
3379	{
3380	case GMMACCOUNT_BASE:
3381	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3382	{
3383	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3384	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3385	}
3386	break;
3387	case GMMACCOUNT_SHADOW:
3388	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3389	{
3390	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3391	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3392	}
3393	break;
3394	case GMMACCOUNT_FIXED:
3395	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3396	{
3397	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3398	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3399	}
3400	break;
3401	default:
3402	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3403	}
3404
3405	/*
3406	* Walk the descriptors and free the pages.
3407	*
3408	* Statistics (except the account) are being updated as we go along,
3409	* unlike the alloc code. Also, stop on the first error.
3410	*/
3411	int rc = VINF_SUCCESS;
3412	uint32_t iPage;
3413	for (iPage = 0; iPage < cPages; iPage++)
3414	{
3415	uint32_t idPage = paPages[iPage].idPage;
3416	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3417	if (RT_LIKELY(pPage))
3418	{
3419	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3420	{
3421	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3422	{
3423	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3424	pGVM->gmm.s.Stats.cPrivatePages--;
3425	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3426	}
3427	else
3428	{
3429	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3430	pPage->Private.hGVM, pGVM->hSelf));
3431	rc = VERR_GMM_NOT_PAGE_OWNER;
3432	break;
3433	}
3434	}
3435	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3436	{
3437	Assert(pGVM->gmm.s.Stats.cSharedPages);
3438	Assert(pPage->Shared.cRefs);
3439	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3440	if (pPage->Shared.u14Checksum)
3441	{
3442	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3443	uChecksum &= UINT32_C(0x00003fff);
3444	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3445	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3446	}
3447	#endif
3448	pGVM->gmm.s.Stats.cSharedPages--;
3449	if (!--pPage->Shared.cRefs)
3450	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3451	else
3452	{
3453	Assert(pGMM->cDuplicatePages);
3454	pGMM->cDuplicatePages--;
3455	}
3456	}
3457	else
3458	{
3459	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3460	rc = VERR_GMM_PAGE_ALREADY_FREE;
3461	break;
3462	}
3463	}
3464	else
3465	{
3466	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3467	rc = VERR_GMM_PAGE_NOT_FOUND;
3468	break;
3469	}
3470	paPages[iPage].idPage = NIL_GMM_PAGEID;
3471	}
3472
3473	/*
3474	* Update the account.
3475	*/
3476	switch (enmAccount)
3477	{
3478	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3479	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3480	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3481	default:
3482	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3483	}
3484
3485	/*
3486	* Any threshold stuff to be done here?
3487	*/
3488
3489	return rc;
3490	}
3491
3492
3493	/**
3494	* Free one or more pages.
3495	*
3496	* This is typically used at reset time or power off.
3497	*
3498	* @returns VBox status code:
3499	* @retval xxx
3500	*
3501	* @param pVM Pointer to the VM.
3502	* @param idCpu The VCPU id.
3503	* @param cPages The number of pages to allocate.
3504	* @param paPages Pointer to the page descriptors containing the Page IDs for each page.
3505	* @param enmAccount The account this relates to.
3506	* @thread EMT.
3507	*/
3508	GMMR0DECL(int) GMMR0FreePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3509	{
3510	LogFlow(("GMMR0FreePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
3511
3512	/*
3513	* Validate input and get the basics.
3514	*/
3515	PGMM pGMM;
3516	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3517	PGVM pGVM;
3518	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3519	if (RT_FAILURE(rc))
3520	return rc;
3521
3522	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3523	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3524	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3525
3526	for (unsigned iPage = 0; iPage < cPages; iPage++)
3527	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3528	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3529	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3530
3531	/*
3532	* Take the semaphore and call the worker function.
3533	*/
3534	gmmR0MutexAcquire(pGMM);
3535	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3536	{
3537	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3538	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3539	}
3540	else
3541	rc = VERR_GMM_IS_NOT_SANE;
3542	gmmR0MutexRelease(pGMM);
3543	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3544	return rc;
3545	}
3546
3547
3548	/**
3549	* VMMR0 request wrapper for GMMR0FreePages.
3550	*
3551	* @returns see GMMR0FreePages.
3552	* @param pVM Pointer to the VM.
3553	* @param idCpu The VCPU id.
3554	* @param pReq Pointer to the request packet.
3555	*/
3556	GMMR0DECL(int) GMMR0FreePagesReq(PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3557	{
3558	/*
3559	* Validate input and pass it on.
3560	*/
3561	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3562	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3563	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3564	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3565	VERR_INVALID_PARAMETER);
3566	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3567	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3568	VERR_INVALID_PARAMETER);
3569
3570	return GMMR0FreePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3571	}
3572
3573
3574	/**
3575	* Report back on a memory ballooning request.
3576	*
3577	* The request may or may not have been initiated by the GMM. If it was initiated
3578	* by the GMM it is important that this function is called even if no pages were
3579	* ballooned.
3580	*
3581	* @returns VBox status code:
3582	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3583	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3584	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3585	* indicating that we won't necessarily have sufficient RAM to boot
3586	* the VM again and that it should pause until this changes (we'll try
3587	* balloon some other VM). (For standard deflate we have little choice
3588	* but to hope the VM won't use the memory that was returned to it.)
3589	*
3590	* @param pVM Pointer to the VM.
3591	* @param idCpu The VCPU id.
3592	* @param enmAction Inflate/deflate/reset.
3593	* @param cBalloonedPages The number of pages that was ballooned.
3594	*
3595	* @thread EMT.
3596	*/
3597	GMMR0DECL(int) GMMR0BalloonedPages(PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3598	{
3599	LogFlow(("GMMR0BalloonedPages: pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3600	pVM, enmAction, cBalloonedPages));
3601
3602	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3603
3604	/*
3605	* Validate input and get the basics.
3606	*/
3607	PGMM pGMM;
3608	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3609	PGVM pGVM;
3610	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3611	if (RT_FAILURE(rc))
3612	return rc;
3613
3614	/*
3615	* Take the semaphore and do some more validations.
3616	*/
3617	gmmR0MutexAcquire(pGMM);
3618	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3619	{
3620	switch (enmAction)
3621	{
3622	case GMMBALLOONACTION_INFLATE:
3623	{
3624	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3625	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3626	{
3627	/*
3628	* Record the ballooned memory.
3629	*/
3630	pGMM->cBalloonedPages += cBalloonedPages;
3631	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3632	{
3633	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3634	AssertFailed();
3635
3636	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3637	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3638	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3639	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3640	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3641	}
3642	else
3643	{
3644	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3645	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3646	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3647	}
3648	}
3649	else
3650	{
3651	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3652	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3653	pGVM->gmm.s.Stats.Reserved.cBasePages));
3654	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3655	}
3656	break;
3657	}
3658
3659	case GMMBALLOONACTION_DEFLATE:
3660	{
3661	/* Deflate. */
3662	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3663	{
3664	/*
3665	* Record the ballooned memory.
3666	*/
3667	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3668	pGMM->cBalloonedPages -= cBalloonedPages;
3669	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3670	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3671	{
3672	AssertFailed(); /* This is path is for later. */
3673	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3674	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3675
3676	/*
3677	* Anything we need to do here now when the request has been completed?
3678	*/
3679	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3680	}
3681	else
3682	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3683	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3684	}
3685	else
3686	{
3687	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3688	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3689	}
3690	break;
3691	}
3692
3693	case GMMBALLOONACTION_RESET:
3694	{
3695	/* Reset to an empty balloon. */
3696	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3697
3698	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3699	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3700	break;
3701	}
3702
3703	default:
3704	rc = VERR_INVALID_PARAMETER;
3705	break;
3706	}
3707	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3708	}
3709	else
3710	rc = VERR_GMM_IS_NOT_SANE;
3711
3712	gmmR0MutexRelease(pGMM);
3713	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3714	return rc;
3715	}
3716
3717
3718	/**
3719	* VMMR0 request wrapper for GMMR0BalloonedPages.
3720	*
3721	* @returns see GMMR0BalloonedPages.
3722	* @param pVM Pointer to the VM.
3723	* @param idCpu The VCPU id.
3724	* @param pReq Pointer to the request packet.
3725	*/
3726	GMMR0DECL(int) GMMR0BalloonedPagesReq(PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3727	{
3728	/*
3729	* Validate input and pass it on.
3730	*/
3731	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3732	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3733	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3734	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3735	VERR_INVALID_PARAMETER);
3736
3737	return GMMR0BalloonedPages(pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3738	}
3739
3740	/**
3741	* Return memory statistics for the hypervisor
3742	*
3743	* @returns VBox status code:
3744	* @param pVM Pointer to the VM.
3745	* @param pReq Pointer to the request packet.
3746	*/
3747	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PVM pVM, PGMMMEMSTATSREQ pReq)
3748	{
3749	/*
3750	* Validate input and pass it on.
3751	*/
3752	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3753	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3754	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3755	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3756	VERR_INVALID_PARAMETER);
3757
3758	/*
3759	* Validate input and get the basics.
3760	*/
3761	PGMM pGMM;
3762	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3763	pReq->cAllocPages = pGMM->cAllocatedPages;
3764	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3765	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3766	pReq->cMaxPages = pGMM->cMaxPages;
3767	pReq->cSharedPages = pGMM->cDuplicatePages;
3768	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3769
3770	return VINF_SUCCESS;
3771	}
3772
3773	/**
3774	* Return memory statistics for the VM
3775	*
3776	* @returns VBox status code:
3777	* @param pVM Pointer to the VM.
3778	* @parma idCpu Cpu id.
3779	* @param pReq Pointer to the request packet.
3780	*/
3781	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3782	{
3783	/*
3784	* Validate input and pass it on.
3785	*/
3786	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3787	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3788	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3789	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3790	VERR_INVALID_PARAMETER);
3791
3792	/*
3793	* Validate input and get the basics.
3794	*/
3795	PGMM pGMM;
3796	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3797	PGVM pGVM;
3798	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3799	if (RT_FAILURE(rc))
3800	return rc;
3801
3802	/*
3803	* Take the semaphore and do some more validations.
3804	*/
3805	gmmR0MutexAcquire(pGMM);
3806	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3807	{
3808	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3809	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3810	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3811	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3812	}
3813	else
3814	rc = VERR_GMM_IS_NOT_SANE;
3815
3816	gmmR0MutexRelease(pGMM);
3817	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3818	return rc;
3819	}
3820
3821
3822	/**
3823	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3824	*
3825	* Don't call this in legacy allocation mode!
3826	*
3827	* @returns VBox status code.
3828	* @param pGMM Pointer to the GMM instance data.
3829	* @param pGVM Pointer to the Global VM structure.
3830	* @param pChunk Pointer to the chunk to be unmapped.
3831	*/
3832	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3833	{
3834	Assert(!pGMM->fLegacyAllocationMode);
3835
3836	/*
3837	* Find the mapping and try unmapping it.
3838	*/
3839	uint32_t cMappings = pChunk->cMappingsX;
3840	for (uint32_t i = 0; i < cMappings; i++)
3841	{
3842	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3843	if (pChunk->paMappingsX[i].pGVM == pGVM)
3844	{
3845	/* unmap */
3846	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3847	if (RT_SUCCESS(rc))
3848	{
3849	/* update the record. */
3850	cMappings--;
3851	if (i < cMappings)
3852	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3853	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3854	pChunk->paMappingsX[cMappings].pGVM = NULL;
3855	Assert(pChunk->cMappingsX - 1U == cMappings);
3856	pChunk->cMappingsX = cMappings;
3857	}
3858
3859	return rc;
3860	}
3861	}
3862
3863	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3864	return VERR_GMM_CHUNK_NOT_MAPPED;
3865	}
3866
3867
3868	/**
3869	* Unmaps a chunk previously mapped into the address space of the current process.
3870	*
3871	* @returns VBox status code.
3872	* @param pGMM Pointer to the GMM instance data.
3873	* @param pGVM Pointer to the Global VM structure.
3874	* @param pChunk Pointer to the chunk to be unmapped.
3875	*/
3876	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3877	{
3878	if (!pGMM->fLegacyAllocationMode)
3879	{
3880	/*
3881	* Lock the chunk and if possible leave the giant GMM lock.
3882	*/
3883	GMMR0CHUNKMTXSTATE MtxState;
3884	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3885	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3886	if (RT_SUCCESS(rc))
3887	{
3888	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3889	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3890	}
3891	return rc;
3892	}
3893
3894	if (pChunk->hGVM == pGVM->hSelf)
3895	return VINF_SUCCESS;
3896
3897	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3898	return VERR_GMM_CHUNK_NOT_MAPPED;
3899	}
3900
3901
3902	/**
3903	* Worker for gmmR0MapChunk.
3904	*
3905	* @returns VBox status code.
3906	* @param pGMM Pointer to the GMM instance data.
3907	* @param pGVM Pointer to the Global VM structure.
3908	* @param pChunk Pointer to the chunk to be mapped.
3909	* @param ppvR3 Where to store the ring-3 address of the mapping.
3910	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3911	* contain the address of the existing mapping.
3912	*/
3913	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3914	{
3915	/*
3916	* If we're in legacy mode this is simple.
3917	*/
3918	if (pGMM->fLegacyAllocationMode)
3919	{
3920	if (pChunk->hGVM != pGVM->hSelf)
3921	{
3922	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3923	return VERR_GMM_CHUNK_NOT_FOUND;
3924	}
3925
3926	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3927	return VINF_SUCCESS;
3928	}
3929
3930	/*
3931	* Check to see if the chunk is already mapped.
3932	*/
3933	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3934	{
3935	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3936	if (pChunk->paMappingsX[i].pGVM == pGVM)
3937	{
3938	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3939	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3940	#ifdef VBOX_WITH_PAGE_SHARING
3941	/* The ring-3 chunk cache can be out of sync; don't fail. */
3942	return VINF_SUCCESS;
3943	#else
3944	return VERR_GMM_CHUNK_ALREADY_MAPPED;
3945	#endif
3946	}
3947	}
3948
3949	/*
3950	* Do the mapping.
3951	*/
3952	RTR0MEMOBJ hMapObj;
3953	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
3954	if (RT_SUCCESS(rc))
3955	{
3956	/* reallocate the array? assumes few users per chunk (usually one). */
3957	unsigned iMapping = pChunk->cMappingsX;
3958	if ( iMapping <= 3
3959	\|\| (iMapping & 3) == 0)
3960	{
3961	unsigned cNewSize = iMapping <= 3
3962	? iMapping + 1
3963	: iMapping + 4;
3964	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
3965	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
3966	{
3967	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3968	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
3969	}
3970
3971	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
3972	if (RT_UNLIKELY(!pvMappings))
3973	{
3974	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3975	return VERR_NO_MEMORY;
3976	}
3977	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
3978	}
3979
3980	/* insert new entry */
3981	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
3982	pChunk->paMappingsX[iMapping].pGVM = pGVM;
3983	Assert(pChunk->cMappingsX == iMapping);
3984	pChunk->cMappingsX = iMapping + 1;
3985
3986	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
3987	}
3988
3989	return rc;
3990	}
3991
3992
3993	/**
3994	* Maps a chunk into the user address space of the current process.
3995	*
3996	* @returns VBox status code.
3997	* @param pGMM Pointer to the GMM instance data.
3998	* @param pGVM Pointer to the Global VM structure.
3999	* @param pChunk Pointer to the chunk to be mapped.
4000	* @param fRelaxedSem Whether we can release the semaphore while doing the
4001	* mapping (@c true) or not.
4002	* @param ppvR3 Where to store the ring-3 address of the mapping.
4003	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4004	* contain the address of the existing mapping.
4005	*/
4006	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4007	{
4008	/*
4009	* Take the chunk lock and leave the giant GMM lock when possible, then
4010	* call the worker function.
4011	*/
4012	GMMR0CHUNKMTXSTATE MtxState;
4013	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4014	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4015	if (RT_SUCCESS(rc))
4016	{
4017	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4018	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4019	}
4020
4021	return rc;
4022	}
4023
4024
4025
4026	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4027	/**
4028	* Check if a chunk is mapped into the specified VM
4029	*
4030	* @returns mapped yes/no
4031	* @param pGMM Pointer to the GMM instance.
4032	* @param pGVM Pointer to the Global VM structure.
4033	* @param pChunk Pointer to the chunk to be mapped.
4034	* @param ppvR3 Where to store the ring-3 address of the mapping.
4035	*/
4036	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4037	{
4038	GMMR0CHUNKMTXSTATE MtxState;
4039	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4040	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4041	{
4042	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4043	if (pChunk->paMappingsX[i].pGVM == pGVM)
4044	{
4045	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4046	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4047	return true;
4048	}
4049	}
4050	*ppvR3 = NULL;
4051	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4052	return false;
4053	}
4054	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
4055
4056
4057	/**
4058	* Map a chunk and/or unmap another chunk.
4059	*
4060	* The mapping and unmapping applies to the current process.
4061	*
4062	* This API does two things because it saves a kernel call per mapping when
4063	* when the ring-3 mapping cache is full.
4064	*
4065	* @returns VBox status code.
4066	* @param pVM The VM.
4067	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4068	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4069	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4070	* @thread EMT
4071	*/
4072	GMMR0DECL(int) GMMR0MapUnmapChunk(PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4073	{
4074	LogFlow(("GMMR0MapUnmapChunk: pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4075	pVM, idChunkMap, idChunkUnmap, ppvR3));
4076
4077	/*
4078	* Validate input and get the basics.
4079	*/
4080	PGMM pGMM;
4081	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4082	PGVM pGVM;
4083	int rc = GVMMR0ByVM(pVM, &pGVM);
4084	if (RT_FAILURE(rc))
4085	return rc;
4086
4087	AssertCompile(NIL_GMM_CHUNKID == 0);
4088	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4089	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4090
4091	if ( idChunkMap == NIL_GMM_CHUNKID
4092	&& idChunkUnmap == NIL_GMM_CHUNKID)
4093	return VERR_INVALID_PARAMETER;
4094
4095	if (idChunkMap != NIL_GMM_CHUNKID)
4096	{
4097	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4098	*ppvR3 = NIL_RTR3PTR;
4099	}
4100
4101	/*
4102	* Take the semaphore and do the work.
4103	*
4104	* The unmapping is done last since it's easier to undo a mapping than
4105	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4106	* that it pushes the user virtual address space to within a chunk of
4107	* it it's limits, so, no problem here.
4108	*/
4109	gmmR0MutexAcquire(pGMM);
4110	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4111	{
4112	PGMMCHUNK pMap = NULL;
4113	if (idChunkMap != NIL_GVM_HANDLE)
4114	{
4115	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4116	if (RT_LIKELY(pMap))
4117	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4118	else
4119	{
4120	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4121	rc = VERR_GMM_CHUNK_NOT_FOUND;
4122	}
4123	}
4124	/** @todo split this operation, the bail out might (theoretcially) not be
4125	* entirely safe. */
4126
4127	if ( idChunkUnmap != NIL_GMM_CHUNKID
4128	&& RT_SUCCESS(rc))
4129	{
4130	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4131	if (RT_LIKELY(pUnmap))
4132	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4133	else
4134	{
4135	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4136	rc = VERR_GMM_CHUNK_NOT_FOUND;
4137	}
4138
4139	if (RT_FAILURE(rc) && pMap)
4140	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4141	}
4142
4143	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4144	}
4145	else
4146	rc = VERR_GMM_IS_NOT_SANE;
4147	gmmR0MutexRelease(pGMM);
4148
4149	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4150	return rc;
4151	}
4152
4153
4154	/**
4155	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4156	*
4157	* @returns see GMMR0MapUnmapChunk.
4158	* @param pVM Pointer to the VM.
4159	* @param pReq Pointer to the request packet.
4160	*/
4161	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4162	{
4163	/*
4164	* Validate input and pass it on.
4165	*/
4166	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4167	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4168	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4169
4170	return GMMR0MapUnmapChunk(pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4171	}
4172
4173
4174	/**
4175	* Legacy mode API for supplying pages.
4176	*
4177	* The specified user address points to a allocation chunk sized block that
4178	* will be locked down and used by the GMM when the GM asks for pages.
4179	*
4180	* @returns VBox status code.
4181	* @param pVM Pointer to the VM.
4182	* @param idCpu The VCPU id.
4183	* @param pvR3 Pointer to the chunk size memory block to lock down.
4184	*/
4185	GMMR0DECL(int) GMMR0SeedChunk(PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4186	{
4187	/*
4188	* Validate input and get the basics.
4189	*/
4190	PGMM pGMM;
4191	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4192	PGVM pGVM;
4193	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4194	if (RT_FAILURE(rc))
4195	return rc;
4196
4197	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4198	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4199
4200	if (!pGMM->fLegacyAllocationMode)
4201	{
4202	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4203	return VERR_NOT_SUPPORTED;
4204	}
4205
4206	/*
4207	* Lock the memory and add it as new chunk with our hGVM.
4208	* (The GMM locking is done inside gmmR0RegisterChunk.)
4209	*/
4210	RTR0MEMOBJ MemObj;
4211	rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4212	if (RT_SUCCESS(rc))
4213	{
4214	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /fChunkFlags/, NULL);
4215	if (RT_SUCCESS(rc))
4216	gmmR0MutexRelease(pGMM);
4217	else
4218	RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4219	}
4220
4221	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4222	return rc;
4223	}
4224
4225	#ifdef VBOX_WITH_PAGE_SHARING
4226
4227	# ifdef VBOX_STRICT
4228	/**
4229	* For checksumming shared pages in strict builds.
4230	*
4231	* The purpose is making sure that a page doesn't change.
4232	*
4233	* @returns Checksum, 0 on failure.
4234	* @param GMM The GMM instance data.
4235	* @param idPage The page ID.
4236	*/
4237	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4238	{
4239	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4240	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4241
4242	uint8_t *pbChunk;
4243	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4244	return 0;
4245	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4246
4247	return RTCrc32(pbPage, PAGE_SIZE);
4248	}
4249	# endif /* VBOX_STRICT */
4250
4251
4252	/**
4253	* Calculates the module hash value.
4254	*
4255	* @returns Hash value.
4256	* @param pszModuleName The module name.
4257	* @param pszVersion The module version string.
4258	*/
4259	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4260	{
4261	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4262	}
4263
4264
4265	/**
4266	* Finds a global module.
4267	*
4268	* @returns Pointer to the global module on success, NULL if not found.
4269	* @param pGMM The GMM instance data.
4270	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4271	* @param cbModule The module size.
4272	* @param enmGuestOS The guest OS type.
4273	* @param pszModuleName The module name.
4274	* @param pszVersion The module version.
4275	*/
4276	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4277	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4278	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4279	{
4280	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4281	pGblMod;
4282	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4283	{
4284	if (pGblMod->cbModule != cbModule)
4285	continue;
4286	if (pGblMod->enmGuestOS != enmGuestOS)
4287	continue;
4288	if (pGblMod->cRegions != cRegions)
4289	continue;
4290	if (strcmp(pGblMod->szName, pszModuleName))
4291	continue;
4292	if (strcmp(pGblMod->szVersion, pszVersion))
4293	continue;
4294
4295	uint32_t i;
4296	for (i = 0; i < cRegions; i++)
4297	{
4298	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4299	if (pGblMod->aRegions[i].off != off)
4300	break;
4301
4302	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4303	if (pGblMod->aRegions[i].cb != cb)
4304	break;
4305	}
4306
4307	if (i == cRegions)
4308	return pGblMod;
4309	}
4310
4311	return NULL;
4312	}
4313
4314
4315	/**
4316	* Creates a new global module.
4317	*
4318	* @returns VBox status code.
4319	* @param pGMM The GMM instance data.
4320	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4321	* @param cbModule The module size.
4322	* @param enmGuestOS The guest OS type.
4323	* @param cRegions The number of regions.
4324	* @param pszModuleName The module name.
4325	* @param pszVersion The module version.
4326	* @param paRegions The region descriptions.
4327	* @param ppGblMod Where to return the new module on success.
4328	*/
4329	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4330	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4331	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4332	{
4333	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, cRegions));
4334	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4335	{
4336	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4337	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4338	}
4339
4340	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULE, aRegions[cRegions]));
4341	if (!pGblMod)
4342	{
4343	Log(("gmmR0ShModNewGlobal: No memory\n"));
4344	return VERR_NO_MEMORY;
4345	}
4346
4347	pGblMod->Core.Key = uHash;
4348	pGblMod->cbModule = cbModule;
4349	pGblMod->cRegions = cRegions;
4350	pGblMod->cUsers = 1;
4351	pGblMod->enmGuestOS = enmGuestOS;
4352	strcpy(pGblMod->szName, pszModuleName);
4353	strcpy(pGblMod->szVersion, pszVersion);
4354
4355	for (uint32_t i = 0; i < cRegions; i++)
4356	{
4357	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4358	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4359	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4360	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4361	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4362	}
4363
4364	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4365	Assert(fInsert); NOREF(fInsert);
4366	pGMM->cShareableModules++;
4367
4368	*ppGblMod = pGblMod;
4369	return VINF_SUCCESS;
4370	}
4371
4372
4373	/**
4374	* Deletes a global module which is no longer referenced by anyone.
4375	*
4376	* @param pGMM The GMM instance data.
4377	* @param pGblMod The module to delete.
4378	*/
4379	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4380	{
4381	Assert(pGblMod->cUsers == 0);
4382	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4383
4384	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4385	Assert(pvTest == pGblMod); NOREF(pvTest);
4386	pGMM->cShareableModules--;
4387
4388	uint32_t i = pGblMod->cRegions;
4389	while (i-- > 0)
4390	{
4391	if (pGblMod->aRegions[i].paidPages)
4392	{
4393	/* We don't doing anything to the pages as they are handled by the
4394	copy-on-write mechanism in PGM. */
4395	RTMemFree(pGblMod->aRegions[i].paidPages);
4396	pGblMod->aRegions[i].paidPages = NULL;
4397	}
4398	}
4399	RTMemFree(pGblMod);
4400	}
4401
4402
4403	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4404	PGMMSHAREDMODULEPERVM *ppRecVM)
4405	{
4406	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4407	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4408
4409	PGMMSHAREDMODULEPERVM pRecVM;
4410	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4411	if (!pRecVM)
4412	return VERR_NO_MEMORY;
4413
4414	pRecVM->Core.Key = GCBaseAddr;
4415	for (uint32_t i = 0; i < cRegions; i++)
4416	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4417
4418	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4419	Assert(fInsert); NOREF(fInsert);
4420	pGVM->gmm.s.Stats.cShareableModules++;
4421
4422	*ppRecVM = pRecVM;
4423	return VINF_SUCCESS;
4424	}
4425
4426
4427	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4428	{
4429	/*
4430	* Free the per-VM module.
4431	*/
4432	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4433	pRecVM->pGlobalModule = NULL;
4434
4435	if (fRemove)
4436	{
4437	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4438	Assert(pvTest == &pRecVM->Core);
4439	}
4440
4441	RTMemFree(pRecVM);
4442
4443	/*
4444	* Release the global module.
4445	* (In the registration bailout case, it might not be.)
4446	*/
4447	if (pGblMod)
4448	{
4449	Assert(pGblMod->cUsers > 0);
4450	pGblMod->cUsers--;
4451	if (pGblMod->cUsers == 0)
4452	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4453	}
4454	}
4455
4456	#endif /* VBOX_WITH_PAGE_SHARING */
4457
4458	/**
4459	* Registers a new shared module for the VM.
4460	*
4461	* @returns VBox status code.
4462	* @param pVM Pointer to the VM.
4463	* @param idCpu The VCPU id.
4464	* @param enmGuestOS The guest OS type.
4465	* @param pszModuleName The module name.
4466	* @param pszVersion The module version.
4467	* @param GCPtrModBase The module base address.
4468	* @param cbModule The module size.
4469	* @param cRegions The mumber of shared region descriptors.
4470	* @param paRegions Pointer to an array of shared region(s).
4471	*/
4472	GMMR0DECL(int) GMMR0RegisterSharedModule(PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4473	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4474	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4475	{
4476	#ifdef VBOX_WITH_PAGE_SHARING
4477	/*
4478	* Validate input and get the basics.
4479	*
4480	* Note! Turns out the module size does necessarily match the size of the
4481	* regions. (iTunes on XP)
4482	*/
4483	PGMM pGMM;
4484	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4485	PGVM pGVM;
4486	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4487	if (RT_FAILURE(rc))
4488	return rc;
4489
4490	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4491	return VERR_GMM_TOO_MANY_REGIONS;
4492
4493	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4494	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4495
4496	uint32_t cbTotal = 0;
4497	for (uint32_t i = 0; i < cRegions; i++)
4498	{
4499	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4500	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4501
4502	cbTotal += paRegions[i].cbRegion;
4503	if (RT_UNLIKELY(cbTotal > _1G))
4504	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4505	}
4506
4507	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4508	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4509	return VERR_GMM_MODULE_NAME_TOO_LONG;
4510
4511	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4512	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4513	return VERR_GMM_MODULE_NAME_TOO_LONG;
4514
4515	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4516	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4517
4518	/*
4519	* Take the semaphore and do some more validations.
4520	*/
4521	gmmR0MutexAcquire(pGMM);
4522	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4523	{
4524	/*
4525	* Check if this module is already locally registered and register
4526	* it if it isn't. The base address is a unique module identifier
4527	* locally.
4528	*/
4529	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4530	bool fNewModule = pRecVM == NULL;
4531	if (fNewModule)
4532	{
4533	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4534	if (RT_SUCCESS(rc))
4535	{
4536	/*
4537	* Find a matching global module, register a new one if needed.
4538	*/
4539	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4540	pszModuleName, pszVersion, paRegions);
4541	if (!pGblMod)
4542	{
4543	Assert(fNewModule);
4544	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4545	pszModuleName, pszVersion, paRegions, &pGblMod);
4546	if (RT_SUCCESS(rc))
4547	{
4548	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4549	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4550	}
4551	else
4552	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4553	}
4554	else
4555	{
4556	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4557	pGblMod->cUsers++;
4558	pRecVM->pGlobalModule = pGblMod;
4559
4560	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4561	}
4562	}
4563	}
4564	else
4565	{
4566	/*
4567	* Attempt to re-register an existing module.
4568	*/
4569	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4570	pszModuleName, pszVersion, paRegions);
4571	if (pRecVM->pGlobalModule == pGblMod)
4572	{
4573	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4574	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4575	}
4576	else
4577	{
4578	/** @todo may have to unregister+register when this happens in case it's caused
4579	* by VBoxService crashing and being restarted... */
4580	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4581	" incoming at %RGvLB%#x %s %s rgns %u\n"
4582	" existing at %RGvLB%#x %s %s rgns %u\n",
4583	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4584	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4585	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4586	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4587	}
4588	}
4589	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4590	}
4591	else
4592	rc = VERR_GMM_IS_NOT_SANE;
4593
4594	gmmR0MutexRelease(pGMM);
4595	return rc;
4596	#else
4597
4598	NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4599	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4600	return VERR_NOT_IMPLEMENTED;
4601	#endif
4602	}
4603
4604
4605	/**
4606	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4607	*
4608	* @returns see GMMR0RegisterSharedModule.
4609	* @param pVM Pointer to the VM.
4610	* @param idCpu The VCPU id.
4611	* @param pReq Pointer to the request packet.
4612	*/
4613	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4614	{
4615	/*
4616	* Validate input and pass it on.
4617	*/
4618	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4619	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4620	AssertMsgReturn(pReq->Hdr.cbReq >= sizeof(pReq) && pReq->Hdr.cbReq == RT_UOFFSETOF(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4621
4622	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4623	pReq->rc = GMMR0RegisterSharedModule(pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4624	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4625	return VINF_SUCCESS;
4626	}
4627
4628
4629	/**
4630	* Unregisters a shared module for the VM
4631	*
4632	* @returns VBox status code.
4633	* @param pVM Pointer to the VM.
4634	* @param idCpu The VCPU id.
4635	* @param pszModuleName The module name.
4636	* @param pszVersion The module version.
4637	* @param GCPtrModBase The module base address.
4638	* @param cbModule The module size.
4639	*/
4640	GMMR0DECL(int) GMMR0UnregisterSharedModule(PVM pVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4641	RTGCPTR GCPtrModBase, uint32_t cbModule)
4642	{
4643	#ifdef VBOX_WITH_PAGE_SHARING
4644	/*
4645	* Validate input and get the basics.
4646	*/
4647	PGMM pGMM;
4648	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4649	PGVM pGVM;
4650	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4651	if (RT_FAILURE(rc))
4652	return rc;
4653
4654	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4655	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4656	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4657	return VERR_GMM_MODULE_NAME_TOO_LONG;
4658	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4659	return VERR_GMM_MODULE_NAME_TOO_LONG;
4660
4661	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4662
4663	/*
4664	* Take the semaphore and do some more validations.
4665	*/
4666	gmmR0MutexAcquire(pGMM);
4667	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4668	{
4669	/*
4670	* Locate and remove the specified module.
4671	*/
4672	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4673	if (pRecVM)
4674	{
4675	/** @todo Do we need to do more validations here, like that the
4676	* name + version + cbModule matches? */
4677	Assert(pRecVM->pGlobalModule);
4678	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4679	}
4680	else
4681	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4682
4683	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4684	}
4685	else
4686	rc = VERR_GMM_IS_NOT_SANE;
4687
4688	gmmR0MutexRelease(pGMM);
4689	return rc;
4690	#else
4691
4692	NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4693	return VERR_NOT_IMPLEMENTED;
4694	#endif
4695	}
4696
4697
4698	/**
4699	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4700	*
4701	* @returns see GMMR0UnregisterSharedModule.
4702	* @param pVM Pointer to the VM.
4703	* @param idCpu The VCPU id.
4704	* @param pReq Pointer to the request packet.
4705	*/
4706	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4707	{
4708	/*
4709	* Validate input and pass it on.
4710	*/
4711	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4712	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4713	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4714
4715	return GMMR0UnregisterSharedModule(pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4716	}
4717
4718	#ifdef VBOX_WITH_PAGE_SHARING
4719
4720	/**
4721	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4722	*
4723	* @param pGMM Pointer to the GMM instance.
4724	* @param pGVM Pointer to the GVM instance.
4725	* @param pPage The page structure.
4726	*/
4727	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4728	{
4729	Assert(pGMM->cSharedPages > 0);
4730	Assert(pGMM->cAllocatedPages > 0);
4731
4732	pGMM->cDuplicatePages++;
4733
4734	pPage->Shared.cRefs++;
4735	pGVM->gmm.s.Stats.cSharedPages++;
4736	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4737	}
4738
4739
4740	/**
4741	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4742	*
4743	* @param pGMM Pointer to the GMM instance.
4744	* @param pGVM Pointer to the GVM instance.
4745	* @param HCPhys Host physical address
4746	* @param idPage The Page ID
4747	* @param pPage The page structure.
4748	*/
4749	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4750	PGMMSHAREDPAGEDESC pPageDesc)
4751	{
4752	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4753	Assert(pChunk);
4754	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4755	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4756
4757	pChunk->cPrivate--;
4758	pChunk->cShared++;
4759
4760	pGMM->cSharedPages++;
4761
4762	pGVM->gmm.s.Stats.cSharedPages++;
4763	pGVM->gmm.s.Stats.cPrivatePages--;
4764
4765	/* Modify the page structure. */
4766	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4767	pPage->Shared.cRefs = 1;
4768	#ifdef VBOX_STRICT
4769	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4770	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4771	#else
4772	pPage->Shared.u14Checksum = 0;
4773	#endif
4774	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
4775	}
4776
4777
4778	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4779	unsigned idxRegion, unsigned idxPage,
4780	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4781	{
4782	/* Easy case: just change the internal page type. */
4783	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
4784	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
4785	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
4786	VERR_PGM_PHYS_INVALID_PAGE_ID);
4787
4788	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4789
4790	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
4791
4792	/* Keep track of these references. */
4793	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
4794
4795	return VINF_SUCCESS;
4796	}
4797
4798	/**
4799	* Checks specified shared module range for changes
4800	*
4801	* Performs the following tasks:
4802	* - If a shared page is new, then it changes the GMM page type to shared and
4803	* returns it in the pPageDesc descriptor.
4804	* - If a shared page already exists, then it checks if the VM page is
4805	* identical and if so frees the VM page and returns the shared page in
4806	* pPageDesc descriptor.
4807	*
4808	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
4809	*
4810	* @returns VBox status code.
4811	* @param pGMM Pointer to the GMM instance data.
4812	* @param pGVM Pointer to the GVM instance data.
4813	* @param pModule Module description
4814	* @param idxRegion Region index
4815	* @param idxPage Page index
4816	* @param paPageDesc Page descriptor
4817	*/
4818	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
4819	PGMMSHAREDPAGEDESC pPageDesc)
4820	{
4821	int rc;
4822	PGMM pGMM;
4823	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4824	pPageDesc->u32StrictChecksum = 0;
4825
4826	AssertMsgReturn(idxRegion < pModule->cRegions,
4827	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4828	VERR_INVALID_PARAMETER);
4829
4830	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
4831	AssertMsgReturn(idxPage < cPages,
4832	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4833	VERR_INVALID_PARAMETER);
4834
4835	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4836
4837	/*
4838	* First time; create a page descriptor array.
4839	*/
4840	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4841	if (!pGlobalRegion->paidPages)
4842	{
4843	Log(("Allocate page descriptor array for %d pages\n", cPages));
4844	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
4845	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
4846
4847	/* Invalidate all descriptors. */
4848	uint32_t i = cPages;
4849	while (i-- > 0)
4850	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
4851	}
4852
4853	/*
4854	* We've seen this shared page for the first time?
4855	*/
4856	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
4857	{
4858	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4859	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4860	}
4861
4862	/*
4863	* We've seen it before...
4864	*/
4865	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
4866	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
4867	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
4868
4869	/*
4870	* Get the shared page source.
4871	*/
4872	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
4873	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
4874	VERR_PGM_PHYS_INVALID_PAGE_ID);
4875
4876	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4877	{
4878	/*
4879	* Page was freed at some point; invalidate this entry.
4880	*/
4881	/** @todo this isn't really bullet proof. */
4882	Log(("Old shared page was freed -> create a new one\n"));
4883	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
4884	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4885	}
4886
4887	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4888
4889	/*
4890	* Calculate the virtual address of the local page.
4891	*/
4892	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
4893	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
4894	VERR_PGM_PHYS_INVALID_PAGE_ID);
4895
4896	uint8_t *pbChunk;
4897	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
4898	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
4899	VERR_PGM_PHYS_INVALID_PAGE_ID);
4900	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4901
4902	/*
4903	* Calculate the virtual address of the shared page.
4904	*/
4905	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
4906	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4907
4908	/*
4909	* Get the virtual address of the physical page; map the chunk into the VM
4910	* process if not already done.
4911	*/
4912	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4913	{
4914	Log(("Map chunk into process!\n"));
4915	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
4916	AssertRCReturn(rc, rc);
4917	}
4918	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4919
4920	#ifdef VBOX_STRICT
4921	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
4922	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
4923	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
4924	("%#x vs %#x - idPage=%# - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
4925	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
4926	#endif
4927
4928	/** @todo write ASMMemComparePage. */
4929	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
4930	{
4931	Log(("Unexpected differences found between local and shared page; skip\n"));
4932	/* Signal to the caller that this one hasn't changed. */
4933	pPageDesc->idPage = NIL_GMM_PAGEID;
4934	return VINF_SUCCESS;
4935	}
4936
4937	/*
4938	* Free the old local page.
4939	*/
4940	GMMFREEPAGEDESC PageDesc;
4941	PageDesc.idPage = pPageDesc->idPage;
4942	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
4943	AssertRCReturn(rc, rc);
4944
4945	gmmR0UseSharedPage(pGMM, pGVM, pPage);
4946
4947	/*
4948	* Pass along the new physical address & page id.
4949	*/
4950	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
4951	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
4952
4953	return VINF_SUCCESS;
4954	}
4955
4956
4957	/**
4958	* RTAvlGCPtrDestroy callback.
4959	*
4960	* @returns 0 or VERR_GMM_INSTANCE.
4961	* @param pNode The node to destroy.
4962	* @param pvArgs Pointer to an argument packet.
4963	*/
4964	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
4965	{
4966	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
4967	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
4968	(PGMMSHAREDMODULEPERVM)pNode,
4969	false /fRemove/);
4970	return VINF_SUCCESS;
4971	}
4972
4973
4974	/**
4975	* Used by GMMR0CleanupVM to clean up shared modules.
4976	*
4977	* This is called without taking the GMM lock so that it can be yielded as
4978	* needed here.
4979	*
4980	* @param pGMM The GMM handle.
4981	* @param pGVM The global VM handle.
4982	*/
4983	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
4984	{
4985	gmmR0MutexAcquire(pGMM);
4986	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
4987
4988	GMMR0SHMODPERVMDTORARGS Args;
4989	Args.pGVM = pGVM;
4990	Args.pGMM = pGMM;
4991	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
4992
4993	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
4994	pGVM->gmm.s.Stats.cShareableModules = 0;
4995
4996	gmmR0MutexRelease(pGMM);
4997	}
4998
4999	#endif /* VBOX_WITH_PAGE_SHARING */
5000
5001	/**
5002	* Removes all shared modules for the specified VM
5003	*
5004	* @returns VBox status code.
5005	* @param pVM Pointer to the VM.
5006	* @param idCpu The VCPU id.
5007	*/
5008	GMMR0DECL(int) GMMR0ResetSharedModules(PVM pVM, VMCPUID idCpu)
5009	{
5010	#ifdef VBOX_WITH_PAGE_SHARING
5011	/*
5012	* Validate input and get the basics.
5013	*/
5014	PGMM pGMM;
5015	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5016	PGVM pGVM;
5017	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
5018	if (RT_FAILURE(rc))
5019	return rc;
5020
5021	/*
5022	* Take the semaphore and do some more validations.
5023	*/
5024	gmmR0MutexAcquire(pGMM);
5025	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5026	{
5027	Log(("GMMR0ResetSharedModules\n"));
5028	GMMR0SHMODPERVMDTORARGS Args;
5029	Args.pGVM = pGVM;
5030	Args.pGMM = pGMM;
5031	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5032	pGVM->gmm.s.Stats.cShareableModules = 0;
5033
5034	rc = VINF_SUCCESS;
5035	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5036	}
5037	else
5038	rc = VERR_GMM_IS_NOT_SANE;
5039
5040	gmmR0MutexRelease(pGMM);
5041	return rc;
5042	#else
5043	NOREF(pVM); NOREF(idCpu);
5044	return VERR_NOT_IMPLEMENTED;
5045	#endif
5046	}
5047
5048	#ifdef VBOX_WITH_PAGE_SHARING
5049
5050	/**
5051	* Tree enumeration callback for checking a shared module.
5052	*/
5053	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5054	{
5055	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5056	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5057	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5058
5059	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5060	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5061
5062	int rc = PGMR0SharedModuleCheck(pArgs->pGVM->pVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5063	if (RT_FAILURE(rc))
5064	return rc;
5065	return VINF_SUCCESS;
5066	}
5067
5068	#endif /* VBOX_WITH_PAGE_SHARING */
5069	#ifdef DEBUG_sandervl
5070
5071	/**
5072	* Setup for a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
5073	*
5074	* @returns VBox status code.
5075	* @param pVM Pointer to the VM.
5076	*/
5077	GMMR0DECL(int) GMMR0CheckSharedModulesStart(PVM pVM)
5078	{
5079	/*
5080	* Validate input and get the basics.
5081	*/
5082	PGMM pGMM;
5083	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5084
5085	/*
5086	* Take the semaphore and do some more validations.
5087	*/
5088	gmmR0MutexAcquire(pGMM);
5089	if (!GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5090	rc = VERR_GMM_IS_NOT_SANE;
5091	else
5092	rc = VINF_SUCCESS;
5093
5094	return rc;
5095	}
5096
5097	/**
5098	* Clean up after a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
5099	*
5100	* @returns VBox status code.
5101	* @param pVM Pointer to the VM.
5102	*/
5103	GMMR0DECL(int) GMMR0CheckSharedModulesEnd(PVM pVM)
5104	{
5105	/*
5106	* Validate input and get the basics.
5107	*/
5108	PGMM pGMM;
5109	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5110
5111	gmmR0MutexRelease(pGMM);
5112	return VINF_SUCCESS;
5113	}
5114
5115	#endif /* DEBUG_sandervl */
5116
5117	/**
5118	* Check all shared modules for the specified VM.
5119	*
5120	* @returns VBox status code.
5121	* @param pVM Pointer to the VM.
5122	* @param pVCpu Pointer to the VMCPU.
5123	*/
5124	GMMR0DECL(int) GMMR0CheckSharedModules(PVM pVM, PVMCPU pVCpu)
5125	{
5126	#ifdef VBOX_WITH_PAGE_SHARING
5127	/*
5128	* Validate input and get the basics.
5129	*/
5130	PGMM pGMM;
5131	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5132	PGVM pGVM;
5133	int rc = GVMMR0ByVMAndEMT(pVM, pVCpu->idCpu, &pGVM);
5134	if (RT_FAILURE(rc))
5135	return rc;
5136
5137	# ifndef DEBUG_sandervl
5138	/*
5139	* Take the semaphore and do some more validations.
5140	*/
5141	gmmR0MutexAcquire(pGMM);
5142	# endif
5143	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5144	{
5145	/*
5146	* Walk the tree, checking each module.
5147	*/
5148	Log(("GMMR0CheckSharedModules\n"));
5149
5150	GMMCHECKSHAREDMODULEINFO Args;
5151	Args.pGVM = pGVM;
5152	Args.idCpu = pVCpu->idCpu;
5153	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5154
5155	Log(("GMMR0CheckSharedModules done!\n"));
5156	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5157	}
5158	else
5159	rc = VERR_GMM_IS_NOT_SANE;
5160
5161	# ifndef DEBUG_sandervl
5162	gmmR0MutexRelease(pGMM);
5163	# endif
5164	return rc;
5165	#else
5166	NOREF(pVM); NOREF(pVCpu);
5167	return VERR_NOT_IMPLEMENTED;
5168	#endif
5169	}
5170
5171	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5172
5173	/**
5174	* RTAvlU32DoWithAll callback.
5175	*
5176	* @returns 0
5177	* @param pNode The node to search.
5178	* @param pvUser Pointer to the input argument packet.
5179	*/
5180	static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvUser)
5181	{
5182	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
5183	GMMFINDDUPPAGEINFO pArgs = (GMMFINDDUPPAGEINFO )pvUser;
5184	PGVM pGVM = pArgs->pGVM;
5185	PGMM pGMM = pArgs->pGMM;
5186	uint8_t *pbChunk;
5187
5188	/* Only take chunks not mapped into this VM process; not entirely correct. */
5189	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5190	{
5191	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5192	if (RT_SUCCESS(rc))
5193	{
5194	/*
5195	* Look for duplicate pages
5196	*/
5197	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5198	while (iPage-- > 0)
5199	{
5200	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5201	{
5202	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5203
5204	if (!memcmp(pArgs->pSourcePage, pbDestPage, PAGE_SIZE))
5205	{
5206	pArgs->fFoundDuplicate = true;
5207	break;
5208	}
5209	}
5210	}
5211	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5212	}
5213	}
5214	return pArgs->fFoundDuplicate; /* (stops search if true) */
5215	}
5216
5217
5218	/**
5219	* Find a duplicate of the specified page in other active VMs
5220	*
5221	* @returns VBox status code.
5222	* @param pVM Pointer to the VM.
5223	* @param pReq Pointer to the request packet.
5224	*/
5225	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5226	{
5227	/*
5228	* Validate input and pass it on.
5229	*/
5230	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
5231	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5232	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5233
5234	PGMM pGMM;
5235	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5236
5237	PGVM pGVM;
5238	int rc = GVMMR0ByVM(pVM, &pGVM);
5239	if (RT_FAILURE(rc))
5240	return rc;
5241
5242	/*
5243	* Take the semaphore and do some more validations.
5244	*/
5245	rc = gmmR0MutexAcquire(pGMM);
5246	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5247	{
5248	uint8_t *pbChunk;
5249	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5250	if (pChunk)
5251	{
5252	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5253	{
5254	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5255	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5256	if (pPage)
5257	{
5258	GMMFINDDUPPAGEINFO Args;
5259	Args.pGVM = pGVM;
5260	Args.pGMM = pGMM;
5261	Args.pSourcePage = pbSourcePage;
5262	Args.fFoundDuplicate = false;
5263	RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Args);
5264
5265	pReq->fDuplicate = Args.fFoundDuplicate;
5266	}
5267	else
5268	{
5269	AssertFailed();
5270	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5271	}
5272	}
5273	else
5274	AssertFailed();
5275	}
5276	else
5277	AssertFailed();
5278	}
5279	else
5280	rc = VERR_GMM_IS_NOT_SANE;
5281
5282	gmmR0MutexRelease(pGMM);
5283	return rc;
5284	}
5285
5286	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5287
5288
5289	/**
5290	* Retrieves the GMM statistics visible to the caller.
5291	*
5292	* @returns VBox status code.
5293	*
5294	* @param pStats Where to put the statistics.
5295	* @param pSession The current session.
5296	* @param pVM Pointer to the VM to obtain statistics for. Optional.
5297	*/
5298	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5299	{
5300	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pVM=%p\n", pStats, pSession, pVM));
5301
5302	/*
5303	* Validate input.
5304	*/
5305	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5306	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5307	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5308
5309	PGMM pGMM;
5310	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5311
5312	/*
5313	* Resolve the VM handle, if not NULL, and lock the GMM.
5314	*/
5315	int rc;
5316	PGVM pGVM;
5317	if (pVM)
5318	{
5319	rc = GVMMR0ByVM(pVM, &pGVM);
5320	if (RT_FAILURE(rc))
5321	return rc;
5322	}
5323	else
5324	pGVM = NULL;
5325
5326	rc = gmmR0MutexAcquire(pGMM);
5327	if (RT_FAILURE(rc))
5328	return rc;
5329
5330	/*
5331	* Copy out the GMM statistics.
5332	*/
5333	pStats->cMaxPages = pGMM->cMaxPages;
5334	pStats->cReservedPages = pGMM->cReservedPages;
5335	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5336	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5337	pStats->cSharedPages = pGMM->cSharedPages;
5338	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5339	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5340	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5341	pStats->cChunks = pGMM->cChunks;
5342	pStats->cFreedChunks = pGMM->cFreedChunks;
5343	pStats->cShareableModules = pGMM->cShareableModules;
5344	RT_ZERO(pStats->au64Reserved);
5345
5346	/*
5347	* Copy out the VM statistics.
5348	*/
5349	if (pGVM)
5350	pStats->VMStats = pGVM->gmm.s.Stats;
5351	else
5352	RT_ZERO(pStats->VMStats);
5353
5354	gmmR0MutexRelease(pGMM);
5355	return rc;
5356	}
5357
5358
5359	/**
5360	* VMMR0 request wrapper for GMMR0QueryStatistics.
5361	*
5362	* @returns see GMMR0QueryStatistics.
5363	* @param pVM Pointer to the VM. Optional.
5364	* @param pReq Pointer to the request packet.
5365	*/
5366	GMMR0DECL(int) GMMR0QueryStatisticsReq(PVM pVM, PGMMQUERYSTATISTICSSREQ pReq)
5367	{
5368	/*
5369	* Validate input and pass it on.
5370	*/
5371	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5372	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5373
5374	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pVM);
5375	}
5376
5377
5378	/**
5379	* Resets the specified GMM statistics.
5380	*
5381	* @returns VBox status code.
5382	*
5383	* @param pStats Which statistics to reset, that is, non-zero fields
5384	* indicates which to reset.
5385	* @param pSession The current session.
5386	* @param pVM The VM to reset statistics for. Optional.
5387	*/
5388	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5389	{
5390	/* Currently nothing we can reset at the moment. */
5391	return VINF_SUCCESS;
5392	}
5393
5394
5395	/**
5396	* VMMR0 request wrapper for GMMR0ResetStatistics.
5397	*
5398	* @returns see GMMR0ResetStatistics.
5399	* @param pVM Pointer to the VM. Optional.
5400	* @param pReq Pointer to the request packet.
5401	*/
5402	GMMR0DECL(int) GMMR0ResetStatisticsReq(PVM pVM, PGMMRESETSTATISTICSSREQ pReq)
5403	{
5404	/*
5405	* Validate input and pass it on.
5406	*/
5407	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5408	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5409
5410	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pVM);
5411	}
5412

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 46543

Download in other formats: