VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 92246

Last change on this file since 92246 was 92202, checked in by vboxsync, 3 years ago

VMM/GMMR0: Tie down chunks to UID and don't cross such. bugref:10093

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 201.9 KB
Line 
1/* $Id: GMMR0.cpp 92202 2021-11-03 22:46:49Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183
184/*********************************************************************************************************************************
185* Defined Constants And Macros *
186*********************************************************************************************************************************/
187/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
188 * Use a critical section instead of a fast mutex for the giant GMM lock.
189 *
190 * @remarks This is primarily a way of avoiding the deadlock checks in the
191 * windows driver verifier. */
192#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
193# define VBOX_USE_CRIT_SECT_FOR_GIANT
194#endif
195
196#if defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM) && !defined(RT_OS_DARWIN) && 0
197/** Enable the legacy mode code (will be dropped soon). */
198# define GMM_WITH_LEGACY_MODE
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * On 32-bit hosts we'll some trickery is necessary to compress all
212 * the information into 32-bits. When the fSharedFree member is set,
213 * the 30th bit decides whether it's a free page or not.
214 *
215 * Because of the different layout on 32-bit and 64-bit hosts, macros
216 * are used to get and set some of the data.
217 */
218typedef union GMMPAGE
219{
220#if HC_ARCH_BITS == 64
221 /** Unsigned integer view. */
222 uint64_t u;
223
224 /** The common view. */
225 struct GMMPAGECOMMON
226 {
227 uint32_t uStuff1 : 32;
228 uint32_t uStuff2 : 30;
229 /** The page state. */
230 uint32_t u2State : 2;
231 } Common;
232
233 /** The view of a private page. */
234 struct GMMPAGEPRIVATE
235 {
236 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
237 uint32_t pfn;
238 /** The GVM handle. (64K VMs) */
239 uint32_t hGVM : 16;
240 /** Reserved. */
241 uint32_t u16Reserved : 14;
242 /** The page state. */
243 uint32_t u2State : 2;
244 } Private;
245
246 /** The view of a shared page. */
247 struct GMMPAGESHARED
248 {
249 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
250 uint32_t pfn;
251 /** The reference count (64K VMs). */
252 uint32_t cRefs : 16;
253 /** Used for debug checksumming. */
254 uint32_t u14Checksum : 14;
255 /** The page state. */
256 uint32_t u2State : 2;
257 } Shared;
258
259 /** The view of a free page. */
260 struct GMMPAGEFREE
261 {
262 /** The index of the next page in the free list. UINT16_MAX is NIL. */
263 uint16_t iNext;
264 /** Reserved. Checksum or something? */
265 uint16_t u16Reserved0;
266 /** Reserved. Checksum or something? */
267 uint32_t u30Reserved1 : 30;
268 /** The page state. */
269 uint32_t u2State : 2;
270 } Free;
271
272#else /* 32-bit */
273 /** Unsigned integer view. */
274 uint32_t u;
275
276 /** The common view. */
277 struct GMMPAGECOMMON
278 {
279 uint32_t uStuff : 30;
280 /** The page state. */
281 uint32_t u2State : 2;
282 } Common;
283
284 /** The view of a private page. */
285 struct GMMPAGEPRIVATE
286 {
287 /** The guest page frame number. (Max addressable: 2 ^ 36) */
288 uint32_t pfn : 24;
289 /** The GVM handle. (127 VMs) */
290 uint32_t hGVM : 7;
291 /** The top page state bit, MBZ. */
292 uint32_t fZero : 1;
293 } Private;
294
295 /** The view of a shared page. */
296 struct GMMPAGESHARED
297 {
298 /** The reference count. */
299 uint32_t cRefs : 30;
300 /** The page state. */
301 uint32_t u2State : 2;
302 } Shared;
303
304 /** The view of a free page. */
305 struct GMMPAGEFREE
306 {
307 /** The index of the next page in the free list. UINT16_MAX is NIL. */
308 uint32_t iNext : 16;
309 /** Reserved. Checksum or something? */
310 uint32_t u14Reserved : 14;
311 /** The page state. */
312 uint32_t u2State : 2;
313 } Free;
314#endif
315} GMMPAGE;
316AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
317/** Pointer to a GMMPAGE. */
318typedef GMMPAGE *PGMMPAGE;
319
320
321/** @name The Page States.
322 * @{ */
323/** A private page. */
324#define GMM_PAGE_STATE_PRIVATE 0
325/** A private page - alternative value used on the 32-bit implementation.
326 * This will never be used on 64-bit hosts. */
327#define GMM_PAGE_STATE_PRIVATE_32 1
328/** A shared page. */
329#define GMM_PAGE_STATE_SHARED 2
330/** A free page. */
331#define GMM_PAGE_STATE_FREE 3
332/** @} */
333
334
335/** @def GMM_PAGE_IS_PRIVATE
336 *
337 * @returns true if private, false if not.
338 * @param pPage The GMM page.
339 */
340#if HC_ARCH_BITS == 64
341# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
342#else
343# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
344#endif
345
346/** @def GMM_PAGE_IS_SHARED
347 *
348 * @returns true if shared, false if not.
349 * @param pPage The GMM page.
350 */
351#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
352
353/** @def GMM_PAGE_IS_FREE
354 *
355 * @returns true if free, false if not.
356 * @param pPage The GMM page.
357 */
358#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
359
360/** @def GMM_PAGE_PFN_LAST
361 * The last valid guest pfn range.
362 * @remark Some of the values outside the range has special meaning,
363 * see GMM_PAGE_PFN_UNSHAREABLE.
364 */
365#if HC_ARCH_BITS == 64
366# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
367#else
368# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
369#endif
370AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
371
372/** @def GMM_PAGE_PFN_UNSHAREABLE
373 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
374 */
375#if HC_ARCH_BITS == 64
376# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
377#else
378# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
379#endif
380AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
381
382
383/**
384 * A GMM allocation chunk ring-3 mapping record.
385 *
386 * This should really be associated with a session and not a VM, but
387 * it's simpler to associated with a VM and cleanup with the VM object
388 * is destroyed.
389 */
390typedef struct GMMCHUNKMAP
391{
392 /** The mapping object. */
393 RTR0MEMOBJ hMapObj;
394 /** The VM owning the mapping. */
395 PGVM pGVM;
396} GMMCHUNKMAP;
397/** Pointer to a GMM allocation chunk mapping. */
398typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
399
400
401/**
402 * A GMM allocation chunk.
403 */
404typedef struct GMMCHUNK
405{
406 /** The AVL node core.
407 * The Key is the chunk ID. (Giant mtx.) */
408 AVLU32NODECORE Core;
409 /** The memory object.
410 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
411 * what the host can dish up with. (Chunk mtx protects mapping accesses
412 * and related frees.) */
413 RTR0MEMOBJ hMemObj;
414#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
415 /** Pointer to the kernel mapping. */
416 uint8_t *pbMapping;
417#endif
418 /** Pointer to the next chunk in the free list. (Giant mtx.) */
419 PGMMCHUNK pFreeNext;
420 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
421 PGMMCHUNK pFreePrev;
422 /** Pointer to the free set this chunk belongs to. NULL for
423 * chunks with no free pages. (Giant mtx.) */
424 PGMMCHUNKFREESET pSet;
425 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
426 RTLISTNODE ListNode;
427 /** Pointer to an array of mappings. (Chunk mtx.) */
428 PGMMCHUNKMAP paMappingsX;
429 /** The number of mappings. (Chunk mtx.) */
430 uint16_t cMappingsX;
431 /** The mapping lock this chunk is using using. UINT8_MAX if nobody is mapping
432 * or freeing anything. (Giant mtx.) */
433 uint8_t volatile iChunkMtx;
434 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
435 uint8_t fFlags;
436 /** The head of the list of free pages. UINT16_MAX is the NIL value.
437 * (Giant mtx.) */
438 uint16_t iFreeHead;
439 /** The number of free pages. (Giant mtx.) */
440 uint16_t cFree;
441 /** The GVM handle of the VM that first allocated pages from this chunk, this
442 * is used as a preference when there are several chunks to choose from.
443 * When in bound memory mode this isn't a preference any longer. (Giant
444 * mtx.) */
445 uint16_t hGVM;
446 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
447 * future use.) (Giant mtx.) */
448 uint16_t idNumaNode;
449 /** The number of private pages. (Giant mtx.) */
450 uint16_t cPrivate;
451 /** The number of shared pages. (Giant mtx.) */
452 uint16_t cShared;
453 /** The UID this chunk is associated with. */
454 RTUID uidOwner;
455 uint32_t u32Padding;
456 /** The pages. (Giant mtx.) */
457 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
458} GMMCHUNK;
459
460/** Indicates that the NUMA properies of the memory is unknown. */
461#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
462
463/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
464 * @{ */
465/** Indicates that the chunk is a large page (2MB). */
466#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
467#ifdef GMM_WITH_LEGACY_MODE
468/** Indicates that the chunk was locked rather than allocated directly. */
469# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
470#endif
471/** @} */
472
473
474/**
475 * An allocation chunk TLB entry.
476 */
477typedef struct GMMCHUNKTLBE
478{
479 /** The chunk id. */
480 uint32_t idChunk;
481 /** Pointer to the chunk. */
482 PGMMCHUNK pChunk;
483} GMMCHUNKTLBE;
484/** Pointer to an allocation chunk TLB entry. */
485typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
486
487
488/** The number of entries in the allocation chunk TLB. */
489#define GMM_CHUNKTLB_ENTRIES 32
490/** Gets the TLB entry index for the given Chunk ID. */
491#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
492
493/**
494 * An allocation chunk TLB.
495 */
496typedef struct GMMCHUNKTLB
497{
498 /** The TLB entries. */
499 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
500} GMMCHUNKTLB;
501/** Pointer to an allocation chunk TLB. */
502typedef GMMCHUNKTLB *PGMMCHUNKTLB;
503
504
505/**
506 * The GMM instance data.
507 */
508typedef struct GMM
509{
510 /** Magic / eye catcher. GMM_MAGIC */
511 uint32_t u32Magic;
512 /** The number of threads waiting on the mutex. */
513 uint32_t cMtxContenders;
514#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
515 /** The critical section protecting the GMM.
516 * More fine grained locking can be implemented later if necessary. */
517 RTCRITSECT GiantCritSect;
518#else
519 /** The fast mutex protecting the GMM.
520 * More fine grained locking can be implemented later if necessary. */
521 RTSEMFASTMUTEX hMtx;
522#endif
523#ifdef VBOX_STRICT
524 /** The current mutex owner. */
525 RTNATIVETHREAD hMtxOwner;
526#endif
527 /** Spinlock protecting the AVL tree.
528 * @todo Make this a read-write spinlock as we should allow concurrent
529 * lookups. */
530 RTSPINLOCK hSpinLockTree;
531 /** The chunk tree.
532 * Protected by hSpinLockTree. */
533 PAVLU32NODECORE pChunks;
534 /** Chunk freeing generation - incremented whenever a chunk is freed. Used
535 * for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
536 * (exclusive), though higher numbers may temporarily occure while
537 * invalidating the individual TLBs during wrap-around processing. */
538 uint64_t volatile idFreeGeneration;
539 /** The chunk TLB.
540 * Protected by hSpinLockTree. */
541 GMMCHUNKTLB ChunkTLB;
542 /** The private free set. */
543 GMMCHUNKFREESET PrivateX;
544 /** The shared free set. */
545 GMMCHUNKFREESET Shared;
546
547 /** Shared module tree (global).
548 * @todo separate trees for distinctly different guest OSes. */
549 PAVLLU32NODECORE pGlobalSharedModuleTree;
550 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
551 uint32_t cShareableModules;
552
553 /** The chunk list. For simplifying the cleanup process and avoid tree
554 * traversal. */
555 RTLISTANCHOR ChunkList;
556
557 /** The maximum number of pages we're allowed to allocate.
558 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
559 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
560 uint64_t cMaxPages;
561 /** The number of pages that has been reserved.
562 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
563 uint64_t cReservedPages;
564 /** The number of pages that we have over-committed in reservations. */
565 uint64_t cOverCommittedPages;
566 /** The number of actually allocated (committed if you like) pages. */
567 uint64_t cAllocatedPages;
568 /** The number of pages that are shared. A subset of cAllocatedPages. */
569 uint64_t cSharedPages;
570 /** The number of pages that are actually shared between VMs. */
571 uint64_t cDuplicatePages;
572 /** The number of pages that are shared that has been left behind by
573 * VMs not doing proper cleanups. */
574 uint64_t cLeftBehindSharedPages;
575 /** The number of allocation chunks.
576 * (The number of pages we've allocated from the host can be derived from this.) */
577 uint32_t cChunks;
578 /** The number of current ballooned pages. */
579 uint64_t cBalloonedPages;
580
581#ifndef GMM_WITH_LEGACY_MODE
582# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
583 /** Whether #RTR0MemObjAllocPhysNC works. */
584 bool fHasWorkingAllocPhysNC;
585# else
586 bool fPadding;
587# endif
588#else
589 /** The legacy allocation mode indicator.
590 * This is determined at initialization time. */
591 bool fLegacyAllocationMode;
592#endif
593 /** The bound memory mode indicator.
594 * When set, the memory will be bound to a specific VM and never
595 * shared. This is always set if fLegacyAllocationMode is set.
596 * (Also determined at initialization time.) */
597 bool fBoundMemoryMode;
598 /** The number of registered VMs. */
599 uint16_t cRegisteredVMs;
600
601 /** The number of freed chunks ever. This is used a list generation to
602 * avoid restarting the cleanup scanning when the list wasn't modified. */
603 uint32_t cFreedChunks;
604 /** The previous allocated Chunk ID.
605 * Used as a hint to avoid scanning the whole bitmap. */
606 uint32_t idChunkPrev;
607 /** Chunk ID allocation bitmap.
608 * Bits of allocated IDs are set, free ones are clear.
609 * The NIL id (0) is marked allocated. */
610 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
611
612 /** The index of the next mutex to use. */
613 uint32_t iNextChunkMtx;
614 /** Chunk locks for reducing lock contention without having to allocate
615 * one lock per chunk. */
616 struct
617 {
618 /** The mutex */
619 RTSEMFASTMUTEX hMtx;
620 /** The number of threads currently using this mutex. */
621 uint32_t volatile cUsers;
622 } aChunkMtx[64];
623} GMM;
624/** Pointer to the GMM instance. */
625typedef GMM *PGMM;
626
627/** The value of GMM::u32Magic (Katsuhiro Otomo). */
628#define GMM_MAGIC UINT32_C(0x19540414)
629
630
631/**
632 * GMM chunk mutex state.
633 *
634 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
635 * gmmR0ChunkMutex* methods.
636 */
637typedef struct GMMR0CHUNKMTXSTATE
638{
639 PGMM pGMM;
640 /** The index of the chunk mutex. */
641 uint8_t iChunkMtx;
642 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
643 uint8_t fFlags;
644} GMMR0CHUNKMTXSTATE;
645/** Pointer to a chunk mutex state. */
646typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
647
648/** @name GMMR0CHUNK_MTX_XXX
649 * @{ */
650#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
651#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
652#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
653#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
654#define GMMR0CHUNK_MTX_END UINT32_C(4)
655/** @} */
656
657
658/** The maximum number of shared modules per-vm. */
659#define GMM_MAX_SHARED_PER_VM_MODULES 2048
660/** The maximum number of shared modules GMM is allowed to track. */
661#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
662
663
664/**
665 * Argument packet for gmmR0SharedModuleCleanup.
666 */
667typedef struct GMMR0SHMODPERVMDTORARGS
668{
669 PGVM pGVM;
670 PGMM pGMM;
671} GMMR0SHMODPERVMDTORARGS;
672
673/**
674 * Argument packet for gmmR0CheckSharedModule.
675 */
676typedef struct GMMCHECKSHAREDMODULEINFO
677{
678 PGVM pGVM;
679 VMCPUID idCpu;
680} GMMCHECKSHAREDMODULEINFO;
681
682
683/*********************************************************************************************************************************
684* Global Variables *
685*********************************************************************************************************************************/
686/** Pointer to the GMM instance data. */
687static PGMM g_pGMM = NULL;
688
689/** Macro for obtaining and validating the g_pGMM pointer.
690 *
691 * On failure it will return from the invoking function with the specified
692 * return value.
693 *
694 * @param pGMM The name of the pGMM variable.
695 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
696 * status codes.
697 */
698#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
699 do { \
700 (pGMM) = g_pGMM; \
701 AssertPtrReturn((pGMM), (rc)); \
702 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
703 } while (0)
704
705/** Macro for obtaining and validating the g_pGMM pointer, void function
706 * variant.
707 *
708 * On failure it will return from the invoking function.
709 *
710 * @param pGMM The name of the pGMM variable.
711 */
712#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
713 do { \
714 (pGMM) = g_pGMM; \
715 AssertPtrReturnVoid((pGMM)); \
716 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
717 } while (0)
718
719
720/** @def GMM_CHECK_SANITY_UPON_ENTERING
721 * Checks the sanity of the GMM instance data before making changes.
722 *
723 * This is macro is a stub by default and must be enabled manually in the code.
724 *
725 * @returns true if sane, false if not.
726 * @param pGMM The name of the pGMM variable.
727 */
728#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
729# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
730#else
731# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
732#endif
733
734/** @def GMM_CHECK_SANITY_UPON_LEAVING
735 * Checks the sanity of the GMM instance data after making changes.
736 *
737 * This is macro is a stub by default and must be enabled manually in the code.
738 *
739 * @returns true if sane, false if not.
740 * @param pGMM The name of the pGMM variable.
741 */
742#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
743# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
744#else
745# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
746#endif
747
748/** @def GMM_CHECK_SANITY_IN_LOOPS
749 * Checks the sanity of the GMM instance in the allocation loops.
750 *
751 * This is macro is a stub by default and must be enabled manually in the code.
752 *
753 * @returns true if sane, false if not.
754 * @param pGMM The name of the pGMM variable.
755 */
756#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
757# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
758#else
759# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
760#endif
761
762
763/*********************************************************************************************************************************
764* Internal Functions *
765*********************************************************************************************************************************/
766static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
767static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
768DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
769DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
770DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
771#ifdef GMMR0_WITH_SANITY_CHECK
772static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
773#endif
774static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
775DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
776DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
777static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
778#ifdef VBOX_WITH_PAGE_SHARING
779static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
780# ifdef VBOX_STRICT
781static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
782# endif
783#endif
784
785
786
787/**
788 * Initializes the GMM component.
789 *
790 * This is called when the VMMR0.r0 module is loaded and protected by the
791 * loader semaphore.
792 *
793 * @returns VBox status code.
794 */
795GMMR0DECL(int) GMMR0Init(void)
796{
797 LogFlow(("GMMInit:\n"));
798
799 /*
800 * Allocate the instance data and the locks.
801 */
802 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
803 if (!pGMM)
804 return VERR_NO_MEMORY;
805
806 pGMM->u32Magic = GMM_MAGIC;
807 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
808 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
809 RTListInit(&pGMM->ChunkList);
810 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
811
812#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
813 int rc = RTCritSectInit(&pGMM->GiantCritSect);
814#else
815 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
816#endif
817 if (RT_SUCCESS(rc))
818 {
819 unsigned iMtx;
820 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
821 {
822 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
823 if (RT_FAILURE(rc))
824 break;
825 }
826 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
827 if (RT_SUCCESS(rc))
828 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
829 if (RT_SUCCESS(rc))
830 {
831#ifndef GMM_WITH_LEGACY_MODE
832 /*
833 * Figure out how we're going to allocate stuff (only applicable to
834 * host with linear physical memory mappings).
835 */
836 pGMM->fBoundMemoryMode = false;
837# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
838 pGMM->fHasWorkingAllocPhysNC = false;
839
840 RTR0MEMOBJ hMemObj;
841 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
842 if (RT_SUCCESS(rc))
843 {
844 rc = RTR0MemObjFree(hMemObj, true);
845 AssertRC(rc);
846 pGMM->fHasWorkingAllocPhysNC = true;
847 }
848 else if (rc != VERR_NOT_SUPPORTED)
849 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
850# endif
851#else /* GMM_WITH_LEGACY_MODE */
852 /*
853 * Check and see if RTR0MemObjAllocPhysNC works.
854 */
855# if 0 /* later, see @bufref{3170}. */
856 RTR0MEMOBJ MemObj;
857 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
858 if (RT_SUCCESS(rc))
859 {
860 rc = RTR0MemObjFree(MemObj, true);
861 AssertRC(rc);
862 }
863 else if (rc == VERR_NOT_SUPPORTED)
864 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
865 else
866 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
867# else
868# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
869 pGMM->fLegacyAllocationMode = false;
870# if ARCH_BITS == 32
871 /* Don't reuse possibly partial chunks because of the virtual
872 address space limitation. */
873 pGMM->fBoundMemoryMode = true;
874# else
875 pGMM->fBoundMemoryMode = false;
876# endif
877# else
878 pGMM->fLegacyAllocationMode = true;
879 pGMM->fBoundMemoryMode = true;
880# endif
881# endif
882#endif /* GMM_WITH_LEGACY_MODE */
883
884 /*
885 * Query system page count and guess a reasonable cMaxPages value.
886 */
887 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
888
889 /*
890 * The idFreeGeneration value should be set so we actually trigger the
891 * wrap-around invalidation handling during a typical test run.
892 */
893 pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
894
895 g_pGMM = pGMM;
896#ifdef GMM_WITH_LEGACY_MODE
897 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
898#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
899 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
900#else
901 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
902#endif
903 return VINF_SUCCESS;
904 }
905
906 /*
907 * Bail out.
908 */
909 RTSpinlockDestroy(pGMM->hSpinLockTree);
910 while (iMtx-- > 0)
911 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
912#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
913 RTCritSectDelete(&pGMM->GiantCritSect);
914#else
915 RTSemFastMutexDestroy(pGMM->hMtx);
916#endif
917 }
918
919 pGMM->u32Magic = 0;
920 RTMemFree(pGMM);
921 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
922 return rc;
923}
924
925
926/**
927 * Terminates the GMM component.
928 */
929GMMR0DECL(void) GMMR0Term(void)
930{
931 LogFlow(("GMMTerm:\n"));
932
933 /*
934 * Take care / be paranoid...
935 */
936 PGMM pGMM = g_pGMM;
937 if (!RT_VALID_PTR(pGMM))
938 return;
939 if (pGMM->u32Magic != GMM_MAGIC)
940 {
941 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
942 return;
943 }
944
945 /*
946 * Undo what init did and free all the resources we've acquired.
947 */
948 /* Destroy the fundamentals. */
949 g_pGMM = NULL;
950 pGMM->u32Magic = ~GMM_MAGIC;
951#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
952 RTCritSectDelete(&pGMM->GiantCritSect);
953#else
954 RTSemFastMutexDestroy(pGMM->hMtx);
955 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
956#endif
957 RTSpinlockDestroy(pGMM->hSpinLockTree);
958 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
959
960 /* Free any chunks still hanging around. */
961 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
962
963 /* Destroy the chunk locks. */
964 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
965 {
966 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
967 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
968 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
969 }
970
971 /* Finally the instance data itself. */
972 RTMemFree(pGMM);
973 LogFlow(("GMMTerm: done\n"));
974}
975
976
977/**
978 * RTAvlU32Destroy callback.
979 *
980 * @returns 0
981 * @param pNode The node to destroy.
982 * @param pvGMM The GMM handle.
983 */
984static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
985{
986 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
987
988 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
989 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
990 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
991
992 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
993 if (RT_FAILURE(rc))
994 {
995 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
996 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
997 AssertRC(rc);
998 }
999 pChunk->hMemObj = NIL_RTR0MEMOBJ;
1000
1001 RTMemFree(pChunk->paMappingsX);
1002 pChunk->paMappingsX = NULL;
1003
1004 RTMemFree(pChunk);
1005 NOREF(pvGMM);
1006 return 0;
1007}
1008
1009
1010/**
1011 * Initializes the per-VM data for the GMM.
1012 *
1013 * This is called from within the GVMM lock (from GVMMR0CreateVM)
1014 * and should only initialize the data members so GMMR0CleanupVM
1015 * can deal with them. We reserve no memory or anything here,
1016 * that's done later in GMMR0InitVM.
1017 *
1018 * @param pGVM Pointer to the Global VM structure.
1019 */
1020GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
1021{
1022 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1023
1024 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1025 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1026 pGVM->gmm.s.Stats.fMayAllocate = false;
1027
1028 pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
1029 int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
1030 AssertRCReturn(rc, rc);
1031
1032 return VINF_SUCCESS;
1033}
1034
1035
1036/**
1037 * Acquires the GMM giant lock.
1038 *
1039 * @returns Assert status code from RTSemFastMutexRequest.
1040 * @param pGMM Pointer to the GMM instance.
1041 */
1042static int gmmR0MutexAcquire(PGMM pGMM)
1043{
1044 ASMAtomicIncU32(&pGMM->cMtxContenders);
1045#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1046 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1047#else
1048 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1049#endif
1050 ASMAtomicDecU32(&pGMM->cMtxContenders);
1051 AssertRC(rc);
1052#ifdef VBOX_STRICT
1053 pGMM->hMtxOwner = RTThreadNativeSelf();
1054#endif
1055 return rc;
1056}
1057
1058
1059/**
1060 * Releases the GMM giant lock.
1061 *
1062 * @returns Assert status code from RTSemFastMutexRequest.
1063 * @param pGMM Pointer to the GMM instance.
1064 */
1065static int gmmR0MutexRelease(PGMM pGMM)
1066{
1067#ifdef VBOX_STRICT
1068 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1069#endif
1070#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1071 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1072#else
1073 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1074 AssertRC(rc);
1075#endif
1076 return rc;
1077}
1078
1079
1080/**
1081 * Yields the GMM giant lock if there is contention and a certain minimum time
1082 * has elapsed since we took it.
1083 *
1084 * @returns @c true if the mutex was yielded, @c false if not.
1085 * @param pGMM Pointer to the GMM instance.
1086 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1087 * (in/out).
1088 */
1089static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1090{
1091 /*
1092 * If nobody is contending the mutex, don't bother checking the time.
1093 */
1094 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1095 return false;
1096
1097 /*
1098 * Don't yield if we haven't executed for at least 2 milliseconds.
1099 */
1100 uint64_t uNanoNow = RTTimeSystemNanoTS();
1101 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1102 return false;
1103
1104 /*
1105 * Yield the mutex.
1106 */
1107#ifdef VBOX_STRICT
1108 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1109#endif
1110 ASMAtomicIncU32(&pGMM->cMtxContenders);
1111#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1112 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1113#else
1114 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1115#endif
1116
1117 RTThreadYield();
1118
1119#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1120 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1121#else
1122 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1123#endif
1124 *puLockNanoTS = RTTimeSystemNanoTS();
1125 ASMAtomicDecU32(&pGMM->cMtxContenders);
1126#ifdef VBOX_STRICT
1127 pGMM->hMtxOwner = RTThreadNativeSelf();
1128#endif
1129
1130 return true;
1131}
1132
1133
1134/**
1135 * Acquires a chunk lock.
1136 *
1137 * The caller must own the giant lock.
1138 *
1139 * @returns Assert status code from RTSemFastMutexRequest.
1140 * @param pMtxState The chunk mutex state info. (Avoids
1141 * passing the same flags and stuff around
1142 * for subsequent release and drop-giant
1143 * calls.)
1144 * @param pGMM Pointer to the GMM instance.
1145 * @param pChunk Pointer to the chunk.
1146 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1147 */
1148static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1149{
1150 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1151 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1152
1153 pMtxState->pGMM = pGMM;
1154 pMtxState->fFlags = (uint8_t)fFlags;
1155
1156 /*
1157 * Get the lock index and reference the lock.
1158 */
1159 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1160 uint32_t iChunkMtx = pChunk->iChunkMtx;
1161 if (iChunkMtx == UINT8_MAX)
1162 {
1163 iChunkMtx = pGMM->iNextChunkMtx++;
1164 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1165
1166 /* Try get an unused one... */
1167 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1168 {
1169 iChunkMtx = pGMM->iNextChunkMtx++;
1170 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1171 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1172 {
1173 iChunkMtx = pGMM->iNextChunkMtx++;
1174 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1175 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1176 {
1177 iChunkMtx = pGMM->iNextChunkMtx++;
1178 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1179 }
1180 }
1181 }
1182
1183 pChunk->iChunkMtx = iChunkMtx;
1184 }
1185 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1186 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1187 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1188
1189 /*
1190 * Drop the giant?
1191 */
1192 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1193 {
1194 /** @todo GMM life cycle cleanup (we may race someone
1195 * destroying and cleaning up GMM)? */
1196 gmmR0MutexRelease(pGMM);
1197 }
1198
1199 /*
1200 * Take the chunk mutex.
1201 */
1202 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1203 AssertRC(rc);
1204 return rc;
1205}
1206
1207
1208/**
1209 * Releases the GMM giant lock.
1210 *
1211 * @returns Assert status code from RTSemFastMutexRequest.
1212 * @param pMtxState Pointer to the chunk mutex state.
1213 * @param pChunk Pointer to the chunk if it's still
1214 * alive, NULL if it isn't. This is used to deassociate
1215 * the chunk from the mutex on the way out so a new one
1216 * can be selected next time, thus avoiding contented
1217 * mutexes.
1218 */
1219static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1220{
1221 PGMM pGMM = pMtxState->pGMM;
1222
1223 /*
1224 * Release the chunk mutex and reacquire the giant if requested.
1225 */
1226 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1227 AssertRC(rc);
1228 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1229 rc = gmmR0MutexAcquire(pGMM);
1230 else
1231 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1232
1233 /*
1234 * Drop the chunk mutex user reference and deassociate it from the chunk
1235 * when possible.
1236 */
1237 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1238 && pChunk
1239 && RT_SUCCESS(rc) )
1240 {
1241 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1242 pChunk->iChunkMtx = UINT8_MAX;
1243 else
1244 {
1245 rc = gmmR0MutexAcquire(pGMM);
1246 if (RT_SUCCESS(rc))
1247 {
1248 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1249 pChunk->iChunkMtx = UINT8_MAX;
1250 rc = gmmR0MutexRelease(pGMM);
1251 }
1252 }
1253 }
1254
1255 pMtxState->pGMM = NULL;
1256 return rc;
1257}
1258
1259
1260/**
1261 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1262 * chunk locked.
1263 *
1264 * This only works if gmmR0ChunkMutexAcquire was called with
1265 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1266 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1267 *
1268 * @returns VBox status code (assuming success is ok).
1269 * @param pMtxState Pointer to the chunk mutex state.
1270 */
1271static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1272{
1273 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1274 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1275 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1276 /** @todo GMM life cycle cleanup (we may race someone
1277 * destroying and cleaning up GMM)? */
1278 return gmmR0MutexRelease(pMtxState->pGMM);
1279}
1280
1281
1282/**
1283 * For experimenting with NUMA affinity and such.
1284 *
1285 * @returns The current NUMA Node ID.
1286 */
1287static uint16_t gmmR0GetCurrentNumaNodeId(void)
1288{
1289#if 1
1290 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1291#else
1292 return RTMpCpuId() / 16;
1293#endif
1294}
1295
1296
1297
1298/**
1299 * Cleans up when a VM is terminating.
1300 *
1301 * @param pGVM Pointer to the Global VM structure.
1302 */
1303GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1304{
1305 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1306
1307 PGMM pGMM;
1308 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1309
1310#ifdef VBOX_WITH_PAGE_SHARING
1311 /*
1312 * Clean up all registered shared modules first.
1313 */
1314 gmmR0SharedModuleCleanup(pGMM, pGVM);
1315#endif
1316
1317 gmmR0MutexAcquire(pGMM);
1318 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1319 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1320
1321 /*
1322 * The policy is 'INVALID' until the initial reservation
1323 * request has been serviced.
1324 */
1325 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1326 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1327 {
1328 /*
1329 * If it's the last VM around, we can skip walking all the chunk looking
1330 * for the pages owned by this VM and instead flush the whole shebang.
1331 *
1332 * This takes care of the eventuality that a VM has left shared page
1333 * references behind (shouldn't happen of course, but you never know).
1334 */
1335 Assert(pGMM->cRegisteredVMs);
1336 pGMM->cRegisteredVMs--;
1337
1338 /*
1339 * Walk the entire pool looking for pages that belong to this VM
1340 * and leftover mappings. (This'll only catch private pages,
1341 * shared pages will be 'left behind'.)
1342 */
1343 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1344 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1345
1346 unsigned iCountDown = 64;
1347 bool fRedoFromStart;
1348 PGMMCHUNK pChunk;
1349 do
1350 {
1351 fRedoFromStart = false;
1352 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1353 {
1354 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1355 if ( ( !pGMM->fBoundMemoryMode
1356 || pChunk->hGVM == pGVM->hSelf)
1357 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1358 {
1359 /* We left the giant mutex, so reset the yield counters. */
1360 uLockNanoTS = RTTimeSystemNanoTS();
1361 iCountDown = 64;
1362 }
1363 else
1364 {
1365 /* Didn't leave it, so do normal yielding. */
1366 if (!iCountDown)
1367 gmmR0MutexYield(pGMM, &uLockNanoTS);
1368 else
1369 iCountDown--;
1370 }
1371 if (pGMM->cFreedChunks != cFreeChunksOld)
1372 {
1373 fRedoFromStart = true;
1374 break;
1375 }
1376 }
1377 } while (fRedoFromStart);
1378
1379 if (pGVM->gmm.s.Stats.cPrivatePages)
1380 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1381
1382 pGMM->cAllocatedPages -= cPrivatePages;
1383
1384 /*
1385 * Free empty chunks.
1386 */
1387 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1388 do
1389 {
1390 fRedoFromStart = false;
1391 iCountDown = 10240;
1392 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1393 while (pChunk)
1394 {
1395 PGMMCHUNK pNext = pChunk->pFreeNext;
1396 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1397 if ( !pGMM->fBoundMemoryMode
1398 || pChunk->hGVM == pGVM->hSelf)
1399 {
1400 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1401 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1402 {
1403 /* We've left the giant mutex, restart? (+1 for our unlink) */
1404 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1405 if (fRedoFromStart)
1406 break;
1407 uLockNanoTS = RTTimeSystemNanoTS();
1408 iCountDown = 10240;
1409 }
1410 }
1411
1412 /* Advance and maybe yield the lock. */
1413 pChunk = pNext;
1414 if (--iCountDown == 0)
1415 {
1416 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1417 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1418 && pPrivateSet->idGeneration != idGenerationOld;
1419 if (fRedoFromStart)
1420 break;
1421 iCountDown = 10240;
1422 }
1423 }
1424 } while (fRedoFromStart);
1425
1426 /*
1427 * Account for shared pages that weren't freed.
1428 */
1429 if (pGVM->gmm.s.Stats.cSharedPages)
1430 {
1431 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1432 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1433 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1434 }
1435
1436 /*
1437 * Clean up balloon statistics in case the VM process crashed.
1438 */
1439 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1440 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1441
1442 /*
1443 * Update the over-commitment management statistics.
1444 */
1445 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1446 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1447 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1448 switch (pGVM->gmm.s.Stats.enmPolicy)
1449 {
1450 case GMMOCPOLICY_NO_OC:
1451 break;
1452 default:
1453 /** @todo Update GMM->cOverCommittedPages */
1454 break;
1455 }
1456 }
1457
1458 /* zap the GVM data. */
1459 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1460 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1461 pGVM->gmm.s.Stats.fMayAllocate = false;
1462
1463 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1464 gmmR0MutexRelease(pGMM);
1465
1466 /*
1467 * Destroy the spinlock.
1468 */
1469 RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1470 ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1471 RTSpinlockDestroy(hSpinlock);
1472
1473 LogFlow(("GMMR0CleanupVM: returns\n"));
1474}
1475
1476
1477/**
1478 * Scan one chunk for private pages belonging to the specified VM.
1479 *
1480 * @note This function may drop the giant mutex!
1481 *
1482 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1483 * we didn't.
1484 * @param pGMM Pointer to the GMM instance.
1485 * @param pGVM The global VM handle.
1486 * @param pChunk The chunk to scan.
1487 */
1488static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1489{
1490 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1491
1492 /*
1493 * Look for pages belonging to the VM.
1494 * (Perform some internal checks while we're scanning.)
1495 */
1496#ifndef VBOX_STRICT
1497 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1498#endif
1499 {
1500 unsigned cPrivate = 0;
1501 unsigned cShared = 0;
1502 unsigned cFree = 0;
1503
1504 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1505
1506 uint16_t hGVM = pGVM->hSelf;
1507 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1508 while (iPage-- > 0)
1509 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1510 {
1511 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1512 {
1513 /*
1514 * Free the page.
1515 *
1516 * The reason for not using gmmR0FreePrivatePage here is that we
1517 * must *not* cause the chunk to be freed from under us - we're in
1518 * an AVL tree walk here.
1519 */
1520 pChunk->aPages[iPage].u = 0;
1521 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1522 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1523 pChunk->iFreeHead = iPage;
1524 pChunk->cPrivate--;
1525 pChunk->cFree++;
1526 pGVM->gmm.s.Stats.cPrivatePages--;
1527 cFree++;
1528 }
1529 else
1530 cPrivate++;
1531 }
1532 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1533 cFree++;
1534 else
1535 cShared++;
1536
1537 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1538
1539 /*
1540 * Did it add up?
1541 */
1542 if (RT_UNLIKELY( pChunk->cFree != cFree
1543 || pChunk->cPrivate != cPrivate
1544 || pChunk->cShared != cShared))
1545 {
1546 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1547 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1548 pChunk->cFree = cFree;
1549 pChunk->cPrivate = cPrivate;
1550 pChunk->cShared = cShared;
1551 }
1552 }
1553
1554 /*
1555 * If not in bound memory mode, we should reset the hGVM field
1556 * if it has our handle in it.
1557 */
1558 if (pChunk->hGVM == pGVM->hSelf)
1559 {
1560 if (!g_pGMM->fBoundMemoryMode)
1561 pChunk->hGVM = NIL_GVM_HANDLE;
1562 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1563 {
1564 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1565 pChunk, pChunk->Core.Key, pChunk->cFree);
1566 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1567
1568 gmmR0UnlinkChunk(pChunk);
1569 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1570 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1571 }
1572 }
1573
1574 /*
1575 * Look for a mapping belonging to the terminating VM.
1576 */
1577 GMMR0CHUNKMTXSTATE MtxState;
1578 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1579 unsigned cMappings = pChunk->cMappingsX;
1580 for (unsigned i = 0; i < cMappings; i++)
1581 if (pChunk->paMappingsX[i].pGVM == pGVM)
1582 {
1583 gmmR0ChunkMutexDropGiant(&MtxState);
1584
1585 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1586
1587 cMappings--;
1588 if (i < cMappings)
1589 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1590 pChunk->paMappingsX[cMappings].pGVM = NULL;
1591 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1592 Assert(pChunk->cMappingsX - 1U == cMappings);
1593 pChunk->cMappingsX = cMappings;
1594
1595 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1596 if (RT_FAILURE(rc))
1597 {
1598 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1599 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1600 AssertRC(rc);
1601 }
1602
1603 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1604 return true;
1605 }
1606
1607 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1608 return false;
1609}
1610
1611
1612/**
1613 * The initial resource reservations.
1614 *
1615 * This will make memory reservations according to policy and priority. If there aren't
1616 * sufficient resources available to sustain the VM this function will fail and all
1617 * future allocations requests will fail as well.
1618 *
1619 * These are just the initial reservations made very very early during the VM creation
1620 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1621 * ring-3 init has completed.
1622 *
1623 * @returns VBox status code.
1624 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1625 * @retval VERR_GMM_
1626 *
1627 * @param pGVM The global (ring-0) VM structure.
1628 * @param idCpu The VCPU id - must be zero.
1629 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1630 * This does not include MMIO2 and similar.
1631 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1632 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1633 * hyper heap, MMIO2 and similar.
1634 * @param enmPolicy The OC policy to use on this VM.
1635 * @param enmPriority The priority in an out-of-memory situation.
1636 *
1637 * @thread The creator thread / EMT(0).
1638 */
1639GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1640 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1641{
1642 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1643 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1644
1645 /*
1646 * Validate, get basics and take the semaphore.
1647 */
1648 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1649 PGMM pGMM;
1650 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1651 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1652 if (RT_FAILURE(rc))
1653 return rc;
1654
1655 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1656 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1657 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1658 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1659 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1660
1661 gmmR0MutexAcquire(pGMM);
1662 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1663 {
1664 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1665 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1666 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1667 {
1668 /*
1669 * Check if we can accommodate this.
1670 */
1671 /* ... later ... */
1672 if (RT_SUCCESS(rc))
1673 {
1674 /*
1675 * Update the records.
1676 */
1677 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1678 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1679 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1680 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1681 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1682 pGVM->gmm.s.Stats.fMayAllocate = true;
1683
1684 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1685 pGMM->cRegisteredVMs++;
1686 }
1687 }
1688 else
1689 rc = VERR_WRONG_ORDER;
1690 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1691 }
1692 else
1693 rc = VERR_GMM_IS_NOT_SANE;
1694 gmmR0MutexRelease(pGMM);
1695 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1696 return rc;
1697}
1698
1699
1700/**
1701 * VMMR0 request wrapper for GMMR0InitialReservation.
1702 *
1703 * @returns see GMMR0InitialReservation.
1704 * @param pGVM The global (ring-0) VM structure.
1705 * @param idCpu The VCPU id.
1706 * @param pReq Pointer to the request packet.
1707 */
1708GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1709{
1710 /*
1711 * Validate input and pass it on.
1712 */
1713 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1714 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1715 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1716
1717 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1718 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1719}
1720
1721
1722/**
1723 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1724 *
1725 * @returns VBox status code.
1726 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1727 *
1728 * @param pGVM The global (ring-0) VM structure.
1729 * @param idCpu The VCPU id.
1730 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1731 * This does not include MMIO2 and similar.
1732 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1733 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1734 * hyper heap, MMIO2 and similar.
1735 *
1736 * @thread EMT(idCpu)
1737 */
1738GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1739 uint32_t cShadowPages, uint32_t cFixedPages)
1740{
1741 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1742 pGVM, cBasePages, cShadowPages, cFixedPages));
1743
1744 /*
1745 * Validate, get basics and take the semaphore.
1746 */
1747 PGMM pGMM;
1748 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1749 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1750 if (RT_FAILURE(rc))
1751 return rc;
1752
1753 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1754 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1755 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1756
1757 gmmR0MutexAcquire(pGMM);
1758 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1759 {
1760 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1761 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1762 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1763 {
1764 /*
1765 * Check if we can accommodate this.
1766 */
1767 /* ... later ... */
1768 if (RT_SUCCESS(rc))
1769 {
1770 /*
1771 * Update the records.
1772 */
1773 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1774 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1775 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1776 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1777
1778 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1779 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1780 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1781 }
1782 }
1783 else
1784 rc = VERR_WRONG_ORDER;
1785 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1786 }
1787 else
1788 rc = VERR_GMM_IS_NOT_SANE;
1789 gmmR0MutexRelease(pGMM);
1790 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1791 return rc;
1792}
1793
1794
1795/**
1796 * VMMR0 request wrapper for GMMR0UpdateReservation.
1797 *
1798 * @returns see GMMR0UpdateReservation.
1799 * @param pGVM The global (ring-0) VM structure.
1800 * @param idCpu The VCPU id.
1801 * @param pReq Pointer to the request packet.
1802 */
1803GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1804{
1805 /*
1806 * Validate input and pass it on.
1807 */
1808 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1809 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1810
1811 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1812}
1813
1814#ifdef GMMR0_WITH_SANITY_CHECK
1815
1816/**
1817 * Performs sanity checks on a free set.
1818 *
1819 * @returns Error count.
1820 *
1821 * @param pGMM Pointer to the GMM instance.
1822 * @param pSet Pointer to the set.
1823 * @param pszSetName The set name.
1824 * @param pszFunction The function from which it was called.
1825 * @param uLine The line number.
1826 */
1827static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1828 const char *pszFunction, unsigned uLineNo)
1829{
1830 uint32_t cErrors = 0;
1831
1832 /*
1833 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1834 */
1835 uint32_t cPages = 0;
1836 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1837 {
1838 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1839 {
1840 /** @todo check that the chunk is hash into the right set. */
1841 cPages += pCur->cFree;
1842 }
1843 }
1844 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1845 {
1846 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1847 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1848 cErrors++;
1849 }
1850
1851 return cErrors;
1852}
1853
1854
1855/**
1856 * Performs some sanity checks on the GMM while owning lock.
1857 *
1858 * @returns Error count.
1859 *
1860 * @param pGMM Pointer to the GMM instance.
1861 * @param pszFunction The function from which it is called.
1862 * @param uLineNo The line number.
1863 */
1864static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1865{
1866 uint32_t cErrors = 0;
1867
1868 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1869 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1870 /** @todo add more sanity checks. */
1871
1872 return cErrors;
1873}
1874
1875#endif /* GMMR0_WITH_SANITY_CHECK */
1876
1877/**
1878 * Looks up a chunk in the tree and fill in the TLB entry for it.
1879 *
1880 * This is not expected to fail and will bitch if it does.
1881 *
1882 * @returns Pointer to the allocation chunk, NULL if not found.
1883 * @param pGMM Pointer to the GMM instance.
1884 * @param idChunk The ID of the chunk to find.
1885 * @param pTlbe Pointer to the TLB entry.
1886 *
1887 * @note Caller owns spinlock.
1888 */
1889static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1890{
1891 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1892 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1893 pTlbe->idChunk = idChunk;
1894 pTlbe->pChunk = pChunk;
1895 return pChunk;
1896}
1897
1898
1899/**
1900 * Finds a allocation chunk, spin-locked.
1901 *
1902 * This is not expected to fail and will bitch if it does.
1903 *
1904 * @returns Pointer to the allocation chunk, NULL if not found.
1905 * @param pGMM Pointer to the GMM instance.
1906 * @param idChunk The ID of the chunk to find.
1907 */
1908DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1909{
1910 /*
1911 * Do a TLB lookup, branch if not in the TLB.
1912 */
1913 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1914 PGMMCHUNK pChunk = pTlbe->pChunk;
1915 if ( pChunk == NULL
1916 || pTlbe->idChunk != idChunk)
1917 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1918 return pChunk;
1919}
1920
1921
1922/**
1923 * Finds a allocation chunk.
1924 *
1925 * This is not expected to fail and will bitch if it does.
1926 *
1927 * @returns Pointer to the allocation chunk, NULL if not found.
1928 * @param pGMM Pointer to the GMM instance.
1929 * @param idChunk The ID of the chunk to find.
1930 */
1931DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1932{
1933 RTSpinlockAcquire(pGMM->hSpinLockTree);
1934 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1935 RTSpinlockRelease(pGMM->hSpinLockTree);
1936 return pChunk;
1937}
1938
1939
1940/**
1941 * Finds a page.
1942 *
1943 * This is not expected to fail and will bitch if it does.
1944 *
1945 * @returns Pointer to the page, NULL if not found.
1946 * @param pGMM Pointer to the GMM instance.
1947 * @param idPage The ID of the page to find.
1948 */
1949DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1950{
1951 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1952 if (RT_LIKELY(pChunk))
1953 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1954 return NULL;
1955}
1956
1957
1958#if 0 /* unused */
1959/**
1960 * Gets the host physical address for a page given by it's ID.
1961 *
1962 * @returns The host physical address or NIL_RTHCPHYS.
1963 * @param pGMM Pointer to the GMM instance.
1964 * @param idPage The ID of the page to find.
1965 */
1966DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1967{
1968 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1969 if (RT_LIKELY(pChunk))
1970 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1971 return NIL_RTHCPHYS;
1972}
1973#endif /* unused */
1974
1975
1976/**
1977 * Selects the appropriate free list given the number of free pages.
1978 *
1979 * @returns Free list index.
1980 * @param cFree The number of free pages in the chunk.
1981 */
1982DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1983{
1984 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1985 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1986 ("%d (%u)\n", iList, cFree));
1987 return iList;
1988}
1989
1990
1991/**
1992 * Unlinks the chunk from the free list it's currently on (if any).
1993 *
1994 * @param pChunk The allocation chunk.
1995 */
1996DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1997{
1998 PGMMCHUNKFREESET pSet = pChunk->pSet;
1999 if (RT_LIKELY(pSet))
2000 {
2001 pSet->cFreePages -= pChunk->cFree;
2002 pSet->idGeneration++;
2003
2004 PGMMCHUNK pPrev = pChunk->pFreePrev;
2005 PGMMCHUNK pNext = pChunk->pFreeNext;
2006 if (pPrev)
2007 pPrev->pFreeNext = pNext;
2008 else
2009 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
2010 if (pNext)
2011 pNext->pFreePrev = pPrev;
2012
2013 pChunk->pSet = NULL;
2014 pChunk->pFreeNext = NULL;
2015 pChunk->pFreePrev = NULL;
2016 }
2017 else
2018 {
2019 Assert(!pChunk->pFreeNext);
2020 Assert(!pChunk->pFreePrev);
2021 Assert(!pChunk->cFree);
2022 }
2023}
2024
2025
2026/**
2027 * Links the chunk onto the appropriate free list in the specified free set.
2028 *
2029 * If no free entries, it's not linked into any list.
2030 *
2031 * @param pChunk The allocation chunk.
2032 * @param pSet The free set.
2033 */
2034DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
2035{
2036 Assert(!pChunk->pSet);
2037 Assert(!pChunk->pFreeNext);
2038 Assert(!pChunk->pFreePrev);
2039
2040 if (pChunk->cFree > 0)
2041 {
2042 pChunk->pSet = pSet;
2043 pChunk->pFreePrev = NULL;
2044 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
2045 pChunk->pFreeNext = pSet->apLists[iList];
2046 if (pChunk->pFreeNext)
2047 pChunk->pFreeNext->pFreePrev = pChunk;
2048 pSet->apLists[iList] = pChunk;
2049
2050 pSet->cFreePages += pChunk->cFree;
2051 pSet->idGeneration++;
2052 }
2053}
2054
2055
2056/**
2057 * Links the chunk onto the appropriate free list in the specified free set.
2058 *
2059 * If no free entries, it's not linked into any list.
2060 *
2061 * @param pGMM Pointer to the GMM instance.
2062 * @param pGVM Pointer to the kernel-only VM instace data.
2063 * @param pChunk The allocation chunk.
2064 */
2065DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2066{
2067 PGMMCHUNKFREESET pSet;
2068 if (pGMM->fBoundMemoryMode)
2069 pSet = &pGVM->gmm.s.Private;
2070 else if (pChunk->cShared)
2071 pSet = &pGMM->Shared;
2072 else
2073 pSet = &pGMM->PrivateX;
2074 gmmR0LinkChunk(pChunk, pSet);
2075}
2076
2077
2078/**
2079 * Frees a Chunk ID.
2080 *
2081 * @param pGMM Pointer to the GMM instance.
2082 * @param idChunk The Chunk ID to free.
2083 */
2084static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2085{
2086 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2087 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2088 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2089}
2090
2091
2092/**
2093 * Allocates a new Chunk ID.
2094 *
2095 * @returns The Chunk ID.
2096 * @param pGMM Pointer to the GMM instance.
2097 */
2098static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2099{
2100 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2101 AssertCompile(NIL_GMM_CHUNKID == 0);
2102
2103 /*
2104 * Try the next sequential one.
2105 */
2106 int32_t idChunk = ++pGMM->idChunkPrev;
2107#if 0 /** @todo enable this code */
2108 if ( idChunk <= GMM_CHUNKID_LAST
2109 && idChunk > NIL_GMM_CHUNKID
2110 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2111 return idChunk;
2112#endif
2113
2114 /*
2115 * Scan sequentially from the last one.
2116 */
2117 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2118 && idChunk > NIL_GMM_CHUNKID)
2119 {
2120 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2121 if (idChunk > NIL_GMM_CHUNKID)
2122 {
2123 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2124 return pGMM->idChunkPrev = idChunk;
2125 }
2126 }
2127
2128 /*
2129 * Ok, scan from the start.
2130 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2131 */
2132 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2133 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2134 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2135
2136 return pGMM->idChunkPrev = idChunk;
2137}
2138
2139
2140/**
2141 * Allocates one private page.
2142 *
2143 * Worker for gmmR0AllocatePages.
2144 *
2145 * @param pChunk The chunk to allocate it from.
2146 * @param hGVM The GVM handle of the VM requesting memory.
2147 * @param pPageDesc The page descriptor.
2148 */
2149static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2150{
2151 /* update the chunk stats. */
2152 if (pChunk->hGVM == NIL_GVM_HANDLE)
2153 pChunk->hGVM = hGVM;
2154 Assert(pChunk->cFree);
2155 pChunk->cFree--;
2156 pChunk->cPrivate++;
2157
2158 /* unlink the first free page. */
2159 const uint32_t iPage = pChunk->iFreeHead;
2160 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2161 PGMMPAGE pPage = &pChunk->aPages[iPage];
2162 Assert(GMM_PAGE_IS_FREE(pPage));
2163 pChunk->iFreeHead = pPage->Free.iNext;
2164 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2165 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2166 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2167
2168 /* make the page private. */
2169 pPage->u = 0;
2170 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2171 pPage->Private.hGVM = hGVM;
2172 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2173 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2174 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2175 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2176 else
2177 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2178
2179 /* update the page descriptor. */
2180 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2181 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2182 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2183 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2184}
2185
2186
2187/**
2188 * Picks the free pages from a chunk.
2189 *
2190 * @returns The new page descriptor table index.
2191 * @param pChunk The chunk.
2192 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2193 * affinity.
2194 * @param iPage The current page descriptor table index.
2195 * @param cPages The total number of pages to allocate.
2196 * @param paPages The page descriptor table (input + ouput).
2197 */
2198static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2199 PGMMPAGEDESC paPages)
2200{
2201 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2202 gmmR0UnlinkChunk(pChunk);
2203
2204 for (; pChunk->cFree && iPage < cPages; iPage++)
2205 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2206
2207 gmmR0LinkChunk(pChunk, pSet);
2208 return iPage;
2209}
2210
2211
2212/**
2213 * Registers a new chunk of memory.
2214 *
2215 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2216 *
2217 * @returns VBox status code. On success, the giant GMM lock will be held, the
2218 * caller must release it (ugly).
2219 * @param pGMM Pointer to the GMM instance.
2220 * @param pSet Pointer to the set.
2221 * @param hMemObj The memory object for the chunk.
2222 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2223 * affinity.
2224 * @param pSession Same as @a hGVM.
2225 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2226 * @param ppChunk Chunk address (out). Optional.
2227 *
2228 * @remarks The caller must not own the giant GMM mutex.
2229 * The giant GMM mutex will be acquired and returned acquired in
2230 * the success path. On failure, no locks will be held.
2231 */
2232static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, PSUPDRVSESSION pSession,
2233 uint16_t fChunkFlags, PGMMCHUNK *ppChunk)
2234{
2235 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2236 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2237#ifdef GMM_WITH_LEGACY_MODE
2238 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2239#else
2240 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2241#endif
2242
2243#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2244 /*
2245 * Get a ring-0 mapping of the object.
2246 */
2247# ifdef GMM_WITH_LEGACY_MODE
2248 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2249# else
2250 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2251# endif
2252 if (!pbMapping)
2253 {
2254 RTR0MEMOBJ hMapObj;
2255 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2256 if (RT_SUCCESS(rc))
2257 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2258 else
2259 return rc;
2260 AssertPtr(pbMapping);
2261 }
2262#endif
2263
2264 /*
2265 * Allocate a chunk.
2266 */
2267 int rc;
2268 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2269 if (pChunk)
2270 {
2271 /*
2272 * Initialize it.
2273 */
2274 pChunk->hMemObj = hMemObj;
2275#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2276 pChunk->pbMapping = pbMapping;
2277#endif
2278 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2279 pChunk->hGVM = hGVM;
2280 /*pChunk->iFreeHead = 0;*/
2281 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2282 pChunk->iChunkMtx = UINT8_MAX;
2283 pChunk->fFlags = fChunkFlags;
2284 pChunk->uidOwner = pSession ? SUPR0GetSessionUid(pSession) : NIL_RTUID;
2285 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2286 {
2287 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2288 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2289 }
2290 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2291 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2292
2293 /*
2294 * Allocate a Chunk ID and insert it into the tree.
2295 * This has to be done behind the mutex of course.
2296 */
2297 rc = gmmR0MutexAcquire(pGMM);
2298 if (RT_SUCCESS(rc))
2299 {
2300 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2301 {
2302 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2303 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2304 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2305 {
2306 RTSpinlockAcquire(pGMM->hSpinLockTree);
2307 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2308 {
2309 pGMM->cChunks++;
2310 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2311 RTSpinlockRelease(pGMM->hSpinLockTree);
2312
2313 gmmR0LinkChunk(pChunk, pSet);
2314
2315 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2316
2317 if (ppChunk)
2318 *ppChunk = pChunk;
2319 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2320 return VINF_SUCCESS;
2321 }
2322 RTSpinlockRelease(pGMM->hSpinLockTree);
2323 }
2324
2325 /* bail out */
2326 rc = VERR_GMM_CHUNK_INSERT;
2327 }
2328 else
2329 rc = VERR_GMM_IS_NOT_SANE;
2330 gmmR0MutexRelease(pGMM);
2331 }
2332
2333 RTMemFree(pChunk);
2334 }
2335 else
2336 rc = VERR_NO_MEMORY;
2337 return rc;
2338}
2339
2340
2341/**
2342 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2343 * what's remaining to the specified free set.
2344 *
2345 * @note This will leave the giant mutex while allocating the new chunk!
2346 *
2347 * @returns VBox status code.
2348 * @param pGMM Pointer to the GMM instance data.
2349 * @param pGVM Pointer to the kernel-only VM instace data.
2350 * @param pSet Pointer to the free set.
2351 * @param cPages The number of pages requested.
2352 * @param paPages The page descriptor table (input + output).
2353 * @param piPage The pointer to the page descriptor table index variable.
2354 * This will be updated.
2355 */
2356static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2357 PGMMPAGEDESC paPages, uint32_t *piPage)
2358{
2359 gmmR0MutexRelease(pGMM);
2360
2361 RTR0MEMOBJ hMemObj;
2362#ifndef GMM_WITH_LEGACY_MODE
2363 int rc;
2364# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2365 if (pGMM->fHasWorkingAllocPhysNC)
2366 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2367 else
2368# endif
2369 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2370#else
2371 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2372#endif
2373 if (RT_SUCCESS(rc))
2374 {
2375 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2376 * free pages first and then unchaining them right afterwards. Instead
2377 * do as much work as possible without holding the giant lock. */
2378 PGMMCHUNK pChunk;
2379 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, 0 /*fChunkFlags*/, &pChunk);
2380 if (RT_SUCCESS(rc))
2381 {
2382 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2383 return VINF_SUCCESS;
2384 }
2385
2386 /* bail out */
2387 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2388 }
2389
2390 int rc2 = gmmR0MutexAcquire(pGMM);
2391 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2392 return rc;
2393
2394}
2395
2396
2397/**
2398 * As a last restort we'll pick any page we can get.
2399 *
2400 * @returns The new page descriptor table index.
2401 * @param pSet The set to pick from.
2402 * @param pGVM Pointer to the global VM structure.
2403 * @param uidSelf The UID of the caller.
2404 * @param iPage The current page descriptor table index.
2405 * @param cPages The total number of pages to allocate.
2406 * @param paPages The page descriptor table (input + ouput).
2407 */
2408static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2409 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2410{
2411 unsigned iList = RT_ELEMENTS(pSet->apLists);
2412 while (iList-- > 0)
2413 {
2414 PGMMCHUNK pChunk = pSet->apLists[iList];
2415 while (pChunk)
2416 {
2417 PGMMCHUNK pNext = pChunk->pFreeNext;
2418 if ( pChunk->uidOwner == uidSelf
2419 || ( pChunk->cMappingsX == 0
2420 && pChunk->cFree == (GMM_CHUNK_SIZE >> PAGE_SHIFT)))
2421 {
2422 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2423 if (iPage >= cPages)
2424 return iPage;
2425 }
2426
2427 pChunk = pNext;
2428 }
2429 }
2430 return iPage;
2431}
2432
2433
2434/**
2435 * Pick pages from empty chunks on the same NUMA node.
2436 *
2437 * @returns The new page descriptor table index.
2438 * @param pSet The set to pick from.
2439 * @param pGVM Pointer to the global VM structure.
2440 * @param uidSelf The UID of the caller.
2441 * @param iPage The current page descriptor table index.
2442 * @param cPages The total number of pages to allocate.
2443 * @param paPages The page descriptor table (input + ouput).
2444 */
2445static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2446 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2447{
2448 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2449 if (pChunk)
2450 {
2451 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2452 while (pChunk)
2453 {
2454 PGMMCHUNK pNext = pChunk->pFreeNext;
2455
2456 if ( pChunk->idNumaNode == idNumaNode
2457 && ( pChunk->uidOwner == uidSelf
2458 || pChunk->cMappingsX == 0))
2459 {
2460 pChunk->hGVM = pGVM->hSelf;
2461 pChunk->uidOwner = uidSelf;
2462 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2463 if (iPage >= cPages)
2464 {
2465 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2466 return iPage;
2467 }
2468 }
2469
2470 pChunk = pNext;
2471 }
2472 }
2473 return iPage;
2474}
2475
2476
2477/**
2478 * Pick pages from non-empty chunks on the same NUMA node.
2479 *
2480 * @returns The new page descriptor table index.
2481 * @param pSet The set to pick from.
2482 * @param pGVM Pointer to the global VM structure.
2483 * @param uidSelf The UID of the caller.
2484 * @param iPage The current page descriptor table index.
2485 * @param cPages The total number of pages to allocate.
2486 * @param paPages The page descriptor table (input + ouput).
2487 */
2488static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID const uidSelf,
2489 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2490{
2491 /** @todo start by picking from chunks with about the right size first? */
2492 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2493 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2494 while (iList-- > 0)
2495 {
2496 PGMMCHUNK pChunk = pSet->apLists[iList];
2497 while (pChunk)
2498 {
2499 PGMMCHUNK pNext = pChunk->pFreeNext;
2500
2501 if ( pChunk->idNumaNode == idNumaNode
2502 && pChunk->uidOwner == uidSelf)
2503 {
2504 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2505 if (iPage >= cPages)
2506 {
2507 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2508 return iPage;
2509 }
2510 }
2511
2512 pChunk = pNext;
2513 }
2514 }
2515 return iPage;
2516}
2517
2518
2519/**
2520 * Pick pages that are in chunks already associated with the VM.
2521 *
2522 * @returns The new page descriptor table index.
2523 * @param pGMM Pointer to the GMM instance data.
2524 * @param pGVM Pointer to the global VM structure.
2525 * @param pSet The set to pick from.
2526 * @param iPage The current page descriptor table index.
2527 * @param cPages The total number of pages to allocate.
2528 * @param paPages The page descriptor table (input + ouput).
2529 */
2530static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2531 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2532{
2533 uint16_t const hGVM = pGVM->hSelf;
2534
2535 /* Hint. */
2536 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2537 {
2538 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2539 if (pChunk && pChunk->cFree)
2540 {
2541 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2542 if (iPage >= cPages)
2543 return iPage;
2544 }
2545 }
2546
2547 /* Scan. */
2548 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2549 {
2550 PGMMCHUNK pChunk = pSet->apLists[iList];
2551 while (pChunk)
2552 {
2553 PGMMCHUNK pNext = pChunk->pFreeNext;
2554
2555 if (pChunk->hGVM == hGVM)
2556 {
2557 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2558 if (iPage >= cPages)
2559 {
2560 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2561 return iPage;
2562 }
2563 }
2564
2565 pChunk = pNext;
2566 }
2567 }
2568 return iPage;
2569}
2570
2571
2572
2573/**
2574 * Pick pages in bound memory mode.
2575 *
2576 * @returns The new page descriptor table index.
2577 * @param pGVM Pointer to the global VM structure.
2578 * @param iPage The current page descriptor table index.
2579 * @param cPages The total number of pages to allocate.
2580 * @param paPages The page descriptor table (input + ouput).
2581 */
2582static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2583{
2584 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2585 {
2586 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2587 while (pChunk)
2588 {
2589 Assert(pChunk->hGVM == pGVM->hSelf);
2590 PGMMCHUNK pNext = pChunk->pFreeNext;
2591 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2592 if (iPage >= cPages)
2593 return iPage;
2594 pChunk = pNext;
2595 }
2596 }
2597 return iPage;
2598}
2599
2600
2601/**
2602 * Checks if we should start picking pages from chunks of other VMs because
2603 * we're getting close to the system memory or reserved limit.
2604 *
2605 * @returns @c true if we should, @c false if we should first try allocate more
2606 * chunks.
2607 */
2608static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2609{
2610 /*
2611 * Don't allocate a new chunk if we're
2612 */
2613 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2614 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2615 - pGVM->gmm.s.Stats.cBalloonedPages
2616 /** @todo what about shared pages? */;
2617 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2618 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2619 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2620 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2621 return true;
2622 /** @todo make the threshold configurable, also test the code to see if
2623 * this ever kicks in (we might be reserving too much or smth). */
2624
2625 /*
2626 * Check how close we're to the max memory limit and how many fragments
2627 * there are?...
2628 */
2629 /** @todo */
2630
2631 return false;
2632}
2633
2634
2635/**
2636 * Checks if we should start picking pages from chunks of other VMs because
2637 * there is a lot of free pages around.
2638 *
2639 * @returns @c true if we should, @c false if we should first try allocate more
2640 * chunks.
2641 */
2642static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2643{
2644 /*
2645 * Setting the limit at 16 chunks (32 MB) at the moment.
2646 */
2647 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2648 return true;
2649 return false;
2650}
2651
2652
2653/**
2654 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2655 *
2656 * @returns VBox status code:
2657 * @retval VINF_SUCCESS on success.
2658 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2659 * gmmR0AllocateMoreChunks is necessary.
2660 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2661 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2662 * that is we're trying to allocate more than we've reserved.
2663 *
2664 * @param pGMM Pointer to the GMM instance data.
2665 * @param pGVM Pointer to the VM.
2666 * @param cPages The number of pages to allocate.
2667 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2668 * details on what is expected on input.
2669 * @param enmAccount The account to charge.
2670 *
2671 * @remarks Call takes the giant GMM lock.
2672 */
2673static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2674{
2675 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2676
2677 /*
2678 * Check allocation limits.
2679 */
2680 if (RT_LIKELY(pGMM->cAllocatedPages + cPages <= pGMM->cMaxPages))
2681 { /* likely */ }
2682 else
2683 return VERR_GMM_HIT_GLOBAL_LIMIT;
2684
2685 switch (enmAccount)
2686 {
2687 case GMMACCOUNT_BASE:
2688 if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2689 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
2690 { /* likely */ }
2691 else
2692 {
2693 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2694 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2695 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2696 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2697 }
2698 break;
2699 case GMMACCOUNT_SHADOW:
2700 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages <= pGVM->gmm.s.Stats.Reserved.cShadowPages))
2701 { /* likely */ }
2702 else
2703 {
2704 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2705 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2706 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2707 }
2708 break;
2709 case GMMACCOUNT_FIXED:
2710 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages <= pGVM->gmm.s.Stats.Reserved.cFixedPages))
2711 { /* likely */ }
2712 else
2713 {
2714 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2715 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2716 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2717 }
2718 break;
2719 default:
2720 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2721 }
2722
2723#ifdef GMM_WITH_LEGACY_MODE
2724 /*
2725 * If we're in legacy memory mode, it's easy to figure if we have
2726 * sufficient number of pages up-front.
2727 */
2728 if ( pGMM->fLegacyAllocationMode
2729 && pGVM->gmm.s.Private.cFreePages < cPages)
2730 {
2731 Assert(pGMM->fBoundMemoryMode);
2732 return VERR_GMM_SEED_ME;
2733 }
2734#endif
2735
2736 /*
2737 * Update the accounts before we proceed because we might be leaving the
2738 * protection of the global mutex and thus run the risk of permitting
2739 * too much memory to be allocated.
2740 */
2741 switch (enmAccount)
2742 {
2743 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2744 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2745 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2746 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2747 }
2748 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2749 pGMM->cAllocatedPages += cPages;
2750
2751#ifdef GMM_WITH_LEGACY_MODE
2752 /*
2753 * Part two of it's-easy-in-legacy-memory-mode.
2754 */
2755 if (pGMM->fLegacyAllocationMode)
2756 {
2757 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2758 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2759 return VINF_SUCCESS;
2760 }
2761#endif
2762
2763 /*
2764 * Bound mode is also relatively straightforward.
2765 */
2766 uint32_t iPage = 0;
2767 int rc = VINF_SUCCESS;
2768 if (pGMM->fBoundMemoryMode)
2769 {
2770 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2771 if (iPage < cPages)
2772 do
2773 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2774 while (iPage < cPages && RT_SUCCESS(rc));
2775 }
2776 /*
2777 * Shared mode is trickier as we should try archive the same locality as
2778 * in bound mode, but smartly make use of non-full chunks allocated by
2779 * other VMs if we're low on memory.
2780 */
2781 else
2782 {
2783 RTUID const uidSelf = SUPR0GetSessionUid(pGVM->pSession);
2784
2785 /* Pick the most optimal pages first. */
2786 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2787 if (iPage < cPages)
2788 {
2789 /* Maybe we should try getting pages from chunks "belonging" to
2790 other VMs before allocating more chunks? */
2791 bool fTriedOnSameAlready = false;
2792 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2793 {
2794 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2795 fTriedOnSameAlready = true;
2796 }
2797
2798 /* Allocate memory from empty chunks. */
2799 if (iPage < cPages)
2800 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2801
2802 /* Grab empty shared chunks. */
2803 if (iPage < cPages)
2804 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, uidSelf, iPage, cPages, paPages);
2805
2806 /* If there is a lof of free pages spread around, try not waste
2807 system memory on more chunks. (Should trigger defragmentation.) */
2808 if ( !fTriedOnSameAlready
2809 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2810 {
2811 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2812 if (iPage < cPages)
2813 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2814 }
2815
2816 /*
2817 * Ok, try allocate new chunks.
2818 */
2819 if (iPage < cPages)
2820 {
2821 do
2822 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2823 while (iPage < cPages && RT_SUCCESS(rc));
2824
2825#if 0 /* We cannot mix chunks with different UIDs. */
2826 /* If the host is out of memory, take whatever we can get. */
2827 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2828 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2829 {
2830 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2831 if (iPage < cPages)
2832 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2833 AssertRelease(iPage == cPages);
2834 rc = VINF_SUCCESS;
2835 }
2836#endif
2837 }
2838 }
2839 }
2840
2841 /*
2842 * Clean up on failure. Since this is bound to be a low-memory condition
2843 * we will give back any empty chunks that might be hanging around.
2844 */
2845 if (RT_SUCCESS(rc))
2846 { /* likely */ }
2847 else
2848 {
2849 /* Update the statistics. */
2850 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2851 pGMM->cAllocatedPages -= cPages - iPage;
2852 switch (enmAccount)
2853 {
2854 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2855 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2856 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2857 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2858 }
2859
2860 /* Release the pages. */
2861 while (iPage-- > 0)
2862 {
2863 uint32_t idPage = paPages[iPage].idPage;
2864 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2865 if (RT_LIKELY(pPage))
2866 {
2867 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2868 Assert(pPage->Private.hGVM == pGVM->hSelf);
2869 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2870 }
2871 else
2872 AssertMsgFailed(("idPage=%#x\n", idPage));
2873
2874 paPages[iPage].idPage = NIL_GMM_PAGEID;
2875 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2876 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2877 }
2878
2879 /* Free empty chunks. */
2880 /** @todo */
2881
2882 /* return the fail status on failure */
2883 return rc;
2884 }
2885 return VINF_SUCCESS;
2886}
2887
2888
2889/**
2890 * Updates the previous allocations and allocates more pages.
2891 *
2892 * The handy pages are always taken from the 'base' memory account.
2893 * The allocated pages are not cleared and will contains random garbage.
2894 *
2895 * @returns VBox status code:
2896 * @retval VINF_SUCCESS on success.
2897 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2898 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2899 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2900 * private page.
2901 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2902 * shared page.
2903 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2904 * owned by the VM.
2905 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2906 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2907 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2908 * that is we're trying to allocate more than we've reserved.
2909 *
2910 * @param pGVM The global (ring-0) VM structure.
2911 * @param idCpu The VCPU id.
2912 * @param cPagesToUpdate The number of pages to update (starting from the head).
2913 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2914 * @param paPages The array of page descriptors.
2915 * See GMMPAGEDESC for details on what is expected on input.
2916 * @thread EMT(idCpu)
2917 */
2918GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2919 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2920{
2921 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2922 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2923
2924 /*
2925 * Validate, get basics and take the semaphore.
2926 * (This is a relatively busy path, so make predictions where possible.)
2927 */
2928 PGMM pGMM;
2929 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2930 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2931 if (RT_FAILURE(rc))
2932 return rc;
2933
2934 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2935 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2936 || (cPagesToAlloc && cPagesToAlloc < 1024),
2937 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2938 VERR_INVALID_PARAMETER);
2939
2940 unsigned iPage = 0;
2941 for (; iPage < cPagesToUpdate; iPage++)
2942 {
2943 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2944 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2945 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2946 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2947 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2948 VERR_INVALID_PARAMETER);
2949 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2950 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2951 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2952 AssertMsgReturn( paPages[iPage].idSharedPage == NIL_GMM_PAGEID
2953 || paPages[iPage].idSharedPage <= GMM_PAGEID_LAST,
2954 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2955 }
2956
2957 for (; iPage < cPagesToAlloc; iPage++)
2958 {
2959 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2960 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2961 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2962 }
2963
2964 gmmR0MutexAcquire(pGMM);
2965 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2966 {
2967 /* No allocations before the initial reservation has been made! */
2968 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2969 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2970 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2971 {
2972 /*
2973 * Perform the updates.
2974 * Stop on the first error.
2975 */
2976 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2977 {
2978 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2979 {
2980 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2981 if (RT_LIKELY(pPage))
2982 {
2983 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2984 {
2985 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2986 {
2987 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2988 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2989 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2990 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2991 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2992 /* else: NIL_RTHCPHYS nothing */
2993
2994 paPages[iPage].idPage = NIL_GMM_PAGEID;
2995 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2996 }
2997 else
2998 {
2999 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
3000 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
3001 rc = VERR_GMM_NOT_PAGE_OWNER;
3002 break;
3003 }
3004 }
3005 else
3006 {
3007 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
3008 rc = VERR_GMM_PAGE_NOT_PRIVATE;
3009 break;
3010 }
3011 }
3012 else
3013 {
3014 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
3015 rc = VERR_GMM_PAGE_NOT_FOUND;
3016 break;
3017 }
3018 }
3019
3020 if (paPages[iPage].idSharedPage == NIL_GMM_PAGEID)
3021 { /* likely */ }
3022 else
3023 {
3024 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
3025 if (RT_LIKELY(pPage))
3026 {
3027 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3028 {
3029 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
3030 Assert(pPage->Shared.cRefs);
3031 Assert(pGVM->gmm.s.Stats.cSharedPages);
3032 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
3033
3034 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
3035 pGVM->gmm.s.Stats.cSharedPages--;
3036 pGVM->gmm.s.Stats.Allocated.cBasePages--;
3037 if (!--pPage->Shared.cRefs)
3038 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
3039 else
3040 {
3041 Assert(pGMM->cDuplicatePages);
3042 pGMM->cDuplicatePages--;
3043 }
3044
3045 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
3046 }
3047 else
3048 {
3049 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
3050 rc = VERR_GMM_PAGE_NOT_SHARED;
3051 break;
3052 }
3053 }
3054 else
3055 {
3056 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3057 rc = VERR_GMM_PAGE_NOT_FOUND;
3058 break;
3059 }
3060 }
3061 } /* for each page to update */
3062
3063 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3064 {
3065#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
3066 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3067 {
3068 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
3069 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3070 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3071 }
3072#endif
3073
3074 /*
3075 * Join paths with GMMR0AllocatePages for the allocation.
3076 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3077 */
3078 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3079 }
3080 }
3081 else
3082 rc = VERR_WRONG_ORDER;
3083 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3084 }
3085 else
3086 rc = VERR_GMM_IS_NOT_SANE;
3087 gmmR0MutexRelease(pGMM);
3088 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3089 return rc;
3090}
3091
3092
3093/**
3094 * Allocate one or more pages.
3095 *
3096 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3097 * The allocated pages are not cleared and will contain random garbage.
3098 *
3099 * @returns VBox status code:
3100 * @retval VINF_SUCCESS on success.
3101 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3102 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3103 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3104 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3105 * that is we're trying to allocate more than we've reserved.
3106 *
3107 * @param pGVM The global (ring-0) VM structure.
3108 * @param idCpu The VCPU id.
3109 * @param cPages The number of pages to allocate.
3110 * @param paPages Pointer to the page descriptors.
3111 * See GMMPAGEDESC for details on what is expected on
3112 * input.
3113 * @param enmAccount The account to charge.
3114 *
3115 * @thread EMT.
3116 */
3117GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3118{
3119 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3120
3121 /*
3122 * Validate, get basics and take the semaphore.
3123 */
3124 PGMM pGMM;
3125 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3126 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3127 if (RT_FAILURE(rc))
3128 return rc;
3129
3130 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3131 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3132 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3133
3134 for (unsigned iPage = 0; iPage < cPages; iPage++)
3135 {
3136 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3137 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3138 || ( enmAccount == GMMACCOUNT_BASE
3139 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3140 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3141 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3142 VERR_INVALID_PARAMETER);
3143 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3144 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3145 }
3146
3147 gmmR0MutexAcquire(pGMM);
3148 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3149 {
3150
3151 /* No allocations before the initial reservation has been made! */
3152 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3153 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3154 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3155 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3156 else
3157 rc = VERR_WRONG_ORDER;
3158 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3159 }
3160 else
3161 rc = VERR_GMM_IS_NOT_SANE;
3162 gmmR0MutexRelease(pGMM);
3163 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3164 return rc;
3165}
3166
3167
3168/**
3169 * VMMR0 request wrapper for GMMR0AllocatePages.
3170 *
3171 * @returns see GMMR0AllocatePages.
3172 * @param pGVM The global (ring-0) VM structure.
3173 * @param idCpu The VCPU id.
3174 * @param pReq Pointer to the request packet.
3175 */
3176GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3177{
3178 /*
3179 * Validate input and pass it on.
3180 */
3181 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3182 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3183 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3184 VERR_INVALID_PARAMETER);
3185 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3186 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3187 VERR_INVALID_PARAMETER);
3188
3189 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3190}
3191
3192
3193/**
3194 * Allocate a large page to represent guest RAM
3195 *
3196 * The allocated pages are not cleared and will contains random garbage.
3197 *
3198 * @returns VBox status code:
3199 * @retval VINF_SUCCESS on success.
3200 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3201 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3202 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3203 * that is we're trying to allocate more than we've reserved.
3204 * @retval VERR_TRY_AGAIN if the host is temporarily out of large pages.
3205 * @returns see GMMR0AllocatePages.
3206 *
3207 * @param pGVM The global (ring-0) VM structure.
3208 * @param idCpu The VCPU id.
3209 * @param cbPage Large page size.
3210 * @param pIdPage Where to return the GMM page ID of the page.
3211 * @param pHCPhys Where to return the host physical address of the page.
3212 */
3213GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3214{
3215 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3216
3217 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3218 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3219 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3220
3221 /*
3222 * Validate, get basics and take the semaphore.
3223 */
3224 PGMM pGMM;
3225 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3226 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3227 if (RT_FAILURE(rc))
3228 return rc;
3229
3230#ifdef GMM_WITH_LEGACY_MODE
3231 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3232 // if (pGMM->fLegacyAllocationMode)
3233 // return VERR_NOT_SUPPORTED;
3234#endif
3235
3236 *pHCPhys = NIL_RTHCPHYS;
3237 *pIdPage = NIL_GMM_PAGEID;
3238
3239 gmmR0MutexAcquire(pGMM);
3240 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3241 {
3242 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3243 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3244 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3245 {
3246 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3247 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3248 gmmR0MutexRelease(pGMM);
3249 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3250 }
3251
3252 /*
3253 * Allocate a new large page chunk.
3254 *
3255 * Note! We leave the giant GMM lock temporarily as the allocation might
3256 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3257 */
3258 AssertCompile(GMM_CHUNK_SIZE == _2M);
3259 gmmR0MutexRelease(pGMM);
3260
3261 RTR0MEMOBJ hMemObj;
3262 rc = RTR0MemObjAllocLarge(&hMemObj, GMM_CHUNK_SIZE, GMM_CHUNK_SIZE, RTMEMOBJ_ALLOC_LARGE_F_FAST);
3263 if (RT_SUCCESS(rc))
3264 {
3265 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3266 PGMMCHUNK pChunk;
3267 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3268 if (RT_SUCCESS(rc))
3269 {
3270 /*
3271 * Allocate all the pages in the chunk.
3272 */
3273 /* Unlink the new chunk from the free list. */
3274 gmmR0UnlinkChunk(pChunk);
3275
3276 /** @todo rewrite this to skip the looping. */
3277 /* Allocate all pages. */
3278 GMMPAGEDESC PageDesc;
3279 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3280
3281 /* Return the first page as we'll use the whole chunk as one big page. */
3282 *pIdPage = PageDesc.idPage;
3283 *pHCPhys = PageDesc.HCPhysGCPhys;
3284
3285 for (unsigned i = 1; i < cPages; i++)
3286 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3287
3288 /* Update accounting. */
3289 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3290 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3291 pGMM->cAllocatedPages += cPages;
3292
3293 gmmR0LinkChunk(pChunk, pSet);
3294 gmmR0MutexRelease(pGMM);
3295 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3296 return VINF_SUCCESS;
3297 }
3298 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3299 }
3300 }
3301 else
3302 {
3303 gmmR0MutexRelease(pGMM);
3304 rc = VERR_GMM_IS_NOT_SANE;
3305 }
3306
3307 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3308 return rc;
3309}
3310
3311
3312/**
3313 * Free a large page.
3314 *
3315 * @returns VBox status code:
3316 * @param pGVM The global (ring-0) VM structure.
3317 * @param idCpu The VCPU id.
3318 * @param idPage The large page id.
3319 */
3320GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3321{
3322 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3323
3324 /*
3325 * Validate, get basics and take the semaphore.
3326 */
3327 PGMM pGMM;
3328 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3329 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3330 if (RT_FAILURE(rc))
3331 return rc;
3332
3333#ifdef GMM_WITH_LEGACY_MODE
3334 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3335 // if (pGMM->fLegacyAllocationMode)
3336 // return VERR_NOT_SUPPORTED;
3337#endif
3338
3339 gmmR0MutexAcquire(pGMM);
3340 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3341 {
3342 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3343
3344 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3345 {
3346 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3347 gmmR0MutexRelease(pGMM);
3348 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3349 }
3350
3351 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3352 if (RT_LIKELY( pPage
3353 && GMM_PAGE_IS_PRIVATE(pPage)))
3354 {
3355 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3356 Assert(pChunk);
3357 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3358 Assert(pChunk->cPrivate > 0);
3359
3360 /* Release the memory immediately. */
3361 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3362
3363 /* Update accounting. */
3364 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3365 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3366 pGMM->cAllocatedPages -= cPages;
3367 }
3368 else
3369 rc = VERR_GMM_PAGE_NOT_FOUND;
3370 }
3371 else
3372 rc = VERR_GMM_IS_NOT_SANE;
3373
3374 gmmR0MutexRelease(pGMM);
3375 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3376 return rc;
3377}
3378
3379
3380/**
3381 * VMMR0 request wrapper for GMMR0FreeLargePage.
3382 *
3383 * @returns see GMMR0FreeLargePage.
3384 * @param pGVM The global (ring-0) VM structure.
3385 * @param idCpu The VCPU id.
3386 * @param pReq Pointer to the request packet.
3387 */
3388GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3389{
3390 /*
3391 * Validate input and pass it on.
3392 */
3393 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3394 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3395 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3396 VERR_INVALID_PARAMETER);
3397
3398 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3399}
3400
3401
3402/**
3403 * @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3404 * Used by gmmR0FreeChunkFlushPerVmTlbs().}
3405 */
3406static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3407{
3408 RT_NOREF(pvUser);
3409 if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3410 {
3411 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3412 uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3413 while (i-- > 0)
3414 {
3415 pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3416 pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3417 }
3418 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3419 }
3420 return VINF_SUCCESS;
3421}
3422
3423
3424/**
3425 * Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3426 * free generation ID value.
3427 *
3428 * This is done at 2^62 - 1, which allows us to drop all locks and as it will
3429 * take a while before 12 exa (2 305 843 009 213 693 952) calls to
3430 * gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3431 * invalidation passes and resets the generation ID between then. This will
3432 * make sure there are no false positives.
3433 *
3434 * @param pGMM Pointer to the GMM instance.
3435 */
3436static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3437{
3438 /*
3439 * First invalidation pass.
3440 */
3441 int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3442 AssertRCSuccess(rc);
3443
3444 /*
3445 * Reset the generation number.
3446 */
3447 RTSpinlockAcquire(pGMM->hSpinLockTree);
3448 ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3449 RTSpinlockRelease(pGMM->hSpinLockTree);
3450
3451 /*
3452 * Second invalidation pass.
3453 */
3454 rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3455 AssertRCSuccess(rc);
3456}
3457
3458
3459/**
3460 * Frees a chunk, giving it back to the host OS.
3461 *
3462 * @param pGMM Pointer to the GMM instance.
3463 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3464 * unmap and free the chunk in one go.
3465 * @param pChunk The chunk to free.
3466 * @param fRelaxedSem Whether we can release the semaphore while doing the
3467 * freeing (@c true) or not.
3468 */
3469static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3470{
3471 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3472
3473 GMMR0CHUNKMTXSTATE MtxState;
3474 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3475
3476 /*
3477 * Cleanup hack! Unmap the chunk from the callers address space.
3478 * This shouldn't happen, so screw lock contention...
3479 */
3480 if ( pChunk->cMappingsX
3481#ifdef GMM_WITH_LEGACY_MODE
3482 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3483#endif
3484 && pGVM)
3485 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3486
3487 /*
3488 * If there are current mappings of the chunk, then request the
3489 * VMs to unmap them. Reposition the chunk in the free list so
3490 * it won't be a likely candidate for allocations.
3491 */
3492 if (pChunk->cMappingsX)
3493 {
3494 /** @todo R0 -> VM request */
3495 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3496 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3497 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3498 return false;
3499 }
3500
3501
3502 /*
3503 * Save and trash the handle.
3504 */
3505 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3506 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3507
3508 /*
3509 * Unlink it from everywhere.
3510 */
3511 gmmR0UnlinkChunk(pChunk);
3512
3513 RTSpinlockAcquire(pGMM->hSpinLockTree);
3514
3515 RTListNodeRemove(&pChunk->ListNode);
3516
3517 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3518 Assert(pCore == &pChunk->Core); NOREF(pCore);
3519
3520 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3521 if (pTlbe->pChunk == pChunk)
3522 {
3523 pTlbe->idChunk = NIL_GMM_CHUNKID;
3524 pTlbe->pChunk = NULL;
3525 }
3526
3527 Assert(pGMM->cChunks > 0);
3528 pGMM->cChunks--;
3529
3530 uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3531
3532 RTSpinlockRelease(pGMM->hSpinLockTree);
3533
3534 /*
3535 * Free the Chunk ID before dropping the locks and freeing the rest.
3536 */
3537 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3538 pChunk->Core.Key = NIL_GMM_CHUNKID;
3539
3540 pGMM->cFreedChunks++;
3541
3542 gmmR0ChunkMutexRelease(&MtxState, NULL);
3543 if (fRelaxedSem)
3544 gmmR0MutexRelease(pGMM);
3545
3546 if (idFreeGeneration == UINT64_MAX / 4)
3547 gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3548
3549 RTMemFree(pChunk->paMappingsX);
3550 pChunk->paMappingsX = NULL;
3551
3552 RTMemFree(pChunk);
3553
3554#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3555 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3556#else
3557 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3558#endif
3559 AssertLogRelRC(rc);
3560
3561 if (fRelaxedSem)
3562 gmmR0MutexAcquire(pGMM);
3563 return fRelaxedSem;
3564}
3565
3566
3567/**
3568 * Free page worker.
3569 *
3570 * The caller does all the statistic decrementing, we do all the incrementing.
3571 *
3572 * @param pGMM Pointer to the GMM instance data.
3573 * @param pGVM Pointer to the GVM instance.
3574 * @param pChunk Pointer to the chunk this page belongs to.
3575 * @param idPage The Page ID.
3576 * @param pPage Pointer to the page.
3577 */
3578static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3579{
3580 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3581 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3582
3583 /*
3584 * Put the page on the free list.
3585 */
3586 pPage->u = 0;
3587 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3588 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3589 pPage->Free.iNext = pChunk->iFreeHead;
3590 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3591
3592 /*
3593 * Update statistics (the cShared/cPrivate stats are up to date already),
3594 * and relink the chunk if necessary.
3595 */
3596 unsigned const cFree = pChunk->cFree;
3597 if ( !cFree
3598 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3599 {
3600 gmmR0UnlinkChunk(pChunk);
3601 pChunk->cFree++;
3602 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3603 }
3604 else
3605 {
3606 pChunk->cFree = cFree + 1;
3607 pChunk->pSet->cFreePages++;
3608 }
3609
3610 /*
3611 * If the chunk becomes empty, consider giving memory back to the host OS.
3612 *
3613 * The current strategy is to try give it back if there are other chunks
3614 * in this free list, meaning if there are at least 240 free pages in this
3615 * category. Note that since there are probably mappings of the chunk,
3616 * it won't be freed up instantly, which probably screws up this logic
3617 * a bit...
3618 */
3619 /** @todo Do this on the way out. */
3620 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3621 || pChunk->pFreeNext == NULL
3622 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3623 { /* likely */ }
3624#ifdef GMM_WITH_LEGACY_MODE
3625 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3626 { /* likely */ }
3627#endif
3628 else
3629 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3630
3631}
3632
3633
3634/**
3635 * Frees a shared page, the page is known to exist and be valid and such.
3636 *
3637 * @param pGMM Pointer to the GMM instance.
3638 * @param pGVM Pointer to the GVM instance.
3639 * @param idPage The page id.
3640 * @param pPage The page structure.
3641 */
3642DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3643{
3644 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3645 Assert(pChunk);
3646 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3647 Assert(pChunk->cShared > 0);
3648 Assert(pGMM->cSharedPages > 0);
3649 Assert(pGMM->cAllocatedPages > 0);
3650 Assert(!pPage->Shared.cRefs);
3651
3652 pChunk->cShared--;
3653 pGMM->cAllocatedPages--;
3654 pGMM->cSharedPages--;
3655 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3656}
3657
3658
3659/**
3660 * Frees a private page, the page is known to exist and be valid and such.
3661 *
3662 * @param pGMM Pointer to the GMM instance.
3663 * @param pGVM Pointer to the GVM instance.
3664 * @param idPage The page id.
3665 * @param pPage The page structure.
3666 */
3667DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3668{
3669 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3670 Assert(pChunk);
3671 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3672 Assert(pChunk->cPrivate > 0);
3673 Assert(pGMM->cAllocatedPages > 0);
3674
3675 pChunk->cPrivate--;
3676 pGMM->cAllocatedPages--;
3677 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3678}
3679
3680
3681/**
3682 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3683 *
3684 * @returns VBox status code:
3685 * @retval xxx
3686 *
3687 * @param pGMM Pointer to the GMM instance data.
3688 * @param pGVM Pointer to the VM.
3689 * @param cPages The number of pages to free.
3690 * @param paPages Pointer to the page descriptors.
3691 * @param enmAccount The account this relates to.
3692 */
3693static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3694{
3695 /*
3696 * Check that the request isn't impossible wrt to the account status.
3697 */
3698 switch (enmAccount)
3699 {
3700 case GMMACCOUNT_BASE:
3701 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3702 {
3703 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3704 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3705 }
3706 break;
3707 case GMMACCOUNT_SHADOW:
3708 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3709 {
3710 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3711 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3712 }
3713 break;
3714 case GMMACCOUNT_FIXED:
3715 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3716 {
3717 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3718 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3719 }
3720 break;
3721 default:
3722 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3723 }
3724
3725 /*
3726 * Walk the descriptors and free the pages.
3727 *
3728 * Statistics (except the account) are being updated as we go along,
3729 * unlike the alloc code. Also, stop on the first error.
3730 */
3731 int rc = VINF_SUCCESS;
3732 uint32_t iPage;
3733 for (iPage = 0; iPage < cPages; iPage++)
3734 {
3735 uint32_t idPage = paPages[iPage].idPage;
3736 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3737 if (RT_LIKELY(pPage))
3738 {
3739 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3740 {
3741 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3742 {
3743 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3744 pGVM->gmm.s.Stats.cPrivatePages--;
3745 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3746 }
3747 else
3748 {
3749 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3750 pPage->Private.hGVM, pGVM->hSelf));
3751 rc = VERR_GMM_NOT_PAGE_OWNER;
3752 break;
3753 }
3754 }
3755 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3756 {
3757 Assert(pGVM->gmm.s.Stats.cSharedPages);
3758 Assert(pPage->Shared.cRefs);
3759#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3760 if (pPage->Shared.u14Checksum)
3761 {
3762 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3763 uChecksum &= UINT32_C(0x00003fff);
3764 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3765 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3766 }
3767#endif
3768 pGVM->gmm.s.Stats.cSharedPages--;
3769 if (!--pPage->Shared.cRefs)
3770 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3771 else
3772 {
3773 Assert(pGMM->cDuplicatePages);
3774 pGMM->cDuplicatePages--;
3775 }
3776 }
3777 else
3778 {
3779 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3780 rc = VERR_GMM_PAGE_ALREADY_FREE;
3781 break;
3782 }
3783 }
3784 else
3785 {
3786 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3787 rc = VERR_GMM_PAGE_NOT_FOUND;
3788 break;
3789 }
3790 paPages[iPage].idPage = NIL_GMM_PAGEID;
3791 }
3792
3793 /*
3794 * Update the account.
3795 */
3796 switch (enmAccount)
3797 {
3798 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3799 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3800 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3801 default:
3802 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3803 }
3804
3805 /*
3806 * Any threshold stuff to be done here?
3807 */
3808
3809 return rc;
3810}
3811
3812
3813/**
3814 * Free one or more pages.
3815 *
3816 * This is typically used at reset time or power off.
3817 *
3818 * @returns VBox status code:
3819 * @retval xxx
3820 *
3821 * @param pGVM The global (ring-0) VM structure.
3822 * @param idCpu The VCPU id.
3823 * @param cPages The number of pages to allocate.
3824 * @param paPages Pointer to the page descriptors containing the page IDs
3825 * for each page.
3826 * @param enmAccount The account this relates to.
3827 * @thread EMT.
3828 */
3829GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3830{
3831 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3832
3833 /*
3834 * Validate input and get the basics.
3835 */
3836 PGMM pGMM;
3837 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3838 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3839 if (RT_FAILURE(rc))
3840 return rc;
3841
3842 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3843 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3844 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3845
3846 for (unsigned iPage = 0; iPage < cPages; iPage++)
3847 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3848 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3849 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3850
3851 /*
3852 * Take the semaphore and call the worker function.
3853 */
3854 gmmR0MutexAcquire(pGMM);
3855 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3856 {
3857 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3858 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3859 }
3860 else
3861 rc = VERR_GMM_IS_NOT_SANE;
3862 gmmR0MutexRelease(pGMM);
3863 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3864 return rc;
3865}
3866
3867
3868/**
3869 * VMMR0 request wrapper for GMMR0FreePages.
3870 *
3871 * @returns see GMMR0FreePages.
3872 * @param pGVM The global (ring-0) VM structure.
3873 * @param idCpu The VCPU id.
3874 * @param pReq Pointer to the request packet.
3875 */
3876GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3877{
3878 /*
3879 * Validate input and pass it on.
3880 */
3881 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3882 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3883 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3884 VERR_INVALID_PARAMETER);
3885 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3886 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3887 VERR_INVALID_PARAMETER);
3888
3889 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3890}
3891
3892
3893/**
3894 * Report back on a memory ballooning request.
3895 *
3896 * The request may or may not have been initiated by the GMM. If it was initiated
3897 * by the GMM it is important that this function is called even if no pages were
3898 * ballooned.
3899 *
3900 * @returns VBox status code:
3901 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3902 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3903 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3904 * indicating that we won't necessarily have sufficient RAM to boot
3905 * the VM again and that it should pause until this changes (we'll try
3906 * balloon some other VM). (For standard deflate we have little choice
3907 * but to hope the VM won't use the memory that was returned to it.)
3908 *
3909 * @param pGVM The global (ring-0) VM structure.
3910 * @param idCpu The VCPU id.
3911 * @param enmAction Inflate/deflate/reset.
3912 * @param cBalloonedPages The number of pages that was ballooned.
3913 *
3914 * @thread EMT(idCpu)
3915 */
3916GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3917{
3918 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3919 pGVM, enmAction, cBalloonedPages));
3920
3921 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3922
3923 /*
3924 * Validate input and get the basics.
3925 */
3926 PGMM pGMM;
3927 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3928 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3929 if (RT_FAILURE(rc))
3930 return rc;
3931
3932 /*
3933 * Take the semaphore and do some more validations.
3934 */
3935 gmmR0MutexAcquire(pGMM);
3936 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3937 {
3938 switch (enmAction)
3939 {
3940 case GMMBALLOONACTION_INFLATE:
3941 {
3942 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3943 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3944 {
3945 /*
3946 * Record the ballooned memory.
3947 */
3948 pGMM->cBalloonedPages += cBalloonedPages;
3949 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3950 {
3951 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3952 AssertFailed();
3953
3954 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3955 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3956 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3957 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3958 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3959 }
3960 else
3961 {
3962 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3963 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3964 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3965 }
3966 }
3967 else
3968 {
3969 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3970 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3971 pGVM->gmm.s.Stats.Reserved.cBasePages));
3972 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3973 }
3974 break;
3975 }
3976
3977 case GMMBALLOONACTION_DEFLATE:
3978 {
3979 /* Deflate. */
3980 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3981 {
3982 /*
3983 * Record the ballooned memory.
3984 */
3985 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3986 pGMM->cBalloonedPages -= cBalloonedPages;
3987 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3988 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3989 {
3990 AssertFailed(); /* This is path is for later. */
3991 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3992 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3993
3994 /*
3995 * Anything we need to do here now when the request has been completed?
3996 */
3997 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3998 }
3999 else
4000 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
4001 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
4002 }
4003 else
4004 {
4005 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
4006 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
4007 }
4008 break;
4009 }
4010
4011 case GMMBALLOONACTION_RESET:
4012 {
4013 /* Reset to an empty balloon. */
4014 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
4015
4016 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
4017 pGVM->gmm.s.Stats.cBalloonedPages = 0;
4018 break;
4019 }
4020
4021 default:
4022 rc = VERR_INVALID_PARAMETER;
4023 break;
4024 }
4025 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4026 }
4027 else
4028 rc = VERR_GMM_IS_NOT_SANE;
4029
4030 gmmR0MutexRelease(pGMM);
4031 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
4032 return rc;
4033}
4034
4035
4036/**
4037 * VMMR0 request wrapper for GMMR0BalloonedPages.
4038 *
4039 * @returns see GMMR0BalloonedPages.
4040 * @param pGVM The global (ring-0) VM structure.
4041 * @param idCpu The VCPU id.
4042 * @param pReq Pointer to the request packet.
4043 */
4044GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
4045{
4046 /*
4047 * Validate input and pass it on.
4048 */
4049 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4050 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
4051 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
4052 VERR_INVALID_PARAMETER);
4053
4054 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
4055}
4056
4057
4058/**
4059 * Return memory statistics for the hypervisor
4060 *
4061 * @returns VBox status code.
4062 * @param pReq Pointer to the request packet.
4063 */
4064GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
4065{
4066 /*
4067 * Validate input and pass it on.
4068 */
4069 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4070 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4071 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4072 VERR_INVALID_PARAMETER);
4073
4074 /*
4075 * Validate input and get the basics.
4076 */
4077 PGMM pGMM;
4078 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4079 pReq->cAllocPages = pGMM->cAllocatedPages;
4080 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
4081 pReq->cBalloonedPages = pGMM->cBalloonedPages;
4082 pReq->cMaxPages = pGMM->cMaxPages;
4083 pReq->cSharedPages = pGMM->cDuplicatePages;
4084 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4085
4086 return VINF_SUCCESS;
4087}
4088
4089
4090/**
4091 * Return memory statistics for the VM
4092 *
4093 * @returns VBox status code.
4094 * @param pGVM The global (ring-0) VM structure.
4095 * @param idCpu Cpu id.
4096 * @param pReq Pointer to the request packet.
4097 *
4098 * @thread EMT(idCpu)
4099 */
4100GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4101{
4102 /*
4103 * Validate input and pass it on.
4104 */
4105 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4106 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4107 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4108 VERR_INVALID_PARAMETER);
4109
4110 /*
4111 * Validate input and get the basics.
4112 */
4113 PGMM pGMM;
4114 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4115 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4116 if (RT_FAILURE(rc))
4117 return rc;
4118
4119 /*
4120 * Take the semaphore and do some more validations.
4121 */
4122 gmmR0MutexAcquire(pGMM);
4123 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4124 {
4125 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4126 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4127 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4128 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4129 }
4130 else
4131 rc = VERR_GMM_IS_NOT_SANE;
4132
4133 gmmR0MutexRelease(pGMM);
4134 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4135 return rc;
4136}
4137
4138
4139/**
4140 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4141 *
4142 * Don't call this in legacy allocation mode!
4143 *
4144 * @returns VBox status code.
4145 * @param pGMM Pointer to the GMM instance data.
4146 * @param pGVM Pointer to the Global VM structure.
4147 * @param pChunk Pointer to the chunk to be unmapped.
4148 */
4149static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4150{
4151 RT_NOREF_PV(pGMM);
4152#ifdef GMM_WITH_LEGACY_MODE
4153 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4154#endif
4155
4156 /*
4157 * Find the mapping and try unmapping it.
4158 */
4159 uint32_t cMappings = pChunk->cMappingsX;
4160 for (uint32_t i = 0; i < cMappings; i++)
4161 {
4162 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4163 if (pChunk->paMappingsX[i].pGVM == pGVM)
4164 {
4165 /* unmap */
4166 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4167 if (RT_SUCCESS(rc))
4168 {
4169 /* update the record. */
4170 cMappings--;
4171 if (i < cMappings)
4172 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4173 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4174 pChunk->paMappingsX[cMappings].pGVM = NULL;
4175 Assert(pChunk->cMappingsX - 1U == cMappings);
4176 pChunk->cMappingsX = cMappings;
4177 }
4178
4179 return rc;
4180 }
4181 }
4182
4183 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4184 return VERR_GMM_CHUNK_NOT_MAPPED;
4185}
4186
4187
4188/**
4189 * Unmaps a chunk previously mapped into the address space of the current process.
4190 *
4191 * @returns VBox status code.
4192 * @param pGMM Pointer to the GMM instance data.
4193 * @param pGVM Pointer to the Global VM structure.
4194 * @param pChunk Pointer to the chunk to be unmapped.
4195 * @param fRelaxedSem Whether we can release the semaphore while doing the
4196 * mapping (@c true) or not.
4197 */
4198static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4199{
4200#ifdef GMM_WITH_LEGACY_MODE
4201 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4202 {
4203#endif
4204 /*
4205 * Lock the chunk and if possible leave the giant GMM lock.
4206 */
4207 GMMR0CHUNKMTXSTATE MtxState;
4208 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4209 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4210 if (RT_SUCCESS(rc))
4211 {
4212 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4213 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4214 }
4215 return rc;
4216#ifdef GMM_WITH_LEGACY_MODE
4217 }
4218
4219 if (pChunk->hGVM == pGVM->hSelf)
4220 return VINF_SUCCESS;
4221
4222 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4223 return VERR_GMM_CHUNK_NOT_MAPPED;
4224#endif
4225}
4226
4227
4228/**
4229 * Worker for gmmR0MapChunk.
4230 *
4231 * @returns VBox status code.
4232 * @param pGMM Pointer to the GMM instance data.
4233 * @param pGVM Pointer to the Global VM structure.
4234 * @param pChunk Pointer to the chunk to be mapped.
4235 * @param ppvR3 Where to store the ring-3 address of the mapping.
4236 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4237 * contain the address of the existing mapping.
4238 */
4239static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4240{
4241#ifdef GMM_WITH_LEGACY_MODE
4242 /*
4243 * If we're in legacy mode this is simple.
4244 */
4245 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4246 {
4247 if (pChunk->hGVM != pGVM->hSelf)
4248 {
4249 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4250 return VERR_GMM_CHUNK_NOT_FOUND;
4251 }
4252
4253 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4254 return VINF_SUCCESS;
4255 }
4256#else
4257 RT_NOREF(pGMM);
4258#endif
4259
4260 /*
4261 * Check to see if the chunk is already mapped.
4262 */
4263 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4264 {
4265 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4266 if (pChunk->paMappingsX[i].pGVM == pGVM)
4267 {
4268 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4269 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4270#ifdef VBOX_WITH_PAGE_SHARING
4271 /* The ring-3 chunk cache can be out of sync; don't fail. */
4272 return VINF_SUCCESS;
4273#else
4274 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4275#endif
4276 }
4277 }
4278
4279 /*
4280 * Do the mapping.
4281 */
4282 RTR0MEMOBJ hMapObj;
4283 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4284 if (RT_SUCCESS(rc))
4285 {
4286 /* reallocate the array? assumes few users per chunk (usually one). */
4287 unsigned iMapping = pChunk->cMappingsX;
4288 if ( iMapping <= 3
4289 || (iMapping & 3) == 0)
4290 {
4291 unsigned cNewSize = iMapping <= 3
4292 ? iMapping + 1
4293 : iMapping + 4;
4294 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4295 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4296 {
4297 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4298 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4299 }
4300
4301 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4302 if (RT_UNLIKELY(!pvMappings))
4303 {
4304 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4305 return VERR_NO_MEMORY;
4306 }
4307 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4308 }
4309
4310 /* insert new entry */
4311 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4312 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4313 Assert(pChunk->cMappingsX == iMapping);
4314 pChunk->cMappingsX = iMapping + 1;
4315
4316 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4317 }
4318
4319 return rc;
4320}
4321
4322
4323/**
4324 * Maps a chunk into the user address space of the current process.
4325 *
4326 * @returns VBox status code.
4327 * @param pGMM Pointer to the GMM instance data.
4328 * @param pGVM Pointer to the Global VM structure.
4329 * @param pChunk Pointer to the chunk to be mapped.
4330 * @param fRelaxedSem Whether we can release the semaphore while doing the
4331 * mapping (@c true) or not.
4332 * @param ppvR3 Where to store the ring-3 address of the mapping.
4333 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4334 * contain the address of the existing mapping.
4335 */
4336static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4337{
4338 /*
4339 * Take the chunk lock and leave the giant GMM lock when possible, then
4340 * call the worker function.
4341 */
4342 GMMR0CHUNKMTXSTATE MtxState;
4343 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4344 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4345 if (RT_SUCCESS(rc))
4346 {
4347 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4348 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4349 }
4350
4351 return rc;
4352}
4353
4354
4355
4356#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4357/**
4358 * Check if a chunk is mapped into the specified VM
4359 *
4360 * @returns mapped yes/no
4361 * @param pGMM Pointer to the GMM instance.
4362 * @param pGVM Pointer to the Global VM structure.
4363 * @param pChunk Pointer to the chunk to be mapped.
4364 * @param ppvR3 Where to store the ring-3 address of the mapping.
4365 */
4366static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4367{
4368 GMMR0CHUNKMTXSTATE MtxState;
4369 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4370 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4371 {
4372 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4373 if (pChunk->paMappingsX[i].pGVM == pGVM)
4374 {
4375 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4376 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4377 return true;
4378 }
4379 }
4380 *ppvR3 = NULL;
4381 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4382 return false;
4383}
4384#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4385
4386
4387/**
4388 * Map a chunk and/or unmap another chunk.
4389 *
4390 * The mapping and unmapping applies to the current process.
4391 *
4392 * This API does two things because it saves a kernel call per mapping when
4393 * when the ring-3 mapping cache is full.
4394 *
4395 * @returns VBox status code.
4396 * @param pGVM The global (ring-0) VM structure.
4397 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4398 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4399 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4400 * @thread EMT ???
4401 */
4402GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4403{
4404 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4405 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4406
4407 /*
4408 * Validate input and get the basics.
4409 */
4410 PGMM pGMM;
4411 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4412 int rc = GVMMR0ValidateGVM(pGVM);
4413 if (RT_FAILURE(rc))
4414 return rc;
4415
4416 AssertCompile(NIL_GMM_CHUNKID == 0);
4417 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4418 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4419
4420 if ( idChunkMap == NIL_GMM_CHUNKID
4421 && idChunkUnmap == NIL_GMM_CHUNKID)
4422 return VERR_INVALID_PARAMETER;
4423
4424 if (idChunkMap != NIL_GMM_CHUNKID)
4425 {
4426 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4427 *ppvR3 = NIL_RTR3PTR;
4428 }
4429
4430 /*
4431 * Take the semaphore and do the work.
4432 *
4433 * The unmapping is done last since it's easier to undo a mapping than
4434 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4435 * that it pushes the user virtual address space to within a chunk of
4436 * it it's limits, so, no problem here.
4437 */
4438 gmmR0MutexAcquire(pGMM);
4439 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4440 {
4441 PGMMCHUNK pMap = NULL;
4442 if (idChunkMap != NIL_GVM_HANDLE)
4443 {
4444 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4445 if (RT_LIKELY(pMap))
4446 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4447 else
4448 {
4449 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4450 rc = VERR_GMM_CHUNK_NOT_FOUND;
4451 }
4452 }
4453/** @todo split this operation, the bail out might (theoretcially) not be
4454 * entirely safe. */
4455
4456 if ( idChunkUnmap != NIL_GMM_CHUNKID
4457 && RT_SUCCESS(rc))
4458 {
4459 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4460 if (RT_LIKELY(pUnmap))
4461 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4462 else
4463 {
4464 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4465 rc = VERR_GMM_CHUNK_NOT_FOUND;
4466 }
4467
4468 if (RT_FAILURE(rc) && pMap)
4469 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4470 }
4471
4472 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4473 }
4474 else
4475 rc = VERR_GMM_IS_NOT_SANE;
4476 gmmR0MutexRelease(pGMM);
4477
4478 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4479 return rc;
4480}
4481
4482
4483/**
4484 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4485 *
4486 * @returns see GMMR0MapUnmapChunk.
4487 * @param pGVM The global (ring-0) VM structure.
4488 * @param pReq Pointer to the request packet.
4489 */
4490GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4491{
4492 /*
4493 * Validate input and pass it on.
4494 */
4495 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4496 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4497
4498 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4499}
4500
4501
4502/**
4503 * Legacy mode API for supplying pages.
4504 *
4505 * The specified user address points to a allocation chunk sized block that
4506 * will be locked down and used by the GMM when the GM asks for pages.
4507 *
4508 * @returns VBox status code.
4509 * @param pGVM The global (ring-0) VM structure.
4510 * @param idCpu The VCPU id.
4511 * @param pvR3 Pointer to the chunk size memory block to lock down.
4512 */
4513GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4514{
4515#ifdef GMM_WITH_LEGACY_MODE
4516 /*
4517 * Validate input and get the basics.
4518 */
4519 PGMM pGMM;
4520 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4521 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4522 if (RT_FAILURE(rc))
4523 return rc;
4524
4525 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4526 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4527
4528 if (!pGMM->fLegacyAllocationMode)
4529 {
4530 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4531 return VERR_NOT_SUPPORTED;
4532 }
4533
4534 /*
4535 * Lock the memory and add it as new chunk with our hGVM.
4536 * (The GMM locking is done inside gmmR0RegisterChunk.)
4537 */
4538 RTR0MEMOBJ hMemObj;
4539 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4540 if (RT_SUCCESS(rc))
4541 {
4542 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, pGVM->pSession, GMM_CHUNK_FLAGS_SEEDED, NULL);
4543 if (RT_SUCCESS(rc))
4544 gmmR0MutexRelease(pGMM);
4545 else
4546 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4547 }
4548
4549 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4550 return rc;
4551#else
4552 RT_NOREF(pGVM, idCpu, pvR3);
4553 return VERR_NOT_SUPPORTED;
4554#endif
4555}
4556
4557
4558#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4559/**
4560 * Gets the ring-0 virtual address for the given page.
4561 *
4562 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4563 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4564 * corresponding chunk will remain valid beyond the call (at least till the EMT
4565 * returns to ring-3).
4566 *
4567 * @returns VBox status code.
4568 * @param pGVM Pointer to the kernel-only VM instace data.
4569 * @param idPage The page ID.
4570 * @param ppv Where to store the address.
4571 * @thread EMT
4572 */
4573GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4574{
4575 *ppv = NULL;
4576 PGMM pGMM;
4577 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4578
4579 uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4580
4581 /*
4582 * Start with the per-VM TLB.
4583 */
4584 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4585
4586 PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4587 PGMMCHUNK pChunk = pTlbe->pChunk;
4588 if ( pChunk != NULL
4589 && pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4590 && pChunk->Core.Key == idChunk)
4591 pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4592 else
4593 {
4594 pGVM->R0Stats.gmm.cChunkTlbMisses++;
4595
4596 /*
4597 * Look it up in the chunk tree.
4598 */
4599 RTSpinlockAcquire(pGMM->hSpinLockTree);
4600 pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4601 if (RT_LIKELY(pChunk))
4602 {
4603 pTlbe->idGeneration = pGMM->idFreeGeneration;
4604 RTSpinlockRelease(pGMM->hSpinLockTree);
4605 pTlbe->pChunk = pChunk;
4606 }
4607 else
4608 {
4609 RTSpinlockRelease(pGMM->hSpinLockTree);
4610 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4611 AssertMsgFailed(("idPage=%#x\n", idPage));
4612 return VERR_GMM_PAGE_NOT_FOUND;
4613 }
4614 }
4615
4616 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4617
4618 /*
4619 * Got a chunk, now validate the page ownership and calcuate it's address.
4620 */
4621 const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4622 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4623 && pPage->Private.hGVM == pGVM->hSelf)
4624 || GMM_PAGE_IS_SHARED(pPage)))
4625 {
4626 AssertPtr(pChunk->pbMapping);
4627 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4628 return VINF_SUCCESS;
4629 }
4630 AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4631 idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4632 return VERR_GMM_NOT_PAGE_OWNER;
4633}
4634#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4635
4636#ifdef VBOX_WITH_PAGE_SHARING
4637
4638# ifdef VBOX_STRICT
4639/**
4640 * For checksumming shared pages in strict builds.
4641 *
4642 * The purpose is making sure that a page doesn't change.
4643 *
4644 * @returns Checksum, 0 on failure.
4645 * @param pGMM The GMM instance data.
4646 * @param pGVM Pointer to the kernel-only VM instace data.
4647 * @param idPage The page ID.
4648 */
4649static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4650{
4651 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4652 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4653
4654 uint8_t *pbChunk;
4655 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4656 return 0;
4657 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4658
4659 return RTCrc32(pbPage, PAGE_SIZE);
4660}
4661# endif /* VBOX_STRICT */
4662
4663
4664/**
4665 * Calculates the module hash value.
4666 *
4667 * @returns Hash value.
4668 * @param pszModuleName The module name.
4669 * @param pszVersion The module version string.
4670 */
4671static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4672{
4673 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4674}
4675
4676
4677/**
4678 * Finds a global module.
4679 *
4680 * @returns Pointer to the global module on success, NULL if not found.
4681 * @param pGMM The GMM instance data.
4682 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4683 * @param cbModule The module size.
4684 * @param enmGuestOS The guest OS type.
4685 * @param cRegions The number of regions.
4686 * @param pszModuleName The module name.
4687 * @param pszVersion The module version.
4688 * @param paRegions The region descriptions.
4689 */
4690static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4691 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4692 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4693{
4694 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4695 pGblMod;
4696 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4697 {
4698 if (pGblMod->cbModule != cbModule)
4699 continue;
4700 if (pGblMod->enmGuestOS != enmGuestOS)
4701 continue;
4702 if (pGblMod->cRegions != cRegions)
4703 continue;
4704 if (strcmp(pGblMod->szName, pszModuleName))
4705 continue;
4706 if (strcmp(pGblMod->szVersion, pszVersion))
4707 continue;
4708
4709 uint32_t i;
4710 for (i = 0; i < cRegions; i++)
4711 {
4712 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4713 if (pGblMod->aRegions[i].off != off)
4714 break;
4715
4716 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4717 if (pGblMod->aRegions[i].cb != cb)
4718 break;
4719 }
4720
4721 if (i == cRegions)
4722 return pGblMod;
4723 }
4724
4725 return NULL;
4726}
4727
4728
4729/**
4730 * Creates a new global module.
4731 *
4732 * @returns VBox status code.
4733 * @param pGMM The GMM instance data.
4734 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4735 * @param cbModule The module size.
4736 * @param enmGuestOS The guest OS type.
4737 * @param cRegions The number of regions.
4738 * @param pszModuleName The module name.
4739 * @param pszVersion The module version.
4740 * @param paRegions The region descriptions.
4741 * @param ppGblMod Where to return the new module on success.
4742 */
4743static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4744 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4745 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4746{
4747 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4748 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4749 {
4750 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4751 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4752 }
4753
4754 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4755 if (!pGblMod)
4756 {
4757 Log(("gmmR0ShModNewGlobal: No memory\n"));
4758 return VERR_NO_MEMORY;
4759 }
4760
4761 pGblMod->Core.Key = uHash;
4762 pGblMod->cbModule = cbModule;
4763 pGblMod->cRegions = cRegions;
4764 pGblMod->cUsers = 1;
4765 pGblMod->enmGuestOS = enmGuestOS;
4766 strcpy(pGblMod->szName, pszModuleName);
4767 strcpy(pGblMod->szVersion, pszVersion);
4768
4769 for (uint32_t i = 0; i < cRegions; i++)
4770 {
4771 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4772 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4773 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4774 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4775 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4776 }
4777
4778 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4779 Assert(fInsert); NOREF(fInsert);
4780 pGMM->cShareableModules++;
4781
4782 *ppGblMod = pGblMod;
4783 return VINF_SUCCESS;
4784}
4785
4786
4787/**
4788 * Deletes a global module which is no longer referenced by anyone.
4789 *
4790 * @param pGMM The GMM instance data.
4791 * @param pGblMod The module to delete.
4792 */
4793static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4794{
4795 Assert(pGblMod->cUsers == 0);
4796 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4797
4798 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4799 Assert(pvTest == pGblMod); NOREF(pvTest);
4800 pGMM->cShareableModules--;
4801
4802 uint32_t i = pGblMod->cRegions;
4803 while (i-- > 0)
4804 {
4805 if (pGblMod->aRegions[i].paidPages)
4806 {
4807 /* We don't doing anything to the pages as they are handled by the
4808 copy-on-write mechanism in PGM. */
4809 RTMemFree(pGblMod->aRegions[i].paidPages);
4810 pGblMod->aRegions[i].paidPages = NULL;
4811 }
4812 }
4813 RTMemFree(pGblMod);
4814}
4815
4816
4817static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4818 PGMMSHAREDMODULEPERVM *ppRecVM)
4819{
4820 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4821 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4822
4823 PGMMSHAREDMODULEPERVM pRecVM;
4824 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4825 if (!pRecVM)
4826 return VERR_NO_MEMORY;
4827
4828 pRecVM->Core.Key = GCBaseAddr;
4829 for (uint32_t i = 0; i < cRegions; i++)
4830 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4831
4832 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4833 Assert(fInsert); NOREF(fInsert);
4834 pGVM->gmm.s.Stats.cShareableModules++;
4835
4836 *ppRecVM = pRecVM;
4837 return VINF_SUCCESS;
4838}
4839
4840
4841static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4842{
4843 /*
4844 * Free the per-VM module.
4845 */
4846 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4847 pRecVM->pGlobalModule = NULL;
4848
4849 if (fRemove)
4850 {
4851 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4852 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4853 }
4854
4855 RTMemFree(pRecVM);
4856
4857 /*
4858 * Release the global module.
4859 * (In the registration bailout case, it might not be.)
4860 */
4861 if (pGblMod)
4862 {
4863 Assert(pGblMod->cUsers > 0);
4864 pGblMod->cUsers--;
4865 if (pGblMod->cUsers == 0)
4866 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4867 }
4868}
4869
4870#endif /* VBOX_WITH_PAGE_SHARING */
4871
4872/**
4873 * Registers a new shared module for the VM.
4874 *
4875 * @returns VBox status code.
4876 * @param pGVM The global (ring-0) VM structure.
4877 * @param idCpu The VCPU id.
4878 * @param enmGuestOS The guest OS type.
4879 * @param pszModuleName The module name.
4880 * @param pszVersion The module version.
4881 * @param GCPtrModBase The module base address.
4882 * @param cbModule The module size.
4883 * @param cRegions The mumber of shared region descriptors.
4884 * @param paRegions Pointer to an array of shared region(s).
4885 * @thread EMT(idCpu)
4886 */
4887GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4888 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4889 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4890{
4891#ifdef VBOX_WITH_PAGE_SHARING
4892 /*
4893 * Validate input and get the basics.
4894 *
4895 * Note! Turns out the module size does necessarily match the size of the
4896 * regions. (iTunes on XP)
4897 */
4898 PGMM pGMM;
4899 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4900 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4901 if (RT_FAILURE(rc))
4902 return rc;
4903
4904 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4905 return VERR_GMM_TOO_MANY_REGIONS;
4906
4907 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4908 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4909
4910 uint32_t cbTotal = 0;
4911 for (uint32_t i = 0; i < cRegions; i++)
4912 {
4913 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4914 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4915
4916 cbTotal += paRegions[i].cbRegion;
4917 if (RT_UNLIKELY(cbTotal > _1G))
4918 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4919 }
4920
4921 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4922 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4923 return VERR_GMM_MODULE_NAME_TOO_LONG;
4924
4925 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4926 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4927 return VERR_GMM_MODULE_NAME_TOO_LONG;
4928
4929 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4930 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4931
4932 /*
4933 * Take the semaphore and do some more validations.
4934 */
4935 gmmR0MutexAcquire(pGMM);
4936 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4937 {
4938 /*
4939 * Check if this module is already locally registered and register
4940 * it if it isn't. The base address is a unique module identifier
4941 * locally.
4942 */
4943 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4944 bool fNewModule = pRecVM == NULL;
4945 if (fNewModule)
4946 {
4947 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4948 if (RT_SUCCESS(rc))
4949 {
4950 /*
4951 * Find a matching global module, register a new one if needed.
4952 */
4953 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4954 pszModuleName, pszVersion, paRegions);
4955 if (!pGblMod)
4956 {
4957 Assert(fNewModule);
4958 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4959 pszModuleName, pszVersion, paRegions, &pGblMod);
4960 if (RT_SUCCESS(rc))
4961 {
4962 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4963 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4964 }
4965 else
4966 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4967 }
4968 else
4969 {
4970 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4971 pGblMod->cUsers++;
4972 pRecVM->pGlobalModule = pGblMod;
4973
4974 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4975 }
4976 }
4977 }
4978 else
4979 {
4980 /*
4981 * Attempt to re-register an existing module.
4982 */
4983 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4984 pszModuleName, pszVersion, paRegions);
4985 if (pRecVM->pGlobalModule == pGblMod)
4986 {
4987 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4988 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4989 }
4990 else
4991 {
4992 /** @todo may have to unregister+register when this happens in case it's caused
4993 * by VBoxService crashing and being restarted... */
4994 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4995 " incoming at %RGvLB%#x %s %s rgns %u\n"
4996 " existing at %RGvLB%#x %s %s rgns %u\n",
4997 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4998 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4999 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
5000 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
5001 }
5002 }
5003 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5004 }
5005 else
5006 rc = VERR_GMM_IS_NOT_SANE;
5007
5008 gmmR0MutexRelease(pGMM);
5009 return rc;
5010#else
5011
5012 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
5013 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
5014 return VERR_NOT_IMPLEMENTED;
5015#endif
5016}
5017
5018
5019/**
5020 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
5021 *
5022 * @returns see GMMR0RegisterSharedModule.
5023 * @param pGVM The global (ring-0) VM structure.
5024 * @param idCpu The VCPU id.
5025 * @param pReq Pointer to the request packet.
5026 */
5027GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
5028{
5029 /*
5030 * Validate input and pass it on.
5031 */
5032 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5033 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
5034 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
5035 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5036
5037 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
5038 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
5039 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
5040 return VINF_SUCCESS;
5041}
5042
5043
5044/**
5045 * Unregisters a shared module for the VM
5046 *
5047 * @returns VBox status code.
5048 * @param pGVM The global (ring-0) VM structure.
5049 * @param idCpu The VCPU id.
5050 * @param pszModuleName The module name.
5051 * @param pszVersion The module version.
5052 * @param GCPtrModBase The module base address.
5053 * @param cbModule The module size.
5054 */
5055GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
5056 RTGCPTR GCPtrModBase, uint32_t cbModule)
5057{
5058#ifdef VBOX_WITH_PAGE_SHARING
5059 /*
5060 * Validate input and get the basics.
5061 */
5062 PGMM pGMM;
5063 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5064 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5065 if (RT_FAILURE(rc))
5066 return rc;
5067
5068 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
5069 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
5070 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
5071 return VERR_GMM_MODULE_NAME_TOO_LONG;
5072 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
5073 return VERR_GMM_MODULE_NAME_TOO_LONG;
5074
5075 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
5076
5077 /*
5078 * Take the semaphore and do some more validations.
5079 */
5080 gmmR0MutexAcquire(pGMM);
5081 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5082 {
5083 /*
5084 * Locate and remove the specified module.
5085 */
5086 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
5087 if (pRecVM)
5088 {
5089 /** @todo Do we need to do more validations here, like that the
5090 * name + version + cbModule matches? */
5091 NOREF(cbModule);
5092 Assert(pRecVM->pGlobalModule);
5093 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
5094 }
5095 else
5096 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
5097
5098 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5099 }
5100 else
5101 rc = VERR_GMM_IS_NOT_SANE;
5102
5103 gmmR0MutexRelease(pGMM);
5104 return rc;
5105#else
5106
5107 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
5108 return VERR_NOT_IMPLEMENTED;
5109#endif
5110}
5111
5112
5113/**
5114 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
5115 *
5116 * @returns see GMMR0UnregisterSharedModule.
5117 * @param pGVM The global (ring-0) VM structure.
5118 * @param idCpu The VCPU id.
5119 * @param pReq Pointer to the request packet.
5120 */
5121GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
5122{
5123 /*
5124 * Validate input and pass it on.
5125 */
5126 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5127 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5128
5129 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
5130}
5131
5132#ifdef VBOX_WITH_PAGE_SHARING
5133
5134/**
5135 * Increase the use count of a shared page, the page is known to exist and be valid and such.
5136 *
5137 * @param pGMM Pointer to the GMM instance.
5138 * @param pGVM Pointer to the GVM instance.
5139 * @param pPage The page structure.
5140 */
5141DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
5142{
5143 Assert(pGMM->cSharedPages > 0);
5144 Assert(pGMM->cAllocatedPages > 0);
5145
5146 pGMM->cDuplicatePages++;
5147
5148 pPage->Shared.cRefs++;
5149 pGVM->gmm.s.Stats.cSharedPages++;
5150 pGVM->gmm.s.Stats.Allocated.cBasePages++;
5151}
5152
5153
5154/**
5155 * Converts a private page to a shared page, the page is known to exist and be valid and such.
5156 *
5157 * @param pGMM Pointer to the GMM instance.
5158 * @param pGVM Pointer to the GVM instance.
5159 * @param HCPhys Host physical address
5160 * @param idPage The Page ID
5161 * @param pPage The page structure.
5162 * @param pPageDesc Shared page descriptor
5163 */
5164DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5165 PGMMSHAREDPAGEDESC pPageDesc)
5166{
5167 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5168 Assert(pChunk);
5169 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5170 Assert(GMM_PAGE_IS_PRIVATE(pPage));
5171
5172 pChunk->cPrivate--;
5173 pChunk->cShared++;
5174
5175 pGMM->cSharedPages++;
5176
5177 pGVM->gmm.s.Stats.cSharedPages++;
5178 pGVM->gmm.s.Stats.cPrivatePages--;
5179
5180 /* Modify the page structure. */
5181 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5182 pPage->Shared.cRefs = 1;
5183#ifdef VBOX_STRICT
5184 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5185 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5186#else
5187 NOREF(pPageDesc);
5188 pPage->Shared.u14Checksum = 0;
5189#endif
5190 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5191}
5192
5193
5194static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5195 unsigned idxRegion, unsigned idxPage,
5196 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5197{
5198 NOREF(pModule);
5199
5200 /* Easy case: just change the internal page type. */
5201 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5202 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5203 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5204 VERR_PGM_PHYS_INVALID_PAGE_ID);
5205 NOREF(idxRegion);
5206
5207 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5208
5209 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5210
5211 /* Keep track of these references. */
5212 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5213
5214 return VINF_SUCCESS;
5215}
5216
5217/**
5218 * Checks specified shared module range for changes
5219 *
5220 * Performs the following tasks:
5221 * - If a shared page is new, then it changes the GMM page type to shared and
5222 * returns it in the pPageDesc descriptor.
5223 * - If a shared page already exists, then it checks if the VM page is
5224 * identical and if so frees the VM page and returns the shared page in
5225 * pPageDesc descriptor.
5226 *
5227 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5228 *
5229 * @returns VBox status code.
5230 * @param pGVM Pointer to the GVM instance data.
5231 * @param pModule Module description
5232 * @param idxRegion Region index
5233 * @param idxPage Page index
5234 * @param pPageDesc Page descriptor
5235 */
5236GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5237 PGMMSHAREDPAGEDESC pPageDesc)
5238{
5239 int rc;
5240 PGMM pGMM;
5241 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5242 pPageDesc->u32StrictChecksum = 0;
5243
5244 AssertMsgReturn(idxRegion < pModule->cRegions,
5245 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5246 VERR_INVALID_PARAMETER);
5247
5248 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5249 AssertMsgReturn(idxPage < cPages,
5250 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5251 VERR_INVALID_PARAMETER);
5252
5253 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5254
5255 /*
5256 * First time; create a page descriptor array.
5257 */
5258 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5259 if (!pGlobalRegion->paidPages)
5260 {
5261 Log(("Allocate page descriptor array for %d pages\n", cPages));
5262 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5263 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5264
5265 /* Invalidate all descriptors. */
5266 uint32_t i = cPages;
5267 while (i-- > 0)
5268 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5269 }
5270
5271 /*
5272 * We've seen this shared page for the first time?
5273 */
5274 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5275 {
5276 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5277 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5278 }
5279
5280 /*
5281 * We've seen it before...
5282 */
5283 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5284 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5285 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5286
5287 /*
5288 * Get the shared page source.
5289 */
5290 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5291 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5292 VERR_PGM_PHYS_INVALID_PAGE_ID);
5293
5294 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5295 {
5296 /*
5297 * Page was freed at some point; invalidate this entry.
5298 */
5299 /** @todo this isn't really bullet proof. */
5300 Log(("Old shared page was freed -> create a new one\n"));
5301 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5302 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5303 }
5304
5305 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5306
5307 /*
5308 * Calculate the virtual address of the local page.
5309 */
5310 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5311 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5312 VERR_PGM_PHYS_INVALID_PAGE_ID);
5313
5314 uint8_t *pbChunk;
5315 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5316 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5317 VERR_PGM_PHYS_INVALID_PAGE_ID);
5318 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5319
5320 /*
5321 * Calculate the virtual address of the shared page.
5322 */
5323 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5324 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5325
5326 /*
5327 * Get the virtual address of the physical page; map the chunk into the VM
5328 * process if not already done.
5329 */
5330 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5331 {
5332 Log(("Map chunk into process!\n"));
5333 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5334 AssertRCReturn(rc, rc);
5335 }
5336 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5337
5338#ifdef VBOX_STRICT
5339 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5340 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5341 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5342 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5343 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5344#endif
5345
5346 /** @todo write ASMMemComparePage. */
5347 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5348 {
5349 Log(("Unexpected differences found between local and shared page; skip\n"));
5350 /* Signal to the caller that this one hasn't changed. */
5351 pPageDesc->idPage = NIL_GMM_PAGEID;
5352 return VINF_SUCCESS;
5353 }
5354
5355 /*
5356 * Free the old local page.
5357 */
5358 GMMFREEPAGEDESC PageDesc;
5359 PageDesc.idPage = pPageDesc->idPage;
5360 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5361 AssertRCReturn(rc, rc);
5362
5363 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5364
5365 /*
5366 * Pass along the new physical address & page id.
5367 */
5368 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5369 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5370
5371 return VINF_SUCCESS;
5372}
5373
5374
5375/**
5376 * RTAvlGCPtrDestroy callback.
5377 *
5378 * @returns 0 or VERR_GMM_INSTANCE.
5379 * @param pNode The node to destroy.
5380 * @param pvArgs Pointer to an argument packet.
5381 */
5382static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5383{
5384 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5385 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5386 (PGMMSHAREDMODULEPERVM)pNode,
5387 false /*fRemove*/);
5388 return VINF_SUCCESS;
5389}
5390
5391
5392/**
5393 * Used by GMMR0CleanupVM to clean up shared modules.
5394 *
5395 * This is called without taking the GMM lock so that it can be yielded as
5396 * needed here.
5397 *
5398 * @param pGMM The GMM handle.
5399 * @param pGVM The global VM handle.
5400 */
5401static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5402{
5403 gmmR0MutexAcquire(pGMM);
5404 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5405
5406 GMMR0SHMODPERVMDTORARGS Args;
5407 Args.pGVM = pGVM;
5408 Args.pGMM = pGMM;
5409 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5410
5411 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5412 pGVM->gmm.s.Stats.cShareableModules = 0;
5413
5414 gmmR0MutexRelease(pGMM);
5415}
5416
5417#endif /* VBOX_WITH_PAGE_SHARING */
5418
5419/**
5420 * Removes all shared modules for the specified VM
5421 *
5422 * @returns VBox status code.
5423 * @param pGVM The global (ring-0) VM structure.
5424 * @param idCpu The VCPU id.
5425 */
5426GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5427{
5428#ifdef VBOX_WITH_PAGE_SHARING
5429 /*
5430 * Validate input and get the basics.
5431 */
5432 PGMM pGMM;
5433 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5434 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5435 if (RT_FAILURE(rc))
5436 return rc;
5437
5438 /*
5439 * Take the semaphore and do some more validations.
5440 */
5441 gmmR0MutexAcquire(pGMM);
5442 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5443 {
5444 Log(("GMMR0ResetSharedModules\n"));
5445 GMMR0SHMODPERVMDTORARGS Args;
5446 Args.pGVM = pGVM;
5447 Args.pGMM = pGMM;
5448 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5449 pGVM->gmm.s.Stats.cShareableModules = 0;
5450
5451 rc = VINF_SUCCESS;
5452 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5453 }
5454 else
5455 rc = VERR_GMM_IS_NOT_SANE;
5456
5457 gmmR0MutexRelease(pGMM);
5458 return rc;
5459#else
5460 RT_NOREF(pGVM, idCpu);
5461 return VERR_NOT_IMPLEMENTED;
5462#endif
5463}
5464
5465#ifdef VBOX_WITH_PAGE_SHARING
5466
5467/**
5468 * Tree enumeration callback for checking a shared module.
5469 */
5470static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5471{
5472 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5473 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5474 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5475
5476 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5477 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5478
5479 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5480 if (RT_FAILURE(rc))
5481 return rc;
5482 return VINF_SUCCESS;
5483}
5484
5485#endif /* VBOX_WITH_PAGE_SHARING */
5486
5487/**
5488 * Check all shared modules for the specified VM.
5489 *
5490 * @returns VBox status code.
5491 * @param pGVM The global (ring-0) VM structure.
5492 * @param idCpu The calling EMT number.
5493 * @thread EMT(idCpu)
5494 */
5495GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5496{
5497#ifdef VBOX_WITH_PAGE_SHARING
5498 /*
5499 * Validate input and get the basics.
5500 */
5501 PGMM pGMM;
5502 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5503 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5504 if (RT_FAILURE(rc))
5505 return rc;
5506
5507# ifndef DEBUG_sandervl
5508 /*
5509 * Take the semaphore and do some more validations.
5510 */
5511 gmmR0MutexAcquire(pGMM);
5512# endif
5513 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5514 {
5515 /*
5516 * Walk the tree, checking each module.
5517 */
5518 Log(("GMMR0CheckSharedModules\n"));
5519
5520 GMMCHECKSHAREDMODULEINFO Args;
5521 Args.pGVM = pGVM;
5522 Args.idCpu = idCpu;
5523 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5524
5525 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5526 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5527 }
5528 else
5529 rc = VERR_GMM_IS_NOT_SANE;
5530
5531# ifndef DEBUG_sandervl
5532 gmmR0MutexRelease(pGMM);
5533# endif
5534 return rc;
5535#else
5536 RT_NOREF(pGVM, idCpu);
5537 return VERR_NOT_IMPLEMENTED;
5538#endif
5539}
5540
5541#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5542
5543/**
5544 * Worker for GMMR0FindDuplicatePageReq.
5545 *
5546 * @returns true if duplicate, false if not.
5547 */
5548static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5549{
5550 bool fFoundDuplicate = false;
5551 /* Only take chunks not mapped into this VM process; not entirely correct. */
5552 uint8_t *pbChunk;
5553 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5554 {
5555 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5556 if (RT_SUCCESS(rc))
5557 {
5558 /*
5559 * Look for duplicate pages
5560 */
5561 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5562 while (iPage-- > 0)
5563 {
5564 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5565 {
5566 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5567 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5568 {
5569 fFoundDuplicate = true;
5570 break;
5571 }
5572 }
5573 }
5574 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5575 }
5576 }
5577 return fFoundDuplicate;
5578}
5579
5580
5581/**
5582 * Find a duplicate of the specified page in other active VMs
5583 *
5584 * @returns VBox status code.
5585 * @param pGVM The global (ring-0) VM structure.
5586 * @param pReq Pointer to the request packet.
5587 */
5588GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5589{
5590 /*
5591 * Validate input and pass it on.
5592 */
5593 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5594 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5595
5596 PGMM pGMM;
5597 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5598
5599 int rc = GVMMR0ValidateGVM(pGVM);
5600 if (RT_FAILURE(rc))
5601 return rc;
5602
5603 /*
5604 * Take the semaphore and do some more validations.
5605 */
5606 rc = gmmR0MutexAcquire(pGMM);
5607 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5608 {
5609 uint8_t *pbChunk;
5610 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5611 if (pChunk)
5612 {
5613 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5614 {
5615 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5616 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5617 if (pPage)
5618 {
5619 /*
5620 * Walk the chunks
5621 */
5622 pReq->fDuplicate = false;
5623 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5624 {
5625 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5626 {
5627 pReq->fDuplicate = true;
5628 break;
5629 }
5630 }
5631 }
5632 else
5633 {
5634 AssertFailed();
5635 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5636 }
5637 }
5638 else
5639 AssertFailed();
5640 }
5641 else
5642 AssertFailed();
5643 }
5644 else
5645 rc = VERR_GMM_IS_NOT_SANE;
5646
5647 gmmR0MutexRelease(pGMM);
5648 return rc;
5649}
5650
5651#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5652
5653
5654/**
5655 * Retrieves the GMM statistics visible to the caller.
5656 *
5657 * @returns VBox status code.
5658 *
5659 * @param pStats Where to put the statistics.
5660 * @param pSession The current session.
5661 * @param pGVM The GVM to obtain statistics for. Optional.
5662 */
5663GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5664{
5665 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5666
5667 /*
5668 * Validate input.
5669 */
5670 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5671 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5672 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5673
5674 PGMM pGMM;
5675 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5676
5677 /*
5678 * Validate the VM handle, if not NULL, and lock the GMM.
5679 */
5680 int rc;
5681 if (pGVM)
5682 {
5683 rc = GVMMR0ValidateGVM(pGVM);
5684 if (RT_FAILURE(rc))
5685 return rc;
5686 }
5687
5688 rc = gmmR0MutexAcquire(pGMM);
5689 if (RT_FAILURE(rc))
5690 return rc;
5691
5692 /*
5693 * Copy out the GMM statistics.
5694 */
5695 pStats->cMaxPages = pGMM->cMaxPages;
5696 pStats->cReservedPages = pGMM->cReservedPages;
5697 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5698 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5699 pStats->cSharedPages = pGMM->cSharedPages;
5700 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5701 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5702 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5703 pStats->cChunks = pGMM->cChunks;
5704 pStats->cFreedChunks = pGMM->cFreedChunks;
5705 pStats->cShareableModules = pGMM->cShareableModules;
5706 pStats->idFreeGeneration = pGMM->idFreeGeneration;
5707 RT_ZERO(pStats->au64Reserved);
5708
5709 /*
5710 * Copy out the VM statistics.
5711 */
5712 if (pGVM)
5713 pStats->VMStats = pGVM->gmm.s.Stats;
5714 else
5715 RT_ZERO(pStats->VMStats);
5716
5717 gmmR0MutexRelease(pGMM);
5718 return rc;
5719}
5720
5721
5722/**
5723 * VMMR0 request wrapper for GMMR0QueryStatistics.
5724 *
5725 * @returns see GMMR0QueryStatistics.
5726 * @param pGVM The global (ring-0) VM structure. Optional.
5727 * @param pReq Pointer to the request packet.
5728 */
5729GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5730{
5731 /*
5732 * Validate input and pass it on.
5733 */
5734 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5735 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5736
5737 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5738}
5739
5740
5741/**
5742 * Resets the specified GMM statistics.
5743 *
5744 * @returns VBox status code.
5745 *
5746 * @param pStats Which statistics to reset, that is, non-zero fields
5747 * indicates which to reset.
5748 * @param pSession The current session.
5749 * @param pGVM The GVM to reset statistics for. Optional.
5750 */
5751GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5752{
5753 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5754 /* Currently nothing we can reset at the moment. */
5755 return VINF_SUCCESS;
5756}
5757
5758
5759/**
5760 * VMMR0 request wrapper for GMMR0ResetStatistics.
5761 *
5762 * @returns see GMMR0ResetStatistics.
5763 * @param pGVM The global (ring-0) VM structure. Optional.
5764 * @param pReq Pointer to the request packet.
5765 */
5766GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5767{
5768 /*
5769 * Validate input and pass it on.
5770 */
5771 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5772 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5773
5774 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5775}
5776
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette