VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 95248

Last change on this file since 95248 was 93554, checked in by vboxsync, 3 years ago

VMM: Changed PAGE_SIZE -> GUEST_PAGE_SIZE / HOST_PAGE_SIZE, PAGE_SHIFT -> GUEST_PAGE_SHIFT / HOST_PAGE_SHIFT, and PAGE_OFFSET_MASK -> GUEST_PAGE_OFFSET_MASK / HOST_PAGE_OFFSET_MASK. Also removed most usage of ASMMemIsZeroPage and ASMMemZeroPage since the host and guest page size doesn't need to be the same any more. Some work left to do in the page pool code. bugref:9898

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 203.6 KB
Line 
1/* $Id: GMMR0.cpp 93554 2022-02-02 22:57:02Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2022 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / GUEST_PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183/* This is 64-bit only code now. */
184#if HC_ARCH_BITS != 64 || ARCH_BITS != 64
185# error "This is 64-bit only code"
186#endif
187
188
189/*********************************************************************************************************************************
190* Defined Constants And Macros *
191*********************************************************************************************************************************/
192/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
193 * Use a critical section instead of a fast mutex for the giant GMM lock.
194 *
195 * @remarks This is primarily a way of avoiding the deadlock checks in the
196 * windows driver verifier. */
197#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
198# define VBOX_USE_CRIT_SECT_FOR_GIANT
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * Because of the different layout on 32-bit and 64-bit hosts in earlier
212 * versions of the code, macros are used to get and set some of the data.
213 */
214typedef union GMMPAGE
215{
216 /** Unsigned integer view. */
217 uint64_t u;
218
219 /** The common view. */
220 struct GMMPAGECOMMON
221 {
222 uint32_t uStuff1 : 32;
223 uint32_t uStuff2 : 30;
224 /** The page state. */
225 uint32_t u2State : 2;
226 } Common;
227
228 /** The view of a private page. */
229 struct GMMPAGEPRIVATE
230 {
231 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
232 uint32_t pfn;
233 /** The GVM handle. (64K VMs) */
234 uint32_t hGVM : 16;
235 /** Reserved. */
236 uint32_t u16Reserved : 14;
237 /** The page state. */
238 uint32_t u2State : 2;
239 } Private;
240
241 /** The view of a shared page. */
242 struct GMMPAGESHARED
243 {
244 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
245 uint32_t pfn;
246 /** The reference count (64K VMs). */
247 uint32_t cRefs : 16;
248 /** Used for debug checksumming. */
249 uint32_t u14Checksum : 14;
250 /** The page state. */
251 uint32_t u2State : 2;
252 } Shared;
253
254 /** The view of a free page. */
255 struct GMMPAGEFREE
256 {
257 /** The index of the next page in the free list. UINT16_MAX is NIL. */
258 uint16_t iNext;
259 /** Reserved. Checksum or something? */
260 uint16_t u16Reserved0;
261 /** Reserved. Checksum or something? */
262 uint32_t u30Reserved1 : 29;
263 /** Set if the page was zeroed. */
264 uint32_t fZeroed : 1;
265 /** The page state. */
266 uint32_t u2State : 2;
267 } Free;
268} GMMPAGE;
269AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
270/** Pointer to a GMMPAGE. */
271typedef GMMPAGE *PGMMPAGE;
272
273
274/** @name The Page States.
275 * @{ */
276/** A private page. */
277#define GMM_PAGE_STATE_PRIVATE 0
278/** A shared page. */
279#define GMM_PAGE_STATE_SHARED 2
280/** A free page. */
281#define GMM_PAGE_STATE_FREE 3
282/** @} */
283
284
285/** @def GMM_PAGE_IS_PRIVATE
286 *
287 * @returns true if private, false if not.
288 * @param pPage The GMM page.
289 */
290#define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
291
292/** @def GMM_PAGE_IS_SHARED
293 *
294 * @returns true if shared, false if not.
295 * @param pPage The GMM page.
296 */
297#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
298
299/** @def GMM_PAGE_IS_FREE
300 *
301 * @returns true if free, false if not.
302 * @param pPage The GMM page.
303 */
304#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
305
306/** @def GMM_PAGE_PFN_LAST
307 * The last valid guest pfn range.
308 * @remark Some of the values outside the range has special meaning,
309 * see GMM_PAGE_PFN_UNSHAREABLE.
310 */
311#define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
312AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> GUEST_PAGE_SHIFT));
313
314/** @def GMM_PAGE_PFN_UNSHAREABLE
315 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
316 */
317#define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
318AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> GUEST_PAGE_SHIFT));
319
320
321/**
322 * A GMM allocation chunk ring-3 mapping record.
323 *
324 * This should really be associated with a session and not a VM, but
325 * it's simpler to associated with a VM and cleanup with the VM object
326 * is destroyed.
327 */
328typedef struct GMMCHUNKMAP
329{
330 /** The mapping object. */
331 RTR0MEMOBJ hMapObj;
332 /** The VM owning the mapping. */
333 PGVM pGVM;
334} GMMCHUNKMAP;
335/** Pointer to a GMM allocation chunk mapping. */
336typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
337
338
339/**
340 * A GMM allocation chunk.
341 */
342typedef struct GMMCHUNK
343{
344 /** The AVL node core.
345 * The Key is the chunk ID. (Giant mtx.) */
346 AVLU32NODECORE Core;
347 /** The memory object.
348 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
349 * what the host can dish up with. (Chunk mtx protects mapping accesses
350 * and related frees.) */
351 RTR0MEMOBJ hMemObj;
352#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
353 /** Pointer to the kernel mapping. */
354 uint8_t *pbMapping;
355#endif
356 /** Pointer to the next chunk in the free list. (Giant mtx.) */
357 PGMMCHUNK pFreeNext;
358 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
359 PGMMCHUNK pFreePrev;
360 /** Pointer to the free set this chunk belongs to. NULL for
361 * chunks with no free pages. (Giant mtx.) */
362 PGMMCHUNKFREESET pSet;
363 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
364 RTLISTNODE ListNode;
365 /** Pointer to an array of mappings. (Chunk mtx.) */
366 PGMMCHUNKMAP paMappingsX;
367 /** The number of mappings. (Chunk mtx.) */
368 uint16_t cMappingsX;
369 /** The mapping lock this chunk is using using. UINT8_MAX if nobody is mapping
370 * or freeing anything. (Giant mtx.) */
371 uint8_t volatile iChunkMtx;
372 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
373 uint8_t fFlags;
374 /** The head of the list of free pages. UINT16_MAX is the NIL value.
375 * (Giant mtx.) */
376 uint16_t iFreeHead;
377 /** The number of free pages. (Giant mtx.) */
378 uint16_t cFree;
379 /** The GVM handle of the VM that first allocated pages from this chunk, this
380 * is used as a preference when there are several chunks to choose from.
381 * When in bound memory mode this isn't a preference any longer. (Giant
382 * mtx.) */
383 uint16_t hGVM;
384 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
385 * future use.) (Giant mtx.) */
386 uint16_t idNumaNode;
387 /** The number of private pages. (Giant mtx.) */
388 uint16_t cPrivate;
389 /** The number of shared pages. (Giant mtx.) */
390 uint16_t cShared;
391 /** The UID this chunk is associated with. */
392 RTUID uidOwner;
393 uint32_t u32Padding;
394 /** The pages. (Giant mtx.) */
395 GMMPAGE aPages[GMM_CHUNK_NUM_PAGES];
396} GMMCHUNK;
397
398/** Indicates that the NUMA properies of the memory is unknown. */
399#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
400
401/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
402 * @{ */
403/** Indicates that the chunk is a large page (2MB). */
404#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
405/** @} */
406
407
408/**
409 * An allocation chunk TLB entry.
410 */
411typedef struct GMMCHUNKTLBE
412{
413 /** The chunk id. */
414 uint32_t idChunk;
415 /** Pointer to the chunk. */
416 PGMMCHUNK pChunk;
417} GMMCHUNKTLBE;
418/** Pointer to an allocation chunk TLB entry. */
419typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
420
421
422/** The number of entries in the allocation chunk TLB. */
423#define GMM_CHUNKTLB_ENTRIES 32
424/** Gets the TLB entry index for the given Chunk ID. */
425#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
426
427/**
428 * An allocation chunk TLB.
429 */
430typedef struct GMMCHUNKTLB
431{
432 /** The TLB entries. */
433 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
434} GMMCHUNKTLB;
435/** Pointer to an allocation chunk TLB. */
436typedef GMMCHUNKTLB *PGMMCHUNKTLB;
437
438
439/**
440 * The GMM instance data.
441 */
442typedef struct GMM
443{
444 /** Magic / eye catcher. GMM_MAGIC */
445 uint32_t u32Magic;
446 /** The number of threads waiting on the mutex. */
447 uint32_t cMtxContenders;
448#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
449 /** The critical section protecting the GMM.
450 * More fine grained locking can be implemented later if necessary. */
451 RTCRITSECT GiantCritSect;
452#else
453 /** The fast mutex protecting the GMM.
454 * More fine grained locking can be implemented later if necessary. */
455 RTSEMFASTMUTEX hMtx;
456#endif
457#ifdef VBOX_STRICT
458 /** The current mutex owner. */
459 RTNATIVETHREAD hMtxOwner;
460#endif
461 /** Spinlock protecting the AVL tree.
462 * @todo Make this a read-write spinlock as we should allow concurrent
463 * lookups. */
464 RTSPINLOCK hSpinLockTree;
465 /** The chunk tree.
466 * Protected by hSpinLockTree. */
467 PAVLU32NODECORE pChunks;
468 /** Chunk freeing generation - incremented whenever a chunk is freed. Used
469 * for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
470 * (exclusive), though higher numbers may temporarily occure while
471 * invalidating the individual TLBs during wrap-around processing. */
472 uint64_t volatile idFreeGeneration;
473 /** The chunk TLB.
474 * Protected by hSpinLockTree. */
475 GMMCHUNKTLB ChunkTLB;
476 /** The private free set. */
477 GMMCHUNKFREESET PrivateX;
478 /** The shared free set. */
479 GMMCHUNKFREESET Shared;
480
481 /** Shared module tree (global).
482 * @todo separate trees for distinctly different guest OSes. */
483 PAVLLU32NODECORE pGlobalSharedModuleTree;
484 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
485 uint32_t cShareableModules;
486
487 /** The chunk list. For simplifying the cleanup process and avoid tree
488 * traversal. */
489 RTLISTANCHOR ChunkList;
490
491 /** The maximum number of pages we're allowed to allocate.
492 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
493 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
494 uint64_t cMaxPages;
495 /** The number of pages that has been reserved.
496 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
497 uint64_t cReservedPages;
498 /** The number of pages that we have over-committed in reservations. */
499 uint64_t cOverCommittedPages;
500 /** The number of actually allocated (committed if you like) pages. */
501 uint64_t cAllocatedPages;
502 /** The number of pages that are shared. A subset of cAllocatedPages. */
503 uint64_t cSharedPages;
504 /** The number of pages that are actually shared between VMs. */
505 uint64_t cDuplicatePages;
506 /** The number of pages that are shared that has been left behind by
507 * VMs not doing proper cleanups. */
508 uint64_t cLeftBehindSharedPages;
509 /** The number of allocation chunks.
510 * (The number of pages we've allocated from the host can be derived from this.) */
511 uint32_t cChunks;
512 /** The number of current ballooned pages. */
513 uint64_t cBalloonedPages;
514
515#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
516 /** Whether #RTR0MemObjAllocPhysNC works. */
517 bool fHasWorkingAllocPhysNC;
518#else
519 bool fPadding;
520#endif
521 /** The bound memory mode indicator.
522 * When set, the memory will be bound to a specific VM and never
523 * shared. This is always set if fLegacyAllocationMode is set.
524 * (Also determined at initialization time.) */
525 bool fBoundMemoryMode;
526 /** The number of registered VMs. */
527 uint16_t cRegisteredVMs;
528
529 /** The index of the next mutex to use. */
530 uint32_t iNextChunkMtx;
531 /** Chunk locks for reducing lock contention without having to allocate
532 * one lock per chunk. */
533 struct
534 {
535 /** The mutex */
536 RTSEMFASTMUTEX hMtx;
537 /** The number of threads currently using this mutex. */
538 uint32_t volatile cUsers;
539 } aChunkMtx[64];
540
541 /** The number of freed chunks ever. This is used as list generation to
542 * avoid restarting the cleanup scanning when the list wasn't modified. */
543 uint32_t volatile cFreedChunks;
544 /** The previous allocated Chunk ID.
545 * Used as a hint to avoid scanning the whole bitmap. */
546 uint32_t idChunkPrev;
547 /** Spinlock protecting idChunkPrev & bmChunkId. */
548 RTSPINLOCK hSpinLockChunkId;
549 /** Chunk ID allocation bitmap.
550 * Bits of allocated IDs are set, free ones are clear.
551 * The NIL id (0) is marked allocated. */
552 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
553} GMM;
554/** Pointer to the GMM instance. */
555typedef GMM *PGMM;
556
557/** The value of GMM::u32Magic (Katsuhiro Otomo). */
558#define GMM_MAGIC UINT32_C(0x19540414)
559
560
561/**
562 * GMM chunk mutex state.
563 *
564 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
565 * gmmR0ChunkMutex* methods.
566 */
567typedef struct GMMR0CHUNKMTXSTATE
568{
569 PGMM pGMM;
570 /** The index of the chunk mutex. */
571 uint8_t iChunkMtx;
572 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
573 uint8_t fFlags;
574} GMMR0CHUNKMTXSTATE;
575/** Pointer to a chunk mutex state. */
576typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
577
578/** @name GMMR0CHUNK_MTX_XXX
579 * @{ */
580#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
581#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
582#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
583#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
584#define GMMR0CHUNK_MTX_END UINT32_C(4)
585/** @} */
586
587
588/** The maximum number of shared modules per-vm. */
589#define GMM_MAX_SHARED_PER_VM_MODULES 2048
590/** The maximum number of shared modules GMM is allowed to track. */
591#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
592
593
594/**
595 * Argument packet for gmmR0SharedModuleCleanup.
596 */
597typedef struct GMMR0SHMODPERVMDTORARGS
598{
599 PGVM pGVM;
600 PGMM pGMM;
601} GMMR0SHMODPERVMDTORARGS;
602
603/**
604 * Argument packet for gmmR0CheckSharedModule.
605 */
606typedef struct GMMCHECKSHAREDMODULEINFO
607{
608 PGVM pGVM;
609 VMCPUID idCpu;
610} GMMCHECKSHAREDMODULEINFO;
611
612
613/*********************************************************************************************************************************
614* Global Variables *
615*********************************************************************************************************************************/
616/** Pointer to the GMM instance data. */
617static PGMM g_pGMM = NULL;
618
619/** Macro for obtaining and validating the g_pGMM pointer.
620 *
621 * On failure it will return from the invoking function with the specified
622 * return value.
623 *
624 * @param pGMM The name of the pGMM variable.
625 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
626 * status codes.
627 */
628#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
629 do { \
630 (pGMM) = g_pGMM; \
631 AssertPtrReturn((pGMM), (rc)); \
632 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
633 } while (0)
634
635/** Macro for obtaining and validating the g_pGMM pointer, void function
636 * variant.
637 *
638 * On failure it will return from the invoking function.
639 *
640 * @param pGMM The name of the pGMM variable.
641 */
642#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
643 do { \
644 (pGMM) = g_pGMM; \
645 AssertPtrReturnVoid((pGMM)); \
646 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
647 } while (0)
648
649
650/** @def GMM_CHECK_SANITY_UPON_ENTERING
651 * Checks the sanity of the GMM instance data before making changes.
652 *
653 * This is macro is a stub by default and must be enabled manually in the code.
654 *
655 * @returns true if sane, false if not.
656 * @param pGMM The name of the pGMM variable.
657 */
658#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
659# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (RT_LIKELY(gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0))
660#else
661# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
662#endif
663
664/** @def GMM_CHECK_SANITY_UPON_LEAVING
665 * Checks the sanity of the GMM instance data after making changes.
666 *
667 * This is macro is a stub by default and must be enabled manually in the code.
668 *
669 * @returns true if sane, false if not.
670 * @param pGMM The name of the pGMM variable.
671 */
672#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
673# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
674#else
675# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
676#endif
677
678/** @def GMM_CHECK_SANITY_IN_LOOPS
679 * Checks the sanity of the GMM instance in the allocation loops.
680 *
681 * This is macro is a stub by default and must be enabled manually in the code.
682 *
683 * @returns true if sane, false if not.
684 * @param pGMM The name of the pGMM variable.
685 */
686#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
687# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
688#else
689# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
690#endif
691
692
693/*********************************************************************************************************************************
694* Internal Functions *
695*********************************************************************************************************************************/
696static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
697static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
698DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
699DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
700DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
701#ifdef GMMR0_WITH_SANITY_CHECK
702static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
703#endif
704static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
705DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
706DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
707static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
708#ifdef VBOX_WITH_PAGE_SHARING
709static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
710# ifdef VBOX_STRICT
711static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
712# endif
713#endif
714
715
716
717/**
718 * Initializes the GMM component.
719 *
720 * This is called when the VMMR0.r0 module is loaded and protected by the
721 * loader semaphore.
722 *
723 * @returns VBox status code.
724 */
725GMMR0DECL(int) GMMR0Init(void)
726{
727 LogFlow(("GMMInit:\n"));
728
729 /* Currently assuming same host and guest page size here. Can change it to
730 dish out guest pages with different size from the host page later if
731 needed, though a restriction would be the host page size must be larger
732 than the guest page size. */
733 AssertCompile(GUEST_PAGE_SIZE == HOST_PAGE_SIZE);
734 AssertCompile(GUEST_PAGE_SIZE <= HOST_PAGE_SIZE);
735
736 /*
737 * Allocate the instance data and the locks.
738 */
739 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
740 if (!pGMM)
741 return VERR_NO_MEMORY;
742
743 pGMM->u32Magic = GMM_MAGIC;
744 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
745 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
746 RTListInit(&pGMM->ChunkList);
747 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
748
749#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
750 int rc = RTCritSectInit(&pGMM->GiantCritSect);
751#else
752 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
753#endif
754 if (RT_SUCCESS(rc))
755 {
756 unsigned iMtx;
757 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
758 {
759 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
760 if (RT_FAILURE(rc))
761 break;
762 }
763 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
764 if (RT_SUCCESS(rc))
765 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
766 pGMM->hSpinLockChunkId = NIL_RTSPINLOCK;
767 if (RT_SUCCESS(rc))
768 rc = RTSpinlockCreate(&pGMM->hSpinLockChunkId, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-id");
769 if (RT_SUCCESS(rc))
770 {
771 /*
772 * Figure out how we're going to allocate stuff (only applicable to
773 * host with linear physical memory mappings).
774 */
775 pGMM->fBoundMemoryMode = false;
776#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
777 pGMM->fHasWorkingAllocPhysNC = false;
778
779 RTR0MEMOBJ hMemObj;
780 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
781 if (RT_SUCCESS(rc))
782 {
783 rc = RTR0MemObjFree(hMemObj, true);
784 AssertRC(rc);
785 pGMM->fHasWorkingAllocPhysNC = true;
786 }
787 else if (rc != VERR_NOT_SUPPORTED)
788 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
789# endif
790
791 /*
792 * Query system page count and guess a reasonable cMaxPages value.
793 */
794 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
795
796 /*
797 * The idFreeGeneration value should be set so we actually trigger the
798 * wrap-around invalidation handling during a typical test run.
799 */
800 pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
801
802 g_pGMM = pGMM;
803#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
804 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
805#else
806 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
807#endif
808 return VINF_SUCCESS;
809 }
810
811 /*
812 * Bail out.
813 */
814 RTSpinlockDestroy(pGMM->hSpinLockChunkId);
815 RTSpinlockDestroy(pGMM->hSpinLockTree);
816 while (iMtx-- > 0)
817 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
818#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
819 RTCritSectDelete(&pGMM->GiantCritSect);
820#else
821 RTSemFastMutexDestroy(pGMM->hMtx);
822#endif
823 }
824
825 pGMM->u32Magic = 0;
826 RTMemFree(pGMM);
827 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
828 return rc;
829}
830
831
832/**
833 * Terminates the GMM component.
834 */
835GMMR0DECL(void) GMMR0Term(void)
836{
837 LogFlow(("GMMTerm:\n"));
838
839 /*
840 * Take care / be paranoid...
841 */
842 PGMM pGMM = g_pGMM;
843 if (!RT_VALID_PTR(pGMM))
844 return;
845 if (pGMM->u32Magic != GMM_MAGIC)
846 {
847 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
848 return;
849 }
850
851 /*
852 * Undo what init did and free all the resources we've acquired.
853 */
854 /* Destroy the fundamentals. */
855 g_pGMM = NULL;
856 pGMM->u32Magic = ~GMM_MAGIC;
857#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
858 RTCritSectDelete(&pGMM->GiantCritSect);
859#else
860 RTSemFastMutexDestroy(pGMM->hMtx);
861 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
862#endif
863 RTSpinlockDestroy(pGMM->hSpinLockTree);
864 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
865 RTSpinlockDestroy(pGMM->hSpinLockChunkId);
866 pGMM->hSpinLockChunkId = NIL_RTSPINLOCK;
867
868 /* Free any chunks still hanging around. */
869 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
870
871 /* Destroy the chunk locks. */
872 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
873 {
874 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
875 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
876 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
877 }
878
879 /* Finally the instance data itself. */
880 RTMemFree(pGMM);
881 LogFlow(("GMMTerm: done\n"));
882}
883
884
885/**
886 * RTAvlU32Destroy callback.
887 *
888 * @returns 0
889 * @param pNode The node to destroy.
890 * @param pvGMM The GMM handle.
891 */
892static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
893{
894 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
895
896 if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
897 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
898 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
899
900 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
901 if (RT_FAILURE(rc))
902 {
903 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
904 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
905 AssertRC(rc);
906 }
907 pChunk->hMemObj = NIL_RTR0MEMOBJ;
908
909 RTMemFree(pChunk->paMappingsX);
910 pChunk->paMappingsX = NULL;
911
912 RTMemFree(pChunk);
913 NOREF(pvGMM);
914 return 0;
915}
916
917
918/**
919 * Initializes the per-VM data for the GMM.
920 *
921 * This is called from within the GVMM lock (from GVMMR0CreateVM)
922 * and should only initialize the data members so GMMR0CleanupVM
923 * can deal with them. We reserve no memory or anything here,
924 * that's done later in GMMR0InitVM.
925 *
926 * @param pGVM Pointer to the Global VM structure.
927 */
928GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
929{
930 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
931
932 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
933 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
934 pGVM->gmm.s.Stats.fMayAllocate = false;
935
936 pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
937 int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
938 AssertRCReturn(rc, rc);
939
940 return VINF_SUCCESS;
941}
942
943
944/**
945 * Acquires the GMM giant lock.
946 *
947 * @returns Assert status code from RTSemFastMutexRequest.
948 * @param pGMM Pointer to the GMM instance.
949 */
950static int gmmR0MutexAcquire(PGMM pGMM)
951{
952 ASMAtomicIncU32(&pGMM->cMtxContenders);
953#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
954 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
955#else
956 int rc = RTSemFastMutexRequest(pGMM->hMtx);
957#endif
958 ASMAtomicDecU32(&pGMM->cMtxContenders);
959 AssertRC(rc);
960#ifdef VBOX_STRICT
961 pGMM->hMtxOwner = RTThreadNativeSelf();
962#endif
963 return rc;
964}
965
966
967/**
968 * Releases the GMM giant lock.
969 *
970 * @returns Assert status code from RTSemFastMutexRequest.
971 * @param pGMM Pointer to the GMM instance.
972 */
973static int gmmR0MutexRelease(PGMM pGMM)
974{
975#ifdef VBOX_STRICT
976 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
977#endif
978#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
979 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
980#else
981 int rc = RTSemFastMutexRelease(pGMM->hMtx);
982 AssertRC(rc);
983#endif
984 return rc;
985}
986
987
988/**
989 * Yields the GMM giant lock if there is contention and a certain minimum time
990 * has elapsed since we took it.
991 *
992 * @returns @c true if the mutex was yielded, @c false if not.
993 * @param pGMM Pointer to the GMM instance.
994 * @param puLockNanoTS Where the lock acquisition time stamp is kept
995 * (in/out).
996 */
997static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
998{
999 /*
1000 * If nobody is contending the mutex, don't bother checking the time.
1001 */
1002 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1003 return false;
1004
1005 /*
1006 * Don't yield if we haven't executed for at least 2 milliseconds.
1007 */
1008 uint64_t uNanoNow = RTTimeSystemNanoTS();
1009 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1010 return false;
1011
1012 /*
1013 * Yield the mutex.
1014 */
1015#ifdef VBOX_STRICT
1016 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1017#endif
1018 ASMAtomicIncU32(&pGMM->cMtxContenders);
1019#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1020 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1021#else
1022 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1023#endif
1024
1025 RTThreadYield();
1026
1027#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1028 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1029#else
1030 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1031#endif
1032 *puLockNanoTS = RTTimeSystemNanoTS();
1033 ASMAtomicDecU32(&pGMM->cMtxContenders);
1034#ifdef VBOX_STRICT
1035 pGMM->hMtxOwner = RTThreadNativeSelf();
1036#endif
1037
1038 return true;
1039}
1040
1041
1042/**
1043 * Acquires a chunk lock.
1044 *
1045 * The caller must own the giant lock.
1046 *
1047 * @returns Assert status code from RTSemFastMutexRequest.
1048 * @param pMtxState The chunk mutex state info. (Avoids
1049 * passing the same flags and stuff around
1050 * for subsequent release and drop-giant
1051 * calls.)
1052 * @param pGMM Pointer to the GMM instance.
1053 * @param pChunk Pointer to the chunk.
1054 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1055 */
1056static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1057{
1058 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1059 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1060
1061 pMtxState->pGMM = pGMM;
1062 pMtxState->fFlags = (uint8_t)fFlags;
1063
1064 /*
1065 * Get the lock index and reference the lock.
1066 */
1067 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1068 uint32_t iChunkMtx = pChunk->iChunkMtx;
1069 if (iChunkMtx == UINT8_MAX)
1070 {
1071 iChunkMtx = pGMM->iNextChunkMtx++;
1072 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1073
1074 /* Try get an unused one... */
1075 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1076 {
1077 iChunkMtx = pGMM->iNextChunkMtx++;
1078 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1079 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1080 {
1081 iChunkMtx = pGMM->iNextChunkMtx++;
1082 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1083 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1084 {
1085 iChunkMtx = pGMM->iNextChunkMtx++;
1086 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1087 }
1088 }
1089 }
1090
1091 pChunk->iChunkMtx = iChunkMtx;
1092 }
1093 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1094 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1095 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1096
1097 /*
1098 * Drop the giant?
1099 */
1100 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1101 {
1102 /** @todo GMM life cycle cleanup (we may race someone
1103 * destroying and cleaning up GMM)? */
1104 gmmR0MutexRelease(pGMM);
1105 }
1106
1107 /*
1108 * Take the chunk mutex.
1109 */
1110 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1111 AssertRC(rc);
1112 return rc;
1113}
1114
1115
1116/**
1117 * Releases the GMM giant lock.
1118 *
1119 * @returns Assert status code from RTSemFastMutexRequest.
1120 * @param pMtxState Pointer to the chunk mutex state.
1121 * @param pChunk Pointer to the chunk if it's still
1122 * alive, NULL if it isn't. This is used to deassociate
1123 * the chunk from the mutex on the way out so a new one
1124 * can be selected next time, thus avoiding contented
1125 * mutexes.
1126 */
1127static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1128{
1129 PGMM pGMM = pMtxState->pGMM;
1130
1131 /*
1132 * Release the chunk mutex and reacquire the giant if requested.
1133 */
1134 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1135 AssertRC(rc);
1136 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1137 rc = gmmR0MutexAcquire(pGMM);
1138 else
1139 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1140
1141 /*
1142 * Drop the chunk mutex user reference and deassociate it from the chunk
1143 * when possible.
1144 */
1145 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1146 && pChunk
1147 && RT_SUCCESS(rc) )
1148 {
1149 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1150 pChunk->iChunkMtx = UINT8_MAX;
1151 else
1152 {
1153 rc = gmmR0MutexAcquire(pGMM);
1154 if (RT_SUCCESS(rc))
1155 {
1156 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1157 pChunk->iChunkMtx = UINT8_MAX;
1158 rc = gmmR0MutexRelease(pGMM);
1159 }
1160 }
1161 }
1162
1163 pMtxState->pGMM = NULL;
1164 return rc;
1165}
1166
1167
1168/**
1169 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1170 * chunk locked.
1171 *
1172 * This only works if gmmR0ChunkMutexAcquire was called with
1173 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1174 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1175 *
1176 * @returns VBox status code (assuming success is ok).
1177 * @param pMtxState Pointer to the chunk mutex state.
1178 */
1179static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1180{
1181 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1182 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1183 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1184 /** @todo GMM life cycle cleanup (we may race someone
1185 * destroying and cleaning up GMM)? */
1186 return gmmR0MutexRelease(pMtxState->pGMM);
1187}
1188
1189
1190/**
1191 * For experimenting with NUMA affinity and such.
1192 *
1193 * @returns The current NUMA Node ID.
1194 */
1195static uint16_t gmmR0GetCurrentNumaNodeId(void)
1196{
1197#if 1
1198 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1199#else
1200 return RTMpCpuId() / 16;
1201#endif
1202}
1203
1204
1205
1206/**
1207 * Cleans up when a VM is terminating.
1208 *
1209 * @param pGVM Pointer to the Global VM structure.
1210 */
1211GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1212{
1213 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1214
1215 PGMM pGMM;
1216 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1217
1218#ifdef VBOX_WITH_PAGE_SHARING
1219 /*
1220 * Clean up all registered shared modules first.
1221 */
1222 gmmR0SharedModuleCleanup(pGMM, pGVM);
1223#endif
1224
1225 gmmR0MutexAcquire(pGMM);
1226 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1227 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1228
1229 /*
1230 * The policy is 'INVALID' until the initial reservation
1231 * request has been serviced.
1232 */
1233 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1234 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1235 {
1236 /*
1237 * If it's the last VM around, we can skip walking all the chunk looking
1238 * for the pages owned by this VM and instead flush the whole shebang.
1239 *
1240 * This takes care of the eventuality that a VM has left shared page
1241 * references behind (shouldn't happen of course, but you never know).
1242 */
1243 Assert(pGMM->cRegisteredVMs);
1244 pGMM->cRegisteredVMs--;
1245
1246 /*
1247 * Walk the entire pool looking for pages that belong to this VM
1248 * and leftover mappings. (This'll only catch private pages,
1249 * shared pages will be 'left behind'.)
1250 */
1251 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1252 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1253
1254 unsigned iCountDown = 64;
1255 bool fRedoFromStart;
1256 PGMMCHUNK pChunk;
1257 do
1258 {
1259 fRedoFromStart = false;
1260 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1261 {
1262 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1263 if ( ( !pGMM->fBoundMemoryMode
1264 || pChunk->hGVM == pGVM->hSelf)
1265 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1266 {
1267 /* We left the giant mutex, so reset the yield counters. */
1268 uLockNanoTS = RTTimeSystemNanoTS();
1269 iCountDown = 64;
1270 }
1271 else
1272 {
1273 /* Didn't leave it, so do normal yielding. */
1274 if (!iCountDown)
1275 gmmR0MutexYield(pGMM, &uLockNanoTS);
1276 else
1277 iCountDown--;
1278 }
1279 if (pGMM->cFreedChunks != cFreeChunksOld)
1280 {
1281 fRedoFromStart = true;
1282 break;
1283 }
1284 }
1285 } while (fRedoFromStart);
1286
1287 if (pGVM->gmm.s.Stats.cPrivatePages)
1288 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1289
1290 pGMM->cAllocatedPages -= cPrivatePages;
1291
1292 /*
1293 * Free empty chunks.
1294 */
1295 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1296 do
1297 {
1298 fRedoFromStart = false;
1299 iCountDown = 10240;
1300 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1301 while (pChunk)
1302 {
1303 PGMMCHUNK pNext = pChunk->pFreeNext;
1304 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1305 if ( !pGMM->fBoundMemoryMode
1306 || pChunk->hGVM == pGVM->hSelf)
1307 {
1308 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1309 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1310 {
1311 /* We've left the giant mutex, restart? (+1 for our unlink) */
1312 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1313 if (fRedoFromStart)
1314 break;
1315 uLockNanoTS = RTTimeSystemNanoTS();
1316 iCountDown = 10240;
1317 }
1318 }
1319
1320 /* Advance and maybe yield the lock. */
1321 pChunk = pNext;
1322 if (--iCountDown == 0)
1323 {
1324 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1325 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1326 && pPrivateSet->idGeneration != idGenerationOld;
1327 if (fRedoFromStart)
1328 break;
1329 iCountDown = 10240;
1330 }
1331 }
1332 } while (fRedoFromStart);
1333
1334 /*
1335 * Account for shared pages that weren't freed.
1336 */
1337 if (pGVM->gmm.s.Stats.cSharedPages)
1338 {
1339 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1340 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1341 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1342 }
1343
1344 /*
1345 * Clean up balloon statistics in case the VM process crashed.
1346 */
1347 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1348 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1349
1350 /*
1351 * Update the over-commitment management statistics.
1352 */
1353 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1354 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1355 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1356 switch (pGVM->gmm.s.Stats.enmPolicy)
1357 {
1358 case GMMOCPOLICY_NO_OC:
1359 break;
1360 default:
1361 /** @todo Update GMM->cOverCommittedPages */
1362 break;
1363 }
1364 }
1365
1366 /* zap the GVM data. */
1367 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1368 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1369 pGVM->gmm.s.Stats.fMayAllocate = false;
1370
1371 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1372 gmmR0MutexRelease(pGMM);
1373
1374 /*
1375 * Destroy the spinlock.
1376 */
1377 RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1378 ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1379 RTSpinlockDestroy(hSpinlock);
1380
1381 LogFlow(("GMMR0CleanupVM: returns\n"));
1382}
1383
1384
1385/**
1386 * Scan one chunk for private pages belonging to the specified VM.
1387 *
1388 * @note This function may drop the giant mutex!
1389 *
1390 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1391 * we didn't.
1392 * @param pGMM Pointer to the GMM instance.
1393 * @param pGVM The global VM handle.
1394 * @param pChunk The chunk to scan.
1395 */
1396static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1397{
1398 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1399
1400 /*
1401 * Look for pages belonging to the VM.
1402 * (Perform some internal checks while we're scanning.)
1403 */
1404#ifndef VBOX_STRICT
1405 if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1406#endif
1407 {
1408 unsigned cPrivate = 0;
1409 unsigned cShared = 0;
1410 unsigned cFree = 0;
1411
1412 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1413
1414 uint16_t hGVM = pGVM->hSelf;
1415 unsigned iPage = (GMM_CHUNK_SIZE >> GUEST_PAGE_SHIFT);
1416 while (iPage-- > 0)
1417 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1418 {
1419 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1420 {
1421 /*
1422 * Free the page.
1423 *
1424 * The reason for not using gmmR0FreePrivatePage here is that we
1425 * must *not* cause the chunk to be freed from under us - we're in
1426 * an AVL tree walk here.
1427 */
1428 pChunk->aPages[iPage].u = 0;
1429 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1430 pChunk->aPages[iPage].Free.fZeroed = false;
1431 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1432 pChunk->iFreeHead = iPage;
1433 pChunk->cPrivate--;
1434 pChunk->cFree++;
1435 pGVM->gmm.s.Stats.cPrivatePages--;
1436 cFree++;
1437 }
1438 else
1439 cPrivate++;
1440 }
1441 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1442 cFree++;
1443 else
1444 cShared++;
1445
1446 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1447
1448 /*
1449 * Did it add up?
1450 */
1451 if (RT_UNLIKELY( pChunk->cFree != cFree
1452 || pChunk->cPrivate != cPrivate
1453 || pChunk->cShared != cShared))
1454 {
1455 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1456 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1457 pChunk->cFree = cFree;
1458 pChunk->cPrivate = cPrivate;
1459 pChunk->cShared = cShared;
1460 }
1461 }
1462
1463 /*
1464 * If not in bound memory mode, we should reset the hGVM field
1465 * if it has our handle in it.
1466 */
1467 if (pChunk->hGVM == pGVM->hSelf)
1468 {
1469 if (!g_pGMM->fBoundMemoryMode)
1470 pChunk->hGVM = NIL_GVM_HANDLE;
1471 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1472 {
1473 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1474 pChunk, pChunk->Core.Key, pChunk->cFree);
1475 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1476
1477 gmmR0UnlinkChunk(pChunk);
1478 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1479 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1480 }
1481 }
1482
1483 /*
1484 * Look for a mapping belonging to the terminating VM.
1485 */
1486 GMMR0CHUNKMTXSTATE MtxState;
1487 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1488 unsigned cMappings = pChunk->cMappingsX;
1489 for (unsigned i = 0; i < cMappings; i++)
1490 if (pChunk->paMappingsX[i].pGVM == pGVM)
1491 {
1492 gmmR0ChunkMutexDropGiant(&MtxState);
1493
1494 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1495
1496 cMappings--;
1497 if (i < cMappings)
1498 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1499 pChunk->paMappingsX[cMappings].pGVM = NULL;
1500 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1501 Assert(pChunk->cMappingsX - 1U == cMappings);
1502 pChunk->cMappingsX = cMappings;
1503
1504 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1505 if (RT_FAILURE(rc))
1506 {
1507 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1508 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1509 AssertRC(rc);
1510 }
1511
1512 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1513 return true;
1514 }
1515
1516 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1517 return false;
1518}
1519
1520
1521/**
1522 * The initial resource reservations.
1523 *
1524 * This will make memory reservations according to policy and priority. If there aren't
1525 * sufficient resources available to sustain the VM this function will fail and all
1526 * future allocations requests will fail as well.
1527 *
1528 * These are just the initial reservations made very very early during the VM creation
1529 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1530 * ring-3 init has completed.
1531 *
1532 * @returns VBox status code.
1533 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1534 * @retval VERR_GMM_
1535 *
1536 * @param pGVM The global (ring-0) VM structure.
1537 * @param idCpu The VCPU id - must be zero.
1538 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1539 * This does not include MMIO2 and similar.
1540 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1541 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1542 * hyper heap, MMIO2 and similar.
1543 * @param enmPolicy The OC policy to use on this VM.
1544 * @param enmPriority The priority in an out-of-memory situation.
1545 *
1546 * @thread The creator thread / EMT(0).
1547 */
1548GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1549 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1550{
1551 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1552 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1553
1554 /*
1555 * Validate, get basics and take the semaphore.
1556 */
1557 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1558 PGMM pGMM;
1559 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1560 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1561 if (RT_FAILURE(rc))
1562 return rc;
1563
1564 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1565 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1566 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1567 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1568 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1569
1570 gmmR0MutexAcquire(pGMM);
1571 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1572 {
1573 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1574 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1575 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1576 {
1577 /*
1578 * Check if we can accommodate this.
1579 */
1580 /* ... later ... */
1581 if (RT_SUCCESS(rc))
1582 {
1583 /*
1584 * Update the records.
1585 */
1586 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1587 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1588 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1589 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1590 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1591 pGVM->gmm.s.Stats.fMayAllocate = true;
1592
1593 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1594 pGMM->cRegisteredVMs++;
1595 }
1596 }
1597 else
1598 rc = VERR_WRONG_ORDER;
1599 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1600 }
1601 else
1602 rc = VERR_GMM_IS_NOT_SANE;
1603 gmmR0MutexRelease(pGMM);
1604 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1605 return rc;
1606}
1607
1608
1609/**
1610 * VMMR0 request wrapper for GMMR0InitialReservation.
1611 *
1612 * @returns see GMMR0InitialReservation.
1613 * @param pGVM The global (ring-0) VM structure.
1614 * @param idCpu The VCPU id.
1615 * @param pReq Pointer to the request packet.
1616 */
1617GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1618{
1619 /*
1620 * Validate input and pass it on.
1621 */
1622 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1623 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1624 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1625
1626 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1627 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1628}
1629
1630
1631/**
1632 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1633 *
1634 * @returns VBox status code.
1635 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1636 *
1637 * @param pGVM The global (ring-0) VM structure.
1638 * @param idCpu The VCPU id.
1639 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1640 * This does not include MMIO2 and similar.
1641 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1642 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1643 * hyper heap, MMIO2 and similar.
1644 *
1645 * @thread EMT(idCpu)
1646 */
1647GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1648 uint32_t cShadowPages, uint32_t cFixedPages)
1649{
1650 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1651 pGVM, cBasePages, cShadowPages, cFixedPages));
1652
1653 /*
1654 * Validate, get basics and take the semaphore.
1655 */
1656 PGMM pGMM;
1657 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1658 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1659 if (RT_FAILURE(rc))
1660 return rc;
1661
1662 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1663 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1664 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1665
1666 gmmR0MutexAcquire(pGMM);
1667 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1668 {
1669 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1670 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1671 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1672 {
1673 /*
1674 * Check if we can accommodate this.
1675 */
1676 /* ... later ... */
1677 if (RT_SUCCESS(rc))
1678 {
1679 /*
1680 * Update the records.
1681 */
1682 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1683 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1684 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1685 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1686
1687 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1688 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1689 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1690 }
1691 }
1692 else
1693 rc = VERR_WRONG_ORDER;
1694 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1695 }
1696 else
1697 rc = VERR_GMM_IS_NOT_SANE;
1698 gmmR0MutexRelease(pGMM);
1699 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1700 return rc;
1701}
1702
1703
1704/**
1705 * VMMR0 request wrapper for GMMR0UpdateReservation.
1706 *
1707 * @returns see GMMR0UpdateReservation.
1708 * @param pGVM The global (ring-0) VM structure.
1709 * @param idCpu The VCPU id.
1710 * @param pReq Pointer to the request packet.
1711 */
1712GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1713{
1714 /*
1715 * Validate input and pass it on.
1716 */
1717 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1718 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1719
1720 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1721}
1722
1723#ifdef GMMR0_WITH_SANITY_CHECK
1724
1725/**
1726 * Performs sanity checks on a free set.
1727 *
1728 * @returns Error count.
1729 *
1730 * @param pGMM Pointer to the GMM instance.
1731 * @param pSet Pointer to the set.
1732 * @param pszSetName The set name.
1733 * @param pszFunction The function from which it was called.
1734 * @param uLine The line number.
1735 */
1736static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1737 const char *pszFunction, unsigned uLineNo)
1738{
1739 uint32_t cErrors = 0;
1740
1741 /*
1742 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1743 */
1744 uint32_t cPages = 0;
1745 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1746 {
1747 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1748 {
1749 /** @todo check that the chunk is hash into the right set. */
1750 cPages += pCur->cFree;
1751 }
1752 }
1753 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1754 {
1755 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1756 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1757 cErrors++;
1758 }
1759
1760 return cErrors;
1761}
1762
1763
1764/**
1765 * Performs some sanity checks on the GMM while owning lock.
1766 *
1767 * @returns Error count.
1768 *
1769 * @param pGMM Pointer to the GMM instance.
1770 * @param pszFunction The function from which it is called.
1771 * @param uLineNo The line number.
1772 */
1773static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1774{
1775 uint32_t cErrors = 0;
1776
1777 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1778 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1779 /** @todo add more sanity checks. */
1780
1781 return cErrors;
1782}
1783
1784#endif /* GMMR0_WITH_SANITY_CHECK */
1785
1786/**
1787 * Looks up a chunk in the tree and fill in the TLB entry for it.
1788 *
1789 * This is not expected to fail and will bitch if it does.
1790 *
1791 * @returns Pointer to the allocation chunk, NULL if not found.
1792 * @param pGMM Pointer to the GMM instance.
1793 * @param idChunk The ID of the chunk to find.
1794 * @param pTlbe Pointer to the TLB entry.
1795 *
1796 * @note Caller owns spinlock.
1797 */
1798static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1799{
1800 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1801 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1802 pTlbe->idChunk = idChunk;
1803 pTlbe->pChunk = pChunk;
1804 return pChunk;
1805}
1806
1807
1808/**
1809 * Finds a allocation chunk, spin-locked.
1810 *
1811 * This is not expected to fail and will bitch if it does.
1812 *
1813 * @returns Pointer to the allocation chunk, NULL if not found.
1814 * @param pGMM Pointer to the GMM instance.
1815 * @param idChunk The ID of the chunk to find.
1816 */
1817DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1818{
1819 /*
1820 * Do a TLB lookup, branch if not in the TLB.
1821 */
1822 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1823 PGMMCHUNK pChunk = pTlbe->pChunk;
1824 if ( pChunk == NULL
1825 || pTlbe->idChunk != idChunk)
1826 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1827 return pChunk;
1828}
1829
1830
1831/**
1832 * Finds a allocation chunk.
1833 *
1834 * This is not expected to fail and will bitch if it does.
1835 *
1836 * @returns Pointer to the allocation chunk, NULL if not found.
1837 * @param pGMM Pointer to the GMM instance.
1838 * @param idChunk The ID of the chunk to find.
1839 */
1840DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1841{
1842 RTSpinlockAcquire(pGMM->hSpinLockTree);
1843 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1844 RTSpinlockRelease(pGMM->hSpinLockTree);
1845 return pChunk;
1846}
1847
1848
1849/**
1850 * Finds a page.
1851 *
1852 * This is not expected to fail and will bitch if it does.
1853 *
1854 * @returns Pointer to the page, NULL if not found.
1855 * @param pGMM Pointer to the GMM instance.
1856 * @param idPage The ID of the page to find.
1857 */
1858DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1859{
1860 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1861 if (RT_LIKELY(pChunk))
1862 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1863 return NULL;
1864}
1865
1866
1867#if 0 /* unused */
1868/**
1869 * Gets the host physical address for a page given by it's ID.
1870 *
1871 * @returns The host physical address or NIL_RTHCPHYS.
1872 * @param pGMM Pointer to the GMM instance.
1873 * @param idPage The ID of the page to find.
1874 */
1875DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1876{
1877 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1878 if (RT_LIKELY(pChunk))
1879 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1880 return NIL_RTHCPHYS;
1881}
1882#endif /* unused */
1883
1884
1885/**
1886 * Selects the appropriate free list given the number of free pages.
1887 *
1888 * @returns Free list index.
1889 * @param cFree The number of free pages in the chunk.
1890 */
1891DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1892{
1893 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1894 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1895 ("%d (%u)\n", iList, cFree));
1896 return iList;
1897}
1898
1899
1900/**
1901 * Unlinks the chunk from the free list it's currently on (if any).
1902 *
1903 * @param pChunk The allocation chunk.
1904 */
1905DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1906{
1907 PGMMCHUNKFREESET pSet = pChunk->pSet;
1908 if (RT_LIKELY(pSet))
1909 {
1910 pSet->cFreePages -= pChunk->cFree;
1911 pSet->idGeneration++;
1912
1913 PGMMCHUNK pPrev = pChunk->pFreePrev;
1914 PGMMCHUNK pNext = pChunk->pFreeNext;
1915 if (pPrev)
1916 pPrev->pFreeNext = pNext;
1917 else
1918 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1919 if (pNext)
1920 pNext->pFreePrev = pPrev;
1921
1922 pChunk->pSet = NULL;
1923 pChunk->pFreeNext = NULL;
1924 pChunk->pFreePrev = NULL;
1925 }
1926 else
1927 {
1928 Assert(!pChunk->pFreeNext);
1929 Assert(!pChunk->pFreePrev);
1930 Assert(!pChunk->cFree);
1931 }
1932}
1933
1934
1935/**
1936 * Links the chunk onto the appropriate free list in the specified free set.
1937 *
1938 * If no free entries, it's not linked into any list.
1939 *
1940 * @param pChunk The allocation chunk.
1941 * @param pSet The free set.
1942 */
1943DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1944{
1945 Assert(!pChunk->pSet);
1946 Assert(!pChunk->pFreeNext);
1947 Assert(!pChunk->pFreePrev);
1948
1949 if (pChunk->cFree > 0)
1950 {
1951 pChunk->pSet = pSet;
1952 pChunk->pFreePrev = NULL;
1953 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1954 pChunk->pFreeNext = pSet->apLists[iList];
1955 if (pChunk->pFreeNext)
1956 pChunk->pFreeNext->pFreePrev = pChunk;
1957 pSet->apLists[iList] = pChunk;
1958
1959 pSet->cFreePages += pChunk->cFree;
1960 pSet->idGeneration++;
1961 }
1962}
1963
1964
1965/**
1966 * Links the chunk onto the appropriate free list in the specified free set.
1967 *
1968 * If no free entries, it's not linked into any list.
1969 *
1970 * @param pGMM Pointer to the GMM instance.
1971 * @param pGVM Pointer to the kernel-only VM instace data.
1972 * @param pChunk The allocation chunk.
1973 */
1974DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1975{
1976 PGMMCHUNKFREESET pSet;
1977 if (pGMM->fBoundMemoryMode)
1978 pSet = &pGVM->gmm.s.Private;
1979 else if (pChunk->cShared)
1980 pSet = &pGMM->Shared;
1981 else
1982 pSet = &pGMM->PrivateX;
1983 gmmR0LinkChunk(pChunk, pSet);
1984}
1985
1986
1987/**
1988 * Frees a Chunk ID.
1989 *
1990 * @param pGMM Pointer to the GMM instance.
1991 * @param idChunk The Chunk ID to free.
1992 */
1993static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1994{
1995 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1996 RTSpinlockAcquire(pGMM->hSpinLockChunkId); /* We could probably skip the locking here, I think. */
1997
1998 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1999 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2000
2001 RTSpinlockRelease(pGMM->hSpinLockChunkId);
2002}
2003
2004
2005/**
2006 * Allocates a new Chunk ID.
2007 *
2008 * @returns The Chunk ID.
2009 * @param pGMM Pointer to the GMM instance.
2010 */
2011static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2012{
2013 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2014 AssertCompile(NIL_GMM_CHUNKID == 0);
2015
2016 RTSpinlockAcquire(pGMM->hSpinLockChunkId);
2017
2018 /*
2019 * Try the next sequential one.
2020 */
2021 int32_t idChunk = ++pGMM->idChunkPrev;
2022 if ( (uint32_t)idChunk <= GMM_CHUNKID_LAST
2023 && idChunk > NIL_GMM_CHUNKID)
2024 {
2025 if (!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk))
2026 {
2027 RTSpinlockRelease(pGMM->hSpinLockChunkId);
2028 return idChunk;
2029 }
2030
2031 /*
2032 * Scan sequentially from the last one.
2033 */
2034 if ((uint32_t)idChunk < GMM_CHUNKID_LAST)
2035 {
2036 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk);
2037 if ( idChunk > NIL_GMM_CHUNKID
2038 && (uint32_t)idChunk <= GMM_CHUNKID_LAST)
2039 {
2040 AssertMsgReturnStmt(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk),
2041 RTSpinlockRelease(pGMM->hSpinLockChunkId), NIL_GMM_CHUNKID);
2042
2043 pGMM->idChunkPrev = idChunk;
2044 RTSpinlockRelease(pGMM->hSpinLockChunkId);
2045 return idChunk;
2046 }
2047 }
2048 }
2049
2050 /*
2051 * Ok, scan from the start.
2052 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2053 */
2054 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2055 AssertMsgReturnStmt(idChunk > NIL_GMM_CHUNKID && (uint32_t)idChunk <= GMM_CHUNKID_LAST, ("%#x\n", idChunk),
2056 RTSpinlockRelease(pGMM->hSpinLockChunkId), NIL_GVM_HANDLE);
2057 AssertMsgReturnStmt(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk),
2058 RTSpinlockRelease(pGMM->hSpinLockChunkId), NIL_GMM_CHUNKID);
2059
2060 pGMM->idChunkPrev = idChunk;
2061 RTSpinlockRelease(pGMM->hSpinLockChunkId);
2062 return idChunk;
2063}
2064
2065
2066/**
2067 * Allocates one private page.
2068 *
2069 * Worker for gmmR0AllocatePages.
2070 *
2071 * @param pChunk The chunk to allocate it from.
2072 * @param hGVM The GVM handle of the VM requesting memory.
2073 * @param pPageDesc The page descriptor.
2074 */
2075static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2076{
2077 /* update the chunk stats. */
2078 if (pChunk->hGVM == NIL_GVM_HANDLE)
2079 pChunk->hGVM = hGVM;
2080 Assert(pChunk->cFree);
2081 pChunk->cFree--;
2082 pChunk->cPrivate++;
2083
2084 /* unlink the first free page. */
2085 const uint32_t iPage = pChunk->iFreeHead;
2086 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2087 PGMMPAGE pPage = &pChunk->aPages[iPage];
2088 Assert(GMM_PAGE_IS_FREE(pPage));
2089 pChunk->iFreeHead = pPage->Free.iNext;
2090 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2091 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2092 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2093
2094 bool const fZeroed = pPage->Free.fZeroed;
2095
2096 /* make the page private. */
2097 pPage->u = 0;
2098 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2099 pPage->Private.hGVM = hGVM;
2100 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2101 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2102 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2103 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> GUEST_PAGE_SHIFT;
2104 else
2105 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2106
2107 /* update the page descriptor. */
2108 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2109 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2110 RTHCPHYS const HCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2111 Assert(HCPhys != NIL_RTHCPHYS); Assert(HCPhys < NIL_GMMPAGEDESC_PHYS);
2112 pPageDesc->HCPhysGCPhys = HCPhys;
2113 pPageDesc->fZeroed = fZeroed;
2114}
2115
2116
2117/**
2118 * Picks the free pages from a chunk.
2119 *
2120 * @returns The new page descriptor table index.
2121 * @param pChunk The chunk.
2122 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2123 * affinity.
2124 * @param iPage The current page descriptor table index.
2125 * @param cPages The total number of pages to allocate.
2126 * @param paPages The page descriptor table (input + ouput).
2127 */
2128static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2129 PGMMPAGEDESC paPages)
2130{
2131 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2132 gmmR0UnlinkChunk(pChunk);
2133
2134 for (; pChunk->cFree && iPage < cPages; iPage++)
2135 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2136
2137 gmmR0LinkChunk(pChunk, pSet);
2138 return iPage;
2139}
2140
2141
2142/**
2143 * Registers a new chunk of memory.
2144 *
2145 * This is called by gmmR0AllocateOneChunk and GMMR0AllocateLargePage.
2146 *
2147 * In the GMMR0AllocateLargePage case the GMM_CHUNK_FLAGS_LARGE_PAGE flag is
2148 * set and the chunk will be registered as fully allocated to save time.
2149 *
2150 * @returns VBox status code. On success, the giant GMM lock will be held, the
2151 * caller must release it (ugly).
2152 * @param pGMM Pointer to the GMM instance.
2153 * @param pSet Pointer to the set.
2154 * @param hMemObj The memory object for the chunk.
2155 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2156 * affinity.
2157 * @param pSession Same as @a hGVM.
2158 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2159 * @param cPages The number of pages requested. Zero for large pages.
2160 * @param paPages The page descriptor table (input + output). NULL for
2161 * large pages.
2162 * @param piPage The pointer to the page descriptor table index variable.
2163 * This will be updated. NULL for large pages.
2164 * @param ppChunk Chunk address (out).
2165 *
2166 * @remarks The caller must not own the giant GMM mutex.
2167 * The giant GMM mutex will be acquired and returned acquired in
2168 * the success path. On failure, no locks will be held.
2169 */
2170static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, PSUPDRVSESSION pSession,
2171 uint16_t fChunkFlags, uint32_t cPages, PGMMPAGEDESC paPages, uint32_t *piPage, PGMMCHUNK *ppChunk)
2172{
2173 /*
2174 * Validate input & state.
2175 */
2176 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2177 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2178 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2179 if (!(fChunkFlags &= GMM_CHUNK_FLAGS_LARGE_PAGE))
2180 {
2181 AssertPtr(paPages);
2182 AssertPtr(piPage);
2183 Assert(cPages > 0);
2184 Assert(cPages > *piPage);
2185 }
2186 else
2187 {
2188 Assert(cPages == 0);
2189 Assert(!paPages);
2190 Assert(!piPage);
2191 }
2192
2193#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2194 /*
2195 * Get a ring-0 mapping of the object.
2196 */
2197 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2198 if (!pbMapping)
2199 {
2200 RTR0MEMOBJ hMapObj;
2201 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2202 if (RT_SUCCESS(rc))
2203 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2204 else
2205 return rc;
2206 AssertPtr(pbMapping);
2207 }
2208#endif
2209
2210 /*
2211 * Allocate a chunk and an ID for it.
2212 */
2213 int rc;
2214 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2215 if (pChunk)
2216 {
2217 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2218 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2219 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2220 {
2221 /*
2222 * Initialize it.
2223 */
2224 pChunk->hMemObj = hMemObj;
2225#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2226 pChunk->pbMapping = pbMapping;
2227#endif
2228 pChunk->hGVM = hGVM;
2229 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2230 pChunk->iChunkMtx = UINT8_MAX;
2231 pChunk->fFlags = fChunkFlags;
2232 pChunk->uidOwner = pSession ? SUPR0GetSessionUid(pSession) : NIL_RTUID;
2233 /*pChunk->cShared = 0; */
2234
2235 uint32_t const iDstPageFirst = piPage ? *piPage : cPages;
2236 if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2237 {
2238 /*
2239 * Allocate the requested number of pages from the start of the chunk,
2240 * queue the rest (if any) on the free list.
2241 */
2242 uint32_t const cPagesAlloc = RT_MIN(cPages - iDstPageFirst, GMM_CHUNK_NUM_PAGES);
2243 pChunk->cPrivate = cPagesAlloc;
2244 pChunk->cFree = GMM_CHUNK_NUM_PAGES - cPagesAlloc;
2245 pChunk->iFreeHead = GMM_CHUNK_NUM_PAGES > cPagesAlloc ? cPagesAlloc : UINT16_MAX;
2246
2247 /* Alloc pages: */
2248 uint32_t const idPageChunk = pChunk->Core.Key << GMM_CHUNKID_SHIFT;
2249 uint32_t iDstPage = iDstPageFirst;
2250 uint32_t iPage;
2251 for (iPage = 0; iPage < cPagesAlloc; iPage++, iDstPage++)
2252 {
2253 if (paPages[iDstPage].HCPhysGCPhys <= GMM_GCPHYS_LAST)
2254 pChunk->aPages[iPage].Private.pfn = paPages[iDstPage].HCPhysGCPhys >> GUEST_PAGE_SHIFT;
2255 else
2256 pChunk->aPages[iPage].Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2257 pChunk->aPages[iPage].Private.hGVM = hGVM;
2258 pChunk->aPages[iPage].Private.u2State = GMM_PAGE_STATE_PRIVATE;
2259
2260 paPages[iDstPage].HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(hMemObj, iPage);
2261 paPages[iDstPage].fZeroed = true;
2262 paPages[iDstPage].idPage = idPageChunk | iPage;
2263 paPages[iDstPage].idSharedPage = NIL_GMM_PAGEID;
2264 }
2265 *piPage = iDstPage;
2266
2267 /* Build free list: */
2268 if (iPage < RT_ELEMENTS(pChunk->aPages))
2269 {
2270 Assert(pChunk->iFreeHead == iPage);
2271 for (; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2272 {
2273 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2274 pChunk->aPages[iPage].Free.fZeroed = true;
2275 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2276 }
2277 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2278 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.fZeroed = true;
2279 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2280 }
2281 else
2282 Assert(pChunk->iFreeHead == UINT16_MAX);
2283 }
2284 else
2285 {
2286 /*
2287 * Large page: Mark all pages as privately allocated (watered down gmmR0AllocatePage).
2288 */
2289 pChunk->cFree = 0;
2290 pChunk->cPrivate = GMM_CHUNK_NUM_PAGES;
2291 pChunk->iFreeHead = UINT16_MAX;
2292
2293 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages); iPage++)
2294 {
2295 pChunk->aPages[iPage].Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2296 pChunk->aPages[iPage].Private.hGVM = hGVM;
2297 pChunk->aPages[iPage].Private.u2State = GMM_PAGE_STATE_PRIVATE;
2298 }
2299 }
2300
2301 /*
2302 * Zero the memory if it wasn't zeroed by the host already.
2303 * This simplifies keeping secret kernel bits from userland and brings
2304 * everyone to the same level wrt allocation zeroing.
2305 */
2306 rc = VINF_SUCCESS;
2307 if (!RTR0MemObjWasZeroInitialized(hMemObj))
2308 {
2309#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2310 if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2311 {
2312 for (uint32_t iPage = 0; iPage < GMM_CHUNK_SIZE / HOST_PAGE_SIZE; iPage++)
2313 {
2314 void *pvPage = NULL;
2315 rc = SUPR0HCPhysToVirt(RTR0MemObjGetPagePhysAddr(hMemObj, iPage), &pvPage);
2316 AssertRCBreak(rc);
2317 RT_BZERO(pvPage, HOST_PAGE_SIZE);
2318 }
2319 }
2320 else
2321 {
2322 /* Can do the whole large page in one go. */
2323 void *pvPage = NULL;
2324 rc = SUPR0HCPhysToVirt(RTR0MemObjGetPagePhysAddr(hMemObj, 0), &pvPage);
2325 AssertRC(rc);
2326 if (RT_SUCCESS(rc))
2327 RT_BZERO(pvPage, GMM_CHUNK_SIZE);
2328 }
2329#else
2330 RT_BZERO(pbMapping, GMM_CHUNK_SIZE);
2331#endif
2332 }
2333 if (RT_SUCCESS(rc))
2334 {
2335 *ppChunk = pChunk;
2336
2337 /*
2338 * Allocate a Chunk ID and insert it into the tree.
2339 * This has to be done behind the mutex of course.
2340 */
2341 rc = gmmR0MutexAcquire(pGMM);
2342 if (RT_SUCCESS(rc))
2343 {
2344 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2345 {
2346 RTSpinlockAcquire(pGMM->hSpinLockTree);
2347 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2348 {
2349 pGMM->cChunks++;
2350 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2351 RTSpinlockRelease(pGMM->hSpinLockTree);
2352
2353 gmmR0LinkChunk(pChunk, pSet);
2354
2355 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2356 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2357 return VINF_SUCCESS;
2358 }
2359
2360 /*
2361 * Bail out.
2362 */
2363 RTSpinlockRelease(pGMM->hSpinLockTree);
2364 rc = VERR_GMM_CHUNK_INSERT;
2365 }
2366 else
2367 rc = VERR_GMM_IS_NOT_SANE;
2368 gmmR0MutexRelease(pGMM);
2369 }
2370 *ppChunk = NULL;
2371 }
2372
2373 /* Undo any page allocations. */
2374 if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2375 {
2376 uint32_t const cToFree = pChunk->cPrivate;
2377 Assert(*piPage - iDstPageFirst == cToFree);
2378 for (uint32_t iDstPage = iDstPageFirst, iPage = 0; iPage < cToFree; iPage++, iDstPage++)
2379 {
2380 paPages[iDstPageFirst].fZeroed = false;
2381 if (pChunk->aPages[iPage].Private.pfn == GMM_PAGE_PFN_UNSHAREABLE)
2382 paPages[iDstPageFirst].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2383 else
2384 paPages[iDstPageFirst].HCPhysGCPhys = (RTHCPHYS)pChunk->aPages[iPage].Private.pfn << GUEST_PAGE_SHIFT;
2385 paPages[iDstPageFirst].idPage = NIL_GMM_PAGEID;
2386 paPages[iDstPageFirst].idSharedPage = NIL_GMM_PAGEID;
2387 }
2388 *piPage = iDstPageFirst;
2389 }
2390
2391 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
2392 }
2393 else
2394 rc = VERR_GMM_CHUNK_INSERT;
2395 RTMemFree(pChunk);
2396 }
2397 else
2398 rc = VERR_NO_MEMORY;
2399 return rc;
2400}
2401
2402
2403/**
2404 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2405 * what's remaining to the specified free set.
2406 *
2407 * @note This will leave the giant mutex while allocating the new chunk!
2408 *
2409 * @returns VBox status code.
2410 * @param pGMM Pointer to the GMM instance data.
2411 * @param pGVM Pointer to the kernel-only VM instace data.
2412 * @param pSet Pointer to the free set.
2413 * @param cPages The number of pages requested.
2414 * @param paPages The page descriptor table (input + output).
2415 * @param piPage The pointer to the page descriptor table index variable.
2416 * This will be updated.
2417 */
2418static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2419 PGMMPAGEDESC paPages, uint32_t *piPage)
2420{
2421 gmmR0MutexRelease(pGMM);
2422
2423 RTR0MEMOBJ hMemObj;
2424 int rc;
2425#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2426 if (pGMM->fHasWorkingAllocPhysNC)
2427 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2428 else
2429#endif
2430 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2431 if (RT_SUCCESS(rc))
2432 {
2433 PGMMCHUNK pIgnored;
2434 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, 0 /*fChunkFlags*/,
2435 cPages, paPages, piPage, &pIgnored);
2436 if (RT_SUCCESS(rc))
2437 return VINF_SUCCESS;
2438
2439 /* bail out */
2440 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2441 }
2442
2443 int rc2 = gmmR0MutexAcquire(pGMM);
2444 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2445 return rc;
2446
2447}
2448
2449
2450/**
2451 * As a last restort we'll pick any page we can get.
2452 *
2453 * @returns The new page descriptor table index.
2454 * @param pSet The set to pick from.
2455 * @param pGVM Pointer to the global VM structure.
2456 * @param uidSelf The UID of the caller.
2457 * @param iPage The current page descriptor table index.
2458 * @param cPages The total number of pages to allocate.
2459 * @param paPages The page descriptor table (input + ouput).
2460 */
2461static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2462 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2463{
2464 unsigned iList = RT_ELEMENTS(pSet->apLists);
2465 while (iList-- > 0)
2466 {
2467 PGMMCHUNK pChunk = pSet->apLists[iList];
2468 while (pChunk)
2469 {
2470 PGMMCHUNK pNext = pChunk->pFreeNext;
2471 if ( pChunk->uidOwner == uidSelf
2472 || ( pChunk->cMappingsX == 0
2473 && pChunk->cFree == (GMM_CHUNK_SIZE >> GUEST_PAGE_SHIFT)))
2474 {
2475 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2476 if (iPage >= cPages)
2477 return iPage;
2478 }
2479
2480 pChunk = pNext;
2481 }
2482 }
2483 return iPage;
2484}
2485
2486
2487/**
2488 * Pick pages from empty chunks on the same NUMA node.
2489 *
2490 * @returns The new page descriptor table index.
2491 * @param pSet The set to pick from.
2492 * @param pGVM Pointer to the global VM structure.
2493 * @param uidSelf The UID of the caller.
2494 * @param iPage The current page descriptor table index.
2495 * @param cPages The total number of pages to allocate.
2496 * @param paPages The page descriptor table (input + ouput).
2497 */
2498static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2499 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2500{
2501 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2502 if (pChunk)
2503 {
2504 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2505 while (pChunk)
2506 {
2507 PGMMCHUNK pNext = pChunk->pFreeNext;
2508
2509 if ( pChunk->idNumaNode == idNumaNode
2510 && ( pChunk->uidOwner == uidSelf
2511 || pChunk->cMappingsX == 0))
2512 {
2513 pChunk->hGVM = pGVM->hSelf;
2514 pChunk->uidOwner = uidSelf;
2515 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2516 if (iPage >= cPages)
2517 {
2518 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2519 return iPage;
2520 }
2521 }
2522
2523 pChunk = pNext;
2524 }
2525 }
2526 return iPage;
2527}
2528
2529
2530/**
2531 * Pick pages from non-empty chunks on the same NUMA node.
2532 *
2533 * @returns The new page descriptor table index.
2534 * @param pSet The set to pick from.
2535 * @param pGVM Pointer to the global VM structure.
2536 * @param uidSelf The UID of the caller.
2537 * @param iPage The current page descriptor table index.
2538 * @param cPages The total number of pages to allocate.
2539 * @param paPages The page descriptor table (input + ouput).
2540 */
2541static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID const uidSelf,
2542 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2543{
2544 /** @todo start by picking from chunks with about the right size first? */
2545 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2546 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2547 while (iList-- > 0)
2548 {
2549 PGMMCHUNK pChunk = pSet->apLists[iList];
2550 while (pChunk)
2551 {
2552 PGMMCHUNK pNext = pChunk->pFreeNext;
2553
2554 if ( pChunk->idNumaNode == idNumaNode
2555 && pChunk->uidOwner == uidSelf)
2556 {
2557 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2558 if (iPage >= cPages)
2559 {
2560 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2561 return iPage;
2562 }
2563 }
2564
2565 pChunk = pNext;
2566 }
2567 }
2568 return iPage;
2569}
2570
2571
2572/**
2573 * Pick pages that are in chunks already associated with the VM.
2574 *
2575 * @returns The new page descriptor table index.
2576 * @param pGMM Pointer to the GMM instance data.
2577 * @param pGVM Pointer to the global VM structure.
2578 * @param pSet The set to pick from.
2579 * @param iPage The current page descriptor table index.
2580 * @param cPages The total number of pages to allocate.
2581 * @param paPages The page descriptor table (input + ouput).
2582 */
2583static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2584 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2585{
2586 uint16_t const hGVM = pGVM->hSelf;
2587
2588 /* Hint. */
2589 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2590 {
2591 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2592 if (pChunk && pChunk->cFree)
2593 {
2594 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2595 if (iPage >= cPages)
2596 return iPage;
2597 }
2598 }
2599
2600 /* Scan. */
2601 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2602 {
2603 PGMMCHUNK pChunk = pSet->apLists[iList];
2604 while (pChunk)
2605 {
2606 PGMMCHUNK pNext = pChunk->pFreeNext;
2607
2608 if (pChunk->hGVM == hGVM)
2609 {
2610 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2611 if (iPage >= cPages)
2612 {
2613 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2614 return iPage;
2615 }
2616 }
2617
2618 pChunk = pNext;
2619 }
2620 }
2621 return iPage;
2622}
2623
2624
2625
2626/**
2627 * Pick pages in bound memory mode.
2628 *
2629 * @returns The new page descriptor table index.
2630 * @param pGVM Pointer to the global VM structure.
2631 * @param iPage The current page descriptor table index.
2632 * @param cPages The total number of pages to allocate.
2633 * @param paPages The page descriptor table (input + ouput).
2634 */
2635static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2636{
2637 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2638 {
2639 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2640 while (pChunk)
2641 {
2642 Assert(pChunk->hGVM == pGVM->hSelf);
2643 PGMMCHUNK pNext = pChunk->pFreeNext;
2644 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2645 if (iPage >= cPages)
2646 return iPage;
2647 pChunk = pNext;
2648 }
2649 }
2650 return iPage;
2651}
2652
2653
2654/**
2655 * Checks if we should start picking pages from chunks of other VMs because
2656 * we're getting close to the system memory or reserved limit.
2657 *
2658 * @returns @c true if we should, @c false if we should first try allocate more
2659 * chunks.
2660 */
2661static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2662{
2663 /*
2664 * Don't allocate a new chunk if we're
2665 */
2666 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2667 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2668 - pGVM->gmm.s.Stats.cBalloonedPages
2669 /** @todo what about shared pages? */;
2670 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2671 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2672 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2673 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2674 return true;
2675 /** @todo make the threshold configurable, also test the code to see if
2676 * this ever kicks in (we might be reserving too much or smth). */
2677
2678 /*
2679 * Check how close we're to the max memory limit and how many fragments
2680 * there are?...
2681 */
2682 /** @todo */
2683
2684 return false;
2685}
2686
2687
2688/**
2689 * Checks if we should start picking pages from chunks of other VMs because
2690 * there is a lot of free pages around.
2691 *
2692 * @returns @c true if we should, @c false if we should first try allocate more
2693 * chunks.
2694 */
2695static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2696{
2697 /*
2698 * Setting the limit at 16 chunks (32 MB) at the moment.
2699 */
2700 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2701 return true;
2702 return false;
2703}
2704
2705
2706/**
2707 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2708 *
2709 * @returns VBox status code:
2710 * @retval VINF_SUCCESS on success.
2711 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2712 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2713 * that is we're trying to allocate more than we've reserved.
2714 *
2715 * @param pGMM Pointer to the GMM instance data.
2716 * @param pGVM Pointer to the VM.
2717 * @param cPages The number of pages to allocate.
2718 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2719 * details on what is expected on input.
2720 * @param enmAccount The account to charge.
2721 *
2722 * @remarks Caller owns the giant GMM lock.
2723 */
2724static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2725{
2726 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2727
2728 /*
2729 * Check allocation limits.
2730 */
2731 if (RT_LIKELY(pGMM->cAllocatedPages + cPages <= pGMM->cMaxPages))
2732 { /* likely */ }
2733 else
2734 return VERR_GMM_HIT_GLOBAL_LIMIT;
2735
2736 switch (enmAccount)
2737 {
2738 case GMMACCOUNT_BASE:
2739 if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2740 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
2741 { /* likely */ }
2742 else
2743 {
2744 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2745 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2746 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2747 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2748 }
2749 break;
2750 case GMMACCOUNT_SHADOW:
2751 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages <= pGVM->gmm.s.Stats.Reserved.cShadowPages))
2752 { /* likely */ }
2753 else
2754 {
2755 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2756 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2757 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2758 }
2759 break;
2760 case GMMACCOUNT_FIXED:
2761 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages <= pGVM->gmm.s.Stats.Reserved.cFixedPages))
2762 { /* likely */ }
2763 else
2764 {
2765 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2766 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2767 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2768 }
2769 break;
2770 default:
2771 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2772 }
2773
2774 /*
2775 * Update the accounts before we proceed because we might be leaving the
2776 * protection of the global mutex and thus run the risk of permitting
2777 * too much memory to be allocated.
2778 */
2779 switch (enmAccount)
2780 {
2781 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2782 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2783 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2784 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2785 }
2786 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2787 pGMM->cAllocatedPages += cPages;
2788
2789 /*
2790 * Bound mode is also relatively straightforward.
2791 */
2792 uint32_t iPage = 0;
2793 int rc = VINF_SUCCESS;
2794 if (pGMM->fBoundMemoryMode)
2795 {
2796 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2797 if (iPage < cPages)
2798 do
2799 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2800 while (iPage < cPages && RT_SUCCESS(rc));
2801 }
2802 /*
2803 * Shared mode is trickier as we should try archive the same locality as
2804 * in bound mode, but smartly make use of non-full chunks allocated by
2805 * other VMs if we're low on memory.
2806 */
2807 else
2808 {
2809 RTUID const uidSelf = SUPR0GetSessionUid(pGVM->pSession);
2810
2811 /* Pick the most optimal pages first. */
2812 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2813 if (iPage < cPages)
2814 {
2815 /* Maybe we should try getting pages from chunks "belonging" to
2816 other VMs before allocating more chunks? */
2817 bool fTriedOnSameAlready = false;
2818 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2819 {
2820 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2821 fTriedOnSameAlready = true;
2822 }
2823
2824 /* Allocate memory from empty chunks. */
2825 if (iPage < cPages)
2826 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2827
2828 /* Grab empty shared chunks. */
2829 if (iPage < cPages)
2830 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, uidSelf, iPage, cPages, paPages);
2831
2832 /* If there is a lof of free pages spread around, try not waste
2833 system memory on more chunks. (Should trigger defragmentation.) */
2834 if ( !fTriedOnSameAlready
2835 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2836 {
2837 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2838 if (iPage < cPages)
2839 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2840 }
2841
2842 /*
2843 * Ok, try allocate new chunks.
2844 */
2845 if (iPage < cPages)
2846 {
2847 do
2848 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2849 while (iPage < cPages && RT_SUCCESS(rc));
2850
2851#if 0 /* We cannot mix chunks with different UIDs. */
2852 /* If the host is out of memory, take whatever we can get. */
2853 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2854 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2855 {
2856 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2857 if (iPage < cPages)
2858 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2859 AssertRelease(iPage == cPages);
2860 rc = VINF_SUCCESS;
2861 }
2862#endif
2863 }
2864 }
2865 }
2866
2867 /*
2868 * Clean up on failure. Since this is bound to be a low-memory condition
2869 * we will give back any empty chunks that might be hanging around.
2870 */
2871 if (RT_SUCCESS(rc))
2872 { /* likely */ }
2873 else
2874 {
2875 /* Update the statistics. */
2876 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2877 pGMM->cAllocatedPages -= cPages - iPage;
2878 switch (enmAccount)
2879 {
2880 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2881 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2882 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2883 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2884 }
2885
2886 /* Release the pages. */
2887 while (iPage-- > 0)
2888 {
2889 uint32_t idPage = paPages[iPage].idPage;
2890 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2891 if (RT_LIKELY(pPage))
2892 {
2893 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2894 Assert(pPage->Private.hGVM == pGVM->hSelf);
2895 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2896 }
2897 else
2898 AssertMsgFailed(("idPage=%#x\n", idPage));
2899
2900 paPages[iPage].idPage = NIL_GMM_PAGEID;
2901 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2902 paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2903 paPages[iPage].fZeroed = false;
2904 }
2905
2906 /* Free empty chunks. */
2907 /** @todo */
2908
2909 /* return the fail status on failure */
2910 return rc;
2911 }
2912 return VINF_SUCCESS;
2913}
2914
2915
2916/**
2917 * Updates the previous allocations and allocates more pages.
2918 *
2919 * The handy pages are always taken from the 'base' memory account.
2920 * The allocated pages are not cleared and will contains random garbage.
2921 *
2922 * @returns VBox status code:
2923 * @retval VINF_SUCCESS on success.
2924 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2925 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2926 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2927 * private page.
2928 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2929 * shared page.
2930 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2931 * owned by the VM.
2932 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2933 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2934 * that is we're trying to allocate more than we've reserved.
2935 *
2936 * @param pGVM The global (ring-0) VM structure.
2937 * @param idCpu The VCPU id.
2938 * @param cPagesToUpdate The number of pages to update (starting from the head).
2939 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2940 * @param paPages The array of page descriptors.
2941 * See GMMPAGEDESC for details on what is expected on input.
2942 * @thread EMT(idCpu)
2943 */
2944GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2945 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2946{
2947 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2948 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2949
2950 /*
2951 * Validate & get basics.
2952 * (This is a relatively busy path, so make predictions where possible.)
2953 */
2954 PGMM pGMM;
2955 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2956 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2957 if (RT_FAILURE(rc))
2958 return rc;
2959
2960 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2961 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2962 || (cPagesToAlloc && cPagesToAlloc < 1024),
2963 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2964 VERR_INVALID_PARAMETER);
2965
2966 unsigned iPage = 0;
2967 for (; iPage < cPagesToUpdate; iPage++)
2968 {
2969 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2970 && !(paPages[iPage].HCPhysGCPhys & GUEST_PAGE_OFFSET_MASK))
2971 || paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
2972 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2973 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2974 VERR_INVALID_PARAMETER);
2975 /* ignore fZeroed here */
2976 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2977 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2978 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2979 AssertMsgReturn( paPages[iPage].idSharedPage == NIL_GMM_PAGEID
2980 || paPages[iPage].idSharedPage <= GMM_PAGEID_LAST,
2981 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2982 }
2983
2984 for (; iPage < cPagesToAlloc; iPage++)
2985 {
2986 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2987 AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
2988 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2989 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2990 }
2991
2992 /*
2993 * Take the semaphore
2994 */
2995 VMMR0EMTBLOCKCTX Ctx;
2996 PGVMCPU pGVCpu = &pGVM->aCpus[idCpu];
2997 rc = VMMR0EmtPrepareToBlock(pGVCpu, VINF_SUCCESS, "GMMR0AllocateHandyPages", pGMM, &Ctx);
2998 AssertRCReturn(rc, rc);
2999
3000 rc = gmmR0MutexAcquire(pGMM);
3001 if ( RT_SUCCESS(rc)
3002 && GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3003 {
3004 /* No allocations before the initial reservation has been made! */
3005 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3006 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3007 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3008 {
3009 /*
3010 * Perform the updates.
3011 * Stop on the first error.
3012 */
3013 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
3014 {
3015 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
3016 {
3017 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
3018 if (RT_LIKELY(pPage))
3019 {
3020 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3021 {
3022 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3023 {
3024 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
3025 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
3026 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> GUEST_PAGE_SHIFT;
3027 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
3028 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
3029 /* else: NIL_RTHCPHYS nothing */
3030
3031 paPages[iPage].idPage = NIL_GMM_PAGEID;
3032 paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
3033 paPages[iPage].fZeroed = false;
3034 }
3035 else
3036 {
3037 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
3038 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
3039 rc = VERR_GMM_NOT_PAGE_OWNER;
3040 break;
3041 }
3042 }
3043 else
3044 {
3045 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
3046 rc = VERR_GMM_PAGE_NOT_PRIVATE;
3047 break;
3048 }
3049 }
3050 else
3051 {
3052 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
3053 rc = VERR_GMM_PAGE_NOT_FOUND;
3054 break;
3055 }
3056 }
3057
3058 if (paPages[iPage].idSharedPage == NIL_GMM_PAGEID)
3059 { /* likely */ }
3060 else
3061 {
3062 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
3063 if (RT_LIKELY(pPage))
3064 {
3065 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3066 {
3067 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
3068 Assert(pPage->Shared.cRefs);
3069 Assert(pGVM->gmm.s.Stats.cSharedPages);
3070 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
3071
3072 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
3073 pGVM->gmm.s.Stats.cSharedPages--;
3074 pGVM->gmm.s.Stats.Allocated.cBasePages--;
3075 if (!--pPage->Shared.cRefs)
3076 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
3077 else
3078 {
3079 Assert(pGMM->cDuplicatePages);
3080 pGMM->cDuplicatePages--;
3081 }
3082
3083 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
3084 }
3085 else
3086 {
3087 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
3088 rc = VERR_GMM_PAGE_NOT_SHARED;
3089 break;
3090 }
3091 }
3092 else
3093 {
3094 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3095 rc = VERR_GMM_PAGE_NOT_FOUND;
3096 break;
3097 }
3098 }
3099 } /* for each page to update */
3100
3101 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3102 {
3103#ifdef VBOX_STRICT
3104 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3105 {
3106 Assert(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS);
3107 Assert(paPages[iPage].fZeroed == false);
3108 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3109 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3110 }
3111#endif
3112
3113 /*
3114 * Join paths with GMMR0AllocatePages for the allocation.
3115 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3116 */
3117 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3118 }
3119 }
3120 else
3121 rc = VERR_WRONG_ORDER;
3122 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3123 gmmR0MutexRelease(pGMM);
3124 }
3125 else if (RT_SUCCESS(rc))
3126 {
3127 gmmR0MutexRelease(pGMM);
3128 rc = VERR_GMM_IS_NOT_SANE;
3129 }
3130 VMMR0EmtResumeAfterBlocking(pGVCpu, &Ctx);
3131
3132 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3133 return rc;
3134}
3135
3136
3137/**
3138 * Allocate one or more pages.
3139 *
3140 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3141 * The allocated pages are not cleared and will contain random garbage.
3142 *
3143 * @returns VBox status code:
3144 * @retval VINF_SUCCESS on success.
3145 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3146 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3147 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3148 * that is we're trying to allocate more than we've reserved.
3149 *
3150 * @param pGVM The global (ring-0) VM structure.
3151 * @param idCpu The VCPU id.
3152 * @param cPages The number of pages to allocate.
3153 * @param paPages Pointer to the page descriptors.
3154 * See GMMPAGEDESC for details on what is expected on
3155 * input.
3156 * @param enmAccount The account to charge.
3157 *
3158 * @thread EMT.
3159 */
3160GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3161{
3162 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3163
3164 /*
3165 * Validate, get basics and take the semaphore.
3166 */
3167 PGMM pGMM;
3168 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3169 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3170 if (RT_FAILURE(rc))
3171 return rc;
3172
3173 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3174 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3175 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - GUEST_PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3176
3177 for (unsigned iPage = 0; iPage < cPages; iPage++)
3178 {
3179 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
3180 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3181 || ( enmAccount == GMMACCOUNT_BASE
3182 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3183 && !(paPages[iPage].HCPhysGCPhys & GUEST_PAGE_OFFSET_MASK)),
3184 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3185 VERR_INVALID_PARAMETER);
3186 AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
3187 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3188 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3189 }
3190
3191 /*
3192 * Grab the giant mutex and get working.
3193 */
3194 gmmR0MutexAcquire(pGMM);
3195 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3196 {
3197
3198 /* No allocations before the initial reservation has been made! */
3199 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3200 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3201 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3202 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3203 else
3204 rc = VERR_WRONG_ORDER;
3205 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3206 }
3207 else
3208 rc = VERR_GMM_IS_NOT_SANE;
3209 gmmR0MutexRelease(pGMM);
3210
3211 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3212 return rc;
3213}
3214
3215
3216/**
3217 * VMMR0 request wrapper for GMMR0AllocatePages.
3218 *
3219 * @returns see GMMR0AllocatePages.
3220 * @param pGVM The global (ring-0) VM structure.
3221 * @param idCpu The VCPU id.
3222 * @param pReq Pointer to the request packet.
3223 */
3224GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3225{
3226 /*
3227 * Validate input and pass it on.
3228 */
3229 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3230 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3231 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3232 VERR_INVALID_PARAMETER);
3233 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3234 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3235 VERR_INVALID_PARAMETER);
3236
3237 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3238}
3239
3240
3241/**
3242 * Allocate a large page to represent guest RAM
3243 *
3244 * The allocated pages are zeroed upon return.
3245 *
3246 * @returns VBox status code:
3247 * @retval VINF_SUCCESS on success.
3248 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3249 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3250 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3251 * that is we're trying to allocate more than we've reserved.
3252 * @retval VERR_TRY_AGAIN if the host is temporarily out of large pages.
3253 * @returns see GMMR0AllocatePages.
3254 *
3255 * @param pGVM The global (ring-0) VM structure.
3256 * @param idCpu The VCPU id.
3257 * @param cbPage Large page size.
3258 * @param pIdPage Where to return the GMM page ID of the page.
3259 * @param pHCPhys Where to return the host physical address of the page.
3260 */
3261GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3262{
3263 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3264
3265 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3266 *pIdPage = NIL_GMM_PAGEID;
3267 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3268 *pHCPhys = NIL_RTHCPHYS;
3269 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3270
3271 /*
3272 * Validate GVM + idCpu, get basics and take the semaphore.
3273 */
3274 PGMM pGMM;
3275 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3276 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3277 AssertRCReturn(rc, rc);
3278
3279 VMMR0EMTBLOCKCTX Ctx;
3280 PGVMCPU pGVCpu = &pGVM->aCpus[idCpu];
3281 rc = VMMR0EmtPrepareToBlock(pGVCpu, VINF_SUCCESS, "GMMR0AllocateLargePage", pGMM, &Ctx);
3282 AssertRCReturn(rc, rc);
3283
3284 rc = gmmR0MutexAcquire(pGMM);
3285 if (RT_SUCCESS(rc))
3286 {
3287 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3288 {
3289 /*
3290 * Check the quota.
3291 */
3292 /** @todo r=bird: Quota checking could be done w/o the giant mutex but using
3293 * a VM specific mutex... */
3294 if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + GMM_CHUNK_NUM_PAGES
3295 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3296 {
3297 /*
3298 * Allocate a new large page chunk.
3299 *
3300 * Note! We leave the giant GMM lock temporarily as the allocation might
3301 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3302 */
3303 AssertCompile(GMM_CHUNK_SIZE == _2M);
3304 gmmR0MutexRelease(pGMM);
3305
3306 RTR0MEMOBJ hMemObj;
3307 rc = RTR0MemObjAllocLarge(&hMemObj, GMM_CHUNK_SIZE, GMM_CHUNK_SIZE, RTMEMOBJ_ALLOC_LARGE_F_FAST);
3308 if (RT_SUCCESS(rc))
3309 {
3310 *pHCPhys = RTR0MemObjGetPagePhysAddr(hMemObj, 0);
3311
3312 /*
3313 * Register the chunk as fully allocated.
3314 * Note! As mentioned above, this will return owning the mutex on success.
3315 */
3316 PGMMCHUNK pChunk = NULL;
3317 PGMMCHUNKFREESET const pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3318 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, GMM_CHUNK_FLAGS_LARGE_PAGE,
3319 0 /*cPages*/, NULL /*paPages*/, NULL /*piPage*/, &pChunk);
3320 if (RT_SUCCESS(rc))
3321 {
3322 /*
3323 * The gmmR0RegisterChunk call already marked all pages allocated,
3324 * so we just have to fill in the return values and update stats now.
3325 */
3326 *pIdPage = pChunk->Core.Key << GMM_CHUNKID_SHIFT;
3327
3328 /* Update accounting. */
3329 pGVM->gmm.s.Stats.Allocated.cBasePages += GMM_CHUNK_NUM_PAGES;
3330 pGVM->gmm.s.Stats.cPrivatePages += GMM_CHUNK_NUM_PAGES;
3331 pGMM->cAllocatedPages += GMM_CHUNK_NUM_PAGES;
3332
3333 gmmR0LinkChunk(pChunk, pSet);
3334 gmmR0MutexRelease(pGMM);
3335
3336 VMMR0EmtResumeAfterBlocking(pGVCpu, &Ctx);
3337 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3338 return VINF_SUCCESS;
3339 }
3340
3341 /*
3342 * Bail out.
3343 */
3344 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3345 *pHCPhys = NIL_RTHCPHYS;
3346 }
3347 /** @todo r=bird: Turn VERR_NO_MEMORY etc into VERR_TRY_AGAIN? Docs say we
3348 * return it, but I am sure IPRT doesn't... */
3349 }
3350 else
3351 {
3352 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3353 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, GMM_CHUNK_NUM_PAGES));
3354 gmmR0MutexRelease(pGMM);
3355 rc = VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3356 }
3357 }
3358 else
3359 {
3360 gmmR0MutexRelease(pGMM);
3361 rc = VERR_GMM_IS_NOT_SANE;
3362 }
3363 }
3364
3365 VMMR0EmtResumeAfterBlocking(pGVCpu, &Ctx);
3366 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3367 return rc;
3368}
3369
3370
3371/**
3372 * Free a large page.
3373 *
3374 * @returns VBox status code:
3375 * @param pGVM The global (ring-0) VM structure.
3376 * @param idCpu The VCPU id.
3377 * @param idPage The large page id.
3378 */
3379GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3380{
3381 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3382
3383 /*
3384 * Validate, get basics and take the semaphore.
3385 */
3386 PGMM pGMM;
3387 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3388 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3389 if (RT_FAILURE(rc))
3390 return rc;
3391
3392 gmmR0MutexAcquire(pGMM);
3393 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3394 {
3395 const unsigned cPages = GMM_CHUNK_NUM_PAGES;
3396
3397 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3398 {
3399 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3400 gmmR0MutexRelease(pGMM);
3401 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3402 }
3403
3404 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3405 if (RT_LIKELY( pPage
3406 && GMM_PAGE_IS_PRIVATE(pPage)))
3407 {
3408 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3409 Assert(pChunk);
3410 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3411 Assert(pChunk->cPrivate > 0);
3412
3413 /* Release the memory immediately. */
3414 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3415
3416 /* Update accounting. */
3417 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3418 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3419 pGMM->cAllocatedPages -= cPages;
3420 }
3421 else
3422 rc = VERR_GMM_PAGE_NOT_FOUND;
3423 }
3424 else
3425 rc = VERR_GMM_IS_NOT_SANE;
3426
3427 gmmR0MutexRelease(pGMM);
3428 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3429 return rc;
3430}
3431
3432
3433/**
3434 * VMMR0 request wrapper for GMMR0FreeLargePage.
3435 *
3436 * @returns see GMMR0FreeLargePage.
3437 * @param pGVM The global (ring-0) VM structure.
3438 * @param idCpu The VCPU id.
3439 * @param pReq Pointer to the request packet.
3440 */
3441GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3442{
3443 /*
3444 * Validate input and pass it on.
3445 */
3446 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3447 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3448 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3449 VERR_INVALID_PARAMETER);
3450
3451 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3452}
3453
3454
3455/**
3456 * @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3457 * Used by gmmR0FreeChunkFlushPerVmTlbs().}
3458 */
3459static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3460{
3461 RT_NOREF(pvUser);
3462 if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3463 {
3464 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3465 uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3466 while (i-- > 0)
3467 {
3468 pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3469 pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3470 }
3471 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3472 }
3473 return VINF_SUCCESS;
3474}
3475
3476
3477/**
3478 * Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3479 * free generation ID value.
3480 *
3481 * This is done at 2^62 - 1, which allows us to drop all locks and as it will
3482 * take a while before 12 exa (2 305 843 009 213 693 952) calls to
3483 * gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3484 * invalidation passes and resets the generation ID between then. This will
3485 * make sure there are no false positives.
3486 *
3487 * @param pGMM Pointer to the GMM instance.
3488 */
3489static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3490{
3491 /*
3492 * First invalidation pass.
3493 */
3494 int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3495 AssertRCSuccess(rc);
3496
3497 /*
3498 * Reset the generation number.
3499 */
3500 RTSpinlockAcquire(pGMM->hSpinLockTree);
3501 ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3502 RTSpinlockRelease(pGMM->hSpinLockTree);
3503
3504 /*
3505 * Second invalidation pass.
3506 */
3507 rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3508 AssertRCSuccess(rc);
3509}
3510
3511
3512/**
3513 * Frees a chunk, giving it back to the host OS.
3514 *
3515 * @param pGMM Pointer to the GMM instance.
3516 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3517 * unmap and free the chunk in one go.
3518 * @param pChunk The chunk to free.
3519 * @param fRelaxedSem Whether we can release the semaphore while doing the
3520 * freeing (@c true) or not.
3521 */
3522static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3523{
3524 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3525
3526 GMMR0CHUNKMTXSTATE MtxState;
3527 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3528
3529 /*
3530 * Cleanup hack! Unmap the chunk from the callers address space.
3531 * This shouldn't happen, so screw lock contention...
3532 */
3533 if (pChunk->cMappingsX && pGVM)
3534 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3535
3536 /*
3537 * If there are current mappings of the chunk, then request the
3538 * VMs to unmap them. Reposition the chunk in the free list so
3539 * it won't be a likely candidate for allocations.
3540 */
3541 if (pChunk->cMappingsX)
3542 {
3543 /** @todo R0 -> VM request */
3544 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3545 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3546 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3547 return false;
3548 }
3549
3550
3551 /*
3552 * Save and trash the handle.
3553 */
3554 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3555 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3556
3557 /*
3558 * Unlink it from everywhere.
3559 */
3560 gmmR0UnlinkChunk(pChunk);
3561
3562 RTSpinlockAcquire(pGMM->hSpinLockTree);
3563
3564 RTListNodeRemove(&pChunk->ListNode);
3565
3566 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3567 Assert(pCore == &pChunk->Core); NOREF(pCore);
3568
3569 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3570 if (pTlbe->pChunk == pChunk)
3571 {
3572 pTlbe->idChunk = NIL_GMM_CHUNKID;
3573 pTlbe->pChunk = NULL;
3574 }
3575
3576 Assert(pGMM->cChunks > 0);
3577 pGMM->cChunks--;
3578
3579 uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3580
3581 RTSpinlockRelease(pGMM->hSpinLockTree);
3582
3583 pGMM->cFreedChunks++;
3584
3585 /* Drop the lock. */
3586 gmmR0ChunkMutexRelease(&MtxState, NULL);
3587 if (fRelaxedSem)
3588 gmmR0MutexRelease(pGMM);
3589
3590 /*
3591 * Flush per VM chunk TLBs if we're getting remotely close to a generation wraparound.
3592 */
3593 if (idFreeGeneration == UINT64_MAX / 4)
3594 gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3595
3596 /*
3597 * Free the Chunk ID and all memory associated with the chunk.
3598 */
3599 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3600 pChunk->Core.Key = NIL_GMM_CHUNKID;
3601
3602 RTMemFree(pChunk->paMappingsX);
3603 pChunk->paMappingsX = NULL;
3604
3605 RTMemFree(pChunk);
3606
3607#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3608 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3609#else
3610 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3611#endif
3612 AssertLogRelRC(rc);
3613
3614 if (fRelaxedSem)
3615 gmmR0MutexAcquire(pGMM);
3616 return fRelaxedSem;
3617}
3618
3619
3620/**
3621 * Free page worker.
3622 *
3623 * The caller does all the statistic decrementing, we do all the incrementing.
3624 *
3625 * @param pGMM Pointer to the GMM instance data.
3626 * @param pGVM Pointer to the GVM instance.
3627 * @param pChunk Pointer to the chunk this page belongs to.
3628 * @param idPage The Page ID.
3629 * @param pPage Pointer to the page.
3630 */
3631static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3632{
3633 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3634 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3635
3636 /*
3637 * Put the page on the free list.
3638 */
3639 pPage->u = 0;
3640 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3641 pPage->Free.fZeroed = false;
3642 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3643 pPage->Free.iNext = pChunk->iFreeHead;
3644 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3645
3646 /*
3647 * Update statistics (the cShared/cPrivate stats are up to date already),
3648 * and relink the chunk if necessary.
3649 */
3650 unsigned const cFree = pChunk->cFree;
3651 if ( !cFree
3652 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3653 {
3654 gmmR0UnlinkChunk(pChunk);
3655 pChunk->cFree++;
3656 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3657 }
3658 else
3659 {
3660 pChunk->cFree = cFree + 1;
3661 pChunk->pSet->cFreePages++;
3662 }
3663
3664 /*
3665 * If the chunk becomes empty, consider giving memory back to the host OS.
3666 *
3667 * The current strategy is to try give it back if there are other chunks
3668 * in this free list, meaning if there are at least 240 free pages in this
3669 * category. Note that since there are probably mappings of the chunk,
3670 * it won't be freed up instantly, which probably screws up this logic
3671 * a bit...
3672 */
3673 /** @todo Do this on the way out. */
3674 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3675 || pChunk->pFreeNext == NULL
3676 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3677 { /* likely */ }
3678 else
3679 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3680}
3681
3682
3683/**
3684 * Frees a shared page, the page is known to exist and be valid and such.
3685 *
3686 * @param pGMM Pointer to the GMM instance.
3687 * @param pGVM Pointer to the GVM instance.
3688 * @param idPage The page id.
3689 * @param pPage The page structure.
3690 */
3691DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3692{
3693 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3694 Assert(pChunk);
3695 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3696 Assert(pChunk->cShared > 0);
3697 Assert(pGMM->cSharedPages > 0);
3698 Assert(pGMM->cAllocatedPages > 0);
3699 Assert(!pPage->Shared.cRefs);
3700
3701 pChunk->cShared--;
3702 pGMM->cAllocatedPages--;
3703 pGMM->cSharedPages--;
3704 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3705}
3706
3707
3708/**
3709 * Frees a private page, the page is known to exist and be valid and such.
3710 *
3711 * @param pGMM Pointer to the GMM instance.
3712 * @param pGVM Pointer to the GVM instance.
3713 * @param idPage The page id.
3714 * @param pPage The page structure.
3715 */
3716DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3717{
3718 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3719 Assert(pChunk);
3720 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3721 Assert(pChunk->cPrivate > 0);
3722 Assert(pGMM->cAllocatedPages > 0);
3723
3724 pChunk->cPrivate--;
3725 pGMM->cAllocatedPages--;
3726 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3727}
3728
3729
3730/**
3731 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3732 *
3733 * @returns VBox status code:
3734 * @retval xxx
3735 *
3736 * @param pGMM Pointer to the GMM instance data.
3737 * @param pGVM Pointer to the VM.
3738 * @param cPages The number of pages to free.
3739 * @param paPages Pointer to the page descriptors.
3740 * @param enmAccount The account this relates to.
3741 */
3742static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3743{
3744 /*
3745 * Check that the request isn't impossible wrt to the account status.
3746 */
3747 switch (enmAccount)
3748 {
3749 case GMMACCOUNT_BASE:
3750 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3751 {
3752 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3753 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3754 }
3755 break;
3756 case GMMACCOUNT_SHADOW:
3757 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3758 {
3759 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3760 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3761 }
3762 break;
3763 case GMMACCOUNT_FIXED:
3764 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3765 {
3766 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3767 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3768 }
3769 break;
3770 default:
3771 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3772 }
3773
3774 /*
3775 * Walk the descriptors and free the pages.
3776 *
3777 * Statistics (except the account) are being updated as we go along,
3778 * unlike the alloc code. Also, stop on the first error.
3779 */
3780 int rc = VINF_SUCCESS;
3781 uint32_t iPage;
3782 for (iPage = 0; iPage < cPages; iPage++)
3783 {
3784 uint32_t idPage = paPages[iPage].idPage;
3785 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3786 if (RT_LIKELY(pPage))
3787 {
3788 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3789 {
3790 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3791 {
3792 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3793 pGVM->gmm.s.Stats.cPrivatePages--;
3794 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3795 }
3796 else
3797 {
3798 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3799 pPage->Private.hGVM, pGVM->hSelf));
3800 rc = VERR_GMM_NOT_PAGE_OWNER;
3801 break;
3802 }
3803 }
3804 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3805 {
3806 Assert(pGVM->gmm.s.Stats.cSharedPages);
3807 Assert(pPage->Shared.cRefs);
3808#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT)
3809 if (pPage->Shared.u14Checksum)
3810 {
3811 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3812 uChecksum &= UINT32_C(0x00003fff);
3813 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3814 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3815 }
3816#endif
3817 pGVM->gmm.s.Stats.cSharedPages--;
3818 if (!--pPage->Shared.cRefs)
3819 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3820 else
3821 {
3822 Assert(pGMM->cDuplicatePages);
3823 pGMM->cDuplicatePages--;
3824 }
3825 }
3826 else
3827 {
3828 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3829 rc = VERR_GMM_PAGE_ALREADY_FREE;
3830 break;
3831 }
3832 }
3833 else
3834 {
3835 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3836 rc = VERR_GMM_PAGE_NOT_FOUND;
3837 break;
3838 }
3839 paPages[iPage].idPage = NIL_GMM_PAGEID;
3840 }
3841
3842 /*
3843 * Update the account.
3844 */
3845 switch (enmAccount)
3846 {
3847 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3848 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3849 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3850 default:
3851 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3852 }
3853
3854 /*
3855 * Any threshold stuff to be done here?
3856 */
3857
3858 return rc;
3859}
3860
3861
3862/**
3863 * Free one or more pages.
3864 *
3865 * This is typically used at reset time or power off.
3866 *
3867 * @returns VBox status code:
3868 * @retval xxx
3869 *
3870 * @param pGVM The global (ring-0) VM structure.
3871 * @param idCpu The VCPU id.
3872 * @param cPages The number of pages to allocate.
3873 * @param paPages Pointer to the page descriptors containing the page IDs
3874 * for each page.
3875 * @param enmAccount The account this relates to.
3876 * @thread EMT.
3877 */
3878GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3879{
3880 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3881
3882 /*
3883 * Validate input and get the basics.
3884 */
3885 PGMM pGMM;
3886 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3887 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3888 if (RT_FAILURE(rc))
3889 return rc;
3890
3891 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3892 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3893 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - GUEST_PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3894
3895 for (unsigned iPage = 0; iPage < cPages; iPage++)
3896 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3897 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3898 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3899
3900 /*
3901 * Take the semaphore and call the worker function.
3902 */
3903 gmmR0MutexAcquire(pGMM);
3904 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3905 {
3906 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3907 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3908 }
3909 else
3910 rc = VERR_GMM_IS_NOT_SANE;
3911 gmmR0MutexRelease(pGMM);
3912 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3913 return rc;
3914}
3915
3916
3917/**
3918 * VMMR0 request wrapper for GMMR0FreePages.
3919 *
3920 * @returns see GMMR0FreePages.
3921 * @param pGVM The global (ring-0) VM structure.
3922 * @param idCpu The VCPU id.
3923 * @param pReq Pointer to the request packet.
3924 */
3925GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3926{
3927 /*
3928 * Validate input and pass it on.
3929 */
3930 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3931 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3932 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3933 VERR_INVALID_PARAMETER);
3934 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3935 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3936 VERR_INVALID_PARAMETER);
3937
3938 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3939}
3940
3941
3942/**
3943 * Report back on a memory ballooning request.
3944 *
3945 * The request may or may not have been initiated by the GMM. If it was initiated
3946 * by the GMM it is important that this function is called even if no pages were
3947 * ballooned.
3948 *
3949 * @returns VBox status code:
3950 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3951 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3952 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3953 * indicating that we won't necessarily have sufficient RAM to boot
3954 * the VM again and that it should pause until this changes (we'll try
3955 * balloon some other VM). (For standard deflate we have little choice
3956 * but to hope the VM won't use the memory that was returned to it.)
3957 *
3958 * @param pGVM The global (ring-0) VM structure.
3959 * @param idCpu The VCPU id.
3960 * @param enmAction Inflate/deflate/reset.
3961 * @param cBalloonedPages The number of pages that was ballooned.
3962 *
3963 * @thread EMT(idCpu)
3964 */
3965GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3966{
3967 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3968 pGVM, enmAction, cBalloonedPages));
3969
3970 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - GUEST_PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3971
3972 /*
3973 * Validate input and get the basics.
3974 */
3975 PGMM pGMM;
3976 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3977 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3978 if (RT_FAILURE(rc))
3979 return rc;
3980
3981 /*
3982 * Take the semaphore and do some more validations.
3983 */
3984 gmmR0MutexAcquire(pGMM);
3985 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3986 {
3987 switch (enmAction)
3988 {
3989 case GMMBALLOONACTION_INFLATE:
3990 {
3991 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3992 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3993 {
3994 /*
3995 * Record the ballooned memory.
3996 */
3997 pGMM->cBalloonedPages += cBalloonedPages;
3998 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3999 {
4000 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
4001 AssertFailed();
4002
4003 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
4004 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
4005 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
4006 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
4007 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
4008 }
4009 else
4010 {
4011 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
4012 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
4013 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
4014 }
4015 }
4016 else
4017 {
4018 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
4019 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
4020 pGVM->gmm.s.Stats.Reserved.cBasePages));
4021 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
4022 }
4023 break;
4024 }
4025
4026 case GMMBALLOONACTION_DEFLATE:
4027 {
4028 /* Deflate. */
4029 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
4030 {
4031 /*
4032 * Record the ballooned memory.
4033 */
4034 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
4035 pGMM->cBalloonedPages -= cBalloonedPages;
4036 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
4037 if (pGVM->gmm.s.Stats.cReqDeflatePages)
4038 {
4039 AssertFailed(); /* This is path is for later. */
4040 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
4041 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
4042
4043 /*
4044 * Anything we need to do here now when the request has been completed?
4045 */
4046 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
4047 }
4048 else
4049 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
4050 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
4051 }
4052 else
4053 {
4054 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
4055 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
4056 }
4057 break;
4058 }
4059
4060 case GMMBALLOONACTION_RESET:
4061 {
4062 /* Reset to an empty balloon. */
4063 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
4064
4065 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
4066 pGVM->gmm.s.Stats.cBalloonedPages = 0;
4067 break;
4068 }
4069
4070 default:
4071 rc = VERR_INVALID_PARAMETER;
4072 break;
4073 }
4074 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4075 }
4076 else
4077 rc = VERR_GMM_IS_NOT_SANE;
4078
4079 gmmR0MutexRelease(pGMM);
4080 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
4081 return rc;
4082}
4083
4084
4085/**
4086 * VMMR0 request wrapper for GMMR0BalloonedPages.
4087 *
4088 * @returns see GMMR0BalloonedPages.
4089 * @param pGVM The global (ring-0) VM structure.
4090 * @param idCpu The VCPU id.
4091 * @param pReq Pointer to the request packet.
4092 */
4093GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
4094{
4095 /*
4096 * Validate input and pass it on.
4097 */
4098 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4099 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
4100 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
4101 VERR_INVALID_PARAMETER);
4102
4103 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
4104}
4105
4106
4107/**
4108 * Return memory statistics for the hypervisor
4109 *
4110 * @returns VBox status code.
4111 * @param pReq Pointer to the request packet.
4112 */
4113GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
4114{
4115 /*
4116 * Validate input and pass it on.
4117 */
4118 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4119 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4120 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4121 VERR_INVALID_PARAMETER);
4122
4123 /*
4124 * Validate input and get the basics.
4125 */
4126 PGMM pGMM;
4127 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4128 pReq->cAllocPages = pGMM->cAllocatedPages;
4129 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT - GUEST_PAGE_SHIFT)) - pGMM->cAllocatedPages;
4130 pReq->cBalloonedPages = pGMM->cBalloonedPages;
4131 pReq->cMaxPages = pGMM->cMaxPages;
4132 pReq->cSharedPages = pGMM->cDuplicatePages;
4133 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4134
4135 return VINF_SUCCESS;
4136}
4137
4138
4139/**
4140 * Return memory statistics for the VM
4141 *
4142 * @returns VBox status code.
4143 * @param pGVM The global (ring-0) VM structure.
4144 * @param idCpu Cpu id.
4145 * @param pReq Pointer to the request packet.
4146 *
4147 * @thread EMT(idCpu)
4148 */
4149GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4150{
4151 /*
4152 * Validate input and pass it on.
4153 */
4154 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4155 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4156 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4157 VERR_INVALID_PARAMETER);
4158
4159 /*
4160 * Validate input and get the basics.
4161 */
4162 PGMM pGMM;
4163 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4164 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4165 if (RT_FAILURE(rc))
4166 return rc;
4167
4168 /*
4169 * Take the semaphore and do some more validations.
4170 */
4171 gmmR0MutexAcquire(pGMM);
4172 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4173 {
4174 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4175 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4176 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4177 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4178 }
4179 else
4180 rc = VERR_GMM_IS_NOT_SANE;
4181
4182 gmmR0MutexRelease(pGMM);
4183 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4184 return rc;
4185}
4186
4187
4188/**
4189 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4190 *
4191 * Don't call this in legacy allocation mode!
4192 *
4193 * @returns VBox status code.
4194 * @param pGMM Pointer to the GMM instance data.
4195 * @param pGVM Pointer to the Global VM structure.
4196 * @param pChunk Pointer to the chunk to be unmapped.
4197 */
4198static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4199{
4200 RT_NOREF_PV(pGMM);
4201
4202 /*
4203 * Find the mapping and try unmapping it.
4204 */
4205 uint32_t cMappings = pChunk->cMappingsX;
4206 for (uint32_t i = 0; i < cMappings; i++)
4207 {
4208 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4209 if (pChunk->paMappingsX[i].pGVM == pGVM)
4210 {
4211 /* unmap */
4212 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4213 if (RT_SUCCESS(rc))
4214 {
4215 /* update the record. */
4216 cMappings--;
4217 if (i < cMappings)
4218 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4219 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4220 pChunk->paMappingsX[cMappings].pGVM = NULL;
4221 Assert(pChunk->cMappingsX - 1U == cMappings);
4222 pChunk->cMappingsX = cMappings;
4223 }
4224
4225 return rc;
4226 }
4227 }
4228
4229 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4230 return VERR_GMM_CHUNK_NOT_MAPPED;
4231}
4232
4233
4234/**
4235 * Unmaps a chunk previously mapped into the address space of the current process.
4236 *
4237 * @returns VBox status code.
4238 * @param pGMM Pointer to the GMM instance data.
4239 * @param pGVM Pointer to the Global VM structure.
4240 * @param pChunk Pointer to the chunk to be unmapped.
4241 * @param fRelaxedSem Whether we can release the semaphore while doing the
4242 * mapping (@c true) or not.
4243 */
4244static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4245{
4246 /*
4247 * Lock the chunk and if possible leave the giant GMM lock.
4248 */
4249 GMMR0CHUNKMTXSTATE MtxState;
4250 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4251 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4252 if (RT_SUCCESS(rc))
4253 {
4254 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4255 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4256 }
4257 return rc;
4258}
4259
4260
4261/**
4262 * Worker for gmmR0MapChunk.
4263 *
4264 * @returns VBox status code.
4265 * @param pGMM Pointer to the GMM instance data.
4266 * @param pGVM Pointer to the Global VM structure.
4267 * @param pChunk Pointer to the chunk to be mapped.
4268 * @param ppvR3 Where to store the ring-3 address of the mapping.
4269 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4270 * contain the address of the existing mapping.
4271 */
4272static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4273{
4274 RT_NOREF(pGMM);
4275
4276 /*
4277 * Check to see if the chunk is already mapped.
4278 */
4279 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4280 {
4281 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4282 if (pChunk->paMappingsX[i].pGVM == pGVM)
4283 {
4284 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4285 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4286#ifdef VBOX_WITH_PAGE_SHARING
4287 /* The ring-3 chunk cache can be out of sync; don't fail. */
4288 return VINF_SUCCESS;
4289#else
4290 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4291#endif
4292 }
4293 }
4294
4295 /*
4296 * Do the mapping.
4297 */
4298 RTR0MEMOBJ hMapObj;
4299 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4300 if (RT_SUCCESS(rc))
4301 {
4302 /* reallocate the array? assumes few users per chunk (usually one). */
4303 unsigned iMapping = pChunk->cMappingsX;
4304 if ( iMapping <= 3
4305 || (iMapping & 3) == 0)
4306 {
4307 unsigned cNewSize = iMapping <= 3
4308 ? iMapping + 1
4309 : iMapping + 4;
4310 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4311 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4312 {
4313 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4314 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4315 }
4316
4317 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4318 if (RT_UNLIKELY(!pvMappings))
4319 {
4320 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4321 return VERR_NO_MEMORY;
4322 }
4323 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4324 }
4325
4326 /* insert new entry */
4327 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4328 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4329 Assert(pChunk->cMappingsX == iMapping);
4330 pChunk->cMappingsX = iMapping + 1;
4331
4332 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4333 }
4334
4335 return rc;
4336}
4337
4338
4339/**
4340 * Maps a chunk into the user address space of the current process.
4341 *
4342 * @returns VBox status code.
4343 * @param pGMM Pointer to the GMM instance data.
4344 * @param pGVM Pointer to the Global VM structure.
4345 * @param pChunk Pointer to the chunk to be mapped.
4346 * @param fRelaxedSem Whether we can release the semaphore while doing the
4347 * mapping (@c true) or not.
4348 * @param ppvR3 Where to store the ring-3 address of the mapping.
4349 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4350 * contain the address of the existing mapping.
4351 */
4352static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4353{
4354 /*
4355 * Take the chunk lock and leave the giant GMM lock when possible, then
4356 * call the worker function.
4357 */
4358 GMMR0CHUNKMTXSTATE MtxState;
4359 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4360 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4361 if (RT_SUCCESS(rc))
4362 {
4363 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4364 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4365 }
4366
4367 return rc;
4368}
4369
4370
4371
4372#if defined(VBOX_WITH_PAGE_SHARING) || defined(VBOX_STRICT)
4373/**
4374 * Check if a chunk is mapped into the specified VM
4375 *
4376 * @returns mapped yes/no
4377 * @param pGMM Pointer to the GMM instance.
4378 * @param pGVM Pointer to the Global VM structure.
4379 * @param pChunk Pointer to the chunk to be mapped.
4380 * @param ppvR3 Where to store the ring-3 address of the mapping.
4381 */
4382static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4383{
4384 GMMR0CHUNKMTXSTATE MtxState;
4385 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4386 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4387 {
4388 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4389 if (pChunk->paMappingsX[i].pGVM == pGVM)
4390 {
4391 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4392 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4393 return true;
4394 }
4395 }
4396 *ppvR3 = NULL;
4397 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4398 return false;
4399}
4400#endif /* VBOX_WITH_PAGE_SHARING || VBOX_STRICT */
4401
4402
4403/**
4404 * Map a chunk and/or unmap another chunk.
4405 *
4406 * The mapping and unmapping applies to the current process.
4407 *
4408 * This API does two things because it saves a kernel call per mapping when
4409 * when the ring-3 mapping cache is full.
4410 *
4411 * @returns VBox status code.
4412 * @param pGVM The global (ring-0) VM structure.
4413 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4414 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4415 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4416 * @thread EMT ???
4417 */
4418GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4419{
4420 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4421 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4422
4423 /*
4424 * Validate input and get the basics.
4425 */
4426 PGMM pGMM;
4427 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4428 int rc = GVMMR0ValidateGVM(pGVM);
4429 if (RT_FAILURE(rc))
4430 return rc;
4431
4432 AssertCompile(NIL_GMM_CHUNKID == 0);
4433 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4434 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4435
4436 if ( idChunkMap == NIL_GMM_CHUNKID
4437 && idChunkUnmap == NIL_GMM_CHUNKID)
4438 return VERR_INVALID_PARAMETER;
4439
4440 if (idChunkMap != NIL_GMM_CHUNKID)
4441 {
4442 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4443 *ppvR3 = NIL_RTR3PTR;
4444 }
4445
4446 /*
4447 * Take the semaphore and do the work.
4448 *
4449 * The unmapping is done last since it's easier to undo a mapping than
4450 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4451 * that it pushes the user virtual address space to within a chunk of
4452 * it it's limits, so, no problem here.
4453 */
4454 gmmR0MutexAcquire(pGMM);
4455 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4456 {
4457 PGMMCHUNK pMap = NULL;
4458 if (idChunkMap != NIL_GVM_HANDLE)
4459 {
4460 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4461 if (RT_LIKELY(pMap))
4462 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4463 else
4464 {
4465 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4466 rc = VERR_GMM_CHUNK_NOT_FOUND;
4467 }
4468 }
4469/** @todo split this operation, the bail out might (theoretcially) not be
4470 * entirely safe. */
4471
4472 if ( idChunkUnmap != NIL_GMM_CHUNKID
4473 && RT_SUCCESS(rc))
4474 {
4475 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4476 if (RT_LIKELY(pUnmap))
4477 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4478 else
4479 {
4480 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4481 rc = VERR_GMM_CHUNK_NOT_FOUND;
4482 }
4483
4484 if (RT_FAILURE(rc) && pMap)
4485 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4486 }
4487
4488 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4489 }
4490 else
4491 rc = VERR_GMM_IS_NOT_SANE;
4492 gmmR0MutexRelease(pGMM);
4493
4494 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4495 return rc;
4496}
4497
4498
4499/**
4500 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4501 *
4502 * @returns see GMMR0MapUnmapChunk.
4503 * @param pGVM The global (ring-0) VM structure.
4504 * @param pReq Pointer to the request packet.
4505 */
4506GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4507{
4508 /*
4509 * Validate input and pass it on.
4510 */
4511 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4512 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4513
4514 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4515}
4516
4517
4518#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4519/**
4520 * Gets the ring-0 virtual address for the given page.
4521 *
4522 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4523 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4524 * corresponding chunk will remain valid beyond the call (at least till the EMT
4525 * returns to ring-3).
4526 *
4527 * @returns VBox status code.
4528 * @param pGVM Pointer to the kernel-only VM instace data.
4529 * @param idPage The page ID.
4530 * @param ppv Where to store the address.
4531 * @thread EMT
4532 */
4533GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4534{
4535 *ppv = NULL;
4536 PGMM pGMM;
4537 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4538
4539 uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4540
4541 /*
4542 * Start with the per-VM TLB.
4543 */
4544 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4545
4546 PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4547 PGMMCHUNK pChunk = pTlbe->pChunk;
4548 if ( pChunk != NULL
4549 && pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4550 && pChunk->Core.Key == idChunk)
4551 pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4552 else
4553 {
4554 pGVM->R0Stats.gmm.cChunkTlbMisses++;
4555
4556 /*
4557 * Look it up in the chunk tree.
4558 */
4559 RTSpinlockAcquire(pGMM->hSpinLockTree);
4560 pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4561 if (RT_LIKELY(pChunk))
4562 {
4563 pTlbe->idGeneration = pGMM->idFreeGeneration;
4564 RTSpinlockRelease(pGMM->hSpinLockTree);
4565 pTlbe->pChunk = pChunk;
4566 }
4567 else
4568 {
4569 RTSpinlockRelease(pGMM->hSpinLockTree);
4570 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4571 AssertMsgFailed(("idPage=%#x\n", idPage));
4572 return VERR_GMM_PAGE_NOT_FOUND;
4573 }
4574 }
4575
4576 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4577
4578 /*
4579 * Got a chunk, now validate the page ownership and calcuate it's address.
4580 */
4581 const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4582 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4583 && pPage->Private.hGVM == pGVM->hSelf)
4584 || GMM_PAGE_IS_SHARED(pPage)))
4585 {
4586 AssertPtr(pChunk->pbMapping);
4587 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT];
4588 return VINF_SUCCESS;
4589 }
4590 AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4591 idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4592 return VERR_GMM_NOT_PAGE_OWNER;
4593}
4594#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4595
4596#ifdef VBOX_WITH_PAGE_SHARING
4597
4598# ifdef VBOX_STRICT
4599/**
4600 * For checksumming shared pages in strict builds.
4601 *
4602 * The purpose is making sure that a page doesn't change.
4603 *
4604 * @returns Checksum, 0 on failure.
4605 * @param pGMM The GMM instance data.
4606 * @param pGVM Pointer to the kernel-only VM instace data.
4607 * @param idPage The page ID.
4608 */
4609static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4610{
4611 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4612 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4613
4614 uint8_t *pbChunk;
4615 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4616 return 0;
4617 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
4618
4619 return RTCrc32(pbPage, GUEST_PAGE_SIZE);
4620}
4621# endif /* VBOX_STRICT */
4622
4623
4624/**
4625 * Calculates the module hash value.
4626 *
4627 * @returns Hash value.
4628 * @param pszModuleName The module name.
4629 * @param pszVersion The module version string.
4630 */
4631static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4632{
4633 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4634}
4635
4636
4637/**
4638 * Finds a global module.
4639 *
4640 * @returns Pointer to the global module on success, NULL if not found.
4641 * @param pGMM The GMM instance data.
4642 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4643 * @param cbModule The module size.
4644 * @param enmGuestOS The guest OS type.
4645 * @param cRegions The number of regions.
4646 * @param pszModuleName The module name.
4647 * @param pszVersion The module version.
4648 * @param paRegions The region descriptions.
4649 */
4650static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4651 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4652 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4653{
4654 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4655 pGblMod;
4656 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4657 {
4658 if (pGblMod->cbModule != cbModule)
4659 continue;
4660 if (pGblMod->enmGuestOS != enmGuestOS)
4661 continue;
4662 if (pGblMod->cRegions != cRegions)
4663 continue;
4664 if (strcmp(pGblMod->szName, pszModuleName))
4665 continue;
4666 if (strcmp(pGblMod->szVersion, pszVersion))
4667 continue;
4668
4669 uint32_t i;
4670 for (i = 0; i < cRegions; i++)
4671 {
4672 uint32_t off = paRegions[i].GCRegionAddr & GUEST_PAGE_OFFSET_MASK;
4673 if (pGblMod->aRegions[i].off != off)
4674 break;
4675
4676 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, GUEST_PAGE_SIZE);
4677 if (pGblMod->aRegions[i].cb != cb)
4678 break;
4679 }
4680
4681 if (i == cRegions)
4682 return pGblMod;
4683 }
4684
4685 return NULL;
4686}
4687
4688
4689/**
4690 * Creates a new global module.
4691 *
4692 * @returns VBox status code.
4693 * @param pGMM The GMM instance data.
4694 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4695 * @param cbModule The module size.
4696 * @param enmGuestOS The guest OS type.
4697 * @param cRegions The number of regions.
4698 * @param pszModuleName The module name.
4699 * @param pszVersion The module version.
4700 * @param paRegions The region descriptions.
4701 * @param ppGblMod Where to return the new module on success.
4702 */
4703static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4704 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4705 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4706{
4707 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4708 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4709 {
4710 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4711 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4712 }
4713
4714 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4715 if (!pGblMod)
4716 {
4717 Log(("gmmR0ShModNewGlobal: No memory\n"));
4718 return VERR_NO_MEMORY;
4719 }
4720
4721 pGblMod->Core.Key = uHash;
4722 pGblMod->cbModule = cbModule;
4723 pGblMod->cRegions = cRegions;
4724 pGblMod->cUsers = 1;
4725 pGblMod->enmGuestOS = enmGuestOS;
4726 strcpy(pGblMod->szName, pszModuleName);
4727 strcpy(pGblMod->szVersion, pszVersion);
4728
4729 for (uint32_t i = 0; i < cRegions; i++)
4730 {
4731 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4732 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & GUEST_PAGE_OFFSET_MASK;
4733 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4734 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, GUEST_PAGE_SIZE);
4735 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4736 }
4737
4738 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4739 Assert(fInsert); NOREF(fInsert);
4740 pGMM->cShareableModules++;
4741
4742 *ppGblMod = pGblMod;
4743 return VINF_SUCCESS;
4744}
4745
4746
4747/**
4748 * Deletes a global module which is no longer referenced by anyone.
4749 *
4750 * @param pGMM The GMM instance data.
4751 * @param pGblMod The module to delete.
4752 */
4753static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4754{
4755 Assert(pGblMod->cUsers == 0);
4756 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4757
4758 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4759 Assert(pvTest == pGblMod); NOREF(pvTest);
4760 pGMM->cShareableModules--;
4761
4762 uint32_t i = pGblMod->cRegions;
4763 while (i-- > 0)
4764 {
4765 if (pGblMod->aRegions[i].paidPages)
4766 {
4767 /* We don't doing anything to the pages as they are handled by the
4768 copy-on-write mechanism in PGM. */
4769 RTMemFree(pGblMod->aRegions[i].paidPages);
4770 pGblMod->aRegions[i].paidPages = NULL;
4771 }
4772 }
4773 RTMemFree(pGblMod);
4774}
4775
4776
4777static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4778 PGMMSHAREDMODULEPERVM *ppRecVM)
4779{
4780 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4781 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4782
4783 PGMMSHAREDMODULEPERVM pRecVM;
4784 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4785 if (!pRecVM)
4786 return VERR_NO_MEMORY;
4787
4788 pRecVM->Core.Key = GCBaseAddr;
4789 for (uint32_t i = 0; i < cRegions; i++)
4790 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4791
4792 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4793 Assert(fInsert); NOREF(fInsert);
4794 pGVM->gmm.s.Stats.cShareableModules++;
4795
4796 *ppRecVM = pRecVM;
4797 return VINF_SUCCESS;
4798}
4799
4800
4801static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4802{
4803 /*
4804 * Free the per-VM module.
4805 */
4806 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4807 pRecVM->pGlobalModule = NULL;
4808
4809 if (fRemove)
4810 {
4811 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4812 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4813 }
4814
4815 RTMemFree(pRecVM);
4816
4817 /*
4818 * Release the global module.
4819 * (In the registration bailout case, it might not be.)
4820 */
4821 if (pGblMod)
4822 {
4823 Assert(pGblMod->cUsers > 0);
4824 pGblMod->cUsers--;
4825 if (pGblMod->cUsers == 0)
4826 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4827 }
4828}
4829
4830#endif /* VBOX_WITH_PAGE_SHARING */
4831
4832/**
4833 * Registers a new shared module for the VM.
4834 *
4835 * @returns VBox status code.
4836 * @param pGVM The global (ring-0) VM structure.
4837 * @param idCpu The VCPU id.
4838 * @param enmGuestOS The guest OS type.
4839 * @param pszModuleName The module name.
4840 * @param pszVersion The module version.
4841 * @param GCPtrModBase The module base address.
4842 * @param cbModule The module size.
4843 * @param cRegions The mumber of shared region descriptors.
4844 * @param paRegions Pointer to an array of shared region(s).
4845 * @thread EMT(idCpu)
4846 */
4847GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4848 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4849 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4850{
4851#ifdef VBOX_WITH_PAGE_SHARING
4852 /*
4853 * Validate input and get the basics.
4854 *
4855 * Note! Turns out the module size does necessarily match the size of the
4856 * regions. (iTunes on XP)
4857 */
4858 PGMM pGMM;
4859 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4860 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4861 if (RT_FAILURE(rc))
4862 return rc;
4863
4864 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4865 return VERR_GMM_TOO_MANY_REGIONS;
4866
4867 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4868 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4869
4870 uint32_t cbTotal = 0;
4871 for (uint32_t i = 0; i < cRegions; i++)
4872 {
4873 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4874 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4875
4876 cbTotal += paRegions[i].cbRegion;
4877 if (RT_UNLIKELY(cbTotal > _1G))
4878 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4879 }
4880
4881 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4882 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4883 return VERR_GMM_MODULE_NAME_TOO_LONG;
4884
4885 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4886 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4887 return VERR_GMM_MODULE_NAME_TOO_LONG;
4888
4889 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4890 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4891
4892 /*
4893 * Take the semaphore and do some more validations.
4894 */
4895 gmmR0MutexAcquire(pGMM);
4896 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4897 {
4898 /*
4899 * Check if this module is already locally registered and register
4900 * it if it isn't. The base address is a unique module identifier
4901 * locally.
4902 */
4903 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4904 bool fNewModule = pRecVM == NULL;
4905 if (fNewModule)
4906 {
4907 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4908 if (RT_SUCCESS(rc))
4909 {
4910 /*
4911 * Find a matching global module, register a new one if needed.
4912 */
4913 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4914 pszModuleName, pszVersion, paRegions);
4915 if (!pGblMod)
4916 {
4917 Assert(fNewModule);
4918 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4919 pszModuleName, pszVersion, paRegions, &pGblMod);
4920 if (RT_SUCCESS(rc))
4921 {
4922 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4923 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4924 }
4925 else
4926 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4927 }
4928 else
4929 {
4930 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4931 pGblMod->cUsers++;
4932 pRecVM->pGlobalModule = pGblMod;
4933
4934 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4935 }
4936 }
4937 }
4938 else
4939 {
4940 /*
4941 * Attempt to re-register an existing module.
4942 */
4943 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4944 pszModuleName, pszVersion, paRegions);
4945 if (pRecVM->pGlobalModule == pGblMod)
4946 {
4947 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4948 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4949 }
4950 else
4951 {
4952 /** @todo may have to unregister+register when this happens in case it's caused
4953 * by VBoxService crashing and being restarted... */
4954 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4955 " incoming at %RGvLB%#x %s %s rgns %u\n"
4956 " existing at %RGvLB%#x %s %s rgns %u\n",
4957 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4958 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4959 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4960 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4961 }
4962 }
4963 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4964 }
4965 else
4966 rc = VERR_GMM_IS_NOT_SANE;
4967
4968 gmmR0MutexRelease(pGMM);
4969 return rc;
4970#else
4971
4972 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4973 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4974 return VERR_NOT_IMPLEMENTED;
4975#endif
4976}
4977
4978
4979/**
4980 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4981 *
4982 * @returns see GMMR0RegisterSharedModule.
4983 * @param pGVM The global (ring-0) VM structure.
4984 * @param idCpu The VCPU id.
4985 * @param pReq Pointer to the request packet.
4986 */
4987GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4988{
4989 /*
4990 * Validate input and pass it on.
4991 */
4992 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4993 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4994 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4995 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4996
4997 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4998 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4999 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
5000 return VINF_SUCCESS;
5001}
5002
5003
5004/**
5005 * Unregisters a shared module for the VM
5006 *
5007 * @returns VBox status code.
5008 * @param pGVM The global (ring-0) VM structure.
5009 * @param idCpu The VCPU id.
5010 * @param pszModuleName The module name.
5011 * @param pszVersion The module version.
5012 * @param GCPtrModBase The module base address.
5013 * @param cbModule The module size.
5014 */
5015GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
5016 RTGCPTR GCPtrModBase, uint32_t cbModule)
5017{
5018#ifdef VBOX_WITH_PAGE_SHARING
5019 /*
5020 * Validate input and get the basics.
5021 */
5022 PGMM pGMM;
5023 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5024 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5025 if (RT_FAILURE(rc))
5026 return rc;
5027
5028 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
5029 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
5030 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
5031 return VERR_GMM_MODULE_NAME_TOO_LONG;
5032 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
5033 return VERR_GMM_MODULE_NAME_TOO_LONG;
5034
5035 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
5036
5037 /*
5038 * Take the semaphore and do some more validations.
5039 */
5040 gmmR0MutexAcquire(pGMM);
5041 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5042 {
5043 /*
5044 * Locate and remove the specified module.
5045 */
5046 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
5047 if (pRecVM)
5048 {
5049 /** @todo Do we need to do more validations here, like that the
5050 * name + version + cbModule matches? */
5051 NOREF(cbModule);
5052 Assert(pRecVM->pGlobalModule);
5053 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
5054 }
5055 else
5056 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
5057
5058 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5059 }
5060 else
5061 rc = VERR_GMM_IS_NOT_SANE;
5062
5063 gmmR0MutexRelease(pGMM);
5064 return rc;
5065#else
5066
5067 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
5068 return VERR_NOT_IMPLEMENTED;
5069#endif
5070}
5071
5072
5073/**
5074 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
5075 *
5076 * @returns see GMMR0UnregisterSharedModule.
5077 * @param pGVM The global (ring-0) VM structure.
5078 * @param idCpu The VCPU id.
5079 * @param pReq Pointer to the request packet.
5080 */
5081GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
5082{
5083 /*
5084 * Validate input and pass it on.
5085 */
5086 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5087 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5088
5089 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
5090}
5091
5092#ifdef VBOX_WITH_PAGE_SHARING
5093
5094/**
5095 * Increase the use count of a shared page, the page is known to exist and be valid and such.
5096 *
5097 * @param pGMM Pointer to the GMM instance.
5098 * @param pGVM Pointer to the GVM instance.
5099 * @param pPage The page structure.
5100 */
5101DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
5102{
5103 Assert(pGMM->cSharedPages > 0);
5104 Assert(pGMM->cAllocatedPages > 0);
5105
5106 pGMM->cDuplicatePages++;
5107
5108 pPage->Shared.cRefs++;
5109 pGVM->gmm.s.Stats.cSharedPages++;
5110 pGVM->gmm.s.Stats.Allocated.cBasePages++;
5111}
5112
5113
5114/**
5115 * Converts a private page to a shared page, the page is known to exist and be valid and such.
5116 *
5117 * @param pGMM Pointer to the GMM instance.
5118 * @param pGVM Pointer to the GVM instance.
5119 * @param HCPhys Host physical address
5120 * @param idPage The Page ID
5121 * @param pPage The page structure.
5122 * @param pPageDesc Shared page descriptor
5123 */
5124DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5125 PGMMSHAREDPAGEDESC pPageDesc)
5126{
5127 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5128 Assert(pChunk);
5129 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5130 Assert(GMM_PAGE_IS_PRIVATE(pPage));
5131
5132 pChunk->cPrivate--;
5133 pChunk->cShared++;
5134
5135 pGMM->cSharedPages++;
5136
5137 pGVM->gmm.s.Stats.cSharedPages++;
5138 pGVM->gmm.s.Stats.cPrivatePages--;
5139
5140 /* Modify the page structure. */
5141 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> GUEST_PAGE_SHIFT);
5142 pPage->Shared.cRefs = 1;
5143#ifdef VBOX_STRICT
5144 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5145 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5146#else
5147 NOREF(pPageDesc);
5148 pPage->Shared.u14Checksum = 0;
5149#endif
5150 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5151}
5152
5153
5154static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5155 unsigned idxRegion, unsigned idxPage,
5156 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5157{
5158 NOREF(pModule);
5159
5160 /* Easy case: just change the internal page type. */
5161 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5162 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5163 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5164 VERR_PGM_PHYS_INVALID_PAGE_ID);
5165 NOREF(idxRegion);
5166
5167 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5168
5169 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5170
5171 /* Keep track of these references. */
5172 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5173
5174 return VINF_SUCCESS;
5175}
5176
5177/**
5178 * Checks specified shared module range for changes
5179 *
5180 * Performs the following tasks:
5181 * - If a shared page is new, then it changes the GMM page type to shared and
5182 * returns it in the pPageDesc descriptor.
5183 * - If a shared page already exists, then it checks if the VM page is
5184 * identical and if so frees the VM page and returns the shared page in
5185 * pPageDesc descriptor.
5186 *
5187 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5188 *
5189 * @returns VBox status code.
5190 * @param pGVM Pointer to the GVM instance data.
5191 * @param pModule Module description
5192 * @param idxRegion Region index
5193 * @param idxPage Page index
5194 * @param pPageDesc Page descriptor
5195 */
5196GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5197 PGMMSHAREDPAGEDESC pPageDesc)
5198{
5199 int rc;
5200 PGMM pGMM;
5201 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5202 pPageDesc->u32StrictChecksum = 0;
5203
5204 AssertMsgReturn(idxRegion < pModule->cRegions,
5205 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5206 VERR_INVALID_PARAMETER);
5207
5208 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> GUEST_PAGE_SHIFT;
5209 AssertMsgReturn(idxPage < cPages,
5210 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5211 VERR_INVALID_PARAMETER);
5212
5213 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5214
5215 /*
5216 * First time; create a page descriptor array.
5217 */
5218 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5219 if (!pGlobalRegion->paidPages)
5220 {
5221 Log(("Allocate page descriptor array for %d pages\n", cPages));
5222 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5223 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5224
5225 /* Invalidate all descriptors. */
5226 uint32_t i = cPages;
5227 while (i-- > 0)
5228 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5229 }
5230
5231 /*
5232 * We've seen this shared page for the first time?
5233 */
5234 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5235 {
5236 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5237 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5238 }
5239
5240 /*
5241 * We've seen it before...
5242 */
5243 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5244 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5245 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5246
5247 /*
5248 * Get the shared page source.
5249 */
5250 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5251 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5252 VERR_PGM_PHYS_INVALID_PAGE_ID);
5253
5254 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5255 {
5256 /*
5257 * Page was freed at some point; invalidate this entry.
5258 */
5259 /** @todo this isn't really bullet proof. */
5260 Log(("Old shared page was freed -> create a new one\n"));
5261 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5262 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5263 }
5264
5265 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << GUEST_PAGE_SHIFT));
5266
5267 /*
5268 * Calculate the virtual address of the local page.
5269 */
5270 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5271 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5272 VERR_PGM_PHYS_INVALID_PAGE_ID);
5273
5274 uint8_t *pbChunk;
5275 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5276 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5277 VERR_PGM_PHYS_INVALID_PAGE_ID);
5278 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
5279
5280 /*
5281 * Calculate the virtual address of the shared page.
5282 */
5283 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5284 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5285
5286 /*
5287 * Get the virtual address of the physical page; map the chunk into the VM
5288 * process if not already done.
5289 */
5290 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5291 {
5292 Log(("Map chunk into process!\n"));
5293 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5294 AssertRCReturn(rc, rc);
5295 }
5296 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
5297
5298#ifdef VBOX_STRICT
5299 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, GUEST_PAGE_SIZE);
5300 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5301 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5302 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5303 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5304#endif
5305
5306 if (memcmp(pbSharedPage, pbLocalPage, GUEST_PAGE_SIZE))
5307 {
5308 Log(("Unexpected differences found between local and shared page; skip\n"));
5309 /* Signal to the caller that this one hasn't changed. */
5310 pPageDesc->idPage = NIL_GMM_PAGEID;
5311 return VINF_SUCCESS;
5312 }
5313
5314 /*
5315 * Free the old local page.
5316 */
5317 GMMFREEPAGEDESC PageDesc;
5318 PageDesc.idPage = pPageDesc->idPage;
5319 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5320 AssertRCReturn(rc, rc);
5321
5322 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5323
5324 /*
5325 * Pass along the new physical address & page id.
5326 */
5327 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << GUEST_PAGE_SHIFT;
5328 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5329
5330 return VINF_SUCCESS;
5331}
5332
5333
5334/**
5335 * RTAvlGCPtrDestroy callback.
5336 *
5337 * @returns 0 or VERR_GMM_INSTANCE.
5338 * @param pNode The node to destroy.
5339 * @param pvArgs Pointer to an argument packet.
5340 */
5341static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5342{
5343 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5344 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5345 (PGMMSHAREDMODULEPERVM)pNode,
5346 false /*fRemove*/);
5347 return VINF_SUCCESS;
5348}
5349
5350
5351/**
5352 * Used by GMMR0CleanupVM to clean up shared modules.
5353 *
5354 * This is called without taking the GMM lock so that it can be yielded as
5355 * needed here.
5356 *
5357 * @param pGMM The GMM handle.
5358 * @param pGVM The global VM handle.
5359 */
5360static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5361{
5362 gmmR0MutexAcquire(pGMM);
5363 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5364
5365 GMMR0SHMODPERVMDTORARGS Args;
5366 Args.pGVM = pGVM;
5367 Args.pGMM = pGMM;
5368 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5369
5370 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5371 pGVM->gmm.s.Stats.cShareableModules = 0;
5372
5373 gmmR0MutexRelease(pGMM);
5374}
5375
5376#endif /* VBOX_WITH_PAGE_SHARING */
5377
5378/**
5379 * Removes all shared modules for the specified VM
5380 *
5381 * @returns VBox status code.
5382 * @param pGVM The global (ring-0) VM structure.
5383 * @param idCpu The VCPU id.
5384 */
5385GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5386{
5387#ifdef VBOX_WITH_PAGE_SHARING
5388 /*
5389 * Validate input and get the basics.
5390 */
5391 PGMM pGMM;
5392 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5393 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5394 if (RT_FAILURE(rc))
5395 return rc;
5396
5397 /*
5398 * Take the semaphore and do some more validations.
5399 */
5400 gmmR0MutexAcquire(pGMM);
5401 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5402 {
5403 Log(("GMMR0ResetSharedModules\n"));
5404 GMMR0SHMODPERVMDTORARGS Args;
5405 Args.pGVM = pGVM;
5406 Args.pGMM = pGMM;
5407 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5408 pGVM->gmm.s.Stats.cShareableModules = 0;
5409
5410 rc = VINF_SUCCESS;
5411 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5412 }
5413 else
5414 rc = VERR_GMM_IS_NOT_SANE;
5415
5416 gmmR0MutexRelease(pGMM);
5417 return rc;
5418#else
5419 RT_NOREF(pGVM, idCpu);
5420 return VERR_NOT_IMPLEMENTED;
5421#endif
5422}
5423
5424#ifdef VBOX_WITH_PAGE_SHARING
5425
5426/**
5427 * Tree enumeration callback for checking a shared module.
5428 */
5429static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5430{
5431 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5432 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5433 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5434
5435 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5436 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5437
5438 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5439 if (RT_FAILURE(rc))
5440 return rc;
5441 return VINF_SUCCESS;
5442}
5443
5444#endif /* VBOX_WITH_PAGE_SHARING */
5445
5446/**
5447 * Check all shared modules for the specified VM.
5448 *
5449 * @returns VBox status code.
5450 * @param pGVM The global (ring-0) VM structure.
5451 * @param idCpu The calling EMT number.
5452 * @thread EMT(idCpu)
5453 */
5454GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5455{
5456#ifdef VBOX_WITH_PAGE_SHARING
5457 /*
5458 * Validate input and get the basics.
5459 */
5460 PGMM pGMM;
5461 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5462 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5463 if (RT_FAILURE(rc))
5464 return rc;
5465
5466# ifndef DEBUG_sandervl
5467 /*
5468 * Take the semaphore and do some more validations.
5469 */
5470 gmmR0MutexAcquire(pGMM);
5471# endif
5472 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5473 {
5474 /*
5475 * Walk the tree, checking each module.
5476 */
5477 Log(("GMMR0CheckSharedModules\n"));
5478
5479 GMMCHECKSHAREDMODULEINFO Args;
5480 Args.pGVM = pGVM;
5481 Args.idCpu = idCpu;
5482 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5483
5484 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5485 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5486 }
5487 else
5488 rc = VERR_GMM_IS_NOT_SANE;
5489
5490# ifndef DEBUG_sandervl
5491 gmmR0MutexRelease(pGMM);
5492# endif
5493 return rc;
5494#else
5495 RT_NOREF(pGVM, idCpu);
5496 return VERR_NOT_IMPLEMENTED;
5497#endif
5498}
5499
5500#ifdef VBOX_STRICT
5501
5502/**
5503 * Worker for GMMR0FindDuplicatePageReq.
5504 *
5505 * @returns true if duplicate, false if not.
5506 */
5507static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5508{
5509 bool fFoundDuplicate = false;
5510 /* Only take chunks not mapped into this VM process; not entirely correct. */
5511 uint8_t *pbChunk;
5512 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5513 {
5514 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5515 if (RT_SUCCESS(rc))
5516 {
5517 /*
5518 * Look for duplicate pages
5519 */
5520 uintptr_t iPage = GMM_CHUNK_NUM_PAGES;
5521 while (iPage-- > 0)
5522 {
5523 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5524 {
5525 uint8_t *pbDestPage = pbChunk + (iPage << GUEST_PAGE_SHIFT);
5526 if (!memcmp(pbSourcePage, pbDestPage, GUEST_PAGE_SIZE))
5527 {
5528 fFoundDuplicate = true;
5529 break;
5530 }
5531 }
5532 }
5533 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5534 }
5535 }
5536 return fFoundDuplicate;
5537}
5538
5539
5540/**
5541 * Find a duplicate of the specified page in other active VMs
5542 *
5543 * @returns VBox status code.
5544 * @param pGVM The global (ring-0) VM structure.
5545 * @param pReq Pointer to the request packet.
5546 */
5547GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5548{
5549 /*
5550 * Validate input and pass it on.
5551 */
5552 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5553 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5554
5555 PGMM pGMM;
5556 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5557
5558 int rc = GVMMR0ValidateGVM(pGVM);
5559 if (RT_FAILURE(rc))
5560 return rc;
5561
5562 /*
5563 * Take the semaphore and do some more validations.
5564 */
5565 rc = gmmR0MutexAcquire(pGMM);
5566 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5567 {
5568 uint8_t *pbChunk;
5569 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5570 if (pChunk)
5571 {
5572 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5573 {
5574 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << GUEST_PAGE_SHIFT);
5575 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5576 if (pPage)
5577 {
5578 /*
5579 * Walk the chunks
5580 */
5581 pReq->fDuplicate = false;
5582 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5583 {
5584 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5585 {
5586 pReq->fDuplicate = true;
5587 break;
5588 }
5589 }
5590 }
5591 else
5592 {
5593 AssertFailed();
5594 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5595 }
5596 }
5597 else
5598 AssertFailed();
5599 }
5600 else
5601 AssertFailed();
5602 }
5603 else
5604 rc = VERR_GMM_IS_NOT_SANE;
5605
5606 gmmR0MutexRelease(pGMM);
5607 return rc;
5608}
5609
5610#endif /* VBOX_STRICT */
5611
5612
5613/**
5614 * Retrieves the GMM statistics visible to the caller.
5615 *
5616 * @returns VBox status code.
5617 *
5618 * @param pStats Where to put the statistics.
5619 * @param pSession The current session.
5620 * @param pGVM The GVM to obtain statistics for. Optional.
5621 */
5622GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5623{
5624 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5625
5626 /*
5627 * Validate input.
5628 */
5629 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5630 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5631 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5632
5633 PGMM pGMM;
5634 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5635
5636 /*
5637 * Validate the VM handle, if not NULL, and lock the GMM.
5638 */
5639 int rc;
5640 if (pGVM)
5641 {
5642 rc = GVMMR0ValidateGVM(pGVM);
5643 if (RT_FAILURE(rc))
5644 return rc;
5645 }
5646
5647 rc = gmmR0MutexAcquire(pGMM);
5648 if (RT_FAILURE(rc))
5649 return rc;
5650
5651 /*
5652 * Copy out the GMM statistics.
5653 */
5654 pStats->cMaxPages = pGMM->cMaxPages;
5655 pStats->cReservedPages = pGMM->cReservedPages;
5656 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5657 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5658 pStats->cSharedPages = pGMM->cSharedPages;
5659 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5660 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5661 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5662 pStats->cChunks = pGMM->cChunks;
5663 pStats->cFreedChunks = pGMM->cFreedChunks;
5664 pStats->cShareableModules = pGMM->cShareableModules;
5665 pStats->idFreeGeneration = pGMM->idFreeGeneration;
5666 RT_ZERO(pStats->au64Reserved);
5667
5668 /*
5669 * Copy out the VM statistics.
5670 */
5671 if (pGVM)
5672 pStats->VMStats = pGVM->gmm.s.Stats;
5673 else
5674 RT_ZERO(pStats->VMStats);
5675
5676 gmmR0MutexRelease(pGMM);
5677 return rc;
5678}
5679
5680
5681/**
5682 * VMMR0 request wrapper for GMMR0QueryStatistics.
5683 *
5684 * @returns see GMMR0QueryStatistics.
5685 * @param pGVM The global (ring-0) VM structure. Optional.
5686 * @param pReq Pointer to the request packet.
5687 */
5688GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5689{
5690 /*
5691 * Validate input and pass it on.
5692 */
5693 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5694 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5695
5696 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5697}
5698
5699
5700/**
5701 * Resets the specified GMM statistics.
5702 *
5703 * @returns VBox status code.
5704 *
5705 * @param pStats Which statistics to reset, that is, non-zero fields
5706 * indicates which to reset.
5707 * @param pSession The current session.
5708 * @param pGVM The GVM to reset statistics for. Optional.
5709 */
5710GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5711{
5712 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5713 /* Currently nothing we can reset at the moment. */
5714 return VINF_SUCCESS;
5715}
5716
5717
5718/**
5719 * VMMR0 request wrapper for GMMR0ResetStatistics.
5720 *
5721 * @returns see GMMR0ResetStatistics.
5722 * @param pGVM The global (ring-0) VM structure. Optional.
5723 * @param pReq Pointer to the request packet.
5724 */
5725GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5726{
5727 /*
5728 * Validate input and pass it on.
5729 */
5730 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5731 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5732
5733 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5734}
5735
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette