VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 91540

Last change on this file since 91540 was 91540, checked in by vboxsync, 3 years ago

VMM/GVMMR0: Corrected idSharedPage validation in GMMR0AllocateHandyPages and changed a few RT_UNLIKELY constructs to RT_LIKELY (more portable).

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 200.8 KB
Line 
1/* $Id: GMMR0.cpp 91540 2021-10-04 12:12:32Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183
184/*********************************************************************************************************************************
185* Defined Constants And Macros *
186*********************************************************************************************************************************/
187/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
188 * Use a critical section instead of a fast mutex for the giant GMM lock.
189 *
190 * @remarks This is primarily a way of avoiding the deadlock checks in the
191 * windows driver verifier. */
192#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
193# define VBOX_USE_CRIT_SECT_FOR_GIANT
194#endif
195
196#if defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM) && !defined(RT_OS_DARWIN) && 0
197/** Enable the legacy mode code (will be dropped soon). */
198# define GMM_WITH_LEGACY_MODE
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * On 32-bit hosts we'll some trickery is necessary to compress all
212 * the information into 32-bits. When the fSharedFree member is set,
213 * the 30th bit decides whether it's a free page or not.
214 *
215 * Because of the different layout on 32-bit and 64-bit hosts, macros
216 * are used to get and set some of the data.
217 */
218typedef union GMMPAGE
219{
220#if HC_ARCH_BITS == 64
221 /** Unsigned integer view. */
222 uint64_t u;
223
224 /** The common view. */
225 struct GMMPAGECOMMON
226 {
227 uint32_t uStuff1 : 32;
228 uint32_t uStuff2 : 30;
229 /** The page state. */
230 uint32_t u2State : 2;
231 } Common;
232
233 /** The view of a private page. */
234 struct GMMPAGEPRIVATE
235 {
236 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
237 uint32_t pfn;
238 /** The GVM handle. (64K VMs) */
239 uint32_t hGVM : 16;
240 /** Reserved. */
241 uint32_t u16Reserved : 14;
242 /** The page state. */
243 uint32_t u2State : 2;
244 } Private;
245
246 /** The view of a shared page. */
247 struct GMMPAGESHARED
248 {
249 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
250 uint32_t pfn;
251 /** The reference count (64K VMs). */
252 uint32_t cRefs : 16;
253 /** Used for debug checksumming. */
254 uint32_t u14Checksum : 14;
255 /** The page state. */
256 uint32_t u2State : 2;
257 } Shared;
258
259 /** The view of a free page. */
260 struct GMMPAGEFREE
261 {
262 /** The index of the next page in the free list. UINT16_MAX is NIL. */
263 uint16_t iNext;
264 /** Reserved. Checksum or something? */
265 uint16_t u16Reserved0;
266 /** Reserved. Checksum or something? */
267 uint32_t u30Reserved1 : 30;
268 /** The page state. */
269 uint32_t u2State : 2;
270 } Free;
271
272#else /* 32-bit */
273 /** Unsigned integer view. */
274 uint32_t u;
275
276 /** The common view. */
277 struct GMMPAGECOMMON
278 {
279 uint32_t uStuff : 30;
280 /** The page state. */
281 uint32_t u2State : 2;
282 } Common;
283
284 /** The view of a private page. */
285 struct GMMPAGEPRIVATE
286 {
287 /** The guest page frame number. (Max addressable: 2 ^ 36) */
288 uint32_t pfn : 24;
289 /** The GVM handle. (127 VMs) */
290 uint32_t hGVM : 7;
291 /** The top page state bit, MBZ. */
292 uint32_t fZero : 1;
293 } Private;
294
295 /** The view of a shared page. */
296 struct GMMPAGESHARED
297 {
298 /** The reference count. */
299 uint32_t cRefs : 30;
300 /** The page state. */
301 uint32_t u2State : 2;
302 } Shared;
303
304 /** The view of a free page. */
305 struct GMMPAGEFREE
306 {
307 /** The index of the next page in the free list. UINT16_MAX is NIL. */
308 uint32_t iNext : 16;
309 /** Reserved. Checksum or something? */
310 uint32_t u14Reserved : 14;
311 /** The page state. */
312 uint32_t u2State : 2;
313 } Free;
314#endif
315} GMMPAGE;
316AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
317/** Pointer to a GMMPAGE. */
318typedef GMMPAGE *PGMMPAGE;
319
320
321/** @name The Page States.
322 * @{ */
323/** A private page. */
324#define GMM_PAGE_STATE_PRIVATE 0
325/** A private page - alternative value used on the 32-bit implementation.
326 * This will never be used on 64-bit hosts. */
327#define GMM_PAGE_STATE_PRIVATE_32 1
328/** A shared page. */
329#define GMM_PAGE_STATE_SHARED 2
330/** A free page. */
331#define GMM_PAGE_STATE_FREE 3
332/** @} */
333
334
335/** @def GMM_PAGE_IS_PRIVATE
336 *
337 * @returns true if private, false if not.
338 * @param pPage The GMM page.
339 */
340#if HC_ARCH_BITS == 64
341# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
342#else
343# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
344#endif
345
346/** @def GMM_PAGE_IS_SHARED
347 *
348 * @returns true if shared, false if not.
349 * @param pPage The GMM page.
350 */
351#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
352
353/** @def GMM_PAGE_IS_FREE
354 *
355 * @returns true if free, false if not.
356 * @param pPage The GMM page.
357 */
358#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
359
360/** @def GMM_PAGE_PFN_LAST
361 * The last valid guest pfn range.
362 * @remark Some of the values outside the range has special meaning,
363 * see GMM_PAGE_PFN_UNSHAREABLE.
364 */
365#if HC_ARCH_BITS == 64
366# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
367#else
368# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
369#endif
370AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
371
372/** @def GMM_PAGE_PFN_UNSHAREABLE
373 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
374 */
375#if HC_ARCH_BITS == 64
376# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
377#else
378# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
379#endif
380AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
381
382
383/**
384 * A GMM allocation chunk ring-3 mapping record.
385 *
386 * This should really be associated with a session and not a VM, but
387 * it's simpler to associated with a VM and cleanup with the VM object
388 * is destroyed.
389 */
390typedef struct GMMCHUNKMAP
391{
392 /** The mapping object. */
393 RTR0MEMOBJ hMapObj;
394 /** The VM owning the mapping. */
395 PGVM pGVM;
396} GMMCHUNKMAP;
397/** Pointer to a GMM allocation chunk mapping. */
398typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
399
400
401/**
402 * A GMM allocation chunk.
403 */
404typedef struct GMMCHUNK
405{
406 /** The AVL node core.
407 * The Key is the chunk ID. (Giant mtx.) */
408 AVLU32NODECORE Core;
409 /** The memory object.
410 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
411 * what the host can dish up with. (Chunk mtx protects mapping accesses
412 * and related frees.) */
413 RTR0MEMOBJ hMemObj;
414#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
415 /** Pointer to the kernel mapping. */
416 uint8_t *pbMapping;
417#endif
418 /** Pointer to the next chunk in the free list. (Giant mtx.) */
419 PGMMCHUNK pFreeNext;
420 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
421 PGMMCHUNK pFreePrev;
422 /** Pointer to the free set this chunk belongs to. NULL for
423 * chunks with no free pages. (Giant mtx.) */
424 PGMMCHUNKFREESET pSet;
425 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
426 RTLISTNODE ListNode;
427 /** Pointer to an array of mappings. (Chunk mtx.) */
428 PGMMCHUNKMAP paMappingsX;
429 /** The number of mappings. (Chunk mtx.) */
430 uint16_t cMappingsX;
431 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
432 * mapping or freeing anything. (Giant mtx.) */
433 uint8_t volatile iChunkMtx;
434 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
435 uint8_t fFlags;
436 /** The head of the list of free pages. UINT16_MAX is the NIL value.
437 * (Giant mtx.) */
438 uint16_t iFreeHead;
439 /** The number of free pages. (Giant mtx.) */
440 uint16_t cFree;
441 /** The GVM handle of the VM that first allocated pages from this chunk, this
442 * is used as a preference when there are several chunks to choose from.
443 * When in bound memory mode this isn't a preference any longer. (Giant
444 * mtx.) */
445 uint16_t hGVM;
446 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
447 * future use.) (Giant mtx.) */
448 uint16_t idNumaNode;
449 /** The number of private pages. (Giant mtx.) */
450 uint16_t cPrivate;
451 /** The number of shared pages. (Giant mtx.) */
452 uint16_t cShared;
453 /** The pages. (Giant mtx.) */
454 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
455} GMMCHUNK;
456
457/** Indicates that the NUMA properies of the memory is unknown. */
458#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
459
460/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
461 * @{ */
462/** Indicates that the chunk is a large page (2MB). */
463#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
464#ifdef GMM_WITH_LEGACY_MODE
465/** Indicates that the chunk was locked rather than allocated directly. */
466# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
467#endif
468/** @} */
469
470
471/**
472 * An allocation chunk TLB entry.
473 */
474typedef struct GMMCHUNKTLBE
475{
476 /** The chunk id. */
477 uint32_t idChunk;
478 /** Pointer to the chunk. */
479 PGMMCHUNK pChunk;
480} GMMCHUNKTLBE;
481/** Pointer to an allocation chunk TLB entry. */
482typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
483
484
485/** The number of entries in the allocation chunk TLB. */
486#define GMM_CHUNKTLB_ENTRIES 32
487/** Gets the TLB entry index for the given Chunk ID. */
488#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
489
490/**
491 * An allocation chunk TLB.
492 */
493typedef struct GMMCHUNKTLB
494{
495 /** The TLB entries. */
496 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
497} GMMCHUNKTLB;
498/** Pointer to an allocation chunk TLB. */
499typedef GMMCHUNKTLB *PGMMCHUNKTLB;
500
501
502/**
503 * The GMM instance data.
504 */
505typedef struct GMM
506{
507 /** Magic / eye catcher. GMM_MAGIC */
508 uint32_t u32Magic;
509 /** The number of threads waiting on the mutex. */
510 uint32_t cMtxContenders;
511#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
512 /** The critical section protecting the GMM.
513 * More fine grained locking can be implemented later if necessary. */
514 RTCRITSECT GiantCritSect;
515#else
516 /** The fast mutex protecting the GMM.
517 * More fine grained locking can be implemented later if necessary. */
518 RTSEMFASTMUTEX hMtx;
519#endif
520#ifdef VBOX_STRICT
521 /** The current mutex owner. */
522 RTNATIVETHREAD hMtxOwner;
523#endif
524 /** Spinlock protecting the AVL tree.
525 * @todo Make this a read-write spinlock as we should allow concurrent
526 * lookups. */
527 RTSPINLOCK hSpinLockTree;
528 /** The chunk tree.
529 * Protected by hSpinLockTree. */
530 PAVLU32NODECORE pChunks;
531 /** Chunk freeing generation - incremented whenever a chunk is freed. Used
532 * for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
533 * (exclusive), though higher numbers may temporarily occure while
534 * invalidating the individual TLBs during wrap-around processing. */
535 uint64_t volatile idFreeGeneration;
536 /** The chunk TLB.
537 * Protected by hSpinLockTree. */
538 GMMCHUNKTLB ChunkTLB;
539 /** The private free set. */
540 GMMCHUNKFREESET PrivateX;
541 /** The shared free set. */
542 GMMCHUNKFREESET Shared;
543
544 /** Shared module tree (global).
545 * @todo separate trees for distinctly different guest OSes. */
546 PAVLLU32NODECORE pGlobalSharedModuleTree;
547 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
548 uint32_t cShareableModules;
549
550 /** The chunk list. For simplifying the cleanup process and avoid tree
551 * traversal. */
552 RTLISTANCHOR ChunkList;
553
554 /** The maximum number of pages we're allowed to allocate.
555 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
556 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
557 uint64_t cMaxPages;
558 /** The number of pages that has been reserved.
559 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
560 uint64_t cReservedPages;
561 /** The number of pages that we have over-committed in reservations. */
562 uint64_t cOverCommittedPages;
563 /** The number of actually allocated (committed if you like) pages. */
564 uint64_t cAllocatedPages;
565 /** The number of pages that are shared. A subset of cAllocatedPages. */
566 uint64_t cSharedPages;
567 /** The number of pages that are actually shared between VMs. */
568 uint64_t cDuplicatePages;
569 /** The number of pages that are shared that has been left behind by
570 * VMs not doing proper cleanups. */
571 uint64_t cLeftBehindSharedPages;
572 /** The number of allocation chunks.
573 * (The number of pages we've allocated from the host can be derived from this.) */
574 uint32_t cChunks;
575 /** The number of current ballooned pages. */
576 uint64_t cBalloonedPages;
577
578#ifndef GMM_WITH_LEGACY_MODE
579# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
580 /** Whether #RTR0MemObjAllocPhysNC works. */
581 bool fHasWorkingAllocPhysNC;
582# else
583 bool fPadding;
584# endif
585#else
586 /** The legacy allocation mode indicator.
587 * This is determined at initialization time. */
588 bool fLegacyAllocationMode;
589#endif
590 /** The bound memory mode indicator.
591 * When set, the memory will be bound to a specific VM and never
592 * shared. This is always set if fLegacyAllocationMode is set.
593 * (Also determined at initialization time.) */
594 bool fBoundMemoryMode;
595 /** The number of registered VMs. */
596 uint16_t cRegisteredVMs;
597
598 /** The number of freed chunks ever. This is used a list generation to
599 * avoid restarting the cleanup scanning when the list wasn't modified. */
600 uint32_t cFreedChunks;
601 /** The previous allocated Chunk ID.
602 * Used as a hint to avoid scanning the whole bitmap. */
603 uint32_t idChunkPrev;
604 /** Chunk ID allocation bitmap.
605 * Bits of allocated IDs are set, free ones are clear.
606 * The NIL id (0) is marked allocated. */
607 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
608
609 /** The index of the next mutex to use. */
610 uint32_t iNextChunkMtx;
611 /** Chunk locks for reducing lock contention without having to allocate
612 * one lock per chunk. */
613 struct
614 {
615 /** The mutex */
616 RTSEMFASTMUTEX hMtx;
617 /** The number of threads currently using this mutex. */
618 uint32_t volatile cUsers;
619 } aChunkMtx[64];
620} GMM;
621/** Pointer to the GMM instance. */
622typedef GMM *PGMM;
623
624/** The value of GMM::u32Magic (Katsuhiro Otomo). */
625#define GMM_MAGIC UINT32_C(0x19540414)
626
627
628/**
629 * GMM chunk mutex state.
630 *
631 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
632 * gmmR0ChunkMutex* methods.
633 */
634typedef struct GMMR0CHUNKMTXSTATE
635{
636 PGMM pGMM;
637 /** The index of the chunk mutex. */
638 uint8_t iChunkMtx;
639 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
640 uint8_t fFlags;
641} GMMR0CHUNKMTXSTATE;
642/** Pointer to a chunk mutex state. */
643typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
644
645/** @name GMMR0CHUNK_MTX_XXX
646 * @{ */
647#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
648#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
649#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
650#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
651#define GMMR0CHUNK_MTX_END UINT32_C(4)
652/** @} */
653
654
655/** The maximum number of shared modules per-vm. */
656#define GMM_MAX_SHARED_PER_VM_MODULES 2048
657/** The maximum number of shared modules GMM is allowed to track. */
658#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
659
660
661/**
662 * Argument packet for gmmR0SharedModuleCleanup.
663 */
664typedef struct GMMR0SHMODPERVMDTORARGS
665{
666 PGVM pGVM;
667 PGMM pGMM;
668} GMMR0SHMODPERVMDTORARGS;
669
670/**
671 * Argument packet for gmmR0CheckSharedModule.
672 */
673typedef struct GMMCHECKSHAREDMODULEINFO
674{
675 PGVM pGVM;
676 VMCPUID idCpu;
677} GMMCHECKSHAREDMODULEINFO;
678
679
680/*********************************************************************************************************************************
681* Global Variables *
682*********************************************************************************************************************************/
683/** Pointer to the GMM instance data. */
684static PGMM g_pGMM = NULL;
685
686/** Macro for obtaining and validating the g_pGMM pointer.
687 *
688 * On failure it will return from the invoking function with the specified
689 * return value.
690 *
691 * @param pGMM The name of the pGMM variable.
692 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
693 * status codes.
694 */
695#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
696 do { \
697 (pGMM) = g_pGMM; \
698 AssertPtrReturn((pGMM), (rc)); \
699 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
700 } while (0)
701
702/** Macro for obtaining and validating the g_pGMM pointer, void function
703 * variant.
704 *
705 * On failure it will return from the invoking function.
706 *
707 * @param pGMM The name of the pGMM variable.
708 */
709#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
710 do { \
711 (pGMM) = g_pGMM; \
712 AssertPtrReturnVoid((pGMM)); \
713 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
714 } while (0)
715
716
717/** @def GMM_CHECK_SANITY_UPON_ENTERING
718 * Checks the sanity of the GMM instance data before making changes.
719 *
720 * This is macro is a stub by default and must be enabled manually in the code.
721 *
722 * @returns true if sane, false if not.
723 * @param pGMM The name of the pGMM variable.
724 */
725#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
726# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
727#else
728# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
729#endif
730
731/** @def GMM_CHECK_SANITY_UPON_LEAVING
732 * Checks the sanity of the GMM instance data after making changes.
733 *
734 * This is macro is a stub by default and must be enabled manually in the code.
735 *
736 * @returns true if sane, false if not.
737 * @param pGMM The name of the pGMM variable.
738 */
739#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
740# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
741#else
742# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
743#endif
744
745/** @def GMM_CHECK_SANITY_IN_LOOPS
746 * Checks the sanity of the GMM instance in the allocation loops.
747 *
748 * This is macro is a stub by default and must be enabled manually in the code.
749 *
750 * @returns true if sane, false if not.
751 * @param pGMM The name of the pGMM variable.
752 */
753#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
754# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
755#else
756# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
757#endif
758
759
760/*********************************************************************************************************************************
761* Internal Functions *
762*********************************************************************************************************************************/
763static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
764static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
765DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
766DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
767DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
768#ifdef GMMR0_WITH_SANITY_CHECK
769static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
770#endif
771static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
772DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
773DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
774static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
775#ifdef VBOX_WITH_PAGE_SHARING
776static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
777# ifdef VBOX_STRICT
778static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
779# endif
780#endif
781
782
783
784/**
785 * Initializes the GMM component.
786 *
787 * This is called when the VMMR0.r0 module is loaded and protected by the
788 * loader semaphore.
789 *
790 * @returns VBox status code.
791 */
792GMMR0DECL(int) GMMR0Init(void)
793{
794 LogFlow(("GMMInit:\n"));
795
796 /*
797 * Allocate the instance data and the locks.
798 */
799 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
800 if (!pGMM)
801 return VERR_NO_MEMORY;
802
803 pGMM->u32Magic = GMM_MAGIC;
804 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
805 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
806 RTListInit(&pGMM->ChunkList);
807 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
808
809#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
810 int rc = RTCritSectInit(&pGMM->GiantCritSect);
811#else
812 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
813#endif
814 if (RT_SUCCESS(rc))
815 {
816 unsigned iMtx;
817 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
818 {
819 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
820 if (RT_FAILURE(rc))
821 break;
822 }
823 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
824 if (RT_SUCCESS(rc))
825 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
826 if (RT_SUCCESS(rc))
827 {
828#ifndef GMM_WITH_LEGACY_MODE
829 /*
830 * Figure out how we're going to allocate stuff (only applicable to
831 * host with linear physical memory mappings).
832 */
833 pGMM->fBoundMemoryMode = false;
834# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
835 pGMM->fHasWorkingAllocPhysNC = false;
836
837 RTR0MEMOBJ hMemObj;
838 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
839 if (RT_SUCCESS(rc))
840 {
841 rc = RTR0MemObjFree(hMemObj, true);
842 AssertRC(rc);
843 pGMM->fHasWorkingAllocPhysNC = true;
844 }
845 else if (rc != VERR_NOT_SUPPORTED)
846 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
847# endif
848#else /* GMM_WITH_LEGACY_MODE */
849 /*
850 * Check and see if RTR0MemObjAllocPhysNC works.
851 */
852# if 0 /* later, see @bufref{3170}. */
853 RTR0MEMOBJ MemObj;
854 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
855 if (RT_SUCCESS(rc))
856 {
857 rc = RTR0MemObjFree(MemObj, true);
858 AssertRC(rc);
859 }
860 else if (rc == VERR_NOT_SUPPORTED)
861 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
862 else
863 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
864# else
865# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
866 pGMM->fLegacyAllocationMode = false;
867# if ARCH_BITS == 32
868 /* Don't reuse possibly partial chunks because of the virtual
869 address space limitation. */
870 pGMM->fBoundMemoryMode = true;
871# else
872 pGMM->fBoundMemoryMode = false;
873# endif
874# else
875 pGMM->fLegacyAllocationMode = true;
876 pGMM->fBoundMemoryMode = true;
877# endif
878# endif
879#endif /* GMM_WITH_LEGACY_MODE */
880
881 /*
882 * Query system page count and guess a reasonable cMaxPages value.
883 */
884 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
885
886 /*
887 * The idFreeGeneration value should be set so we actually trigger the
888 * wrap-around invalidation handling during a typical test run.
889 */
890 pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
891
892 g_pGMM = pGMM;
893#ifdef GMM_WITH_LEGACY_MODE
894 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
895#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
896 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
897#else
898 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
899#endif
900 return VINF_SUCCESS;
901 }
902
903 /*
904 * Bail out.
905 */
906 RTSpinlockDestroy(pGMM->hSpinLockTree);
907 while (iMtx-- > 0)
908 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
909#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
910 RTCritSectDelete(&pGMM->GiantCritSect);
911#else
912 RTSemFastMutexDestroy(pGMM->hMtx);
913#endif
914 }
915
916 pGMM->u32Magic = 0;
917 RTMemFree(pGMM);
918 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
919 return rc;
920}
921
922
923/**
924 * Terminates the GMM component.
925 */
926GMMR0DECL(void) GMMR0Term(void)
927{
928 LogFlow(("GMMTerm:\n"));
929
930 /*
931 * Take care / be paranoid...
932 */
933 PGMM pGMM = g_pGMM;
934 if (!RT_VALID_PTR(pGMM))
935 return;
936 if (pGMM->u32Magic != GMM_MAGIC)
937 {
938 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
939 return;
940 }
941
942 /*
943 * Undo what init did and free all the resources we've acquired.
944 */
945 /* Destroy the fundamentals. */
946 g_pGMM = NULL;
947 pGMM->u32Magic = ~GMM_MAGIC;
948#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
949 RTCritSectDelete(&pGMM->GiantCritSect);
950#else
951 RTSemFastMutexDestroy(pGMM->hMtx);
952 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
953#endif
954 RTSpinlockDestroy(pGMM->hSpinLockTree);
955 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
956
957 /* Free any chunks still hanging around. */
958 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
959
960 /* Destroy the chunk locks. */
961 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
962 {
963 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
964 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
965 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
966 }
967
968 /* Finally the instance data itself. */
969 RTMemFree(pGMM);
970 LogFlow(("GMMTerm: done\n"));
971}
972
973
974/**
975 * RTAvlU32Destroy callback.
976 *
977 * @returns 0
978 * @param pNode The node to destroy.
979 * @param pvGMM The GMM handle.
980 */
981static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
982{
983 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
984
985 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
986 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
987 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
988
989 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
990 if (RT_FAILURE(rc))
991 {
992 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
993 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
994 AssertRC(rc);
995 }
996 pChunk->hMemObj = NIL_RTR0MEMOBJ;
997
998 RTMemFree(pChunk->paMappingsX);
999 pChunk->paMappingsX = NULL;
1000
1001 RTMemFree(pChunk);
1002 NOREF(pvGMM);
1003 return 0;
1004}
1005
1006
1007/**
1008 * Initializes the per-VM data for the GMM.
1009 *
1010 * This is called from within the GVMM lock (from GVMMR0CreateVM)
1011 * and should only initialize the data members so GMMR0CleanupVM
1012 * can deal with them. We reserve no memory or anything here,
1013 * that's done later in GMMR0InitVM.
1014 *
1015 * @param pGVM Pointer to the Global VM structure.
1016 */
1017GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
1018{
1019 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1020
1021 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1022 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1023 pGVM->gmm.s.Stats.fMayAllocate = false;
1024
1025 pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
1026 int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
1027 AssertRCReturn(rc, rc);
1028
1029 return VINF_SUCCESS;
1030}
1031
1032
1033/**
1034 * Acquires the GMM giant lock.
1035 *
1036 * @returns Assert status code from RTSemFastMutexRequest.
1037 * @param pGMM Pointer to the GMM instance.
1038 */
1039static int gmmR0MutexAcquire(PGMM pGMM)
1040{
1041 ASMAtomicIncU32(&pGMM->cMtxContenders);
1042#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1043 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1044#else
1045 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1046#endif
1047 ASMAtomicDecU32(&pGMM->cMtxContenders);
1048 AssertRC(rc);
1049#ifdef VBOX_STRICT
1050 pGMM->hMtxOwner = RTThreadNativeSelf();
1051#endif
1052 return rc;
1053}
1054
1055
1056/**
1057 * Releases the GMM giant lock.
1058 *
1059 * @returns Assert status code from RTSemFastMutexRequest.
1060 * @param pGMM Pointer to the GMM instance.
1061 */
1062static int gmmR0MutexRelease(PGMM pGMM)
1063{
1064#ifdef VBOX_STRICT
1065 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1066#endif
1067#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1068 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1069#else
1070 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1071 AssertRC(rc);
1072#endif
1073 return rc;
1074}
1075
1076
1077/**
1078 * Yields the GMM giant lock if there is contention and a certain minimum time
1079 * has elapsed since we took it.
1080 *
1081 * @returns @c true if the mutex was yielded, @c false if not.
1082 * @param pGMM Pointer to the GMM instance.
1083 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1084 * (in/out).
1085 */
1086static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1087{
1088 /*
1089 * If nobody is contending the mutex, don't bother checking the time.
1090 */
1091 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1092 return false;
1093
1094 /*
1095 * Don't yield if we haven't executed for at least 2 milliseconds.
1096 */
1097 uint64_t uNanoNow = RTTimeSystemNanoTS();
1098 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1099 return false;
1100
1101 /*
1102 * Yield the mutex.
1103 */
1104#ifdef VBOX_STRICT
1105 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1106#endif
1107 ASMAtomicIncU32(&pGMM->cMtxContenders);
1108#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1109 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1110#else
1111 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1112#endif
1113
1114 RTThreadYield();
1115
1116#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1117 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1118#else
1119 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1120#endif
1121 *puLockNanoTS = RTTimeSystemNanoTS();
1122 ASMAtomicDecU32(&pGMM->cMtxContenders);
1123#ifdef VBOX_STRICT
1124 pGMM->hMtxOwner = RTThreadNativeSelf();
1125#endif
1126
1127 return true;
1128}
1129
1130
1131/**
1132 * Acquires a chunk lock.
1133 *
1134 * The caller must own the giant lock.
1135 *
1136 * @returns Assert status code from RTSemFastMutexRequest.
1137 * @param pMtxState The chunk mutex state info. (Avoids
1138 * passing the same flags and stuff around
1139 * for subsequent release and drop-giant
1140 * calls.)
1141 * @param pGMM Pointer to the GMM instance.
1142 * @param pChunk Pointer to the chunk.
1143 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1144 */
1145static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1146{
1147 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1148 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1149
1150 pMtxState->pGMM = pGMM;
1151 pMtxState->fFlags = (uint8_t)fFlags;
1152
1153 /*
1154 * Get the lock index and reference the lock.
1155 */
1156 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1157 uint32_t iChunkMtx = pChunk->iChunkMtx;
1158 if (iChunkMtx == UINT8_MAX)
1159 {
1160 iChunkMtx = pGMM->iNextChunkMtx++;
1161 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1162
1163 /* Try get an unused one... */
1164 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1165 {
1166 iChunkMtx = pGMM->iNextChunkMtx++;
1167 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1168 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1169 {
1170 iChunkMtx = pGMM->iNextChunkMtx++;
1171 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1172 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1173 {
1174 iChunkMtx = pGMM->iNextChunkMtx++;
1175 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1176 }
1177 }
1178 }
1179
1180 pChunk->iChunkMtx = iChunkMtx;
1181 }
1182 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1183 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1184 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1185
1186 /*
1187 * Drop the giant?
1188 */
1189 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1190 {
1191 /** @todo GMM life cycle cleanup (we may race someone
1192 * destroying and cleaning up GMM)? */
1193 gmmR0MutexRelease(pGMM);
1194 }
1195
1196 /*
1197 * Take the chunk mutex.
1198 */
1199 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1200 AssertRC(rc);
1201 return rc;
1202}
1203
1204
1205/**
1206 * Releases the GMM giant lock.
1207 *
1208 * @returns Assert status code from RTSemFastMutexRequest.
1209 * @param pMtxState Pointer to the chunk mutex state.
1210 * @param pChunk Pointer to the chunk if it's still
1211 * alive, NULL if it isn't. This is used to deassociate
1212 * the chunk from the mutex on the way out so a new one
1213 * can be selected next time, thus avoiding contented
1214 * mutexes.
1215 */
1216static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1217{
1218 PGMM pGMM = pMtxState->pGMM;
1219
1220 /*
1221 * Release the chunk mutex and reacquire the giant if requested.
1222 */
1223 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1224 AssertRC(rc);
1225 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1226 rc = gmmR0MutexAcquire(pGMM);
1227 else
1228 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1229
1230 /*
1231 * Drop the chunk mutex user reference and deassociate it from the chunk
1232 * when possible.
1233 */
1234 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1235 && pChunk
1236 && RT_SUCCESS(rc) )
1237 {
1238 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1239 pChunk->iChunkMtx = UINT8_MAX;
1240 else
1241 {
1242 rc = gmmR0MutexAcquire(pGMM);
1243 if (RT_SUCCESS(rc))
1244 {
1245 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1246 pChunk->iChunkMtx = UINT8_MAX;
1247 rc = gmmR0MutexRelease(pGMM);
1248 }
1249 }
1250 }
1251
1252 pMtxState->pGMM = NULL;
1253 return rc;
1254}
1255
1256
1257/**
1258 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1259 * chunk locked.
1260 *
1261 * This only works if gmmR0ChunkMutexAcquire was called with
1262 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1263 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1264 *
1265 * @returns VBox status code (assuming success is ok).
1266 * @param pMtxState Pointer to the chunk mutex state.
1267 */
1268static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1269{
1270 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1271 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1272 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1273 /** @todo GMM life cycle cleanup (we may race someone
1274 * destroying and cleaning up GMM)? */
1275 return gmmR0MutexRelease(pMtxState->pGMM);
1276}
1277
1278
1279/**
1280 * For experimenting with NUMA affinity and such.
1281 *
1282 * @returns The current NUMA Node ID.
1283 */
1284static uint16_t gmmR0GetCurrentNumaNodeId(void)
1285{
1286#if 1
1287 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1288#else
1289 return RTMpCpuId() / 16;
1290#endif
1291}
1292
1293
1294
1295/**
1296 * Cleans up when a VM is terminating.
1297 *
1298 * @param pGVM Pointer to the Global VM structure.
1299 */
1300GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1301{
1302 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1303
1304 PGMM pGMM;
1305 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1306
1307#ifdef VBOX_WITH_PAGE_SHARING
1308 /*
1309 * Clean up all registered shared modules first.
1310 */
1311 gmmR0SharedModuleCleanup(pGMM, pGVM);
1312#endif
1313
1314 gmmR0MutexAcquire(pGMM);
1315 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1316 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1317
1318 /*
1319 * The policy is 'INVALID' until the initial reservation
1320 * request has been serviced.
1321 */
1322 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1323 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1324 {
1325 /*
1326 * If it's the last VM around, we can skip walking all the chunk looking
1327 * for the pages owned by this VM and instead flush the whole shebang.
1328 *
1329 * This takes care of the eventuality that a VM has left shared page
1330 * references behind (shouldn't happen of course, but you never know).
1331 */
1332 Assert(pGMM->cRegisteredVMs);
1333 pGMM->cRegisteredVMs--;
1334
1335 /*
1336 * Walk the entire pool looking for pages that belong to this VM
1337 * and leftover mappings. (This'll only catch private pages,
1338 * shared pages will be 'left behind'.)
1339 */
1340 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1341 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1342
1343 unsigned iCountDown = 64;
1344 bool fRedoFromStart;
1345 PGMMCHUNK pChunk;
1346 do
1347 {
1348 fRedoFromStart = false;
1349 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1350 {
1351 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1352 if ( ( !pGMM->fBoundMemoryMode
1353 || pChunk->hGVM == pGVM->hSelf)
1354 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1355 {
1356 /* We left the giant mutex, so reset the yield counters. */
1357 uLockNanoTS = RTTimeSystemNanoTS();
1358 iCountDown = 64;
1359 }
1360 else
1361 {
1362 /* Didn't leave it, so do normal yielding. */
1363 if (!iCountDown)
1364 gmmR0MutexYield(pGMM, &uLockNanoTS);
1365 else
1366 iCountDown--;
1367 }
1368 if (pGMM->cFreedChunks != cFreeChunksOld)
1369 {
1370 fRedoFromStart = true;
1371 break;
1372 }
1373 }
1374 } while (fRedoFromStart);
1375
1376 if (pGVM->gmm.s.Stats.cPrivatePages)
1377 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1378
1379 pGMM->cAllocatedPages -= cPrivatePages;
1380
1381 /*
1382 * Free empty chunks.
1383 */
1384 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1385 do
1386 {
1387 fRedoFromStart = false;
1388 iCountDown = 10240;
1389 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1390 while (pChunk)
1391 {
1392 PGMMCHUNK pNext = pChunk->pFreeNext;
1393 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1394 if ( !pGMM->fBoundMemoryMode
1395 || pChunk->hGVM == pGVM->hSelf)
1396 {
1397 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1398 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1399 {
1400 /* We've left the giant mutex, restart? (+1 for our unlink) */
1401 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1402 if (fRedoFromStart)
1403 break;
1404 uLockNanoTS = RTTimeSystemNanoTS();
1405 iCountDown = 10240;
1406 }
1407 }
1408
1409 /* Advance and maybe yield the lock. */
1410 pChunk = pNext;
1411 if (--iCountDown == 0)
1412 {
1413 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1414 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1415 && pPrivateSet->idGeneration != idGenerationOld;
1416 if (fRedoFromStart)
1417 break;
1418 iCountDown = 10240;
1419 }
1420 }
1421 } while (fRedoFromStart);
1422
1423 /*
1424 * Account for shared pages that weren't freed.
1425 */
1426 if (pGVM->gmm.s.Stats.cSharedPages)
1427 {
1428 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1429 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1430 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1431 }
1432
1433 /*
1434 * Clean up balloon statistics in case the VM process crashed.
1435 */
1436 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1437 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1438
1439 /*
1440 * Update the over-commitment management statistics.
1441 */
1442 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1443 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1444 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1445 switch (pGVM->gmm.s.Stats.enmPolicy)
1446 {
1447 case GMMOCPOLICY_NO_OC:
1448 break;
1449 default:
1450 /** @todo Update GMM->cOverCommittedPages */
1451 break;
1452 }
1453 }
1454
1455 /* zap the GVM data. */
1456 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1457 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1458 pGVM->gmm.s.Stats.fMayAllocate = false;
1459
1460 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1461 gmmR0MutexRelease(pGMM);
1462
1463 /*
1464 * Destroy the spinlock.
1465 */
1466 RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1467 ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1468 RTSpinlockDestroy(hSpinlock);
1469
1470 LogFlow(("GMMR0CleanupVM: returns\n"));
1471}
1472
1473
1474/**
1475 * Scan one chunk for private pages belonging to the specified VM.
1476 *
1477 * @note This function may drop the giant mutex!
1478 *
1479 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1480 * we didn't.
1481 * @param pGMM Pointer to the GMM instance.
1482 * @param pGVM The global VM handle.
1483 * @param pChunk The chunk to scan.
1484 */
1485static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1486{
1487 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1488
1489 /*
1490 * Look for pages belonging to the VM.
1491 * (Perform some internal checks while we're scanning.)
1492 */
1493#ifndef VBOX_STRICT
1494 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1495#endif
1496 {
1497 unsigned cPrivate = 0;
1498 unsigned cShared = 0;
1499 unsigned cFree = 0;
1500
1501 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1502
1503 uint16_t hGVM = pGVM->hSelf;
1504 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1505 while (iPage-- > 0)
1506 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1507 {
1508 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1509 {
1510 /*
1511 * Free the page.
1512 *
1513 * The reason for not using gmmR0FreePrivatePage here is that we
1514 * must *not* cause the chunk to be freed from under us - we're in
1515 * an AVL tree walk here.
1516 */
1517 pChunk->aPages[iPage].u = 0;
1518 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1519 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1520 pChunk->iFreeHead = iPage;
1521 pChunk->cPrivate--;
1522 pChunk->cFree++;
1523 pGVM->gmm.s.Stats.cPrivatePages--;
1524 cFree++;
1525 }
1526 else
1527 cPrivate++;
1528 }
1529 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1530 cFree++;
1531 else
1532 cShared++;
1533
1534 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1535
1536 /*
1537 * Did it add up?
1538 */
1539 if (RT_UNLIKELY( pChunk->cFree != cFree
1540 || pChunk->cPrivate != cPrivate
1541 || pChunk->cShared != cShared))
1542 {
1543 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1544 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1545 pChunk->cFree = cFree;
1546 pChunk->cPrivate = cPrivate;
1547 pChunk->cShared = cShared;
1548 }
1549 }
1550
1551 /*
1552 * If not in bound memory mode, we should reset the hGVM field
1553 * if it has our handle in it.
1554 */
1555 if (pChunk->hGVM == pGVM->hSelf)
1556 {
1557 if (!g_pGMM->fBoundMemoryMode)
1558 pChunk->hGVM = NIL_GVM_HANDLE;
1559 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1560 {
1561 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1562 pChunk, pChunk->Core.Key, pChunk->cFree);
1563 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1564
1565 gmmR0UnlinkChunk(pChunk);
1566 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1567 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1568 }
1569 }
1570
1571 /*
1572 * Look for a mapping belonging to the terminating VM.
1573 */
1574 GMMR0CHUNKMTXSTATE MtxState;
1575 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1576 unsigned cMappings = pChunk->cMappingsX;
1577 for (unsigned i = 0; i < cMappings; i++)
1578 if (pChunk->paMappingsX[i].pGVM == pGVM)
1579 {
1580 gmmR0ChunkMutexDropGiant(&MtxState);
1581
1582 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1583
1584 cMappings--;
1585 if (i < cMappings)
1586 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1587 pChunk->paMappingsX[cMappings].pGVM = NULL;
1588 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1589 Assert(pChunk->cMappingsX - 1U == cMappings);
1590 pChunk->cMappingsX = cMappings;
1591
1592 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1593 if (RT_FAILURE(rc))
1594 {
1595 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1596 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1597 AssertRC(rc);
1598 }
1599
1600 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1601 return true;
1602 }
1603
1604 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1605 return false;
1606}
1607
1608
1609/**
1610 * The initial resource reservations.
1611 *
1612 * This will make memory reservations according to policy and priority. If there aren't
1613 * sufficient resources available to sustain the VM this function will fail and all
1614 * future allocations requests will fail as well.
1615 *
1616 * These are just the initial reservations made very very early during the VM creation
1617 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1618 * ring-3 init has completed.
1619 *
1620 * @returns VBox status code.
1621 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1622 * @retval VERR_GMM_
1623 *
1624 * @param pGVM The global (ring-0) VM structure.
1625 * @param idCpu The VCPU id - must be zero.
1626 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1627 * This does not include MMIO2 and similar.
1628 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1629 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1630 * hyper heap, MMIO2 and similar.
1631 * @param enmPolicy The OC policy to use on this VM.
1632 * @param enmPriority The priority in an out-of-memory situation.
1633 *
1634 * @thread The creator thread / EMT(0).
1635 */
1636GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1637 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1638{
1639 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1640 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1641
1642 /*
1643 * Validate, get basics and take the semaphore.
1644 */
1645 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1646 PGMM pGMM;
1647 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1648 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1649 if (RT_FAILURE(rc))
1650 return rc;
1651
1652 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1653 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1654 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1655 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1656 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1657
1658 gmmR0MutexAcquire(pGMM);
1659 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1660 {
1661 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1662 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1663 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1664 {
1665 /*
1666 * Check if we can accommodate this.
1667 */
1668 /* ... later ... */
1669 if (RT_SUCCESS(rc))
1670 {
1671 /*
1672 * Update the records.
1673 */
1674 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1675 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1676 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1677 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1678 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1679 pGVM->gmm.s.Stats.fMayAllocate = true;
1680
1681 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1682 pGMM->cRegisteredVMs++;
1683 }
1684 }
1685 else
1686 rc = VERR_WRONG_ORDER;
1687 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1688 }
1689 else
1690 rc = VERR_GMM_IS_NOT_SANE;
1691 gmmR0MutexRelease(pGMM);
1692 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1693 return rc;
1694}
1695
1696
1697/**
1698 * VMMR0 request wrapper for GMMR0InitialReservation.
1699 *
1700 * @returns see GMMR0InitialReservation.
1701 * @param pGVM The global (ring-0) VM structure.
1702 * @param idCpu The VCPU id.
1703 * @param pReq Pointer to the request packet.
1704 */
1705GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1706{
1707 /*
1708 * Validate input and pass it on.
1709 */
1710 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1711 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1712 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1713
1714 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1715 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1716}
1717
1718
1719/**
1720 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1721 *
1722 * @returns VBox status code.
1723 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1724 *
1725 * @param pGVM The global (ring-0) VM structure.
1726 * @param idCpu The VCPU id.
1727 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1728 * This does not include MMIO2 and similar.
1729 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1730 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1731 * hyper heap, MMIO2 and similar.
1732 *
1733 * @thread EMT(idCpu)
1734 */
1735GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1736 uint32_t cShadowPages, uint32_t cFixedPages)
1737{
1738 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1739 pGVM, cBasePages, cShadowPages, cFixedPages));
1740
1741 /*
1742 * Validate, get basics and take the semaphore.
1743 */
1744 PGMM pGMM;
1745 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1746 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1747 if (RT_FAILURE(rc))
1748 return rc;
1749
1750 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1751 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1752 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1753
1754 gmmR0MutexAcquire(pGMM);
1755 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1756 {
1757 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1758 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1759 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1760 {
1761 /*
1762 * Check if we can accommodate this.
1763 */
1764 /* ... later ... */
1765 if (RT_SUCCESS(rc))
1766 {
1767 /*
1768 * Update the records.
1769 */
1770 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1771 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1772 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1773 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1774
1775 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1776 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1777 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1778 }
1779 }
1780 else
1781 rc = VERR_WRONG_ORDER;
1782 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1783 }
1784 else
1785 rc = VERR_GMM_IS_NOT_SANE;
1786 gmmR0MutexRelease(pGMM);
1787 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1788 return rc;
1789}
1790
1791
1792/**
1793 * VMMR0 request wrapper for GMMR0UpdateReservation.
1794 *
1795 * @returns see GMMR0UpdateReservation.
1796 * @param pGVM The global (ring-0) VM structure.
1797 * @param idCpu The VCPU id.
1798 * @param pReq Pointer to the request packet.
1799 */
1800GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1801{
1802 /*
1803 * Validate input and pass it on.
1804 */
1805 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1806 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1807
1808 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1809}
1810
1811#ifdef GMMR0_WITH_SANITY_CHECK
1812
1813/**
1814 * Performs sanity checks on a free set.
1815 *
1816 * @returns Error count.
1817 *
1818 * @param pGMM Pointer to the GMM instance.
1819 * @param pSet Pointer to the set.
1820 * @param pszSetName The set name.
1821 * @param pszFunction The function from which it was called.
1822 * @param uLine The line number.
1823 */
1824static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1825 const char *pszFunction, unsigned uLineNo)
1826{
1827 uint32_t cErrors = 0;
1828
1829 /*
1830 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1831 */
1832 uint32_t cPages = 0;
1833 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1834 {
1835 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1836 {
1837 /** @todo check that the chunk is hash into the right set. */
1838 cPages += pCur->cFree;
1839 }
1840 }
1841 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1842 {
1843 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1844 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1845 cErrors++;
1846 }
1847
1848 return cErrors;
1849}
1850
1851
1852/**
1853 * Performs some sanity checks on the GMM while owning lock.
1854 *
1855 * @returns Error count.
1856 *
1857 * @param pGMM Pointer to the GMM instance.
1858 * @param pszFunction The function from which it is called.
1859 * @param uLineNo The line number.
1860 */
1861static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1862{
1863 uint32_t cErrors = 0;
1864
1865 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1866 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1867 /** @todo add more sanity checks. */
1868
1869 return cErrors;
1870}
1871
1872#endif /* GMMR0_WITH_SANITY_CHECK */
1873
1874/**
1875 * Looks up a chunk in the tree and fill in the TLB entry for it.
1876 *
1877 * This is not expected to fail and will bitch if it does.
1878 *
1879 * @returns Pointer to the allocation chunk, NULL if not found.
1880 * @param pGMM Pointer to the GMM instance.
1881 * @param idChunk The ID of the chunk to find.
1882 * @param pTlbe Pointer to the TLB entry.
1883 *
1884 * @note Caller owns spinlock.
1885 */
1886static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1887{
1888 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1889 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1890 pTlbe->idChunk = idChunk;
1891 pTlbe->pChunk = pChunk;
1892 return pChunk;
1893}
1894
1895
1896/**
1897 * Finds a allocation chunk, spin-locked.
1898 *
1899 * This is not expected to fail and will bitch if it does.
1900 *
1901 * @returns Pointer to the allocation chunk, NULL if not found.
1902 * @param pGMM Pointer to the GMM instance.
1903 * @param idChunk The ID of the chunk to find.
1904 */
1905DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1906{
1907 /*
1908 * Do a TLB lookup, branch if not in the TLB.
1909 */
1910 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1911 PGMMCHUNK pChunk = pTlbe->pChunk;
1912 if ( pChunk == NULL
1913 || pTlbe->idChunk != idChunk)
1914 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1915 return pChunk;
1916}
1917
1918
1919/**
1920 * Finds a allocation chunk.
1921 *
1922 * This is not expected to fail and will bitch if it does.
1923 *
1924 * @returns Pointer to the allocation chunk, NULL if not found.
1925 * @param pGMM Pointer to the GMM instance.
1926 * @param idChunk The ID of the chunk to find.
1927 */
1928DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1929{
1930 RTSpinlockAcquire(pGMM->hSpinLockTree);
1931 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1932 RTSpinlockRelease(pGMM->hSpinLockTree);
1933 return pChunk;
1934}
1935
1936
1937/**
1938 * Finds a page.
1939 *
1940 * This is not expected to fail and will bitch if it does.
1941 *
1942 * @returns Pointer to the page, NULL if not found.
1943 * @param pGMM Pointer to the GMM instance.
1944 * @param idPage The ID of the page to find.
1945 */
1946DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1947{
1948 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1949 if (RT_LIKELY(pChunk))
1950 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1951 return NULL;
1952}
1953
1954
1955#if 0 /* unused */
1956/**
1957 * Gets the host physical address for a page given by it's ID.
1958 *
1959 * @returns The host physical address or NIL_RTHCPHYS.
1960 * @param pGMM Pointer to the GMM instance.
1961 * @param idPage The ID of the page to find.
1962 */
1963DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1964{
1965 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1966 if (RT_LIKELY(pChunk))
1967 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1968 return NIL_RTHCPHYS;
1969}
1970#endif /* unused */
1971
1972
1973/**
1974 * Selects the appropriate free list given the number of free pages.
1975 *
1976 * @returns Free list index.
1977 * @param cFree The number of free pages in the chunk.
1978 */
1979DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1980{
1981 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1982 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1983 ("%d (%u)\n", iList, cFree));
1984 return iList;
1985}
1986
1987
1988/**
1989 * Unlinks the chunk from the free list it's currently on (if any).
1990 *
1991 * @param pChunk The allocation chunk.
1992 */
1993DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1994{
1995 PGMMCHUNKFREESET pSet = pChunk->pSet;
1996 if (RT_LIKELY(pSet))
1997 {
1998 pSet->cFreePages -= pChunk->cFree;
1999 pSet->idGeneration++;
2000
2001 PGMMCHUNK pPrev = pChunk->pFreePrev;
2002 PGMMCHUNK pNext = pChunk->pFreeNext;
2003 if (pPrev)
2004 pPrev->pFreeNext = pNext;
2005 else
2006 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
2007 if (pNext)
2008 pNext->pFreePrev = pPrev;
2009
2010 pChunk->pSet = NULL;
2011 pChunk->pFreeNext = NULL;
2012 pChunk->pFreePrev = NULL;
2013 }
2014 else
2015 {
2016 Assert(!pChunk->pFreeNext);
2017 Assert(!pChunk->pFreePrev);
2018 Assert(!pChunk->cFree);
2019 }
2020}
2021
2022
2023/**
2024 * Links the chunk onto the appropriate free list in the specified free set.
2025 *
2026 * If no free entries, it's not linked into any list.
2027 *
2028 * @param pChunk The allocation chunk.
2029 * @param pSet The free set.
2030 */
2031DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
2032{
2033 Assert(!pChunk->pSet);
2034 Assert(!pChunk->pFreeNext);
2035 Assert(!pChunk->pFreePrev);
2036
2037 if (pChunk->cFree > 0)
2038 {
2039 pChunk->pSet = pSet;
2040 pChunk->pFreePrev = NULL;
2041 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
2042 pChunk->pFreeNext = pSet->apLists[iList];
2043 if (pChunk->pFreeNext)
2044 pChunk->pFreeNext->pFreePrev = pChunk;
2045 pSet->apLists[iList] = pChunk;
2046
2047 pSet->cFreePages += pChunk->cFree;
2048 pSet->idGeneration++;
2049 }
2050}
2051
2052
2053/**
2054 * Links the chunk onto the appropriate free list in the specified free set.
2055 *
2056 * If no free entries, it's not linked into any list.
2057 *
2058 * @param pGMM Pointer to the GMM instance.
2059 * @param pGVM Pointer to the kernel-only VM instace data.
2060 * @param pChunk The allocation chunk.
2061 */
2062DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2063{
2064 PGMMCHUNKFREESET pSet;
2065 if (pGMM->fBoundMemoryMode)
2066 pSet = &pGVM->gmm.s.Private;
2067 else if (pChunk->cShared)
2068 pSet = &pGMM->Shared;
2069 else
2070 pSet = &pGMM->PrivateX;
2071 gmmR0LinkChunk(pChunk, pSet);
2072}
2073
2074
2075/**
2076 * Frees a Chunk ID.
2077 *
2078 * @param pGMM Pointer to the GMM instance.
2079 * @param idChunk The Chunk ID to free.
2080 */
2081static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2082{
2083 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2084 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2085 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2086}
2087
2088
2089/**
2090 * Allocates a new Chunk ID.
2091 *
2092 * @returns The Chunk ID.
2093 * @param pGMM Pointer to the GMM instance.
2094 */
2095static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2096{
2097 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2098 AssertCompile(NIL_GMM_CHUNKID == 0);
2099
2100 /*
2101 * Try the next sequential one.
2102 */
2103 int32_t idChunk = ++pGMM->idChunkPrev;
2104#if 0 /** @todo enable this code */
2105 if ( idChunk <= GMM_CHUNKID_LAST
2106 && idChunk > NIL_GMM_CHUNKID
2107 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2108 return idChunk;
2109#endif
2110
2111 /*
2112 * Scan sequentially from the last one.
2113 */
2114 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2115 && idChunk > NIL_GMM_CHUNKID)
2116 {
2117 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2118 if (idChunk > NIL_GMM_CHUNKID)
2119 {
2120 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2121 return pGMM->idChunkPrev = idChunk;
2122 }
2123 }
2124
2125 /*
2126 * Ok, scan from the start.
2127 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2128 */
2129 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2130 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2131 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2132
2133 return pGMM->idChunkPrev = idChunk;
2134}
2135
2136
2137/**
2138 * Allocates one private page.
2139 *
2140 * Worker for gmmR0AllocatePages.
2141 *
2142 * @param pChunk The chunk to allocate it from.
2143 * @param hGVM The GVM handle of the VM requesting memory.
2144 * @param pPageDesc The page descriptor.
2145 */
2146static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2147{
2148 /* update the chunk stats. */
2149 if (pChunk->hGVM == NIL_GVM_HANDLE)
2150 pChunk->hGVM = hGVM;
2151 Assert(pChunk->cFree);
2152 pChunk->cFree--;
2153 pChunk->cPrivate++;
2154
2155 /* unlink the first free page. */
2156 const uint32_t iPage = pChunk->iFreeHead;
2157 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2158 PGMMPAGE pPage = &pChunk->aPages[iPage];
2159 Assert(GMM_PAGE_IS_FREE(pPage));
2160 pChunk->iFreeHead = pPage->Free.iNext;
2161 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2162 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2163 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2164
2165 /* make the page private. */
2166 pPage->u = 0;
2167 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2168 pPage->Private.hGVM = hGVM;
2169 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2170 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2171 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2172 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2173 else
2174 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2175
2176 /* update the page descriptor. */
2177 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2178 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2179 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2180 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2181}
2182
2183
2184/**
2185 * Picks the free pages from a chunk.
2186 *
2187 * @returns The new page descriptor table index.
2188 * @param pChunk The chunk.
2189 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2190 * affinity.
2191 * @param iPage The current page descriptor table index.
2192 * @param cPages The total number of pages to allocate.
2193 * @param paPages The page descriptor table (input + ouput).
2194 */
2195static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2196 PGMMPAGEDESC paPages)
2197{
2198 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2199 gmmR0UnlinkChunk(pChunk);
2200
2201 for (; pChunk->cFree && iPage < cPages; iPage++)
2202 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2203
2204 gmmR0LinkChunk(pChunk, pSet);
2205 return iPage;
2206}
2207
2208
2209/**
2210 * Registers a new chunk of memory.
2211 *
2212 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2213 *
2214 * @returns VBox status code. On success, the giant GMM lock will be held, the
2215 * caller must release it (ugly).
2216 * @param pGMM Pointer to the GMM instance.
2217 * @param pSet Pointer to the set.
2218 * @param hMemObj The memory object for the chunk.
2219 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2220 * affinity.
2221 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2222 * @param ppChunk Chunk address (out). Optional.
2223 *
2224 * @remarks The caller must not own the giant GMM mutex.
2225 * The giant GMM mutex will be acquired and returned acquired in
2226 * the success path. On failure, no locks will be held.
2227 */
2228static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2229 PGMMCHUNK *ppChunk)
2230{
2231 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2232 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2233#ifdef GMM_WITH_LEGACY_MODE
2234 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2235#else
2236 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2237#endif
2238
2239#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2240 /*
2241 * Get a ring-0 mapping of the object.
2242 */
2243# ifdef GMM_WITH_LEGACY_MODE
2244 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2245# else
2246 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2247# endif
2248 if (!pbMapping)
2249 {
2250 RTR0MEMOBJ hMapObj;
2251 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2252 if (RT_SUCCESS(rc))
2253 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2254 else
2255 return rc;
2256 AssertPtr(pbMapping);
2257 }
2258#endif
2259
2260 /*
2261 * Allocate a chunk.
2262 */
2263 int rc;
2264 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2265 if (pChunk)
2266 {
2267 /*
2268 * Initialize it.
2269 */
2270 pChunk->hMemObj = hMemObj;
2271#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2272 pChunk->pbMapping = pbMapping;
2273#endif
2274 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2275 pChunk->hGVM = hGVM;
2276 /*pChunk->iFreeHead = 0;*/
2277 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2278 pChunk->iChunkMtx = UINT8_MAX;
2279 pChunk->fFlags = fChunkFlags;
2280 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2281 {
2282 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2283 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2284 }
2285 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2286 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2287
2288 /*
2289 * Allocate a Chunk ID and insert it into the tree.
2290 * This has to be done behind the mutex of course.
2291 */
2292 rc = gmmR0MutexAcquire(pGMM);
2293 if (RT_SUCCESS(rc))
2294 {
2295 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2296 {
2297 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2298 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2299 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2300 {
2301 RTSpinlockAcquire(pGMM->hSpinLockTree);
2302 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2303 {
2304 pGMM->cChunks++;
2305 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2306 RTSpinlockRelease(pGMM->hSpinLockTree);
2307
2308 gmmR0LinkChunk(pChunk, pSet);
2309
2310 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2311
2312 if (ppChunk)
2313 *ppChunk = pChunk;
2314 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2315 return VINF_SUCCESS;
2316 }
2317 RTSpinlockRelease(pGMM->hSpinLockTree);
2318 }
2319
2320 /* bail out */
2321 rc = VERR_GMM_CHUNK_INSERT;
2322 }
2323 else
2324 rc = VERR_GMM_IS_NOT_SANE;
2325 gmmR0MutexRelease(pGMM);
2326 }
2327
2328 RTMemFree(pChunk);
2329 }
2330 else
2331 rc = VERR_NO_MEMORY;
2332 return rc;
2333}
2334
2335
2336/**
2337 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2338 * what's remaining to the specified free set.
2339 *
2340 * @note This will leave the giant mutex while allocating the new chunk!
2341 *
2342 * @returns VBox status code.
2343 * @param pGMM Pointer to the GMM instance data.
2344 * @param pGVM Pointer to the kernel-only VM instace data.
2345 * @param pSet Pointer to the free set.
2346 * @param cPages The number of pages requested.
2347 * @param paPages The page descriptor table (input + output).
2348 * @param piPage The pointer to the page descriptor table index variable.
2349 * This will be updated.
2350 */
2351static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2352 PGMMPAGEDESC paPages, uint32_t *piPage)
2353{
2354 gmmR0MutexRelease(pGMM);
2355
2356 RTR0MEMOBJ hMemObj;
2357#ifndef GMM_WITH_LEGACY_MODE
2358 int rc;
2359# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2360 if (pGMM->fHasWorkingAllocPhysNC)
2361 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2362 else
2363# endif
2364 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2365#else
2366 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2367#endif
2368 if (RT_SUCCESS(rc))
2369 {
2370 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2371 * free pages first and then unchaining them right afterwards. Instead
2372 * do as much work as possible without holding the giant lock. */
2373 PGMMCHUNK pChunk;
2374 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2375 if (RT_SUCCESS(rc))
2376 {
2377 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2378 return VINF_SUCCESS;
2379 }
2380
2381 /* bail out */
2382 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2383 }
2384
2385 int rc2 = gmmR0MutexAcquire(pGMM);
2386 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2387 return rc;
2388
2389}
2390
2391
2392/**
2393 * As a last restort we'll pick any page we can get.
2394 *
2395 * @returns The new page descriptor table index.
2396 * @param pSet The set to pick from.
2397 * @param pGVM Pointer to the global VM structure.
2398 * @param iPage The current page descriptor table index.
2399 * @param cPages The total number of pages to allocate.
2400 * @param paPages The page descriptor table (input + ouput).
2401 */
2402static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2403 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2404{
2405 unsigned iList = RT_ELEMENTS(pSet->apLists);
2406 while (iList-- > 0)
2407 {
2408 PGMMCHUNK pChunk = pSet->apLists[iList];
2409 while (pChunk)
2410 {
2411 PGMMCHUNK pNext = pChunk->pFreeNext;
2412
2413 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2414 if (iPage >= cPages)
2415 return iPage;
2416
2417 pChunk = pNext;
2418 }
2419 }
2420 return iPage;
2421}
2422
2423
2424/**
2425 * Pick pages from empty chunks on the same NUMA node.
2426 *
2427 * @returns The new page descriptor table index.
2428 * @param pSet The set to pick from.
2429 * @param pGVM Pointer to the global VM structure.
2430 * @param iPage The current page descriptor table index.
2431 * @param cPages The total number of pages to allocate.
2432 * @param paPages The page descriptor table (input + ouput).
2433 */
2434static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2435 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2436{
2437 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2438 if (pChunk)
2439 {
2440 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2441 while (pChunk)
2442 {
2443 PGMMCHUNK pNext = pChunk->pFreeNext;
2444
2445 if (pChunk->idNumaNode == idNumaNode)
2446 {
2447 pChunk->hGVM = pGVM->hSelf;
2448 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2449 if (iPage >= cPages)
2450 {
2451 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2452 return iPage;
2453 }
2454 }
2455
2456 pChunk = pNext;
2457 }
2458 }
2459 return iPage;
2460}
2461
2462
2463/**
2464 * Pick pages from non-empty chunks on the same NUMA node.
2465 *
2466 * @returns The new page descriptor table index.
2467 * @param pSet The set to pick from.
2468 * @param pGVM Pointer to the global VM structure.
2469 * @param iPage The current page descriptor table index.
2470 * @param cPages The total number of pages to allocate.
2471 * @param paPages The page descriptor table (input + ouput).
2472 */
2473static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2474 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2475{
2476 /** @todo start by picking from chunks with about the right size first? */
2477 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2478 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2479 while (iList-- > 0)
2480 {
2481 PGMMCHUNK pChunk = pSet->apLists[iList];
2482 while (pChunk)
2483 {
2484 PGMMCHUNK pNext = pChunk->pFreeNext;
2485
2486 if (pChunk->idNumaNode == idNumaNode)
2487 {
2488 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2489 if (iPage >= cPages)
2490 {
2491 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2492 return iPage;
2493 }
2494 }
2495
2496 pChunk = pNext;
2497 }
2498 }
2499 return iPage;
2500}
2501
2502
2503/**
2504 * Pick pages that are in chunks already associated with the VM.
2505 *
2506 * @returns The new page descriptor table index.
2507 * @param pGMM Pointer to the GMM instance data.
2508 * @param pGVM Pointer to the global VM structure.
2509 * @param pSet The set to pick from.
2510 * @param iPage The current page descriptor table index.
2511 * @param cPages The total number of pages to allocate.
2512 * @param paPages The page descriptor table (input + ouput).
2513 */
2514static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2515 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2516{
2517 uint16_t const hGVM = pGVM->hSelf;
2518
2519 /* Hint. */
2520 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2521 {
2522 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2523 if (pChunk && pChunk->cFree)
2524 {
2525 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2526 if (iPage >= cPages)
2527 return iPage;
2528 }
2529 }
2530
2531 /* Scan. */
2532 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2533 {
2534 PGMMCHUNK pChunk = pSet->apLists[iList];
2535 while (pChunk)
2536 {
2537 PGMMCHUNK pNext = pChunk->pFreeNext;
2538
2539 if (pChunk->hGVM == hGVM)
2540 {
2541 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2542 if (iPage >= cPages)
2543 {
2544 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2545 return iPage;
2546 }
2547 }
2548
2549 pChunk = pNext;
2550 }
2551 }
2552 return iPage;
2553}
2554
2555
2556
2557/**
2558 * Pick pages in bound memory mode.
2559 *
2560 * @returns The new page descriptor table index.
2561 * @param pGVM Pointer to the global VM structure.
2562 * @param iPage The current page descriptor table index.
2563 * @param cPages The total number of pages to allocate.
2564 * @param paPages The page descriptor table (input + ouput).
2565 */
2566static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2567{
2568 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2569 {
2570 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2571 while (pChunk)
2572 {
2573 Assert(pChunk->hGVM == pGVM->hSelf);
2574 PGMMCHUNK pNext = pChunk->pFreeNext;
2575 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2576 if (iPage >= cPages)
2577 return iPage;
2578 pChunk = pNext;
2579 }
2580 }
2581 return iPage;
2582}
2583
2584
2585/**
2586 * Checks if we should start picking pages from chunks of other VMs because
2587 * we're getting close to the system memory or reserved limit.
2588 *
2589 * @returns @c true if we should, @c false if we should first try allocate more
2590 * chunks.
2591 */
2592static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2593{
2594 /*
2595 * Don't allocate a new chunk if we're
2596 */
2597 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2598 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2599 - pGVM->gmm.s.Stats.cBalloonedPages
2600 /** @todo what about shared pages? */;
2601 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2602 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2603 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2604 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2605 return true;
2606 /** @todo make the threshold configurable, also test the code to see if
2607 * this ever kicks in (we might be reserving too much or smth). */
2608
2609 /*
2610 * Check how close we're to the max memory limit and how many fragments
2611 * there are?...
2612 */
2613 /** @todo */
2614
2615 return false;
2616}
2617
2618
2619/**
2620 * Checks if we should start picking pages from chunks of other VMs because
2621 * there is a lot of free pages around.
2622 *
2623 * @returns @c true if we should, @c false if we should first try allocate more
2624 * chunks.
2625 */
2626static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2627{
2628 /*
2629 * Setting the limit at 16 chunks (32 MB) at the moment.
2630 */
2631 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2632 return true;
2633 return false;
2634}
2635
2636
2637/**
2638 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2639 *
2640 * @returns VBox status code:
2641 * @retval VINF_SUCCESS on success.
2642 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2643 * gmmR0AllocateMoreChunks is necessary.
2644 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2645 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2646 * that is we're trying to allocate more than we've reserved.
2647 *
2648 * @param pGMM Pointer to the GMM instance data.
2649 * @param pGVM Pointer to the VM.
2650 * @param cPages The number of pages to allocate.
2651 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2652 * details on what is expected on input.
2653 * @param enmAccount The account to charge.
2654 *
2655 * @remarks Call takes the giant GMM lock.
2656 */
2657static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2658{
2659 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2660
2661 /*
2662 * Check allocation limits.
2663 */
2664 if (RT_LIKELY(pGMM->cAllocatedPages + cPages <= pGMM->cMaxPages))
2665 { /* likely */ }
2666 else
2667 return VERR_GMM_HIT_GLOBAL_LIMIT;
2668
2669 switch (enmAccount)
2670 {
2671 case GMMACCOUNT_BASE:
2672 if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2673 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
2674 { /* likely */ }
2675 else
2676 {
2677 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2678 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2679 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2680 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2681 }
2682 break;
2683 case GMMACCOUNT_SHADOW:
2684 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages <= pGVM->gmm.s.Stats.Reserved.cShadowPages))
2685 { /* likely */ }
2686 else
2687 {
2688 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2689 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2690 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2691 }
2692 break;
2693 case GMMACCOUNT_FIXED:
2694 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages <= pGVM->gmm.s.Stats.Reserved.cFixedPages))
2695 { /* likely */ }
2696 else
2697 {
2698 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2699 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2700 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2701 }
2702 break;
2703 default:
2704 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2705 }
2706
2707#ifdef GMM_WITH_LEGACY_MODE
2708 /*
2709 * If we're in legacy memory mode, it's easy to figure if we have
2710 * sufficient number of pages up-front.
2711 */
2712 if ( pGMM->fLegacyAllocationMode
2713 && pGVM->gmm.s.Private.cFreePages < cPages)
2714 {
2715 Assert(pGMM->fBoundMemoryMode);
2716 return VERR_GMM_SEED_ME;
2717 }
2718#endif
2719
2720 /*
2721 * Update the accounts before we proceed because we might be leaving the
2722 * protection of the global mutex and thus run the risk of permitting
2723 * too much memory to be allocated.
2724 */
2725 switch (enmAccount)
2726 {
2727 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2728 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2729 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2730 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2731 }
2732 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2733 pGMM->cAllocatedPages += cPages;
2734
2735#ifdef GMM_WITH_LEGACY_MODE
2736 /*
2737 * Part two of it's-easy-in-legacy-memory-mode.
2738 */
2739 if (pGMM->fLegacyAllocationMode)
2740 {
2741 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2742 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2743 return VINF_SUCCESS;
2744 }
2745#endif
2746
2747 /*
2748 * Bound mode is also relatively straightforward.
2749 */
2750 uint32_t iPage = 0;
2751 int rc = VINF_SUCCESS;
2752 if (pGMM->fBoundMemoryMode)
2753 {
2754 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2755 if (iPage < cPages)
2756 do
2757 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2758 while (iPage < cPages && RT_SUCCESS(rc));
2759 }
2760 /*
2761 * Shared mode is trickier as we should try archive the same locality as
2762 * in bound mode, but smartly make use of non-full chunks allocated by
2763 * other VMs if we're low on memory.
2764 */
2765 else
2766 {
2767 /* Pick the most optimal pages first. */
2768 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2769 if (iPage < cPages)
2770 {
2771 /* Maybe we should try getting pages from chunks "belonging" to
2772 other VMs before allocating more chunks? */
2773 bool fTriedOnSameAlready = false;
2774 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2775 {
2776 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2777 fTriedOnSameAlready = true;
2778 }
2779
2780 /* Allocate memory from empty chunks. */
2781 if (iPage < cPages)
2782 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2783
2784 /* Grab empty shared chunks. */
2785 if (iPage < cPages)
2786 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2787
2788 /* If there is a lof of free pages spread around, try not waste
2789 system memory on more chunks. (Should trigger defragmentation.) */
2790 if ( !fTriedOnSameAlready
2791 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2792 {
2793 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2794 if (iPage < cPages)
2795 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2796 }
2797
2798 /*
2799 * Ok, try allocate new chunks.
2800 */
2801 if (iPage < cPages)
2802 {
2803 do
2804 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2805 while (iPage < cPages && RT_SUCCESS(rc));
2806
2807 /* If the host is out of memory, take whatever we can get. */
2808 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2809 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2810 {
2811 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2812 if (iPage < cPages)
2813 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2814 AssertRelease(iPage == cPages);
2815 rc = VINF_SUCCESS;
2816 }
2817 }
2818 }
2819 }
2820
2821 /*
2822 * Clean up on failure. Since this is bound to be a low-memory condition
2823 * we will give back any empty chunks that might be hanging around.
2824 */
2825 if (RT_SUCCESS(rc))
2826 { /* likely */ }
2827 else
2828 {
2829 /* Update the statistics. */
2830 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2831 pGMM->cAllocatedPages -= cPages - iPage;
2832 switch (enmAccount)
2833 {
2834 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2835 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2836 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2837 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2838 }
2839
2840 /* Release the pages. */
2841 while (iPage-- > 0)
2842 {
2843 uint32_t idPage = paPages[iPage].idPage;
2844 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2845 if (RT_LIKELY(pPage))
2846 {
2847 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2848 Assert(pPage->Private.hGVM == pGVM->hSelf);
2849 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2850 }
2851 else
2852 AssertMsgFailed(("idPage=%#x\n", idPage));
2853
2854 paPages[iPage].idPage = NIL_GMM_PAGEID;
2855 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2856 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2857 }
2858
2859 /* Free empty chunks. */
2860 /** @todo */
2861
2862 /* return the fail status on failure */
2863 return rc;
2864 }
2865 return VINF_SUCCESS;
2866}
2867
2868
2869/**
2870 * Updates the previous allocations and allocates more pages.
2871 *
2872 * The handy pages are always taken from the 'base' memory account.
2873 * The allocated pages are not cleared and will contains random garbage.
2874 *
2875 * @returns VBox status code:
2876 * @retval VINF_SUCCESS on success.
2877 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2878 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2879 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2880 * private page.
2881 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2882 * shared page.
2883 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2884 * owned by the VM.
2885 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2886 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2887 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2888 * that is we're trying to allocate more than we've reserved.
2889 *
2890 * @param pGVM The global (ring-0) VM structure.
2891 * @param idCpu The VCPU id.
2892 * @param cPagesToUpdate The number of pages to update (starting from the head).
2893 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2894 * @param paPages The array of page descriptors.
2895 * See GMMPAGEDESC for details on what is expected on input.
2896 * @thread EMT(idCpu)
2897 */
2898GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2899 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2900{
2901 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2902 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2903
2904 /*
2905 * Validate, get basics and take the semaphore.
2906 * (This is a relatively busy path, so make predictions where possible.)
2907 */
2908 PGMM pGMM;
2909 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2910 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2911 if (RT_FAILURE(rc))
2912 return rc;
2913
2914 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2915 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2916 || (cPagesToAlloc && cPagesToAlloc < 1024),
2917 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2918 VERR_INVALID_PARAMETER);
2919
2920 unsigned iPage = 0;
2921 for (; iPage < cPagesToUpdate; iPage++)
2922 {
2923 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2924 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2925 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2926 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2927 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2928 VERR_INVALID_PARAMETER);
2929 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2930 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2931 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2932 AssertMsgReturn( paPages[iPage].idSharedPage == NIL_GMM_PAGEID
2933 || paPages[iPage].idSharedPage <= GMM_PAGEID_LAST,
2934 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2935 }
2936
2937 for (; iPage < cPagesToAlloc; iPage++)
2938 {
2939 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2940 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2941 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2942 }
2943
2944 gmmR0MutexAcquire(pGMM);
2945 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2946 {
2947 /* No allocations before the initial reservation has been made! */
2948 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2949 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2950 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2951 {
2952 /*
2953 * Perform the updates.
2954 * Stop on the first error.
2955 */
2956 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2957 {
2958 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2959 {
2960 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2961 if (RT_LIKELY(pPage))
2962 {
2963 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2964 {
2965 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2966 {
2967 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2968 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2969 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2970 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2971 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2972 /* else: NIL_RTHCPHYS nothing */
2973
2974 paPages[iPage].idPage = NIL_GMM_PAGEID;
2975 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2976 }
2977 else
2978 {
2979 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2980 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2981 rc = VERR_GMM_NOT_PAGE_OWNER;
2982 break;
2983 }
2984 }
2985 else
2986 {
2987 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2988 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2989 break;
2990 }
2991 }
2992 else
2993 {
2994 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2995 rc = VERR_GMM_PAGE_NOT_FOUND;
2996 break;
2997 }
2998 }
2999
3000 if (paPages[iPage].idSharedPage == NIL_GMM_PAGEID)
3001 { /* likely */ }
3002 else
3003 {
3004 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
3005 if (RT_LIKELY(pPage))
3006 {
3007 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3008 {
3009 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
3010 Assert(pPage->Shared.cRefs);
3011 Assert(pGVM->gmm.s.Stats.cSharedPages);
3012 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
3013
3014 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
3015 pGVM->gmm.s.Stats.cSharedPages--;
3016 pGVM->gmm.s.Stats.Allocated.cBasePages--;
3017 if (!--pPage->Shared.cRefs)
3018 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
3019 else
3020 {
3021 Assert(pGMM->cDuplicatePages);
3022 pGMM->cDuplicatePages--;
3023 }
3024
3025 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
3026 }
3027 else
3028 {
3029 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
3030 rc = VERR_GMM_PAGE_NOT_SHARED;
3031 break;
3032 }
3033 }
3034 else
3035 {
3036 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3037 rc = VERR_GMM_PAGE_NOT_FOUND;
3038 break;
3039 }
3040 }
3041 } /* for each page to update */
3042
3043 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3044 {
3045#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
3046 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3047 {
3048 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
3049 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3050 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3051 }
3052#endif
3053
3054 /*
3055 * Join paths with GMMR0AllocatePages for the allocation.
3056 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3057 */
3058 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3059 }
3060 }
3061 else
3062 rc = VERR_WRONG_ORDER;
3063 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3064 }
3065 else
3066 rc = VERR_GMM_IS_NOT_SANE;
3067 gmmR0MutexRelease(pGMM);
3068 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3069 return rc;
3070}
3071
3072
3073/**
3074 * Allocate one or more pages.
3075 *
3076 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3077 * The allocated pages are not cleared and will contain random garbage.
3078 *
3079 * @returns VBox status code:
3080 * @retval VINF_SUCCESS on success.
3081 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3082 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3083 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3084 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3085 * that is we're trying to allocate more than we've reserved.
3086 *
3087 * @param pGVM The global (ring-0) VM structure.
3088 * @param idCpu The VCPU id.
3089 * @param cPages The number of pages to allocate.
3090 * @param paPages Pointer to the page descriptors.
3091 * See GMMPAGEDESC for details on what is expected on
3092 * input.
3093 * @param enmAccount The account to charge.
3094 *
3095 * @thread EMT.
3096 */
3097GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3098{
3099 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3100
3101 /*
3102 * Validate, get basics and take the semaphore.
3103 */
3104 PGMM pGMM;
3105 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3106 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3107 if (RT_FAILURE(rc))
3108 return rc;
3109
3110 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3111 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3112 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3113
3114 for (unsigned iPage = 0; iPage < cPages; iPage++)
3115 {
3116 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3117 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3118 || ( enmAccount == GMMACCOUNT_BASE
3119 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3120 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3121 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3122 VERR_INVALID_PARAMETER);
3123 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3124 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3125 }
3126
3127 gmmR0MutexAcquire(pGMM);
3128 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3129 {
3130
3131 /* No allocations before the initial reservation has been made! */
3132 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3133 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3134 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3135 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3136 else
3137 rc = VERR_WRONG_ORDER;
3138 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3139 }
3140 else
3141 rc = VERR_GMM_IS_NOT_SANE;
3142 gmmR0MutexRelease(pGMM);
3143 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3144 return rc;
3145}
3146
3147
3148/**
3149 * VMMR0 request wrapper for GMMR0AllocatePages.
3150 *
3151 * @returns see GMMR0AllocatePages.
3152 * @param pGVM The global (ring-0) VM structure.
3153 * @param idCpu The VCPU id.
3154 * @param pReq Pointer to the request packet.
3155 */
3156GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3157{
3158 /*
3159 * Validate input and pass it on.
3160 */
3161 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3162 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3163 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3164 VERR_INVALID_PARAMETER);
3165 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3166 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3167 VERR_INVALID_PARAMETER);
3168
3169 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3170}
3171
3172
3173/**
3174 * Allocate a large page to represent guest RAM
3175 *
3176 * The allocated pages are not cleared and will contains random garbage.
3177 *
3178 * @returns VBox status code:
3179 * @retval VINF_SUCCESS on success.
3180 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3181 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3182 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3183 * that is we're trying to allocate more than we've reserved.
3184 * @retval VERR_TRY_AGAIN if the host is temporarily out of large pages.
3185 * @returns see GMMR0AllocatePages.
3186 *
3187 * @param pGVM The global (ring-0) VM structure.
3188 * @param idCpu The VCPU id.
3189 * @param cbPage Large page size.
3190 * @param pIdPage Where to return the GMM page ID of the page.
3191 * @param pHCPhys Where to return the host physical address of the page.
3192 */
3193GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3194{
3195 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3196
3197 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3198 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3199 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3200
3201 /*
3202 * Validate, get basics and take the semaphore.
3203 */
3204 PGMM pGMM;
3205 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3206 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3207 if (RT_FAILURE(rc))
3208 return rc;
3209
3210#ifdef GMM_WITH_LEGACY_MODE
3211 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3212 // if (pGMM->fLegacyAllocationMode)
3213 // return VERR_NOT_SUPPORTED;
3214#endif
3215
3216 *pHCPhys = NIL_RTHCPHYS;
3217 *pIdPage = NIL_GMM_PAGEID;
3218
3219 gmmR0MutexAcquire(pGMM);
3220 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3221 {
3222 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3223 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3224 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3225 {
3226 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3227 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3228 gmmR0MutexRelease(pGMM);
3229 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3230 }
3231
3232 /*
3233 * Allocate a new large page chunk.
3234 *
3235 * Note! We leave the giant GMM lock temporarily as the allocation might
3236 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3237 */
3238 AssertCompile(GMM_CHUNK_SIZE == _2M);
3239 gmmR0MutexRelease(pGMM);
3240
3241 RTR0MEMOBJ hMemObj;
3242 rc = RTR0MemObjAllocLarge(&hMemObj, GMM_CHUNK_SIZE, GMM_CHUNK_SIZE, RTMEMOBJ_ALLOC_LARGE_F_FAST);
3243 if (RT_SUCCESS(rc))
3244 {
3245 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3246 PGMMCHUNK pChunk;
3247 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3248 if (RT_SUCCESS(rc))
3249 {
3250 /*
3251 * Allocate all the pages in the chunk.
3252 */
3253 /* Unlink the new chunk from the free list. */
3254 gmmR0UnlinkChunk(pChunk);
3255
3256 /** @todo rewrite this to skip the looping. */
3257 /* Allocate all pages. */
3258 GMMPAGEDESC PageDesc;
3259 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3260
3261 /* Return the first page as we'll use the whole chunk as one big page. */
3262 *pIdPage = PageDesc.idPage;
3263 *pHCPhys = PageDesc.HCPhysGCPhys;
3264
3265 for (unsigned i = 1; i < cPages; i++)
3266 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3267
3268 /* Update accounting. */
3269 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3270 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3271 pGMM->cAllocatedPages += cPages;
3272
3273 gmmR0LinkChunk(pChunk, pSet);
3274 gmmR0MutexRelease(pGMM);
3275 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3276 return VINF_SUCCESS;
3277 }
3278 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3279 }
3280 }
3281 else
3282 {
3283 gmmR0MutexRelease(pGMM);
3284 rc = VERR_GMM_IS_NOT_SANE;
3285 }
3286
3287 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3288 return rc;
3289}
3290
3291
3292/**
3293 * Free a large page.
3294 *
3295 * @returns VBox status code:
3296 * @param pGVM The global (ring-0) VM structure.
3297 * @param idCpu The VCPU id.
3298 * @param idPage The large page id.
3299 */
3300GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3301{
3302 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3303
3304 /*
3305 * Validate, get basics and take the semaphore.
3306 */
3307 PGMM pGMM;
3308 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3309 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3310 if (RT_FAILURE(rc))
3311 return rc;
3312
3313#ifdef GMM_WITH_LEGACY_MODE
3314 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3315 // if (pGMM->fLegacyAllocationMode)
3316 // return VERR_NOT_SUPPORTED;
3317#endif
3318
3319 gmmR0MutexAcquire(pGMM);
3320 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3321 {
3322 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3323
3324 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3325 {
3326 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3327 gmmR0MutexRelease(pGMM);
3328 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3329 }
3330
3331 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3332 if (RT_LIKELY( pPage
3333 && GMM_PAGE_IS_PRIVATE(pPage)))
3334 {
3335 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3336 Assert(pChunk);
3337 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3338 Assert(pChunk->cPrivate > 0);
3339
3340 /* Release the memory immediately. */
3341 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3342
3343 /* Update accounting. */
3344 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3345 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3346 pGMM->cAllocatedPages -= cPages;
3347 }
3348 else
3349 rc = VERR_GMM_PAGE_NOT_FOUND;
3350 }
3351 else
3352 rc = VERR_GMM_IS_NOT_SANE;
3353
3354 gmmR0MutexRelease(pGMM);
3355 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3356 return rc;
3357}
3358
3359
3360/**
3361 * VMMR0 request wrapper for GMMR0FreeLargePage.
3362 *
3363 * @returns see GMMR0FreeLargePage.
3364 * @param pGVM The global (ring-0) VM structure.
3365 * @param idCpu The VCPU id.
3366 * @param pReq Pointer to the request packet.
3367 */
3368GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3369{
3370 /*
3371 * Validate input and pass it on.
3372 */
3373 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3374 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3375 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3376 VERR_INVALID_PARAMETER);
3377
3378 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3379}
3380
3381
3382/**
3383 * @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3384 * Used by gmmR0FreeChunkFlushPerVmTlbs().}
3385 */
3386static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3387{
3388 RT_NOREF(pvUser);
3389 if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3390 {
3391 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3392 uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3393 while (i-- > 0)
3394 {
3395 pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3396 pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3397 }
3398 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3399 }
3400 return VINF_SUCCESS;
3401}
3402
3403
3404/**
3405 * Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3406 * free generation ID value.
3407 *
3408 * This is done at 2^62 - 1, which allows us to drop all locks and as it will
3409 * take a while before 12 exa (2 305 843 009 213 693 952) calls to
3410 * gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3411 * invalidation passes and resets the generation ID between then. This will
3412 * make sure there are no false positives.
3413 *
3414 * @param pGMM Pointer to the GMM instance.
3415 */
3416static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3417{
3418 /*
3419 * First invalidation pass.
3420 */
3421 int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3422 AssertRCSuccess(rc);
3423
3424 /*
3425 * Reset the generation number.
3426 */
3427 RTSpinlockAcquire(pGMM->hSpinLockTree);
3428 ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3429 RTSpinlockRelease(pGMM->hSpinLockTree);
3430
3431 /*
3432 * Second invalidation pass.
3433 */
3434 rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3435 AssertRCSuccess(rc);
3436}
3437
3438
3439/**
3440 * Frees a chunk, giving it back to the host OS.
3441 *
3442 * @param pGMM Pointer to the GMM instance.
3443 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3444 * unmap and free the chunk in one go.
3445 * @param pChunk The chunk to free.
3446 * @param fRelaxedSem Whether we can release the semaphore while doing the
3447 * freeing (@c true) or not.
3448 */
3449static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3450{
3451 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3452
3453 GMMR0CHUNKMTXSTATE MtxState;
3454 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3455
3456 /*
3457 * Cleanup hack! Unmap the chunk from the callers address space.
3458 * This shouldn't happen, so screw lock contention...
3459 */
3460 if ( pChunk->cMappingsX
3461#ifdef GMM_WITH_LEGACY_MODE
3462 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3463#endif
3464 && pGVM)
3465 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3466
3467 /*
3468 * If there are current mappings of the chunk, then request the
3469 * VMs to unmap them. Reposition the chunk in the free list so
3470 * it won't be a likely candidate for allocations.
3471 */
3472 if (pChunk->cMappingsX)
3473 {
3474 /** @todo R0 -> VM request */
3475 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3476 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3477 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3478 return false;
3479 }
3480
3481
3482 /*
3483 * Save and trash the handle.
3484 */
3485 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3486 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3487
3488 /*
3489 * Unlink it from everywhere.
3490 */
3491 gmmR0UnlinkChunk(pChunk);
3492
3493 RTSpinlockAcquire(pGMM->hSpinLockTree);
3494
3495 RTListNodeRemove(&pChunk->ListNode);
3496
3497 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3498 Assert(pCore == &pChunk->Core); NOREF(pCore);
3499
3500 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3501 if (pTlbe->pChunk == pChunk)
3502 {
3503 pTlbe->idChunk = NIL_GMM_CHUNKID;
3504 pTlbe->pChunk = NULL;
3505 }
3506
3507 Assert(pGMM->cChunks > 0);
3508 pGMM->cChunks--;
3509
3510 uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3511
3512 RTSpinlockRelease(pGMM->hSpinLockTree);
3513
3514 /*
3515 * Free the Chunk ID before dropping the locks and freeing the rest.
3516 */
3517 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3518 pChunk->Core.Key = NIL_GMM_CHUNKID;
3519
3520 pGMM->cFreedChunks++;
3521
3522 gmmR0ChunkMutexRelease(&MtxState, NULL);
3523 if (fRelaxedSem)
3524 gmmR0MutexRelease(pGMM);
3525
3526 if (idFreeGeneration == UINT64_MAX / 4)
3527 gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3528
3529 RTMemFree(pChunk->paMappingsX);
3530 pChunk->paMappingsX = NULL;
3531
3532 RTMemFree(pChunk);
3533
3534#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3535 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3536#else
3537 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3538#endif
3539 AssertLogRelRC(rc);
3540
3541 if (fRelaxedSem)
3542 gmmR0MutexAcquire(pGMM);
3543 return fRelaxedSem;
3544}
3545
3546
3547/**
3548 * Free page worker.
3549 *
3550 * The caller does all the statistic decrementing, we do all the incrementing.
3551 *
3552 * @param pGMM Pointer to the GMM instance data.
3553 * @param pGVM Pointer to the GVM instance.
3554 * @param pChunk Pointer to the chunk this page belongs to.
3555 * @param idPage The Page ID.
3556 * @param pPage Pointer to the page.
3557 */
3558static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3559{
3560 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3561 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3562
3563 /*
3564 * Put the page on the free list.
3565 */
3566 pPage->u = 0;
3567 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3568 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3569 pPage->Free.iNext = pChunk->iFreeHead;
3570 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3571
3572 /*
3573 * Update statistics (the cShared/cPrivate stats are up to date already),
3574 * and relink the chunk if necessary.
3575 */
3576 unsigned const cFree = pChunk->cFree;
3577 if ( !cFree
3578 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3579 {
3580 gmmR0UnlinkChunk(pChunk);
3581 pChunk->cFree++;
3582 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3583 }
3584 else
3585 {
3586 pChunk->cFree = cFree + 1;
3587 pChunk->pSet->cFreePages++;
3588 }
3589
3590 /*
3591 * If the chunk becomes empty, consider giving memory back to the host OS.
3592 *
3593 * The current strategy is to try give it back if there are other chunks
3594 * in this free list, meaning if there are at least 240 free pages in this
3595 * category. Note that since there are probably mappings of the chunk,
3596 * it won't be freed up instantly, which probably screws up this logic
3597 * a bit...
3598 */
3599 /** @todo Do this on the way out. */
3600 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3601 || pChunk->pFreeNext == NULL
3602 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3603 { /* likely */ }
3604#ifdef GMM_WITH_LEGACY_MODE
3605 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3606 { /* likely */ }
3607#endif
3608 else
3609 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3610
3611}
3612
3613
3614/**
3615 * Frees a shared page, the page is known to exist and be valid and such.
3616 *
3617 * @param pGMM Pointer to the GMM instance.
3618 * @param pGVM Pointer to the GVM instance.
3619 * @param idPage The page id.
3620 * @param pPage The page structure.
3621 */
3622DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3623{
3624 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3625 Assert(pChunk);
3626 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3627 Assert(pChunk->cShared > 0);
3628 Assert(pGMM->cSharedPages > 0);
3629 Assert(pGMM->cAllocatedPages > 0);
3630 Assert(!pPage->Shared.cRefs);
3631
3632 pChunk->cShared--;
3633 pGMM->cAllocatedPages--;
3634 pGMM->cSharedPages--;
3635 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3636}
3637
3638
3639/**
3640 * Frees a private page, the page is known to exist and be valid and such.
3641 *
3642 * @param pGMM Pointer to the GMM instance.
3643 * @param pGVM Pointer to the GVM instance.
3644 * @param idPage The page id.
3645 * @param pPage The page structure.
3646 */
3647DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3648{
3649 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3650 Assert(pChunk);
3651 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3652 Assert(pChunk->cPrivate > 0);
3653 Assert(pGMM->cAllocatedPages > 0);
3654
3655 pChunk->cPrivate--;
3656 pGMM->cAllocatedPages--;
3657 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3658}
3659
3660
3661/**
3662 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3663 *
3664 * @returns VBox status code:
3665 * @retval xxx
3666 *
3667 * @param pGMM Pointer to the GMM instance data.
3668 * @param pGVM Pointer to the VM.
3669 * @param cPages The number of pages to free.
3670 * @param paPages Pointer to the page descriptors.
3671 * @param enmAccount The account this relates to.
3672 */
3673static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3674{
3675 /*
3676 * Check that the request isn't impossible wrt to the account status.
3677 */
3678 switch (enmAccount)
3679 {
3680 case GMMACCOUNT_BASE:
3681 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3682 {
3683 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3684 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3685 }
3686 break;
3687 case GMMACCOUNT_SHADOW:
3688 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3689 {
3690 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3691 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3692 }
3693 break;
3694 case GMMACCOUNT_FIXED:
3695 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3696 {
3697 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3698 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3699 }
3700 break;
3701 default:
3702 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3703 }
3704
3705 /*
3706 * Walk the descriptors and free the pages.
3707 *
3708 * Statistics (except the account) are being updated as we go along,
3709 * unlike the alloc code. Also, stop on the first error.
3710 */
3711 int rc = VINF_SUCCESS;
3712 uint32_t iPage;
3713 for (iPage = 0; iPage < cPages; iPage++)
3714 {
3715 uint32_t idPage = paPages[iPage].idPage;
3716 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3717 if (RT_LIKELY(pPage))
3718 {
3719 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3720 {
3721 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3722 {
3723 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3724 pGVM->gmm.s.Stats.cPrivatePages--;
3725 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3726 }
3727 else
3728 {
3729 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3730 pPage->Private.hGVM, pGVM->hSelf));
3731 rc = VERR_GMM_NOT_PAGE_OWNER;
3732 break;
3733 }
3734 }
3735 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3736 {
3737 Assert(pGVM->gmm.s.Stats.cSharedPages);
3738 Assert(pPage->Shared.cRefs);
3739#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3740 if (pPage->Shared.u14Checksum)
3741 {
3742 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3743 uChecksum &= UINT32_C(0x00003fff);
3744 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3745 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3746 }
3747#endif
3748 pGVM->gmm.s.Stats.cSharedPages--;
3749 if (!--pPage->Shared.cRefs)
3750 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3751 else
3752 {
3753 Assert(pGMM->cDuplicatePages);
3754 pGMM->cDuplicatePages--;
3755 }
3756 }
3757 else
3758 {
3759 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3760 rc = VERR_GMM_PAGE_ALREADY_FREE;
3761 break;
3762 }
3763 }
3764 else
3765 {
3766 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3767 rc = VERR_GMM_PAGE_NOT_FOUND;
3768 break;
3769 }
3770 paPages[iPage].idPage = NIL_GMM_PAGEID;
3771 }
3772
3773 /*
3774 * Update the account.
3775 */
3776 switch (enmAccount)
3777 {
3778 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3779 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3780 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3781 default:
3782 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3783 }
3784
3785 /*
3786 * Any threshold stuff to be done here?
3787 */
3788
3789 return rc;
3790}
3791
3792
3793/**
3794 * Free one or more pages.
3795 *
3796 * This is typically used at reset time or power off.
3797 *
3798 * @returns VBox status code:
3799 * @retval xxx
3800 *
3801 * @param pGVM The global (ring-0) VM structure.
3802 * @param idCpu The VCPU id.
3803 * @param cPages The number of pages to allocate.
3804 * @param paPages Pointer to the page descriptors containing the page IDs
3805 * for each page.
3806 * @param enmAccount The account this relates to.
3807 * @thread EMT.
3808 */
3809GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3810{
3811 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3812
3813 /*
3814 * Validate input and get the basics.
3815 */
3816 PGMM pGMM;
3817 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3818 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3819 if (RT_FAILURE(rc))
3820 return rc;
3821
3822 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3823 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3824 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3825
3826 for (unsigned iPage = 0; iPage < cPages; iPage++)
3827 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3828 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3829 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3830
3831 /*
3832 * Take the semaphore and call the worker function.
3833 */
3834 gmmR0MutexAcquire(pGMM);
3835 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3836 {
3837 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3838 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3839 }
3840 else
3841 rc = VERR_GMM_IS_NOT_SANE;
3842 gmmR0MutexRelease(pGMM);
3843 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3844 return rc;
3845}
3846
3847
3848/**
3849 * VMMR0 request wrapper for GMMR0FreePages.
3850 *
3851 * @returns see GMMR0FreePages.
3852 * @param pGVM The global (ring-0) VM structure.
3853 * @param idCpu The VCPU id.
3854 * @param pReq Pointer to the request packet.
3855 */
3856GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3857{
3858 /*
3859 * Validate input and pass it on.
3860 */
3861 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3862 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3863 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3864 VERR_INVALID_PARAMETER);
3865 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3866 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3867 VERR_INVALID_PARAMETER);
3868
3869 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3870}
3871
3872
3873/**
3874 * Report back on a memory ballooning request.
3875 *
3876 * The request may or may not have been initiated by the GMM. If it was initiated
3877 * by the GMM it is important that this function is called even if no pages were
3878 * ballooned.
3879 *
3880 * @returns VBox status code:
3881 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3882 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3883 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3884 * indicating that we won't necessarily have sufficient RAM to boot
3885 * the VM again and that it should pause until this changes (we'll try
3886 * balloon some other VM). (For standard deflate we have little choice
3887 * but to hope the VM won't use the memory that was returned to it.)
3888 *
3889 * @param pGVM The global (ring-0) VM structure.
3890 * @param idCpu The VCPU id.
3891 * @param enmAction Inflate/deflate/reset.
3892 * @param cBalloonedPages The number of pages that was ballooned.
3893 *
3894 * @thread EMT(idCpu)
3895 */
3896GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3897{
3898 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3899 pGVM, enmAction, cBalloonedPages));
3900
3901 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3902
3903 /*
3904 * Validate input and get the basics.
3905 */
3906 PGMM pGMM;
3907 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3908 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3909 if (RT_FAILURE(rc))
3910 return rc;
3911
3912 /*
3913 * Take the semaphore and do some more validations.
3914 */
3915 gmmR0MutexAcquire(pGMM);
3916 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3917 {
3918 switch (enmAction)
3919 {
3920 case GMMBALLOONACTION_INFLATE:
3921 {
3922 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3923 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3924 {
3925 /*
3926 * Record the ballooned memory.
3927 */
3928 pGMM->cBalloonedPages += cBalloonedPages;
3929 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3930 {
3931 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3932 AssertFailed();
3933
3934 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3935 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3936 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3937 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3938 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3939 }
3940 else
3941 {
3942 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3943 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3944 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3945 }
3946 }
3947 else
3948 {
3949 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3950 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3951 pGVM->gmm.s.Stats.Reserved.cBasePages));
3952 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3953 }
3954 break;
3955 }
3956
3957 case GMMBALLOONACTION_DEFLATE:
3958 {
3959 /* Deflate. */
3960 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3961 {
3962 /*
3963 * Record the ballooned memory.
3964 */
3965 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3966 pGMM->cBalloonedPages -= cBalloonedPages;
3967 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3968 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3969 {
3970 AssertFailed(); /* This is path is for later. */
3971 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3972 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3973
3974 /*
3975 * Anything we need to do here now when the request has been completed?
3976 */
3977 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3978 }
3979 else
3980 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3981 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3982 }
3983 else
3984 {
3985 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3986 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3987 }
3988 break;
3989 }
3990
3991 case GMMBALLOONACTION_RESET:
3992 {
3993 /* Reset to an empty balloon. */
3994 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3995
3996 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3997 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3998 break;
3999 }
4000
4001 default:
4002 rc = VERR_INVALID_PARAMETER;
4003 break;
4004 }
4005 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4006 }
4007 else
4008 rc = VERR_GMM_IS_NOT_SANE;
4009
4010 gmmR0MutexRelease(pGMM);
4011 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
4012 return rc;
4013}
4014
4015
4016/**
4017 * VMMR0 request wrapper for GMMR0BalloonedPages.
4018 *
4019 * @returns see GMMR0BalloonedPages.
4020 * @param pGVM The global (ring-0) VM structure.
4021 * @param idCpu The VCPU id.
4022 * @param pReq Pointer to the request packet.
4023 */
4024GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
4025{
4026 /*
4027 * Validate input and pass it on.
4028 */
4029 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4030 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
4031 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
4032 VERR_INVALID_PARAMETER);
4033
4034 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
4035}
4036
4037
4038/**
4039 * Return memory statistics for the hypervisor
4040 *
4041 * @returns VBox status code.
4042 * @param pReq Pointer to the request packet.
4043 */
4044GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
4045{
4046 /*
4047 * Validate input and pass it on.
4048 */
4049 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4050 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4051 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4052 VERR_INVALID_PARAMETER);
4053
4054 /*
4055 * Validate input and get the basics.
4056 */
4057 PGMM pGMM;
4058 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4059 pReq->cAllocPages = pGMM->cAllocatedPages;
4060 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
4061 pReq->cBalloonedPages = pGMM->cBalloonedPages;
4062 pReq->cMaxPages = pGMM->cMaxPages;
4063 pReq->cSharedPages = pGMM->cDuplicatePages;
4064 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4065
4066 return VINF_SUCCESS;
4067}
4068
4069
4070/**
4071 * Return memory statistics for the VM
4072 *
4073 * @returns VBox status code.
4074 * @param pGVM The global (ring-0) VM structure.
4075 * @param idCpu Cpu id.
4076 * @param pReq Pointer to the request packet.
4077 *
4078 * @thread EMT(idCpu)
4079 */
4080GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4081{
4082 /*
4083 * Validate input and pass it on.
4084 */
4085 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4086 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4087 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4088 VERR_INVALID_PARAMETER);
4089
4090 /*
4091 * Validate input and get the basics.
4092 */
4093 PGMM pGMM;
4094 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4095 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4096 if (RT_FAILURE(rc))
4097 return rc;
4098
4099 /*
4100 * Take the semaphore and do some more validations.
4101 */
4102 gmmR0MutexAcquire(pGMM);
4103 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4104 {
4105 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4106 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4107 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4108 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4109 }
4110 else
4111 rc = VERR_GMM_IS_NOT_SANE;
4112
4113 gmmR0MutexRelease(pGMM);
4114 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4115 return rc;
4116}
4117
4118
4119/**
4120 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4121 *
4122 * Don't call this in legacy allocation mode!
4123 *
4124 * @returns VBox status code.
4125 * @param pGMM Pointer to the GMM instance data.
4126 * @param pGVM Pointer to the Global VM structure.
4127 * @param pChunk Pointer to the chunk to be unmapped.
4128 */
4129static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4130{
4131 RT_NOREF_PV(pGMM);
4132#ifdef GMM_WITH_LEGACY_MODE
4133 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4134#endif
4135
4136 /*
4137 * Find the mapping and try unmapping it.
4138 */
4139 uint32_t cMappings = pChunk->cMappingsX;
4140 for (uint32_t i = 0; i < cMappings; i++)
4141 {
4142 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4143 if (pChunk->paMappingsX[i].pGVM == pGVM)
4144 {
4145 /* unmap */
4146 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4147 if (RT_SUCCESS(rc))
4148 {
4149 /* update the record. */
4150 cMappings--;
4151 if (i < cMappings)
4152 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4153 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4154 pChunk->paMappingsX[cMappings].pGVM = NULL;
4155 Assert(pChunk->cMappingsX - 1U == cMappings);
4156 pChunk->cMappingsX = cMappings;
4157 }
4158
4159 return rc;
4160 }
4161 }
4162
4163 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4164 return VERR_GMM_CHUNK_NOT_MAPPED;
4165}
4166
4167
4168/**
4169 * Unmaps a chunk previously mapped into the address space of the current process.
4170 *
4171 * @returns VBox status code.
4172 * @param pGMM Pointer to the GMM instance data.
4173 * @param pGVM Pointer to the Global VM structure.
4174 * @param pChunk Pointer to the chunk to be unmapped.
4175 * @param fRelaxedSem Whether we can release the semaphore while doing the
4176 * mapping (@c true) or not.
4177 */
4178static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4179{
4180#ifdef GMM_WITH_LEGACY_MODE
4181 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4182 {
4183#endif
4184 /*
4185 * Lock the chunk and if possible leave the giant GMM lock.
4186 */
4187 GMMR0CHUNKMTXSTATE MtxState;
4188 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4189 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4190 if (RT_SUCCESS(rc))
4191 {
4192 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4193 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4194 }
4195 return rc;
4196#ifdef GMM_WITH_LEGACY_MODE
4197 }
4198
4199 if (pChunk->hGVM == pGVM->hSelf)
4200 return VINF_SUCCESS;
4201
4202 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4203 return VERR_GMM_CHUNK_NOT_MAPPED;
4204#endif
4205}
4206
4207
4208/**
4209 * Worker for gmmR0MapChunk.
4210 *
4211 * @returns VBox status code.
4212 * @param pGMM Pointer to the GMM instance data.
4213 * @param pGVM Pointer to the Global VM structure.
4214 * @param pChunk Pointer to the chunk to be mapped.
4215 * @param ppvR3 Where to store the ring-3 address of the mapping.
4216 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4217 * contain the address of the existing mapping.
4218 */
4219static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4220{
4221#ifdef GMM_WITH_LEGACY_MODE
4222 /*
4223 * If we're in legacy mode this is simple.
4224 */
4225 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4226 {
4227 if (pChunk->hGVM != pGVM->hSelf)
4228 {
4229 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4230 return VERR_GMM_CHUNK_NOT_FOUND;
4231 }
4232
4233 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4234 return VINF_SUCCESS;
4235 }
4236#else
4237 RT_NOREF(pGMM);
4238#endif
4239
4240 /*
4241 * Check to see if the chunk is already mapped.
4242 */
4243 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4244 {
4245 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4246 if (pChunk->paMappingsX[i].pGVM == pGVM)
4247 {
4248 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4249 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4250#ifdef VBOX_WITH_PAGE_SHARING
4251 /* The ring-3 chunk cache can be out of sync; don't fail. */
4252 return VINF_SUCCESS;
4253#else
4254 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4255#endif
4256 }
4257 }
4258
4259 /*
4260 * Do the mapping.
4261 */
4262 RTR0MEMOBJ hMapObj;
4263 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4264 if (RT_SUCCESS(rc))
4265 {
4266 /* reallocate the array? assumes few users per chunk (usually one). */
4267 unsigned iMapping = pChunk->cMappingsX;
4268 if ( iMapping <= 3
4269 || (iMapping & 3) == 0)
4270 {
4271 unsigned cNewSize = iMapping <= 3
4272 ? iMapping + 1
4273 : iMapping + 4;
4274 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4275 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4276 {
4277 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4278 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4279 }
4280
4281 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4282 if (RT_UNLIKELY(!pvMappings))
4283 {
4284 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4285 return VERR_NO_MEMORY;
4286 }
4287 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4288 }
4289
4290 /* insert new entry */
4291 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4292 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4293 Assert(pChunk->cMappingsX == iMapping);
4294 pChunk->cMappingsX = iMapping + 1;
4295
4296 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4297 }
4298
4299 return rc;
4300}
4301
4302
4303/**
4304 * Maps a chunk into the user address space of the current process.
4305 *
4306 * @returns VBox status code.
4307 * @param pGMM Pointer to the GMM instance data.
4308 * @param pGVM Pointer to the Global VM structure.
4309 * @param pChunk Pointer to the chunk to be mapped.
4310 * @param fRelaxedSem Whether we can release the semaphore while doing the
4311 * mapping (@c true) or not.
4312 * @param ppvR3 Where to store the ring-3 address of the mapping.
4313 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4314 * contain the address of the existing mapping.
4315 */
4316static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4317{
4318 /*
4319 * Take the chunk lock and leave the giant GMM lock when possible, then
4320 * call the worker function.
4321 */
4322 GMMR0CHUNKMTXSTATE MtxState;
4323 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4324 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4325 if (RT_SUCCESS(rc))
4326 {
4327 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4328 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4329 }
4330
4331 return rc;
4332}
4333
4334
4335
4336#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4337/**
4338 * Check if a chunk is mapped into the specified VM
4339 *
4340 * @returns mapped yes/no
4341 * @param pGMM Pointer to the GMM instance.
4342 * @param pGVM Pointer to the Global VM structure.
4343 * @param pChunk Pointer to the chunk to be mapped.
4344 * @param ppvR3 Where to store the ring-3 address of the mapping.
4345 */
4346static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4347{
4348 GMMR0CHUNKMTXSTATE MtxState;
4349 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4350 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4351 {
4352 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4353 if (pChunk->paMappingsX[i].pGVM == pGVM)
4354 {
4355 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4356 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4357 return true;
4358 }
4359 }
4360 *ppvR3 = NULL;
4361 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4362 return false;
4363}
4364#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4365
4366
4367/**
4368 * Map a chunk and/or unmap another chunk.
4369 *
4370 * The mapping and unmapping applies to the current process.
4371 *
4372 * This API does two things because it saves a kernel call per mapping when
4373 * when the ring-3 mapping cache is full.
4374 *
4375 * @returns VBox status code.
4376 * @param pGVM The global (ring-0) VM structure.
4377 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4378 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4379 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4380 * @thread EMT ???
4381 */
4382GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4383{
4384 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4385 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4386
4387 /*
4388 * Validate input and get the basics.
4389 */
4390 PGMM pGMM;
4391 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4392 int rc = GVMMR0ValidateGVM(pGVM);
4393 if (RT_FAILURE(rc))
4394 return rc;
4395
4396 AssertCompile(NIL_GMM_CHUNKID == 0);
4397 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4398 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4399
4400 if ( idChunkMap == NIL_GMM_CHUNKID
4401 && idChunkUnmap == NIL_GMM_CHUNKID)
4402 return VERR_INVALID_PARAMETER;
4403
4404 if (idChunkMap != NIL_GMM_CHUNKID)
4405 {
4406 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4407 *ppvR3 = NIL_RTR3PTR;
4408 }
4409
4410 /*
4411 * Take the semaphore and do the work.
4412 *
4413 * The unmapping is done last since it's easier to undo a mapping than
4414 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4415 * that it pushes the user virtual address space to within a chunk of
4416 * it it's limits, so, no problem here.
4417 */
4418 gmmR0MutexAcquire(pGMM);
4419 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4420 {
4421 PGMMCHUNK pMap = NULL;
4422 if (idChunkMap != NIL_GVM_HANDLE)
4423 {
4424 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4425 if (RT_LIKELY(pMap))
4426 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4427 else
4428 {
4429 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4430 rc = VERR_GMM_CHUNK_NOT_FOUND;
4431 }
4432 }
4433/** @todo split this operation, the bail out might (theoretcially) not be
4434 * entirely safe. */
4435
4436 if ( idChunkUnmap != NIL_GMM_CHUNKID
4437 && RT_SUCCESS(rc))
4438 {
4439 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4440 if (RT_LIKELY(pUnmap))
4441 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4442 else
4443 {
4444 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4445 rc = VERR_GMM_CHUNK_NOT_FOUND;
4446 }
4447
4448 if (RT_FAILURE(rc) && pMap)
4449 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4450 }
4451
4452 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4453 }
4454 else
4455 rc = VERR_GMM_IS_NOT_SANE;
4456 gmmR0MutexRelease(pGMM);
4457
4458 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4459 return rc;
4460}
4461
4462
4463/**
4464 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4465 *
4466 * @returns see GMMR0MapUnmapChunk.
4467 * @param pGVM The global (ring-0) VM structure.
4468 * @param pReq Pointer to the request packet.
4469 */
4470GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4471{
4472 /*
4473 * Validate input and pass it on.
4474 */
4475 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4476 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4477
4478 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4479}
4480
4481
4482/**
4483 * Legacy mode API for supplying pages.
4484 *
4485 * The specified user address points to a allocation chunk sized block that
4486 * will be locked down and used by the GMM when the GM asks for pages.
4487 *
4488 * @returns VBox status code.
4489 * @param pGVM The global (ring-0) VM structure.
4490 * @param idCpu The VCPU id.
4491 * @param pvR3 Pointer to the chunk size memory block to lock down.
4492 */
4493GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4494{
4495#ifdef GMM_WITH_LEGACY_MODE
4496 /*
4497 * Validate input and get the basics.
4498 */
4499 PGMM pGMM;
4500 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4501 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4502 if (RT_FAILURE(rc))
4503 return rc;
4504
4505 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4506 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4507
4508 if (!pGMM->fLegacyAllocationMode)
4509 {
4510 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4511 return VERR_NOT_SUPPORTED;
4512 }
4513
4514 /*
4515 * Lock the memory and add it as new chunk with our hGVM.
4516 * (The GMM locking is done inside gmmR0RegisterChunk.)
4517 */
4518 RTR0MEMOBJ hMemObj;
4519 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4520 if (RT_SUCCESS(rc))
4521 {
4522 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4523 if (RT_SUCCESS(rc))
4524 gmmR0MutexRelease(pGMM);
4525 else
4526 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4527 }
4528
4529 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4530 return rc;
4531#else
4532 RT_NOREF(pGVM, idCpu, pvR3);
4533 return VERR_NOT_SUPPORTED;
4534#endif
4535}
4536
4537
4538#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4539/**
4540 * Gets the ring-0 virtual address for the given page.
4541 *
4542 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4543 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4544 * corresponding chunk will remain valid beyond the call (at least till the EMT
4545 * returns to ring-3).
4546 *
4547 * @returns VBox status code.
4548 * @param pGVM Pointer to the kernel-only VM instace data.
4549 * @param idPage The page ID.
4550 * @param ppv Where to store the address.
4551 * @thread EMT
4552 */
4553GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4554{
4555 *ppv = NULL;
4556 PGMM pGMM;
4557 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4558
4559 uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4560
4561 /*
4562 * Start with the per-VM TLB.
4563 */
4564 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4565
4566 PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4567 PGMMCHUNK pChunk = pTlbe->pChunk;
4568 if ( pChunk != NULL
4569 && pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4570 && pChunk->Core.Key == idChunk)
4571 pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4572 else
4573 {
4574 pGVM->R0Stats.gmm.cChunkTlbMisses++;
4575
4576 /*
4577 * Look it up in the chunk tree.
4578 */
4579 RTSpinlockAcquire(pGMM->hSpinLockTree);
4580 pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4581 if (RT_LIKELY(pChunk))
4582 {
4583 pTlbe->idGeneration = pGMM->idFreeGeneration;
4584 RTSpinlockRelease(pGMM->hSpinLockTree);
4585 pTlbe->pChunk = pChunk;
4586 }
4587 else
4588 {
4589 RTSpinlockRelease(pGMM->hSpinLockTree);
4590 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4591 AssertMsgFailed(("idPage=%#x\n", idPage));
4592 return VERR_GMM_PAGE_NOT_FOUND;
4593 }
4594 }
4595
4596 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4597
4598 /*
4599 * Got a chunk, now validate the page ownership and calcuate it's address.
4600 */
4601 const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4602 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4603 && pPage->Private.hGVM == pGVM->hSelf)
4604 || GMM_PAGE_IS_SHARED(pPage)))
4605 {
4606 AssertPtr(pChunk->pbMapping);
4607 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4608 return VINF_SUCCESS;
4609 }
4610 AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4611 idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4612 return VERR_GMM_NOT_PAGE_OWNER;
4613}
4614#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4615
4616#ifdef VBOX_WITH_PAGE_SHARING
4617
4618# ifdef VBOX_STRICT
4619/**
4620 * For checksumming shared pages in strict builds.
4621 *
4622 * The purpose is making sure that a page doesn't change.
4623 *
4624 * @returns Checksum, 0 on failure.
4625 * @param pGMM The GMM instance data.
4626 * @param pGVM Pointer to the kernel-only VM instace data.
4627 * @param idPage The page ID.
4628 */
4629static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4630{
4631 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4632 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4633
4634 uint8_t *pbChunk;
4635 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4636 return 0;
4637 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4638
4639 return RTCrc32(pbPage, PAGE_SIZE);
4640}
4641# endif /* VBOX_STRICT */
4642
4643
4644/**
4645 * Calculates the module hash value.
4646 *
4647 * @returns Hash value.
4648 * @param pszModuleName The module name.
4649 * @param pszVersion The module version string.
4650 */
4651static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4652{
4653 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4654}
4655
4656
4657/**
4658 * Finds a global module.
4659 *
4660 * @returns Pointer to the global module on success, NULL if not found.
4661 * @param pGMM The GMM instance data.
4662 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4663 * @param cbModule The module size.
4664 * @param enmGuestOS The guest OS type.
4665 * @param cRegions The number of regions.
4666 * @param pszModuleName The module name.
4667 * @param pszVersion The module version.
4668 * @param paRegions The region descriptions.
4669 */
4670static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4671 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4672 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4673{
4674 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4675 pGblMod;
4676 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4677 {
4678 if (pGblMod->cbModule != cbModule)
4679 continue;
4680 if (pGblMod->enmGuestOS != enmGuestOS)
4681 continue;
4682 if (pGblMod->cRegions != cRegions)
4683 continue;
4684 if (strcmp(pGblMod->szName, pszModuleName))
4685 continue;
4686 if (strcmp(pGblMod->szVersion, pszVersion))
4687 continue;
4688
4689 uint32_t i;
4690 for (i = 0; i < cRegions; i++)
4691 {
4692 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4693 if (pGblMod->aRegions[i].off != off)
4694 break;
4695
4696 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4697 if (pGblMod->aRegions[i].cb != cb)
4698 break;
4699 }
4700
4701 if (i == cRegions)
4702 return pGblMod;
4703 }
4704
4705 return NULL;
4706}
4707
4708
4709/**
4710 * Creates a new global module.
4711 *
4712 * @returns VBox status code.
4713 * @param pGMM The GMM instance data.
4714 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4715 * @param cbModule The module size.
4716 * @param enmGuestOS The guest OS type.
4717 * @param cRegions The number of regions.
4718 * @param pszModuleName The module name.
4719 * @param pszVersion The module version.
4720 * @param paRegions The region descriptions.
4721 * @param ppGblMod Where to return the new module on success.
4722 */
4723static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4724 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4725 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4726{
4727 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4728 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4729 {
4730 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4731 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4732 }
4733
4734 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4735 if (!pGblMod)
4736 {
4737 Log(("gmmR0ShModNewGlobal: No memory\n"));
4738 return VERR_NO_MEMORY;
4739 }
4740
4741 pGblMod->Core.Key = uHash;
4742 pGblMod->cbModule = cbModule;
4743 pGblMod->cRegions = cRegions;
4744 pGblMod->cUsers = 1;
4745 pGblMod->enmGuestOS = enmGuestOS;
4746 strcpy(pGblMod->szName, pszModuleName);
4747 strcpy(pGblMod->szVersion, pszVersion);
4748
4749 for (uint32_t i = 0; i < cRegions; i++)
4750 {
4751 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4752 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4753 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4754 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4755 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4756 }
4757
4758 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4759 Assert(fInsert); NOREF(fInsert);
4760 pGMM->cShareableModules++;
4761
4762 *ppGblMod = pGblMod;
4763 return VINF_SUCCESS;
4764}
4765
4766
4767/**
4768 * Deletes a global module which is no longer referenced by anyone.
4769 *
4770 * @param pGMM The GMM instance data.
4771 * @param pGblMod The module to delete.
4772 */
4773static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4774{
4775 Assert(pGblMod->cUsers == 0);
4776 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4777
4778 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4779 Assert(pvTest == pGblMod); NOREF(pvTest);
4780 pGMM->cShareableModules--;
4781
4782 uint32_t i = pGblMod->cRegions;
4783 while (i-- > 0)
4784 {
4785 if (pGblMod->aRegions[i].paidPages)
4786 {
4787 /* We don't doing anything to the pages as they are handled by the
4788 copy-on-write mechanism in PGM. */
4789 RTMemFree(pGblMod->aRegions[i].paidPages);
4790 pGblMod->aRegions[i].paidPages = NULL;
4791 }
4792 }
4793 RTMemFree(pGblMod);
4794}
4795
4796
4797static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4798 PGMMSHAREDMODULEPERVM *ppRecVM)
4799{
4800 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4801 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4802
4803 PGMMSHAREDMODULEPERVM pRecVM;
4804 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4805 if (!pRecVM)
4806 return VERR_NO_MEMORY;
4807
4808 pRecVM->Core.Key = GCBaseAddr;
4809 for (uint32_t i = 0; i < cRegions; i++)
4810 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4811
4812 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4813 Assert(fInsert); NOREF(fInsert);
4814 pGVM->gmm.s.Stats.cShareableModules++;
4815
4816 *ppRecVM = pRecVM;
4817 return VINF_SUCCESS;
4818}
4819
4820
4821static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4822{
4823 /*
4824 * Free the per-VM module.
4825 */
4826 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4827 pRecVM->pGlobalModule = NULL;
4828
4829 if (fRemove)
4830 {
4831 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4832 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4833 }
4834
4835 RTMemFree(pRecVM);
4836
4837 /*
4838 * Release the global module.
4839 * (In the registration bailout case, it might not be.)
4840 */
4841 if (pGblMod)
4842 {
4843 Assert(pGblMod->cUsers > 0);
4844 pGblMod->cUsers--;
4845 if (pGblMod->cUsers == 0)
4846 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4847 }
4848}
4849
4850#endif /* VBOX_WITH_PAGE_SHARING */
4851
4852/**
4853 * Registers a new shared module for the VM.
4854 *
4855 * @returns VBox status code.
4856 * @param pGVM The global (ring-0) VM structure.
4857 * @param idCpu The VCPU id.
4858 * @param enmGuestOS The guest OS type.
4859 * @param pszModuleName The module name.
4860 * @param pszVersion The module version.
4861 * @param GCPtrModBase The module base address.
4862 * @param cbModule The module size.
4863 * @param cRegions The mumber of shared region descriptors.
4864 * @param paRegions Pointer to an array of shared region(s).
4865 * @thread EMT(idCpu)
4866 */
4867GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4868 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4869 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4870{
4871#ifdef VBOX_WITH_PAGE_SHARING
4872 /*
4873 * Validate input and get the basics.
4874 *
4875 * Note! Turns out the module size does necessarily match the size of the
4876 * regions. (iTunes on XP)
4877 */
4878 PGMM pGMM;
4879 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4880 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4881 if (RT_FAILURE(rc))
4882 return rc;
4883
4884 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4885 return VERR_GMM_TOO_MANY_REGIONS;
4886
4887 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4888 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4889
4890 uint32_t cbTotal = 0;
4891 for (uint32_t i = 0; i < cRegions; i++)
4892 {
4893 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4894 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4895
4896 cbTotal += paRegions[i].cbRegion;
4897 if (RT_UNLIKELY(cbTotal > _1G))
4898 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4899 }
4900
4901 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4902 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4903 return VERR_GMM_MODULE_NAME_TOO_LONG;
4904
4905 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4906 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4907 return VERR_GMM_MODULE_NAME_TOO_LONG;
4908
4909 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4910 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4911
4912 /*
4913 * Take the semaphore and do some more validations.
4914 */
4915 gmmR0MutexAcquire(pGMM);
4916 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4917 {
4918 /*
4919 * Check if this module is already locally registered and register
4920 * it if it isn't. The base address is a unique module identifier
4921 * locally.
4922 */
4923 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4924 bool fNewModule = pRecVM == NULL;
4925 if (fNewModule)
4926 {
4927 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4928 if (RT_SUCCESS(rc))
4929 {
4930 /*
4931 * Find a matching global module, register a new one if needed.
4932 */
4933 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4934 pszModuleName, pszVersion, paRegions);
4935 if (!pGblMod)
4936 {
4937 Assert(fNewModule);
4938 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4939 pszModuleName, pszVersion, paRegions, &pGblMod);
4940 if (RT_SUCCESS(rc))
4941 {
4942 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4943 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4944 }
4945 else
4946 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4947 }
4948 else
4949 {
4950 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4951 pGblMod->cUsers++;
4952 pRecVM->pGlobalModule = pGblMod;
4953
4954 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4955 }
4956 }
4957 }
4958 else
4959 {
4960 /*
4961 * Attempt to re-register an existing module.
4962 */
4963 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4964 pszModuleName, pszVersion, paRegions);
4965 if (pRecVM->pGlobalModule == pGblMod)
4966 {
4967 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4968 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4969 }
4970 else
4971 {
4972 /** @todo may have to unregister+register when this happens in case it's caused
4973 * by VBoxService crashing and being restarted... */
4974 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4975 " incoming at %RGvLB%#x %s %s rgns %u\n"
4976 " existing at %RGvLB%#x %s %s rgns %u\n",
4977 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4978 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4979 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4980 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4981 }
4982 }
4983 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4984 }
4985 else
4986 rc = VERR_GMM_IS_NOT_SANE;
4987
4988 gmmR0MutexRelease(pGMM);
4989 return rc;
4990#else
4991
4992 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4993 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4994 return VERR_NOT_IMPLEMENTED;
4995#endif
4996}
4997
4998
4999/**
5000 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
5001 *
5002 * @returns see GMMR0RegisterSharedModule.
5003 * @param pGVM The global (ring-0) VM structure.
5004 * @param idCpu The VCPU id.
5005 * @param pReq Pointer to the request packet.
5006 */
5007GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
5008{
5009 /*
5010 * Validate input and pass it on.
5011 */
5012 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5013 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
5014 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
5015 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5016
5017 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
5018 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
5019 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
5020 return VINF_SUCCESS;
5021}
5022
5023
5024/**
5025 * Unregisters a shared module for the VM
5026 *
5027 * @returns VBox status code.
5028 * @param pGVM The global (ring-0) VM structure.
5029 * @param idCpu The VCPU id.
5030 * @param pszModuleName The module name.
5031 * @param pszVersion The module version.
5032 * @param GCPtrModBase The module base address.
5033 * @param cbModule The module size.
5034 */
5035GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
5036 RTGCPTR GCPtrModBase, uint32_t cbModule)
5037{
5038#ifdef VBOX_WITH_PAGE_SHARING
5039 /*
5040 * Validate input and get the basics.
5041 */
5042 PGMM pGMM;
5043 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5044 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5045 if (RT_FAILURE(rc))
5046 return rc;
5047
5048 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
5049 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
5050 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
5051 return VERR_GMM_MODULE_NAME_TOO_LONG;
5052 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
5053 return VERR_GMM_MODULE_NAME_TOO_LONG;
5054
5055 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
5056
5057 /*
5058 * Take the semaphore and do some more validations.
5059 */
5060 gmmR0MutexAcquire(pGMM);
5061 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5062 {
5063 /*
5064 * Locate and remove the specified module.
5065 */
5066 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
5067 if (pRecVM)
5068 {
5069 /** @todo Do we need to do more validations here, like that the
5070 * name + version + cbModule matches? */
5071 NOREF(cbModule);
5072 Assert(pRecVM->pGlobalModule);
5073 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
5074 }
5075 else
5076 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
5077
5078 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5079 }
5080 else
5081 rc = VERR_GMM_IS_NOT_SANE;
5082
5083 gmmR0MutexRelease(pGMM);
5084 return rc;
5085#else
5086
5087 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
5088 return VERR_NOT_IMPLEMENTED;
5089#endif
5090}
5091
5092
5093/**
5094 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
5095 *
5096 * @returns see GMMR0UnregisterSharedModule.
5097 * @param pGVM The global (ring-0) VM structure.
5098 * @param idCpu The VCPU id.
5099 * @param pReq Pointer to the request packet.
5100 */
5101GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
5102{
5103 /*
5104 * Validate input and pass it on.
5105 */
5106 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5107 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5108
5109 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
5110}
5111
5112#ifdef VBOX_WITH_PAGE_SHARING
5113
5114/**
5115 * Increase the use count of a shared page, the page is known to exist and be valid and such.
5116 *
5117 * @param pGMM Pointer to the GMM instance.
5118 * @param pGVM Pointer to the GVM instance.
5119 * @param pPage The page structure.
5120 */
5121DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
5122{
5123 Assert(pGMM->cSharedPages > 0);
5124 Assert(pGMM->cAllocatedPages > 0);
5125
5126 pGMM->cDuplicatePages++;
5127
5128 pPage->Shared.cRefs++;
5129 pGVM->gmm.s.Stats.cSharedPages++;
5130 pGVM->gmm.s.Stats.Allocated.cBasePages++;
5131}
5132
5133
5134/**
5135 * Converts a private page to a shared page, the page is known to exist and be valid and such.
5136 *
5137 * @param pGMM Pointer to the GMM instance.
5138 * @param pGVM Pointer to the GVM instance.
5139 * @param HCPhys Host physical address
5140 * @param idPage The Page ID
5141 * @param pPage The page structure.
5142 * @param pPageDesc Shared page descriptor
5143 */
5144DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5145 PGMMSHAREDPAGEDESC pPageDesc)
5146{
5147 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5148 Assert(pChunk);
5149 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5150 Assert(GMM_PAGE_IS_PRIVATE(pPage));
5151
5152 pChunk->cPrivate--;
5153 pChunk->cShared++;
5154
5155 pGMM->cSharedPages++;
5156
5157 pGVM->gmm.s.Stats.cSharedPages++;
5158 pGVM->gmm.s.Stats.cPrivatePages--;
5159
5160 /* Modify the page structure. */
5161 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5162 pPage->Shared.cRefs = 1;
5163#ifdef VBOX_STRICT
5164 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5165 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5166#else
5167 NOREF(pPageDesc);
5168 pPage->Shared.u14Checksum = 0;
5169#endif
5170 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5171}
5172
5173
5174static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5175 unsigned idxRegion, unsigned idxPage,
5176 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5177{
5178 NOREF(pModule);
5179
5180 /* Easy case: just change the internal page type. */
5181 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5182 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5183 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5184 VERR_PGM_PHYS_INVALID_PAGE_ID);
5185 NOREF(idxRegion);
5186
5187 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5188
5189 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5190
5191 /* Keep track of these references. */
5192 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5193
5194 return VINF_SUCCESS;
5195}
5196
5197/**
5198 * Checks specified shared module range for changes
5199 *
5200 * Performs the following tasks:
5201 * - If a shared page is new, then it changes the GMM page type to shared and
5202 * returns it in the pPageDesc descriptor.
5203 * - If a shared page already exists, then it checks if the VM page is
5204 * identical and if so frees the VM page and returns the shared page in
5205 * pPageDesc descriptor.
5206 *
5207 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5208 *
5209 * @returns VBox status code.
5210 * @param pGVM Pointer to the GVM instance data.
5211 * @param pModule Module description
5212 * @param idxRegion Region index
5213 * @param idxPage Page index
5214 * @param pPageDesc Page descriptor
5215 */
5216GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5217 PGMMSHAREDPAGEDESC pPageDesc)
5218{
5219 int rc;
5220 PGMM pGMM;
5221 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5222 pPageDesc->u32StrictChecksum = 0;
5223
5224 AssertMsgReturn(idxRegion < pModule->cRegions,
5225 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5226 VERR_INVALID_PARAMETER);
5227
5228 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5229 AssertMsgReturn(idxPage < cPages,
5230 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5231 VERR_INVALID_PARAMETER);
5232
5233 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5234
5235 /*
5236 * First time; create a page descriptor array.
5237 */
5238 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5239 if (!pGlobalRegion->paidPages)
5240 {
5241 Log(("Allocate page descriptor array for %d pages\n", cPages));
5242 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5243 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5244
5245 /* Invalidate all descriptors. */
5246 uint32_t i = cPages;
5247 while (i-- > 0)
5248 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5249 }
5250
5251 /*
5252 * We've seen this shared page for the first time?
5253 */
5254 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5255 {
5256 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5257 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5258 }
5259
5260 /*
5261 * We've seen it before...
5262 */
5263 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5264 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5265 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5266
5267 /*
5268 * Get the shared page source.
5269 */
5270 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5271 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5272 VERR_PGM_PHYS_INVALID_PAGE_ID);
5273
5274 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5275 {
5276 /*
5277 * Page was freed at some point; invalidate this entry.
5278 */
5279 /** @todo this isn't really bullet proof. */
5280 Log(("Old shared page was freed -> create a new one\n"));
5281 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5282 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5283 }
5284
5285 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5286
5287 /*
5288 * Calculate the virtual address of the local page.
5289 */
5290 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5291 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5292 VERR_PGM_PHYS_INVALID_PAGE_ID);
5293
5294 uint8_t *pbChunk;
5295 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5296 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5297 VERR_PGM_PHYS_INVALID_PAGE_ID);
5298 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5299
5300 /*
5301 * Calculate the virtual address of the shared page.
5302 */
5303 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5304 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5305
5306 /*
5307 * Get the virtual address of the physical page; map the chunk into the VM
5308 * process if not already done.
5309 */
5310 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5311 {
5312 Log(("Map chunk into process!\n"));
5313 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5314 AssertRCReturn(rc, rc);
5315 }
5316 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5317
5318#ifdef VBOX_STRICT
5319 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5320 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5321 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5322 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5323 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5324#endif
5325
5326 /** @todo write ASMMemComparePage. */
5327 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5328 {
5329 Log(("Unexpected differences found between local and shared page; skip\n"));
5330 /* Signal to the caller that this one hasn't changed. */
5331 pPageDesc->idPage = NIL_GMM_PAGEID;
5332 return VINF_SUCCESS;
5333 }
5334
5335 /*
5336 * Free the old local page.
5337 */
5338 GMMFREEPAGEDESC PageDesc;
5339 PageDesc.idPage = pPageDesc->idPage;
5340 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5341 AssertRCReturn(rc, rc);
5342
5343 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5344
5345 /*
5346 * Pass along the new physical address & page id.
5347 */
5348 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5349 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5350
5351 return VINF_SUCCESS;
5352}
5353
5354
5355/**
5356 * RTAvlGCPtrDestroy callback.
5357 *
5358 * @returns 0 or VERR_GMM_INSTANCE.
5359 * @param pNode The node to destroy.
5360 * @param pvArgs Pointer to an argument packet.
5361 */
5362static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5363{
5364 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5365 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5366 (PGMMSHAREDMODULEPERVM)pNode,
5367 false /*fRemove*/);
5368 return VINF_SUCCESS;
5369}
5370
5371
5372/**
5373 * Used by GMMR0CleanupVM to clean up shared modules.
5374 *
5375 * This is called without taking the GMM lock so that it can be yielded as
5376 * needed here.
5377 *
5378 * @param pGMM The GMM handle.
5379 * @param pGVM The global VM handle.
5380 */
5381static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5382{
5383 gmmR0MutexAcquire(pGMM);
5384 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5385
5386 GMMR0SHMODPERVMDTORARGS Args;
5387 Args.pGVM = pGVM;
5388 Args.pGMM = pGMM;
5389 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5390
5391 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5392 pGVM->gmm.s.Stats.cShareableModules = 0;
5393
5394 gmmR0MutexRelease(pGMM);
5395}
5396
5397#endif /* VBOX_WITH_PAGE_SHARING */
5398
5399/**
5400 * Removes all shared modules for the specified VM
5401 *
5402 * @returns VBox status code.
5403 * @param pGVM The global (ring-0) VM structure.
5404 * @param idCpu The VCPU id.
5405 */
5406GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5407{
5408#ifdef VBOX_WITH_PAGE_SHARING
5409 /*
5410 * Validate input and get the basics.
5411 */
5412 PGMM pGMM;
5413 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5414 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5415 if (RT_FAILURE(rc))
5416 return rc;
5417
5418 /*
5419 * Take the semaphore and do some more validations.
5420 */
5421 gmmR0MutexAcquire(pGMM);
5422 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5423 {
5424 Log(("GMMR0ResetSharedModules\n"));
5425 GMMR0SHMODPERVMDTORARGS Args;
5426 Args.pGVM = pGVM;
5427 Args.pGMM = pGMM;
5428 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5429 pGVM->gmm.s.Stats.cShareableModules = 0;
5430
5431 rc = VINF_SUCCESS;
5432 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5433 }
5434 else
5435 rc = VERR_GMM_IS_NOT_SANE;
5436
5437 gmmR0MutexRelease(pGMM);
5438 return rc;
5439#else
5440 RT_NOREF(pGVM, idCpu);
5441 return VERR_NOT_IMPLEMENTED;
5442#endif
5443}
5444
5445#ifdef VBOX_WITH_PAGE_SHARING
5446
5447/**
5448 * Tree enumeration callback for checking a shared module.
5449 */
5450static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5451{
5452 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5453 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5454 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5455
5456 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5457 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5458
5459 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5460 if (RT_FAILURE(rc))
5461 return rc;
5462 return VINF_SUCCESS;
5463}
5464
5465#endif /* VBOX_WITH_PAGE_SHARING */
5466
5467/**
5468 * Check all shared modules for the specified VM.
5469 *
5470 * @returns VBox status code.
5471 * @param pGVM The global (ring-0) VM structure.
5472 * @param idCpu The calling EMT number.
5473 * @thread EMT(idCpu)
5474 */
5475GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5476{
5477#ifdef VBOX_WITH_PAGE_SHARING
5478 /*
5479 * Validate input and get the basics.
5480 */
5481 PGMM pGMM;
5482 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5483 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5484 if (RT_FAILURE(rc))
5485 return rc;
5486
5487# ifndef DEBUG_sandervl
5488 /*
5489 * Take the semaphore and do some more validations.
5490 */
5491 gmmR0MutexAcquire(pGMM);
5492# endif
5493 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5494 {
5495 /*
5496 * Walk the tree, checking each module.
5497 */
5498 Log(("GMMR0CheckSharedModules\n"));
5499
5500 GMMCHECKSHAREDMODULEINFO Args;
5501 Args.pGVM = pGVM;
5502 Args.idCpu = idCpu;
5503 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5504
5505 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5506 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5507 }
5508 else
5509 rc = VERR_GMM_IS_NOT_SANE;
5510
5511# ifndef DEBUG_sandervl
5512 gmmR0MutexRelease(pGMM);
5513# endif
5514 return rc;
5515#else
5516 RT_NOREF(pGVM, idCpu);
5517 return VERR_NOT_IMPLEMENTED;
5518#endif
5519}
5520
5521#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5522
5523/**
5524 * Worker for GMMR0FindDuplicatePageReq.
5525 *
5526 * @returns true if duplicate, false if not.
5527 */
5528static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5529{
5530 bool fFoundDuplicate = false;
5531 /* Only take chunks not mapped into this VM process; not entirely correct. */
5532 uint8_t *pbChunk;
5533 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5534 {
5535 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5536 if (RT_SUCCESS(rc))
5537 {
5538 /*
5539 * Look for duplicate pages
5540 */
5541 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5542 while (iPage-- > 0)
5543 {
5544 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5545 {
5546 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5547 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5548 {
5549 fFoundDuplicate = true;
5550 break;
5551 }
5552 }
5553 }
5554 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5555 }
5556 }
5557 return fFoundDuplicate;
5558}
5559
5560
5561/**
5562 * Find a duplicate of the specified page in other active VMs
5563 *
5564 * @returns VBox status code.
5565 * @param pGVM The global (ring-0) VM structure.
5566 * @param pReq Pointer to the request packet.
5567 */
5568GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5569{
5570 /*
5571 * Validate input and pass it on.
5572 */
5573 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5574 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5575
5576 PGMM pGMM;
5577 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5578
5579 int rc = GVMMR0ValidateGVM(pGVM);
5580 if (RT_FAILURE(rc))
5581 return rc;
5582
5583 /*
5584 * Take the semaphore and do some more validations.
5585 */
5586 rc = gmmR0MutexAcquire(pGMM);
5587 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5588 {
5589 uint8_t *pbChunk;
5590 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5591 if (pChunk)
5592 {
5593 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5594 {
5595 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5596 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5597 if (pPage)
5598 {
5599 /*
5600 * Walk the chunks
5601 */
5602 pReq->fDuplicate = false;
5603 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5604 {
5605 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5606 {
5607 pReq->fDuplicate = true;
5608 break;
5609 }
5610 }
5611 }
5612 else
5613 {
5614 AssertFailed();
5615 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5616 }
5617 }
5618 else
5619 AssertFailed();
5620 }
5621 else
5622 AssertFailed();
5623 }
5624 else
5625 rc = VERR_GMM_IS_NOT_SANE;
5626
5627 gmmR0MutexRelease(pGMM);
5628 return rc;
5629}
5630
5631#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5632
5633
5634/**
5635 * Retrieves the GMM statistics visible to the caller.
5636 *
5637 * @returns VBox status code.
5638 *
5639 * @param pStats Where to put the statistics.
5640 * @param pSession The current session.
5641 * @param pGVM The GVM to obtain statistics for. Optional.
5642 */
5643GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5644{
5645 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5646
5647 /*
5648 * Validate input.
5649 */
5650 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5651 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5652 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5653
5654 PGMM pGMM;
5655 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5656
5657 /*
5658 * Validate the VM handle, if not NULL, and lock the GMM.
5659 */
5660 int rc;
5661 if (pGVM)
5662 {
5663 rc = GVMMR0ValidateGVM(pGVM);
5664 if (RT_FAILURE(rc))
5665 return rc;
5666 }
5667
5668 rc = gmmR0MutexAcquire(pGMM);
5669 if (RT_FAILURE(rc))
5670 return rc;
5671
5672 /*
5673 * Copy out the GMM statistics.
5674 */
5675 pStats->cMaxPages = pGMM->cMaxPages;
5676 pStats->cReservedPages = pGMM->cReservedPages;
5677 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5678 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5679 pStats->cSharedPages = pGMM->cSharedPages;
5680 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5681 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5682 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5683 pStats->cChunks = pGMM->cChunks;
5684 pStats->cFreedChunks = pGMM->cFreedChunks;
5685 pStats->cShareableModules = pGMM->cShareableModules;
5686 pStats->idFreeGeneration = pGMM->idFreeGeneration;
5687 RT_ZERO(pStats->au64Reserved);
5688
5689 /*
5690 * Copy out the VM statistics.
5691 */
5692 if (pGVM)
5693 pStats->VMStats = pGVM->gmm.s.Stats;
5694 else
5695 RT_ZERO(pStats->VMStats);
5696
5697 gmmR0MutexRelease(pGMM);
5698 return rc;
5699}
5700
5701
5702/**
5703 * VMMR0 request wrapper for GMMR0QueryStatistics.
5704 *
5705 * @returns see GMMR0QueryStatistics.
5706 * @param pGVM The global (ring-0) VM structure. Optional.
5707 * @param pReq Pointer to the request packet.
5708 */
5709GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5710{
5711 /*
5712 * Validate input and pass it on.
5713 */
5714 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5715 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5716
5717 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5718}
5719
5720
5721/**
5722 * Resets the specified GMM statistics.
5723 *
5724 * @returns VBox status code.
5725 *
5726 * @param pStats Which statistics to reset, that is, non-zero fields
5727 * indicates which to reset.
5728 * @param pSession The current session.
5729 * @param pGVM The GVM to reset statistics for. Optional.
5730 */
5731GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5732{
5733 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5734 /* Currently nothing we can reset at the moment. */
5735 return VINF_SUCCESS;
5736}
5737
5738
5739/**
5740 * VMMR0 request wrapper for GMMR0ResetStatistics.
5741 *
5742 * @returns see GMMR0ResetStatistics.
5743 * @param pGVM The global (ring-0) VM structure. Optional.
5744 * @param pReq Pointer to the request packet.
5745 */
5746GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5747{
5748 /*
5749 * Validate input and pass it on.
5750 */
5751 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5752 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5753
5754 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5755}
5756
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette