VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 88839

Last change on this file since 88839 was 82992, checked in by vboxsync, 5 years ago

VMM/GMMR0: Added a per-VM chunk TLB to avoid having everyone hammer the global spinlock. [doxyfix] bugref:9627

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 200.8 KB
Line 
1/* $Id: GMMR0.cpp 82992 2020-02-05 12:16:50Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183
184/*********************************************************************************************************************************
185* Defined Constants And Macros *
186*********************************************************************************************************************************/
187/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
188 * Use a critical section instead of a fast mutex for the giant GMM lock.
189 *
190 * @remarks This is primarily a way of avoiding the deadlock checks in the
191 * windows driver verifier. */
192#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
193# define VBOX_USE_CRIT_SECT_FOR_GIANT
194#endif
195
196#if (!defined(VBOX_WITH_RAM_IN_KERNEL) || defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)) \
197 && !defined(RT_OS_DARWIN)
198/** Enable the legacy mode code (will be dropped soon). */
199# define GMM_WITH_LEGACY_MODE
200#endif
201
202
203/*********************************************************************************************************************************
204* Structures and Typedefs *
205*********************************************************************************************************************************/
206/** Pointer to set of free chunks. */
207typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
208
209/**
210 * The per-page tracking structure employed by the GMM.
211 *
212 * On 32-bit hosts we'll some trickery is necessary to compress all
213 * the information into 32-bits. When the fSharedFree member is set,
214 * the 30th bit decides whether it's a free page or not.
215 *
216 * Because of the different layout on 32-bit and 64-bit hosts, macros
217 * are used to get and set some of the data.
218 */
219typedef union GMMPAGE
220{
221#if HC_ARCH_BITS == 64
222 /** Unsigned integer view. */
223 uint64_t u;
224
225 /** The common view. */
226 struct GMMPAGECOMMON
227 {
228 uint32_t uStuff1 : 32;
229 uint32_t uStuff2 : 30;
230 /** The page state. */
231 uint32_t u2State : 2;
232 } Common;
233
234 /** The view of a private page. */
235 struct GMMPAGEPRIVATE
236 {
237 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
238 uint32_t pfn;
239 /** The GVM handle. (64K VMs) */
240 uint32_t hGVM : 16;
241 /** Reserved. */
242 uint32_t u16Reserved : 14;
243 /** The page state. */
244 uint32_t u2State : 2;
245 } Private;
246
247 /** The view of a shared page. */
248 struct GMMPAGESHARED
249 {
250 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
251 uint32_t pfn;
252 /** The reference count (64K VMs). */
253 uint32_t cRefs : 16;
254 /** Used for debug checksumming. */
255 uint32_t u14Checksum : 14;
256 /** The page state. */
257 uint32_t u2State : 2;
258 } Shared;
259
260 /** The view of a free page. */
261 struct GMMPAGEFREE
262 {
263 /** The index of the next page in the free list. UINT16_MAX is NIL. */
264 uint16_t iNext;
265 /** Reserved. Checksum or something? */
266 uint16_t u16Reserved0;
267 /** Reserved. Checksum or something? */
268 uint32_t u30Reserved1 : 30;
269 /** The page state. */
270 uint32_t u2State : 2;
271 } Free;
272
273#else /* 32-bit */
274 /** Unsigned integer view. */
275 uint32_t u;
276
277 /** The common view. */
278 struct GMMPAGECOMMON
279 {
280 uint32_t uStuff : 30;
281 /** The page state. */
282 uint32_t u2State : 2;
283 } Common;
284
285 /** The view of a private page. */
286 struct GMMPAGEPRIVATE
287 {
288 /** The guest page frame number. (Max addressable: 2 ^ 36) */
289 uint32_t pfn : 24;
290 /** The GVM handle. (127 VMs) */
291 uint32_t hGVM : 7;
292 /** The top page state bit, MBZ. */
293 uint32_t fZero : 1;
294 } Private;
295
296 /** The view of a shared page. */
297 struct GMMPAGESHARED
298 {
299 /** The reference count. */
300 uint32_t cRefs : 30;
301 /** The page state. */
302 uint32_t u2State : 2;
303 } Shared;
304
305 /** The view of a free page. */
306 struct GMMPAGEFREE
307 {
308 /** The index of the next page in the free list. UINT16_MAX is NIL. */
309 uint32_t iNext : 16;
310 /** Reserved. Checksum or something? */
311 uint32_t u14Reserved : 14;
312 /** The page state. */
313 uint32_t u2State : 2;
314 } Free;
315#endif
316} GMMPAGE;
317AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
318/** Pointer to a GMMPAGE. */
319typedef GMMPAGE *PGMMPAGE;
320
321
322/** @name The Page States.
323 * @{ */
324/** A private page. */
325#define GMM_PAGE_STATE_PRIVATE 0
326/** A private page - alternative value used on the 32-bit implementation.
327 * This will never be used on 64-bit hosts. */
328#define GMM_PAGE_STATE_PRIVATE_32 1
329/** A shared page. */
330#define GMM_PAGE_STATE_SHARED 2
331/** A free page. */
332#define GMM_PAGE_STATE_FREE 3
333/** @} */
334
335
336/** @def GMM_PAGE_IS_PRIVATE
337 *
338 * @returns true if private, false if not.
339 * @param pPage The GMM page.
340 */
341#if HC_ARCH_BITS == 64
342# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
343#else
344# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
345#endif
346
347/** @def GMM_PAGE_IS_SHARED
348 *
349 * @returns true if shared, false if not.
350 * @param pPage The GMM page.
351 */
352#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
353
354/** @def GMM_PAGE_IS_FREE
355 *
356 * @returns true if free, false if not.
357 * @param pPage The GMM page.
358 */
359#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
360
361/** @def GMM_PAGE_PFN_LAST
362 * The last valid guest pfn range.
363 * @remark Some of the values outside the range has special meaning,
364 * see GMM_PAGE_PFN_UNSHAREABLE.
365 */
366#if HC_ARCH_BITS == 64
367# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
368#else
369# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
370#endif
371AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
372
373/** @def GMM_PAGE_PFN_UNSHAREABLE
374 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
375 */
376#if HC_ARCH_BITS == 64
377# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
378#else
379# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
380#endif
381AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
382
383
384/**
385 * A GMM allocation chunk ring-3 mapping record.
386 *
387 * This should really be associated with a session and not a VM, but
388 * it's simpler to associated with a VM and cleanup with the VM object
389 * is destroyed.
390 */
391typedef struct GMMCHUNKMAP
392{
393 /** The mapping object. */
394 RTR0MEMOBJ hMapObj;
395 /** The VM owning the mapping. */
396 PGVM pGVM;
397} GMMCHUNKMAP;
398/** Pointer to a GMM allocation chunk mapping. */
399typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
400
401
402/**
403 * A GMM allocation chunk.
404 */
405typedef struct GMMCHUNK
406{
407 /** The AVL node core.
408 * The Key is the chunk ID. (Giant mtx.) */
409 AVLU32NODECORE Core;
410 /** The memory object.
411 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
412 * what the host can dish up with. (Chunk mtx protects mapping accesses
413 * and related frees.) */
414 RTR0MEMOBJ hMemObj;
415#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
416 /** Pointer to the kernel mapping. */
417 uint8_t *pbMapping;
418#endif
419 /** Pointer to the next chunk in the free list. (Giant mtx.) */
420 PGMMCHUNK pFreeNext;
421 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
422 PGMMCHUNK pFreePrev;
423 /** Pointer to the free set this chunk belongs to. NULL for
424 * chunks with no free pages. (Giant mtx.) */
425 PGMMCHUNKFREESET pSet;
426 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
427 RTLISTNODE ListNode;
428 /** Pointer to an array of mappings. (Chunk mtx.) */
429 PGMMCHUNKMAP paMappingsX;
430 /** The number of mappings. (Chunk mtx.) */
431 uint16_t cMappingsX;
432 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
433 * mapping or freeing anything. (Giant mtx.) */
434 uint8_t volatile iChunkMtx;
435 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
436 uint8_t fFlags;
437 /** The head of the list of free pages. UINT16_MAX is the NIL value.
438 * (Giant mtx.) */
439 uint16_t iFreeHead;
440 /** The number of free pages. (Giant mtx.) */
441 uint16_t cFree;
442 /** The GVM handle of the VM that first allocated pages from this chunk, this
443 * is used as a preference when there are several chunks to choose from.
444 * When in bound memory mode this isn't a preference any longer. (Giant
445 * mtx.) */
446 uint16_t hGVM;
447 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
448 * future use.) (Giant mtx.) */
449 uint16_t idNumaNode;
450 /** The number of private pages. (Giant mtx.) */
451 uint16_t cPrivate;
452 /** The number of shared pages. (Giant mtx.) */
453 uint16_t cShared;
454 /** The pages. (Giant mtx.) */
455 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
456} GMMCHUNK;
457
458/** Indicates that the NUMA properies of the memory is unknown. */
459#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
460
461/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
462 * @{ */
463/** Indicates that the chunk is a large page (2MB). */
464#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
465#ifdef GMM_WITH_LEGACY_MODE
466/** Indicates that the chunk was locked rather than allocated directly. */
467# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
468#endif
469/** @} */
470
471
472/**
473 * An allocation chunk TLB entry.
474 */
475typedef struct GMMCHUNKTLBE
476{
477 /** The chunk id. */
478 uint32_t idChunk;
479 /** Pointer to the chunk. */
480 PGMMCHUNK pChunk;
481} GMMCHUNKTLBE;
482/** Pointer to an allocation chunk TLB entry. */
483typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
484
485
486/** The number of entries in the allocation chunk TLB. */
487#define GMM_CHUNKTLB_ENTRIES 32
488/** Gets the TLB entry index for the given Chunk ID. */
489#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
490
491/**
492 * An allocation chunk TLB.
493 */
494typedef struct GMMCHUNKTLB
495{
496 /** The TLB entries. */
497 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
498} GMMCHUNKTLB;
499/** Pointer to an allocation chunk TLB. */
500typedef GMMCHUNKTLB *PGMMCHUNKTLB;
501
502
503/**
504 * The GMM instance data.
505 */
506typedef struct GMM
507{
508 /** Magic / eye catcher. GMM_MAGIC */
509 uint32_t u32Magic;
510 /** The number of threads waiting on the mutex. */
511 uint32_t cMtxContenders;
512#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
513 /** The critical section protecting the GMM.
514 * More fine grained locking can be implemented later if necessary. */
515 RTCRITSECT GiantCritSect;
516#else
517 /** The fast mutex protecting the GMM.
518 * More fine grained locking can be implemented later if necessary. */
519 RTSEMFASTMUTEX hMtx;
520#endif
521#ifdef VBOX_STRICT
522 /** The current mutex owner. */
523 RTNATIVETHREAD hMtxOwner;
524#endif
525 /** Spinlock protecting the AVL tree.
526 * @todo Make this a read-write spinlock as we should allow concurrent
527 * lookups. */
528 RTSPINLOCK hSpinLockTree;
529 /** The chunk tree.
530 * Protected by hSpinLockTree. */
531 PAVLU32NODECORE pChunks;
532 /** Chunk freeing generation - incremented whenever a chunk is freed. Used
533 * for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
534 * (exclusive), though higher numbers may temporarily occure while
535 * invalidating the individual TLBs during wrap-around processing. */
536 uint64_t volatile idFreeGeneration;
537 /** The chunk TLB.
538 * Protected by hSpinLockTree. */
539 GMMCHUNKTLB ChunkTLB;
540 /** The private free set. */
541 GMMCHUNKFREESET PrivateX;
542 /** The shared free set. */
543 GMMCHUNKFREESET Shared;
544
545 /** Shared module tree (global).
546 * @todo separate trees for distinctly different guest OSes. */
547 PAVLLU32NODECORE pGlobalSharedModuleTree;
548 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
549 uint32_t cShareableModules;
550
551 /** The chunk list. For simplifying the cleanup process and avoid tree
552 * traversal. */
553 RTLISTANCHOR ChunkList;
554
555 /** The maximum number of pages we're allowed to allocate.
556 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
557 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
558 uint64_t cMaxPages;
559 /** The number of pages that has been reserved.
560 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
561 uint64_t cReservedPages;
562 /** The number of pages that we have over-committed in reservations. */
563 uint64_t cOverCommittedPages;
564 /** The number of actually allocated (committed if you like) pages. */
565 uint64_t cAllocatedPages;
566 /** The number of pages that are shared. A subset of cAllocatedPages. */
567 uint64_t cSharedPages;
568 /** The number of pages that are actually shared between VMs. */
569 uint64_t cDuplicatePages;
570 /** The number of pages that are shared that has been left behind by
571 * VMs not doing proper cleanups. */
572 uint64_t cLeftBehindSharedPages;
573 /** The number of allocation chunks.
574 * (The number of pages we've allocated from the host can be derived from this.) */
575 uint32_t cChunks;
576 /** The number of current ballooned pages. */
577 uint64_t cBalloonedPages;
578
579#ifndef GMM_WITH_LEGACY_MODE
580# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
581 /** Whether #RTR0MemObjAllocPhysNC works. */
582 bool fHasWorkingAllocPhysNC;
583# else
584 bool fPadding;
585# endif
586#else
587 /** The legacy allocation mode indicator.
588 * This is determined at initialization time. */
589 bool fLegacyAllocationMode;
590#endif
591 /** The bound memory mode indicator.
592 * When set, the memory will be bound to a specific VM and never
593 * shared. This is always set if fLegacyAllocationMode is set.
594 * (Also determined at initialization time.) */
595 bool fBoundMemoryMode;
596 /** The number of registered VMs. */
597 uint16_t cRegisteredVMs;
598
599 /** The number of freed chunks ever. This is used a list generation to
600 * avoid restarting the cleanup scanning when the list wasn't modified. */
601 uint32_t cFreedChunks;
602 /** The previous allocated Chunk ID.
603 * Used as a hint to avoid scanning the whole bitmap. */
604 uint32_t idChunkPrev;
605 /** Chunk ID allocation bitmap.
606 * Bits of allocated IDs are set, free ones are clear.
607 * The NIL id (0) is marked allocated. */
608 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
609
610 /** The index of the next mutex to use. */
611 uint32_t iNextChunkMtx;
612 /** Chunk locks for reducing lock contention without having to allocate
613 * one lock per chunk. */
614 struct
615 {
616 /** The mutex */
617 RTSEMFASTMUTEX hMtx;
618 /** The number of threads currently using this mutex. */
619 uint32_t volatile cUsers;
620 } aChunkMtx[64];
621} GMM;
622/** Pointer to the GMM instance. */
623typedef GMM *PGMM;
624
625/** The value of GMM::u32Magic (Katsuhiro Otomo). */
626#define GMM_MAGIC UINT32_C(0x19540414)
627
628
629/**
630 * GMM chunk mutex state.
631 *
632 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
633 * gmmR0ChunkMutex* methods.
634 */
635typedef struct GMMR0CHUNKMTXSTATE
636{
637 PGMM pGMM;
638 /** The index of the chunk mutex. */
639 uint8_t iChunkMtx;
640 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
641 uint8_t fFlags;
642} GMMR0CHUNKMTXSTATE;
643/** Pointer to a chunk mutex state. */
644typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
645
646/** @name GMMR0CHUNK_MTX_XXX
647 * @{ */
648#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
649#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
650#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
651#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
652#define GMMR0CHUNK_MTX_END UINT32_C(4)
653/** @} */
654
655
656/** The maximum number of shared modules per-vm. */
657#define GMM_MAX_SHARED_PER_VM_MODULES 2048
658/** The maximum number of shared modules GMM is allowed to track. */
659#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
660
661
662/**
663 * Argument packet for gmmR0SharedModuleCleanup.
664 */
665typedef struct GMMR0SHMODPERVMDTORARGS
666{
667 PGVM pGVM;
668 PGMM pGMM;
669} GMMR0SHMODPERVMDTORARGS;
670
671/**
672 * Argument packet for gmmR0CheckSharedModule.
673 */
674typedef struct GMMCHECKSHAREDMODULEINFO
675{
676 PGVM pGVM;
677 VMCPUID idCpu;
678} GMMCHECKSHAREDMODULEINFO;
679
680
681/*********************************************************************************************************************************
682* Global Variables *
683*********************************************************************************************************************************/
684/** Pointer to the GMM instance data. */
685static PGMM g_pGMM = NULL;
686
687/** Macro for obtaining and validating the g_pGMM pointer.
688 *
689 * On failure it will return from the invoking function with the specified
690 * return value.
691 *
692 * @param pGMM The name of the pGMM variable.
693 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
694 * status codes.
695 */
696#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
697 do { \
698 (pGMM) = g_pGMM; \
699 AssertPtrReturn((pGMM), (rc)); \
700 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
701 } while (0)
702
703/** Macro for obtaining and validating the g_pGMM pointer, void function
704 * variant.
705 *
706 * On failure it will return from the invoking function.
707 *
708 * @param pGMM The name of the pGMM variable.
709 */
710#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
711 do { \
712 (pGMM) = g_pGMM; \
713 AssertPtrReturnVoid((pGMM)); \
714 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
715 } while (0)
716
717
718/** @def GMM_CHECK_SANITY_UPON_ENTERING
719 * Checks the sanity of the GMM instance data before making changes.
720 *
721 * This is macro is a stub by default and must be enabled manually in the code.
722 *
723 * @returns true if sane, false if not.
724 * @param pGMM The name of the pGMM variable.
725 */
726#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
727# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
728#else
729# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
730#endif
731
732/** @def GMM_CHECK_SANITY_UPON_LEAVING
733 * Checks the sanity of the GMM instance data after making changes.
734 *
735 * This is macro is a stub by default and must be enabled manually in the code.
736 *
737 * @returns true if sane, false if not.
738 * @param pGMM The name of the pGMM variable.
739 */
740#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
741# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
742#else
743# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
744#endif
745
746/** @def GMM_CHECK_SANITY_IN_LOOPS
747 * Checks the sanity of the GMM instance in the allocation loops.
748 *
749 * This is macro is a stub by default and must be enabled manually in the code.
750 *
751 * @returns true if sane, false if not.
752 * @param pGMM The name of the pGMM variable.
753 */
754#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
755# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
756#else
757# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
758#endif
759
760
761/*********************************************************************************************************************************
762* Internal Functions *
763*********************************************************************************************************************************/
764static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
765static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
766DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
767DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
768DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
769#ifdef GMMR0_WITH_SANITY_CHECK
770static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
771#endif
772static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
773DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
774DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
775static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
776#ifdef VBOX_WITH_PAGE_SHARING
777static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
778# ifdef VBOX_STRICT
779static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
780# endif
781#endif
782
783
784
785/**
786 * Initializes the GMM component.
787 *
788 * This is called when the VMMR0.r0 module is loaded and protected by the
789 * loader semaphore.
790 *
791 * @returns VBox status code.
792 */
793GMMR0DECL(int) GMMR0Init(void)
794{
795 LogFlow(("GMMInit:\n"));
796
797 /*
798 * Allocate the instance data and the locks.
799 */
800 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
801 if (!pGMM)
802 return VERR_NO_MEMORY;
803
804 pGMM->u32Magic = GMM_MAGIC;
805 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
806 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
807 RTListInit(&pGMM->ChunkList);
808 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
809
810#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
811 int rc = RTCritSectInit(&pGMM->GiantCritSect);
812#else
813 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
814#endif
815 if (RT_SUCCESS(rc))
816 {
817 unsigned iMtx;
818 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
819 {
820 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
821 if (RT_FAILURE(rc))
822 break;
823 }
824 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
825 if (RT_SUCCESS(rc))
826 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
827 if (RT_SUCCESS(rc))
828 {
829#ifndef GMM_WITH_LEGACY_MODE
830 /*
831 * Figure out how we're going to allocate stuff (only applicable to
832 * host with linear physical memory mappings).
833 */
834 pGMM->fBoundMemoryMode = false;
835# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
836 pGMM->fHasWorkingAllocPhysNC = false;
837
838 RTR0MEMOBJ hMemObj;
839 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
840 if (RT_SUCCESS(rc))
841 {
842 rc = RTR0MemObjFree(hMemObj, true);
843 AssertRC(rc);
844 pGMM->fHasWorkingAllocPhysNC = true;
845 }
846 else if (rc != VERR_NOT_SUPPORTED)
847 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
848# endif
849#else /* GMM_WITH_LEGACY_MODE */
850 /*
851 * Check and see if RTR0MemObjAllocPhysNC works.
852 */
853# if 0 /* later, see @bufref{3170}. */
854 RTR0MEMOBJ MemObj;
855 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
856 if (RT_SUCCESS(rc))
857 {
858 rc = RTR0MemObjFree(MemObj, true);
859 AssertRC(rc);
860 }
861 else if (rc == VERR_NOT_SUPPORTED)
862 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
863 else
864 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
865# else
866# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
867 pGMM->fLegacyAllocationMode = false;
868# if ARCH_BITS == 32
869 /* Don't reuse possibly partial chunks because of the virtual
870 address space limitation. */
871 pGMM->fBoundMemoryMode = true;
872# else
873 pGMM->fBoundMemoryMode = false;
874# endif
875# else
876 pGMM->fLegacyAllocationMode = true;
877 pGMM->fBoundMemoryMode = true;
878# endif
879# endif
880#endif /* GMM_WITH_LEGACY_MODE */
881
882 /*
883 * Query system page count and guess a reasonable cMaxPages value.
884 */
885 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
886
887 /*
888 * The idFreeGeneration value should be set so we actually trigger the
889 * wrap-around invalidation handling during a typical test run.
890 */
891 pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
892
893 g_pGMM = pGMM;
894#ifdef GMM_WITH_LEGACY_MODE
895 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
896#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
897 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
898#else
899 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
900#endif
901 return VINF_SUCCESS;
902 }
903
904 /*
905 * Bail out.
906 */
907 RTSpinlockDestroy(pGMM->hSpinLockTree);
908 while (iMtx-- > 0)
909 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
910#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
911 RTCritSectDelete(&pGMM->GiantCritSect);
912#else
913 RTSemFastMutexDestroy(pGMM->hMtx);
914#endif
915 }
916
917 pGMM->u32Magic = 0;
918 RTMemFree(pGMM);
919 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
920 return rc;
921}
922
923
924/**
925 * Terminates the GMM component.
926 */
927GMMR0DECL(void) GMMR0Term(void)
928{
929 LogFlow(("GMMTerm:\n"));
930
931 /*
932 * Take care / be paranoid...
933 */
934 PGMM pGMM = g_pGMM;
935 if (!VALID_PTR(pGMM))
936 return;
937 if (pGMM->u32Magic != GMM_MAGIC)
938 {
939 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
940 return;
941 }
942
943 /*
944 * Undo what init did and free all the resources we've acquired.
945 */
946 /* Destroy the fundamentals. */
947 g_pGMM = NULL;
948 pGMM->u32Magic = ~GMM_MAGIC;
949#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
950 RTCritSectDelete(&pGMM->GiantCritSect);
951#else
952 RTSemFastMutexDestroy(pGMM->hMtx);
953 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
954#endif
955 RTSpinlockDestroy(pGMM->hSpinLockTree);
956 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
957
958 /* Free any chunks still hanging around. */
959 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
960
961 /* Destroy the chunk locks. */
962 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
963 {
964 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
965 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
966 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
967 }
968
969 /* Finally the instance data itself. */
970 RTMemFree(pGMM);
971 LogFlow(("GMMTerm: done\n"));
972}
973
974
975/**
976 * RTAvlU32Destroy callback.
977 *
978 * @returns 0
979 * @param pNode The node to destroy.
980 * @param pvGMM The GMM handle.
981 */
982static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
983{
984 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
985
986 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
987 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
988 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
989
990 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
991 if (RT_FAILURE(rc))
992 {
993 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
994 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
995 AssertRC(rc);
996 }
997 pChunk->hMemObj = NIL_RTR0MEMOBJ;
998
999 RTMemFree(pChunk->paMappingsX);
1000 pChunk->paMappingsX = NULL;
1001
1002 RTMemFree(pChunk);
1003 NOREF(pvGMM);
1004 return 0;
1005}
1006
1007
1008/**
1009 * Initializes the per-VM data for the GMM.
1010 *
1011 * This is called from within the GVMM lock (from GVMMR0CreateVM)
1012 * and should only initialize the data members so GMMR0CleanupVM
1013 * can deal with them. We reserve no memory or anything here,
1014 * that's done later in GMMR0InitVM.
1015 *
1016 * @param pGVM Pointer to the Global VM structure.
1017 */
1018GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
1019{
1020 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1021
1022 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1023 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1024 pGVM->gmm.s.Stats.fMayAllocate = false;
1025
1026 pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
1027 int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
1028 AssertRCReturn(rc, rc);
1029
1030 return VINF_SUCCESS;
1031}
1032
1033
1034/**
1035 * Acquires the GMM giant lock.
1036 *
1037 * @returns Assert status code from RTSemFastMutexRequest.
1038 * @param pGMM Pointer to the GMM instance.
1039 */
1040static int gmmR0MutexAcquire(PGMM pGMM)
1041{
1042 ASMAtomicIncU32(&pGMM->cMtxContenders);
1043#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1044 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1045#else
1046 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1047#endif
1048 ASMAtomicDecU32(&pGMM->cMtxContenders);
1049 AssertRC(rc);
1050#ifdef VBOX_STRICT
1051 pGMM->hMtxOwner = RTThreadNativeSelf();
1052#endif
1053 return rc;
1054}
1055
1056
1057/**
1058 * Releases the GMM giant lock.
1059 *
1060 * @returns Assert status code from RTSemFastMutexRequest.
1061 * @param pGMM Pointer to the GMM instance.
1062 */
1063static int gmmR0MutexRelease(PGMM pGMM)
1064{
1065#ifdef VBOX_STRICT
1066 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1067#endif
1068#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1069 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1070#else
1071 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1072 AssertRC(rc);
1073#endif
1074 return rc;
1075}
1076
1077
1078/**
1079 * Yields the GMM giant lock if there is contention and a certain minimum time
1080 * has elapsed since we took it.
1081 *
1082 * @returns @c true if the mutex was yielded, @c false if not.
1083 * @param pGMM Pointer to the GMM instance.
1084 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1085 * (in/out).
1086 */
1087static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1088{
1089 /*
1090 * If nobody is contending the mutex, don't bother checking the time.
1091 */
1092 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1093 return false;
1094
1095 /*
1096 * Don't yield if we haven't executed for at least 2 milliseconds.
1097 */
1098 uint64_t uNanoNow = RTTimeSystemNanoTS();
1099 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1100 return false;
1101
1102 /*
1103 * Yield the mutex.
1104 */
1105#ifdef VBOX_STRICT
1106 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1107#endif
1108 ASMAtomicIncU32(&pGMM->cMtxContenders);
1109#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1110 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1111#else
1112 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1113#endif
1114
1115 RTThreadYield();
1116
1117#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1118 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1119#else
1120 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1121#endif
1122 *puLockNanoTS = RTTimeSystemNanoTS();
1123 ASMAtomicDecU32(&pGMM->cMtxContenders);
1124#ifdef VBOX_STRICT
1125 pGMM->hMtxOwner = RTThreadNativeSelf();
1126#endif
1127
1128 return true;
1129}
1130
1131
1132/**
1133 * Acquires a chunk lock.
1134 *
1135 * The caller must own the giant lock.
1136 *
1137 * @returns Assert status code from RTSemFastMutexRequest.
1138 * @param pMtxState The chunk mutex state info. (Avoids
1139 * passing the same flags and stuff around
1140 * for subsequent release and drop-giant
1141 * calls.)
1142 * @param pGMM Pointer to the GMM instance.
1143 * @param pChunk Pointer to the chunk.
1144 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1145 */
1146static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1147{
1148 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1149 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1150
1151 pMtxState->pGMM = pGMM;
1152 pMtxState->fFlags = (uint8_t)fFlags;
1153
1154 /*
1155 * Get the lock index and reference the lock.
1156 */
1157 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1158 uint32_t iChunkMtx = pChunk->iChunkMtx;
1159 if (iChunkMtx == UINT8_MAX)
1160 {
1161 iChunkMtx = pGMM->iNextChunkMtx++;
1162 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1163
1164 /* Try get an unused one... */
1165 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1166 {
1167 iChunkMtx = pGMM->iNextChunkMtx++;
1168 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1169 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1170 {
1171 iChunkMtx = pGMM->iNextChunkMtx++;
1172 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1173 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1174 {
1175 iChunkMtx = pGMM->iNextChunkMtx++;
1176 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1177 }
1178 }
1179 }
1180
1181 pChunk->iChunkMtx = iChunkMtx;
1182 }
1183 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1184 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1185 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1186
1187 /*
1188 * Drop the giant?
1189 */
1190 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1191 {
1192 /** @todo GMM life cycle cleanup (we may race someone
1193 * destroying and cleaning up GMM)? */
1194 gmmR0MutexRelease(pGMM);
1195 }
1196
1197 /*
1198 * Take the chunk mutex.
1199 */
1200 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1201 AssertRC(rc);
1202 return rc;
1203}
1204
1205
1206/**
1207 * Releases the GMM giant lock.
1208 *
1209 * @returns Assert status code from RTSemFastMutexRequest.
1210 * @param pMtxState Pointer to the chunk mutex state.
1211 * @param pChunk Pointer to the chunk if it's still
1212 * alive, NULL if it isn't. This is used to deassociate
1213 * the chunk from the mutex on the way out so a new one
1214 * can be selected next time, thus avoiding contented
1215 * mutexes.
1216 */
1217static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1218{
1219 PGMM pGMM = pMtxState->pGMM;
1220
1221 /*
1222 * Release the chunk mutex and reacquire the giant if requested.
1223 */
1224 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1225 AssertRC(rc);
1226 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1227 rc = gmmR0MutexAcquire(pGMM);
1228 else
1229 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1230
1231 /*
1232 * Drop the chunk mutex user reference and deassociate it from the chunk
1233 * when possible.
1234 */
1235 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1236 && pChunk
1237 && RT_SUCCESS(rc) )
1238 {
1239 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1240 pChunk->iChunkMtx = UINT8_MAX;
1241 else
1242 {
1243 rc = gmmR0MutexAcquire(pGMM);
1244 if (RT_SUCCESS(rc))
1245 {
1246 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1247 pChunk->iChunkMtx = UINT8_MAX;
1248 rc = gmmR0MutexRelease(pGMM);
1249 }
1250 }
1251 }
1252
1253 pMtxState->pGMM = NULL;
1254 return rc;
1255}
1256
1257
1258/**
1259 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1260 * chunk locked.
1261 *
1262 * This only works if gmmR0ChunkMutexAcquire was called with
1263 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1264 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1265 *
1266 * @returns VBox status code (assuming success is ok).
1267 * @param pMtxState Pointer to the chunk mutex state.
1268 */
1269static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1270{
1271 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1272 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1273 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1274 /** @todo GMM life cycle cleanup (we may race someone
1275 * destroying and cleaning up GMM)? */
1276 return gmmR0MutexRelease(pMtxState->pGMM);
1277}
1278
1279
1280/**
1281 * For experimenting with NUMA affinity and such.
1282 *
1283 * @returns The current NUMA Node ID.
1284 */
1285static uint16_t gmmR0GetCurrentNumaNodeId(void)
1286{
1287#if 1
1288 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1289#else
1290 return RTMpCpuId() / 16;
1291#endif
1292}
1293
1294
1295
1296/**
1297 * Cleans up when a VM is terminating.
1298 *
1299 * @param pGVM Pointer to the Global VM structure.
1300 */
1301GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1302{
1303 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1304
1305 PGMM pGMM;
1306 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1307
1308#ifdef VBOX_WITH_PAGE_SHARING
1309 /*
1310 * Clean up all registered shared modules first.
1311 */
1312 gmmR0SharedModuleCleanup(pGMM, pGVM);
1313#endif
1314
1315 gmmR0MutexAcquire(pGMM);
1316 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1317 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1318
1319 /*
1320 * The policy is 'INVALID' until the initial reservation
1321 * request has been serviced.
1322 */
1323 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1324 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1325 {
1326 /*
1327 * If it's the last VM around, we can skip walking all the chunk looking
1328 * for the pages owned by this VM and instead flush the whole shebang.
1329 *
1330 * This takes care of the eventuality that a VM has left shared page
1331 * references behind (shouldn't happen of course, but you never know).
1332 */
1333 Assert(pGMM->cRegisteredVMs);
1334 pGMM->cRegisteredVMs--;
1335
1336 /*
1337 * Walk the entire pool looking for pages that belong to this VM
1338 * and leftover mappings. (This'll only catch private pages,
1339 * shared pages will be 'left behind'.)
1340 */
1341 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1342 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1343
1344 unsigned iCountDown = 64;
1345 bool fRedoFromStart;
1346 PGMMCHUNK pChunk;
1347 do
1348 {
1349 fRedoFromStart = false;
1350 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1351 {
1352 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1353 if ( ( !pGMM->fBoundMemoryMode
1354 || pChunk->hGVM == pGVM->hSelf)
1355 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1356 {
1357 /* We left the giant mutex, so reset the yield counters. */
1358 uLockNanoTS = RTTimeSystemNanoTS();
1359 iCountDown = 64;
1360 }
1361 else
1362 {
1363 /* Didn't leave it, so do normal yielding. */
1364 if (!iCountDown)
1365 gmmR0MutexYield(pGMM, &uLockNanoTS);
1366 else
1367 iCountDown--;
1368 }
1369 if (pGMM->cFreedChunks != cFreeChunksOld)
1370 {
1371 fRedoFromStart = true;
1372 break;
1373 }
1374 }
1375 } while (fRedoFromStart);
1376
1377 if (pGVM->gmm.s.Stats.cPrivatePages)
1378 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1379
1380 pGMM->cAllocatedPages -= cPrivatePages;
1381
1382 /*
1383 * Free empty chunks.
1384 */
1385 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1386 do
1387 {
1388 fRedoFromStart = false;
1389 iCountDown = 10240;
1390 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1391 while (pChunk)
1392 {
1393 PGMMCHUNK pNext = pChunk->pFreeNext;
1394 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1395 if ( !pGMM->fBoundMemoryMode
1396 || pChunk->hGVM == pGVM->hSelf)
1397 {
1398 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1399 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1400 {
1401 /* We've left the giant mutex, restart? (+1 for our unlink) */
1402 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1403 if (fRedoFromStart)
1404 break;
1405 uLockNanoTS = RTTimeSystemNanoTS();
1406 iCountDown = 10240;
1407 }
1408 }
1409
1410 /* Advance and maybe yield the lock. */
1411 pChunk = pNext;
1412 if (--iCountDown == 0)
1413 {
1414 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1415 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1416 && pPrivateSet->idGeneration != idGenerationOld;
1417 if (fRedoFromStart)
1418 break;
1419 iCountDown = 10240;
1420 }
1421 }
1422 } while (fRedoFromStart);
1423
1424 /*
1425 * Account for shared pages that weren't freed.
1426 */
1427 if (pGVM->gmm.s.Stats.cSharedPages)
1428 {
1429 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1430 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1431 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1432 }
1433
1434 /*
1435 * Clean up balloon statistics in case the VM process crashed.
1436 */
1437 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1438 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1439
1440 /*
1441 * Update the over-commitment management statistics.
1442 */
1443 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1444 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1445 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1446 switch (pGVM->gmm.s.Stats.enmPolicy)
1447 {
1448 case GMMOCPOLICY_NO_OC:
1449 break;
1450 default:
1451 /** @todo Update GMM->cOverCommittedPages */
1452 break;
1453 }
1454 }
1455
1456 /* zap the GVM data. */
1457 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1458 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1459 pGVM->gmm.s.Stats.fMayAllocate = false;
1460
1461 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1462 gmmR0MutexRelease(pGMM);
1463
1464 /*
1465 * Destroy the spinlock.
1466 */
1467 RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1468 ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1469 RTSpinlockDestroy(hSpinlock);
1470
1471 LogFlow(("GMMR0CleanupVM: returns\n"));
1472}
1473
1474
1475/**
1476 * Scan one chunk for private pages belonging to the specified VM.
1477 *
1478 * @note This function may drop the giant mutex!
1479 *
1480 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1481 * we didn't.
1482 * @param pGMM Pointer to the GMM instance.
1483 * @param pGVM The global VM handle.
1484 * @param pChunk The chunk to scan.
1485 */
1486static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1487{
1488 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1489
1490 /*
1491 * Look for pages belonging to the VM.
1492 * (Perform some internal checks while we're scanning.)
1493 */
1494#ifndef VBOX_STRICT
1495 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1496#endif
1497 {
1498 unsigned cPrivate = 0;
1499 unsigned cShared = 0;
1500 unsigned cFree = 0;
1501
1502 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1503
1504 uint16_t hGVM = pGVM->hSelf;
1505 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1506 while (iPage-- > 0)
1507 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1508 {
1509 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1510 {
1511 /*
1512 * Free the page.
1513 *
1514 * The reason for not using gmmR0FreePrivatePage here is that we
1515 * must *not* cause the chunk to be freed from under us - we're in
1516 * an AVL tree walk here.
1517 */
1518 pChunk->aPages[iPage].u = 0;
1519 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1520 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1521 pChunk->iFreeHead = iPage;
1522 pChunk->cPrivate--;
1523 pChunk->cFree++;
1524 pGVM->gmm.s.Stats.cPrivatePages--;
1525 cFree++;
1526 }
1527 else
1528 cPrivate++;
1529 }
1530 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1531 cFree++;
1532 else
1533 cShared++;
1534
1535 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1536
1537 /*
1538 * Did it add up?
1539 */
1540 if (RT_UNLIKELY( pChunk->cFree != cFree
1541 || pChunk->cPrivate != cPrivate
1542 || pChunk->cShared != cShared))
1543 {
1544 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1545 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1546 pChunk->cFree = cFree;
1547 pChunk->cPrivate = cPrivate;
1548 pChunk->cShared = cShared;
1549 }
1550 }
1551
1552 /*
1553 * If not in bound memory mode, we should reset the hGVM field
1554 * if it has our handle in it.
1555 */
1556 if (pChunk->hGVM == pGVM->hSelf)
1557 {
1558 if (!g_pGMM->fBoundMemoryMode)
1559 pChunk->hGVM = NIL_GVM_HANDLE;
1560 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1561 {
1562 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1563 pChunk, pChunk->Core.Key, pChunk->cFree);
1564 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1565
1566 gmmR0UnlinkChunk(pChunk);
1567 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1568 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1569 }
1570 }
1571
1572 /*
1573 * Look for a mapping belonging to the terminating VM.
1574 */
1575 GMMR0CHUNKMTXSTATE MtxState;
1576 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1577 unsigned cMappings = pChunk->cMappingsX;
1578 for (unsigned i = 0; i < cMappings; i++)
1579 if (pChunk->paMappingsX[i].pGVM == pGVM)
1580 {
1581 gmmR0ChunkMutexDropGiant(&MtxState);
1582
1583 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1584
1585 cMappings--;
1586 if (i < cMappings)
1587 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1588 pChunk->paMappingsX[cMappings].pGVM = NULL;
1589 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1590 Assert(pChunk->cMappingsX - 1U == cMappings);
1591 pChunk->cMappingsX = cMappings;
1592
1593 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1594 if (RT_FAILURE(rc))
1595 {
1596 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1597 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1598 AssertRC(rc);
1599 }
1600
1601 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1602 return true;
1603 }
1604
1605 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1606 return false;
1607}
1608
1609
1610/**
1611 * The initial resource reservations.
1612 *
1613 * This will make memory reservations according to policy and priority. If there aren't
1614 * sufficient resources available to sustain the VM this function will fail and all
1615 * future allocations requests will fail as well.
1616 *
1617 * These are just the initial reservations made very very early during the VM creation
1618 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1619 * ring-3 init has completed.
1620 *
1621 * @returns VBox status code.
1622 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1623 * @retval VERR_GMM_
1624 *
1625 * @param pGVM The global (ring-0) VM structure.
1626 * @param idCpu The VCPU id - must be zero.
1627 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1628 * This does not include MMIO2 and similar.
1629 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1630 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1631 * hyper heap, MMIO2 and similar.
1632 * @param enmPolicy The OC policy to use on this VM.
1633 * @param enmPriority The priority in an out-of-memory situation.
1634 *
1635 * @thread The creator thread / EMT(0).
1636 */
1637GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1638 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1639{
1640 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1641 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1642
1643 /*
1644 * Validate, get basics and take the semaphore.
1645 */
1646 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1647 PGMM pGMM;
1648 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1649 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1650 if (RT_FAILURE(rc))
1651 return rc;
1652
1653 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1654 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1655 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1656 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1657 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1658
1659 gmmR0MutexAcquire(pGMM);
1660 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1661 {
1662 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1663 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1664 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1665 {
1666 /*
1667 * Check if we can accommodate this.
1668 */
1669 /* ... later ... */
1670 if (RT_SUCCESS(rc))
1671 {
1672 /*
1673 * Update the records.
1674 */
1675 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1676 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1677 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1678 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1679 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1680 pGVM->gmm.s.Stats.fMayAllocate = true;
1681
1682 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1683 pGMM->cRegisteredVMs++;
1684 }
1685 }
1686 else
1687 rc = VERR_WRONG_ORDER;
1688 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1689 }
1690 else
1691 rc = VERR_GMM_IS_NOT_SANE;
1692 gmmR0MutexRelease(pGMM);
1693 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1694 return rc;
1695}
1696
1697
1698/**
1699 * VMMR0 request wrapper for GMMR0InitialReservation.
1700 *
1701 * @returns see GMMR0InitialReservation.
1702 * @param pGVM The global (ring-0) VM structure.
1703 * @param idCpu The VCPU id.
1704 * @param pReq Pointer to the request packet.
1705 */
1706GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1707{
1708 /*
1709 * Validate input and pass it on.
1710 */
1711 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1712 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1713 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1714
1715 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1716 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1717}
1718
1719
1720/**
1721 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1722 *
1723 * @returns VBox status code.
1724 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1725 *
1726 * @param pGVM The global (ring-0) VM structure.
1727 * @param idCpu The VCPU id.
1728 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1729 * This does not include MMIO2 and similar.
1730 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1731 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1732 * hyper heap, MMIO2 and similar.
1733 *
1734 * @thread EMT(idCpu)
1735 */
1736GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1737 uint32_t cShadowPages, uint32_t cFixedPages)
1738{
1739 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1740 pGVM, cBasePages, cShadowPages, cFixedPages));
1741
1742 /*
1743 * Validate, get basics and take the semaphore.
1744 */
1745 PGMM pGMM;
1746 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1747 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1748 if (RT_FAILURE(rc))
1749 return rc;
1750
1751 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1752 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1753 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1754
1755 gmmR0MutexAcquire(pGMM);
1756 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1757 {
1758 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1759 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1760 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1761 {
1762 /*
1763 * Check if we can accommodate this.
1764 */
1765 /* ... later ... */
1766 if (RT_SUCCESS(rc))
1767 {
1768 /*
1769 * Update the records.
1770 */
1771 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1772 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1773 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1774 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1775
1776 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1777 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1778 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1779 }
1780 }
1781 else
1782 rc = VERR_WRONG_ORDER;
1783 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1784 }
1785 else
1786 rc = VERR_GMM_IS_NOT_SANE;
1787 gmmR0MutexRelease(pGMM);
1788 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1789 return rc;
1790}
1791
1792
1793/**
1794 * VMMR0 request wrapper for GMMR0UpdateReservation.
1795 *
1796 * @returns see GMMR0UpdateReservation.
1797 * @param pGVM The global (ring-0) VM structure.
1798 * @param idCpu The VCPU id.
1799 * @param pReq Pointer to the request packet.
1800 */
1801GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1802{
1803 /*
1804 * Validate input and pass it on.
1805 */
1806 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1807 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1808
1809 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1810}
1811
1812#ifdef GMMR0_WITH_SANITY_CHECK
1813
1814/**
1815 * Performs sanity checks on a free set.
1816 *
1817 * @returns Error count.
1818 *
1819 * @param pGMM Pointer to the GMM instance.
1820 * @param pSet Pointer to the set.
1821 * @param pszSetName The set name.
1822 * @param pszFunction The function from which it was called.
1823 * @param uLine The line number.
1824 */
1825static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1826 const char *pszFunction, unsigned uLineNo)
1827{
1828 uint32_t cErrors = 0;
1829
1830 /*
1831 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1832 */
1833 uint32_t cPages = 0;
1834 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1835 {
1836 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1837 {
1838 /** @todo check that the chunk is hash into the right set. */
1839 cPages += pCur->cFree;
1840 }
1841 }
1842 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1843 {
1844 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1845 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1846 cErrors++;
1847 }
1848
1849 return cErrors;
1850}
1851
1852
1853/**
1854 * Performs some sanity checks on the GMM while owning lock.
1855 *
1856 * @returns Error count.
1857 *
1858 * @param pGMM Pointer to the GMM instance.
1859 * @param pszFunction The function from which it is called.
1860 * @param uLineNo The line number.
1861 */
1862static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1863{
1864 uint32_t cErrors = 0;
1865
1866 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1867 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1868 /** @todo add more sanity checks. */
1869
1870 return cErrors;
1871}
1872
1873#endif /* GMMR0_WITH_SANITY_CHECK */
1874
1875/**
1876 * Looks up a chunk in the tree and fill in the TLB entry for it.
1877 *
1878 * This is not expected to fail and will bitch if it does.
1879 *
1880 * @returns Pointer to the allocation chunk, NULL if not found.
1881 * @param pGMM Pointer to the GMM instance.
1882 * @param idChunk The ID of the chunk to find.
1883 * @param pTlbe Pointer to the TLB entry.
1884 *
1885 * @note Caller owns spinlock.
1886 */
1887static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1888{
1889 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1890 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1891 pTlbe->idChunk = idChunk;
1892 pTlbe->pChunk = pChunk;
1893 return pChunk;
1894}
1895
1896
1897/**
1898 * Finds a allocation chunk, spin-locked.
1899 *
1900 * This is not expected to fail and will bitch if it does.
1901 *
1902 * @returns Pointer to the allocation chunk, NULL if not found.
1903 * @param pGMM Pointer to the GMM instance.
1904 * @param idChunk The ID of the chunk to find.
1905 */
1906DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1907{
1908 /*
1909 * Do a TLB lookup, branch if not in the TLB.
1910 */
1911 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1912 PGMMCHUNK pChunk = pTlbe->pChunk;
1913 if ( pChunk == NULL
1914 || pTlbe->idChunk != idChunk)
1915 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1916 return pChunk;
1917}
1918
1919
1920/**
1921 * Finds a allocation chunk.
1922 *
1923 * This is not expected to fail and will bitch if it does.
1924 *
1925 * @returns Pointer to the allocation chunk, NULL if not found.
1926 * @param pGMM Pointer to the GMM instance.
1927 * @param idChunk The ID of the chunk to find.
1928 */
1929DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1930{
1931 RTSpinlockAcquire(pGMM->hSpinLockTree);
1932 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1933 RTSpinlockRelease(pGMM->hSpinLockTree);
1934 return pChunk;
1935}
1936
1937
1938/**
1939 * Finds a page.
1940 *
1941 * This is not expected to fail and will bitch if it does.
1942 *
1943 * @returns Pointer to the page, NULL if not found.
1944 * @param pGMM Pointer to the GMM instance.
1945 * @param idPage The ID of the page to find.
1946 */
1947DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1948{
1949 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1950 if (RT_LIKELY(pChunk))
1951 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1952 return NULL;
1953}
1954
1955
1956#if 0 /* unused */
1957/**
1958 * Gets the host physical address for a page given by it's ID.
1959 *
1960 * @returns The host physical address or NIL_RTHCPHYS.
1961 * @param pGMM Pointer to the GMM instance.
1962 * @param idPage The ID of the page to find.
1963 */
1964DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1965{
1966 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1967 if (RT_LIKELY(pChunk))
1968 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1969 return NIL_RTHCPHYS;
1970}
1971#endif /* unused */
1972
1973
1974/**
1975 * Selects the appropriate free list given the number of free pages.
1976 *
1977 * @returns Free list index.
1978 * @param cFree The number of free pages in the chunk.
1979 */
1980DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1981{
1982 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1983 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1984 ("%d (%u)\n", iList, cFree));
1985 return iList;
1986}
1987
1988
1989/**
1990 * Unlinks the chunk from the free list it's currently on (if any).
1991 *
1992 * @param pChunk The allocation chunk.
1993 */
1994DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1995{
1996 PGMMCHUNKFREESET pSet = pChunk->pSet;
1997 if (RT_LIKELY(pSet))
1998 {
1999 pSet->cFreePages -= pChunk->cFree;
2000 pSet->idGeneration++;
2001
2002 PGMMCHUNK pPrev = pChunk->pFreePrev;
2003 PGMMCHUNK pNext = pChunk->pFreeNext;
2004 if (pPrev)
2005 pPrev->pFreeNext = pNext;
2006 else
2007 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
2008 if (pNext)
2009 pNext->pFreePrev = pPrev;
2010
2011 pChunk->pSet = NULL;
2012 pChunk->pFreeNext = NULL;
2013 pChunk->pFreePrev = NULL;
2014 }
2015 else
2016 {
2017 Assert(!pChunk->pFreeNext);
2018 Assert(!pChunk->pFreePrev);
2019 Assert(!pChunk->cFree);
2020 }
2021}
2022
2023
2024/**
2025 * Links the chunk onto the appropriate free list in the specified free set.
2026 *
2027 * If no free entries, it's not linked into any list.
2028 *
2029 * @param pChunk The allocation chunk.
2030 * @param pSet The free set.
2031 */
2032DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
2033{
2034 Assert(!pChunk->pSet);
2035 Assert(!pChunk->pFreeNext);
2036 Assert(!pChunk->pFreePrev);
2037
2038 if (pChunk->cFree > 0)
2039 {
2040 pChunk->pSet = pSet;
2041 pChunk->pFreePrev = NULL;
2042 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
2043 pChunk->pFreeNext = pSet->apLists[iList];
2044 if (pChunk->pFreeNext)
2045 pChunk->pFreeNext->pFreePrev = pChunk;
2046 pSet->apLists[iList] = pChunk;
2047
2048 pSet->cFreePages += pChunk->cFree;
2049 pSet->idGeneration++;
2050 }
2051}
2052
2053
2054/**
2055 * Links the chunk onto the appropriate free list in the specified free set.
2056 *
2057 * If no free entries, it's not linked into any list.
2058 *
2059 * @param pGMM Pointer to the GMM instance.
2060 * @param pGVM Pointer to the kernel-only VM instace data.
2061 * @param pChunk The allocation chunk.
2062 */
2063DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2064{
2065 PGMMCHUNKFREESET pSet;
2066 if (pGMM->fBoundMemoryMode)
2067 pSet = &pGVM->gmm.s.Private;
2068 else if (pChunk->cShared)
2069 pSet = &pGMM->Shared;
2070 else
2071 pSet = &pGMM->PrivateX;
2072 gmmR0LinkChunk(pChunk, pSet);
2073}
2074
2075
2076/**
2077 * Frees a Chunk ID.
2078 *
2079 * @param pGMM Pointer to the GMM instance.
2080 * @param idChunk The Chunk ID to free.
2081 */
2082static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2083{
2084 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2085 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2086 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2087}
2088
2089
2090/**
2091 * Allocates a new Chunk ID.
2092 *
2093 * @returns The Chunk ID.
2094 * @param pGMM Pointer to the GMM instance.
2095 */
2096static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2097{
2098 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2099 AssertCompile(NIL_GMM_CHUNKID == 0);
2100
2101 /*
2102 * Try the next sequential one.
2103 */
2104 int32_t idChunk = ++pGMM->idChunkPrev;
2105#if 0 /** @todo enable this code */
2106 if ( idChunk <= GMM_CHUNKID_LAST
2107 && idChunk > NIL_GMM_CHUNKID
2108 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2109 return idChunk;
2110#endif
2111
2112 /*
2113 * Scan sequentially from the last one.
2114 */
2115 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2116 && idChunk > NIL_GMM_CHUNKID)
2117 {
2118 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2119 if (idChunk > NIL_GMM_CHUNKID)
2120 {
2121 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2122 return pGMM->idChunkPrev = idChunk;
2123 }
2124 }
2125
2126 /*
2127 * Ok, scan from the start.
2128 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2129 */
2130 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2131 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2132 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2133
2134 return pGMM->idChunkPrev = idChunk;
2135}
2136
2137
2138/**
2139 * Allocates one private page.
2140 *
2141 * Worker for gmmR0AllocatePages.
2142 *
2143 * @param pChunk The chunk to allocate it from.
2144 * @param hGVM The GVM handle of the VM requesting memory.
2145 * @param pPageDesc The page descriptor.
2146 */
2147static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2148{
2149 /* update the chunk stats. */
2150 if (pChunk->hGVM == NIL_GVM_HANDLE)
2151 pChunk->hGVM = hGVM;
2152 Assert(pChunk->cFree);
2153 pChunk->cFree--;
2154 pChunk->cPrivate++;
2155
2156 /* unlink the first free page. */
2157 const uint32_t iPage = pChunk->iFreeHead;
2158 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2159 PGMMPAGE pPage = &pChunk->aPages[iPage];
2160 Assert(GMM_PAGE_IS_FREE(pPage));
2161 pChunk->iFreeHead = pPage->Free.iNext;
2162 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2163 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2164 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2165
2166 /* make the page private. */
2167 pPage->u = 0;
2168 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2169 pPage->Private.hGVM = hGVM;
2170 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2171 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2172 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2173 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2174 else
2175 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2176
2177 /* update the page descriptor. */
2178 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2179 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2180 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2181 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2182}
2183
2184
2185/**
2186 * Picks the free pages from a chunk.
2187 *
2188 * @returns The new page descriptor table index.
2189 * @param pChunk The chunk.
2190 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2191 * affinity.
2192 * @param iPage The current page descriptor table index.
2193 * @param cPages The total number of pages to allocate.
2194 * @param paPages The page descriptor table (input + ouput).
2195 */
2196static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2197 PGMMPAGEDESC paPages)
2198{
2199 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2200 gmmR0UnlinkChunk(pChunk);
2201
2202 for (; pChunk->cFree && iPage < cPages; iPage++)
2203 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2204
2205 gmmR0LinkChunk(pChunk, pSet);
2206 return iPage;
2207}
2208
2209
2210/**
2211 * Registers a new chunk of memory.
2212 *
2213 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2214 *
2215 * @returns VBox status code. On success, the giant GMM lock will be held, the
2216 * caller must release it (ugly).
2217 * @param pGMM Pointer to the GMM instance.
2218 * @param pSet Pointer to the set.
2219 * @param hMemObj The memory object for the chunk.
2220 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2221 * affinity.
2222 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2223 * @param ppChunk Chunk address (out). Optional.
2224 *
2225 * @remarks The caller must not own the giant GMM mutex.
2226 * The giant GMM mutex will be acquired and returned acquired in
2227 * the success path. On failure, no locks will be held.
2228 */
2229static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2230 PGMMCHUNK *ppChunk)
2231{
2232 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2233 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2234#ifdef GMM_WITH_LEGACY_MODE
2235 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2236#else
2237 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2238#endif
2239
2240#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2241 /*
2242 * Get a ring-0 mapping of the object.
2243 */
2244# ifdef GMM_WITH_LEGACY_MODE
2245 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2246# else
2247 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2248# endif
2249 if (!pbMapping)
2250 {
2251 RTR0MEMOBJ hMapObj;
2252 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2253 if (RT_SUCCESS(rc))
2254 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2255 else
2256 return rc;
2257 AssertPtr(pbMapping);
2258 }
2259#endif
2260
2261 /*
2262 * Allocate a chunk.
2263 */
2264 int rc;
2265 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2266 if (pChunk)
2267 {
2268 /*
2269 * Initialize it.
2270 */
2271 pChunk->hMemObj = hMemObj;
2272#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2273 pChunk->pbMapping = pbMapping;
2274#endif
2275 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2276 pChunk->hGVM = hGVM;
2277 /*pChunk->iFreeHead = 0;*/
2278 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2279 pChunk->iChunkMtx = UINT8_MAX;
2280 pChunk->fFlags = fChunkFlags;
2281 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2282 {
2283 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2284 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2285 }
2286 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2287 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2288
2289 /*
2290 * Allocate a Chunk ID and insert it into the tree.
2291 * This has to be done behind the mutex of course.
2292 */
2293 rc = gmmR0MutexAcquire(pGMM);
2294 if (RT_SUCCESS(rc))
2295 {
2296 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2297 {
2298 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2299 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2300 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2301 {
2302 RTSpinlockAcquire(pGMM->hSpinLockTree);
2303 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2304 {
2305 pGMM->cChunks++;
2306 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2307 RTSpinlockRelease(pGMM->hSpinLockTree);
2308
2309 gmmR0LinkChunk(pChunk, pSet);
2310
2311 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2312
2313 if (ppChunk)
2314 *ppChunk = pChunk;
2315 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2316 return VINF_SUCCESS;
2317 }
2318 RTSpinlockRelease(pGMM->hSpinLockTree);
2319 }
2320
2321 /* bail out */
2322 rc = VERR_GMM_CHUNK_INSERT;
2323 }
2324 else
2325 rc = VERR_GMM_IS_NOT_SANE;
2326 gmmR0MutexRelease(pGMM);
2327 }
2328
2329 RTMemFree(pChunk);
2330 }
2331 else
2332 rc = VERR_NO_MEMORY;
2333 return rc;
2334}
2335
2336
2337/**
2338 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2339 * what's remaining to the specified free set.
2340 *
2341 * @note This will leave the giant mutex while allocating the new chunk!
2342 *
2343 * @returns VBox status code.
2344 * @param pGMM Pointer to the GMM instance data.
2345 * @param pGVM Pointer to the kernel-only VM instace data.
2346 * @param pSet Pointer to the free set.
2347 * @param cPages The number of pages requested.
2348 * @param paPages The page descriptor table (input + output).
2349 * @param piPage The pointer to the page descriptor table index variable.
2350 * This will be updated.
2351 */
2352static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2353 PGMMPAGEDESC paPages, uint32_t *piPage)
2354{
2355 gmmR0MutexRelease(pGMM);
2356
2357 RTR0MEMOBJ hMemObj;
2358#ifndef GMM_WITH_LEGACY_MODE
2359 int rc;
2360# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2361 if (pGMM->fHasWorkingAllocPhysNC)
2362 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2363 else
2364# endif
2365 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2366#else
2367 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2368#endif
2369 if (RT_SUCCESS(rc))
2370 {
2371 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2372 * free pages first and then unchaining them right afterwards. Instead
2373 * do as much work as possible without holding the giant lock. */
2374 PGMMCHUNK pChunk;
2375 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2376 if (RT_SUCCESS(rc))
2377 {
2378 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2379 return VINF_SUCCESS;
2380 }
2381
2382 /* bail out */
2383 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2384 }
2385
2386 int rc2 = gmmR0MutexAcquire(pGMM);
2387 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2388 return rc;
2389
2390}
2391
2392
2393/**
2394 * As a last restort we'll pick any page we can get.
2395 *
2396 * @returns The new page descriptor table index.
2397 * @param pSet The set to pick from.
2398 * @param pGVM Pointer to the global VM structure.
2399 * @param iPage The current page descriptor table index.
2400 * @param cPages The total number of pages to allocate.
2401 * @param paPages The page descriptor table (input + ouput).
2402 */
2403static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2404 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2405{
2406 unsigned iList = RT_ELEMENTS(pSet->apLists);
2407 while (iList-- > 0)
2408 {
2409 PGMMCHUNK pChunk = pSet->apLists[iList];
2410 while (pChunk)
2411 {
2412 PGMMCHUNK pNext = pChunk->pFreeNext;
2413
2414 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2415 if (iPage >= cPages)
2416 return iPage;
2417
2418 pChunk = pNext;
2419 }
2420 }
2421 return iPage;
2422}
2423
2424
2425/**
2426 * Pick pages from empty chunks on the same NUMA node.
2427 *
2428 * @returns The new page descriptor table index.
2429 * @param pSet The set to pick from.
2430 * @param pGVM Pointer to the global VM structure.
2431 * @param iPage The current page descriptor table index.
2432 * @param cPages The total number of pages to allocate.
2433 * @param paPages The page descriptor table (input + ouput).
2434 */
2435static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2436 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2437{
2438 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2439 if (pChunk)
2440 {
2441 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2442 while (pChunk)
2443 {
2444 PGMMCHUNK pNext = pChunk->pFreeNext;
2445
2446 if (pChunk->idNumaNode == idNumaNode)
2447 {
2448 pChunk->hGVM = pGVM->hSelf;
2449 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2450 if (iPage >= cPages)
2451 {
2452 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2453 return iPage;
2454 }
2455 }
2456
2457 pChunk = pNext;
2458 }
2459 }
2460 return iPage;
2461}
2462
2463
2464/**
2465 * Pick pages from non-empty chunks on the same NUMA node.
2466 *
2467 * @returns The new page descriptor table index.
2468 * @param pSet The set to pick from.
2469 * @param pGVM Pointer to the global VM structure.
2470 * @param iPage The current page descriptor table index.
2471 * @param cPages The total number of pages to allocate.
2472 * @param paPages The page descriptor table (input + ouput).
2473 */
2474static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2475 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2476{
2477 /** @todo start by picking from chunks with about the right size first? */
2478 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2479 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2480 while (iList-- > 0)
2481 {
2482 PGMMCHUNK pChunk = pSet->apLists[iList];
2483 while (pChunk)
2484 {
2485 PGMMCHUNK pNext = pChunk->pFreeNext;
2486
2487 if (pChunk->idNumaNode == idNumaNode)
2488 {
2489 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2490 if (iPage >= cPages)
2491 {
2492 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2493 return iPage;
2494 }
2495 }
2496
2497 pChunk = pNext;
2498 }
2499 }
2500 return iPage;
2501}
2502
2503
2504/**
2505 * Pick pages that are in chunks already associated with the VM.
2506 *
2507 * @returns The new page descriptor table index.
2508 * @param pGMM Pointer to the GMM instance data.
2509 * @param pGVM Pointer to the global VM structure.
2510 * @param pSet The set to pick from.
2511 * @param iPage The current page descriptor table index.
2512 * @param cPages The total number of pages to allocate.
2513 * @param paPages The page descriptor table (input + ouput).
2514 */
2515static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2516 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2517{
2518 uint16_t const hGVM = pGVM->hSelf;
2519
2520 /* Hint. */
2521 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2522 {
2523 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2524 if (pChunk && pChunk->cFree)
2525 {
2526 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2527 if (iPage >= cPages)
2528 return iPage;
2529 }
2530 }
2531
2532 /* Scan. */
2533 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2534 {
2535 PGMMCHUNK pChunk = pSet->apLists[iList];
2536 while (pChunk)
2537 {
2538 PGMMCHUNK pNext = pChunk->pFreeNext;
2539
2540 if (pChunk->hGVM == hGVM)
2541 {
2542 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2543 if (iPage >= cPages)
2544 {
2545 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2546 return iPage;
2547 }
2548 }
2549
2550 pChunk = pNext;
2551 }
2552 }
2553 return iPage;
2554}
2555
2556
2557
2558/**
2559 * Pick pages in bound memory mode.
2560 *
2561 * @returns The new page descriptor table index.
2562 * @param pGVM Pointer to the global VM structure.
2563 * @param iPage The current page descriptor table index.
2564 * @param cPages The total number of pages to allocate.
2565 * @param paPages The page descriptor table (input + ouput).
2566 */
2567static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2568{
2569 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2570 {
2571 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2572 while (pChunk)
2573 {
2574 Assert(pChunk->hGVM == pGVM->hSelf);
2575 PGMMCHUNK pNext = pChunk->pFreeNext;
2576 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2577 if (iPage >= cPages)
2578 return iPage;
2579 pChunk = pNext;
2580 }
2581 }
2582 return iPage;
2583}
2584
2585
2586/**
2587 * Checks if we should start picking pages from chunks of other VMs because
2588 * we're getting close to the system memory or reserved limit.
2589 *
2590 * @returns @c true if we should, @c false if we should first try allocate more
2591 * chunks.
2592 */
2593static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2594{
2595 /*
2596 * Don't allocate a new chunk if we're
2597 */
2598 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2599 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2600 - pGVM->gmm.s.Stats.cBalloonedPages
2601 /** @todo what about shared pages? */;
2602 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2603 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2604 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2605 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2606 return true;
2607 /** @todo make the threshold configurable, also test the code to see if
2608 * this ever kicks in (we might be reserving too much or smth). */
2609
2610 /*
2611 * Check how close we're to the max memory limit and how many fragments
2612 * there are?...
2613 */
2614 /** @todo */
2615
2616 return false;
2617}
2618
2619
2620/**
2621 * Checks if we should start picking pages from chunks of other VMs because
2622 * there is a lot of free pages around.
2623 *
2624 * @returns @c true if we should, @c false if we should first try allocate more
2625 * chunks.
2626 */
2627static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2628{
2629 /*
2630 * Setting the limit at 16 chunks (32 MB) at the moment.
2631 */
2632 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2633 return true;
2634 return false;
2635}
2636
2637
2638/**
2639 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2640 *
2641 * @returns VBox status code:
2642 * @retval VINF_SUCCESS on success.
2643 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2644 * gmmR0AllocateMoreChunks is necessary.
2645 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2646 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2647 * that is we're trying to allocate more than we've reserved.
2648 *
2649 * @param pGMM Pointer to the GMM instance data.
2650 * @param pGVM Pointer to the VM.
2651 * @param cPages The number of pages to allocate.
2652 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2653 * details on what is expected on input.
2654 * @param enmAccount The account to charge.
2655 *
2656 * @remarks Call takes the giant GMM lock.
2657 */
2658static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2659{
2660 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2661
2662 /*
2663 * Check allocation limits.
2664 */
2665 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2666 return VERR_GMM_HIT_GLOBAL_LIMIT;
2667
2668 switch (enmAccount)
2669 {
2670 case GMMACCOUNT_BASE:
2671 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2672 > pGVM->gmm.s.Stats.Reserved.cBasePages))
2673 {
2674 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2675 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2676 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2677 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2678 }
2679 break;
2680 case GMMACCOUNT_SHADOW:
2681 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2682 {
2683 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2684 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2685 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2686 }
2687 break;
2688 case GMMACCOUNT_FIXED:
2689 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2690 {
2691 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2692 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2693 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2694 }
2695 break;
2696 default:
2697 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2698 }
2699
2700#ifdef GMM_WITH_LEGACY_MODE
2701 /*
2702 * If we're in legacy memory mode, it's easy to figure if we have
2703 * sufficient number of pages up-front.
2704 */
2705 if ( pGMM->fLegacyAllocationMode
2706 && pGVM->gmm.s.Private.cFreePages < cPages)
2707 {
2708 Assert(pGMM->fBoundMemoryMode);
2709 return VERR_GMM_SEED_ME;
2710 }
2711#endif
2712
2713 /*
2714 * Update the accounts before we proceed because we might be leaving the
2715 * protection of the global mutex and thus run the risk of permitting
2716 * too much memory to be allocated.
2717 */
2718 switch (enmAccount)
2719 {
2720 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2721 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2722 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2723 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2724 }
2725 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2726 pGMM->cAllocatedPages += cPages;
2727
2728#ifdef GMM_WITH_LEGACY_MODE
2729 /*
2730 * Part two of it's-easy-in-legacy-memory-mode.
2731 */
2732 if (pGMM->fLegacyAllocationMode)
2733 {
2734 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2735 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2736 return VINF_SUCCESS;
2737 }
2738#endif
2739
2740 /*
2741 * Bound mode is also relatively straightforward.
2742 */
2743 uint32_t iPage = 0;
2744 int rc = VINF_SUCCESS;
2745 if (pGMM->fBoundMemoryMode)
2746 {
2747 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2748 if (iPage < cPages)
2749 do
2750 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2751 while (iPage < cPages && RT_SUCCESS(rc));
2752 }
2753 /*
2754 * Shared mode is trickier as we should try archive the same locality as
2755 * in bound mode, but smartly make use of non-full chunks allocated by
2756 * other VMs if we're low on memory.
2757 */
2758 else
2759 {
2760 /* Pick the most optimal pages first. */
2761 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2762 if (iPage < cPages)
2763 {
2764 /* Maybe we should try getting pages from chunks "belonging" to
2765 other VMs before allocating more chunks? */
2766 bool fTriedOnSameAlready = false;
2767 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2768 {
2769 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2770 fTriedOnSameAlready = true;
2771 }
2772
2773 /* Allocate memory from empty chunks. */
2774 if (iPage < cPages)
2775 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2776
2777 /* Grab empty shared chunks. */
2778 if (iPage < cPages)
2779 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2780
2781 /* If there is a lof of free pages spread around, try not waste
2782 system memory on more chunks. (Should trigger defragmentation.) */
2783 if ( !fTriedOnSameAlready
2784 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2785 {
2786 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2787 if (iPage < cPages)
2788 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2789 }
2790
2791 /*
2792 * Ok, try allocate new chunks.
2793 */
2794 if (iPage < cPages)
2795 {
2796 do
2797 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2798 while (iPage < cPages && RT_SUCCESS(rc));
2799
2800 /* If the host is out of memory, take whatever we can get. */
2801 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2802 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2803 {
2804 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2805 if (iPage < cPages)
2806 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2807 AssertRelease(iPage == cPages);
2808 rc = VINF_SUCCESS;
2809 }
2810 }
2811 }
2812 }
2813
2814 /*
2815 * Clean up on failure. Since this is bound to be a low-memory condition
2816 * we will give back any empty chunks that might be hanging around.
2817 */
2818 if (RT_FAILURE(rc))
2819 {
2820 /* Update the statistics. */
2821 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2822 pGMM->cAllocatedPages -= cPages - iPage;
2823 switch (enmAccount)
2824 {
2825 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2826 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2827 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2828 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2829 }
2830
2831 /* Release the pages. */
2832 while (iPage-- > 0)
2833 {
2834 uint32_t idPage = paPages[iPage].idPage;
2835 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2836 if (RT_LIKELY(pPage))
2837 {
2838 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2839 Assert(pPage->Private.hGVM == pGVM->hSelf);
2840 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2841 }
2842 else
2843 AssertMsgFailed(("idPage=%#x\n", idPage));
2844
2845 paPages[iPage].idPage = NIL_GMM_PAGEID;
2846 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2847 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2848 }
2849
2850 /* Free empty chunks. */
2851 /** @todo */
2852
2853 /* return the fail status on failure */
2854 return rc;
2855 }
2856 return VINF_SUCCESS;
2857}
2858
2859
2860/**
2861 * Updates the previous allocations and allocates more pages.
2862 *
2863 * The handy pages are always taken from the 'base' memory account.
2864 * The allocated pages are not cleared and will contains random garbage.
2865 *
2866 * @returns VBox status code:
2867 * @retval VINF_SUCCESS on success.
2868 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2869 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2870 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2871 * private page.
2872 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2873 * shared page.
2874 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2875 * owned by the VM.
2876 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2877 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2878 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2879 * that is we're trying to allocate more than we've reserved.
2880 *
2881 * @param pGVM The global (ring-0) VM structure.
2882 * @param idCpu The VCPU id.
2883 * @param cPagesToUpdate The number of pages to update (starting from the head).
2884 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2885 * @param paPages The array of page descriptors.
2886 * See GMMPAGEDESC for details on what is expected on input.
2887 * @thread EMT(idCpu)
2888 */
2889GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2890 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2891{
2892 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2893 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2894
2895 /*
2896 * Validate, get basics and take the semaphore.
2897 * (This is a relatively busy path, so make predictions where possible.)
2898 */
2899 PGMM pGMM;
2900 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2901 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2902 if (RT_FAILURE(rc))
2903 return rc;
2904
2905 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2906 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2907 || (cPagesToAlloc && cPagesToAlloc < 1024),
2908 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2909 VERR_INVALID_PARAMETER);
2910
2911 unsigned iPage = 0;
2912 for (; iPage < cPagesToUpdate; iPage++)
2913 {
2914 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2915 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2916 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2917 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2918 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2919 VERR_INVALID_PARAMETER);
2920 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2921 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2922 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2923 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2924 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2925 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2926 }
2927
2928 for (; iPage < cPagesToAlloc; iPage++)
2929 {
2930 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2931 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2932 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2933 }
2934
2935 gmmR0MutexAcquire(pGMM);
2936 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2937 {
2938 /* No allocations before the initial reservation has been made! */
2939 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2940 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2941 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2942 {
2943 /*
2944 * Perform the updates.
2945 * Stop on the first error.
2946 */
2947 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2948 {
2949 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2950 {
2951 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2952 if (RT_LIKELY(pPage))
2953 {
2954 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2955 {
2956 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2957 {
2958 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2959 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2960 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2961 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2962 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2963 /* else: NIL_RTHCPHYS nothing */
2964
2965 paPages[iPage].idPage = NIL_GMM_PAGEID;
2966 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2967 }
2968 else
2969 {
2970 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2971 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2972 rc = VERR_GMM_NOT_PAGE_OWNER;
2973 break;
2974 }
2975 }
2976 else
2977 {
2978 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2979 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2980 break;
2981 }
2982 }
2983 else
2984 {
2985 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2986 rc = VERR_GMM_PAGE_NOT_FOUND;
2987 break;
2988 }
2989 }
2990
2991 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2992 {
2993 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2994 if (RT_LIKELY(pPage))
2995 {
2996 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2997 {
2998 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2999 Assert(pPage->Shared.cRefs);
3000 Assert(pGVM->gmm.s.Stats.cSharedPages);
3001 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
3002
3003 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
3004 pGVM->gmm.s.Stats.cSharedPages--;
3005 pGVM->gmm.s.Stats.Allocated.cBasePages--;
3006 if (!--pPage->Shared.cRefs)
3007 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
3008 else
3009 {
3010 Assert(pGMM->cDuplicatePages);
3011 pGMM->cDuplicatePages--;
3012 }
3013
3014 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
3015 }
3016 else
3017 {
3018 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
3019 rc = VERR_GMM_PAGE_NOT_SHARED;
3020 break;
3021 }
3022 }
3023 else
3024 {
3025 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3026 rc = VERR_GMM_PAGE_NOT_FOUND;
3027 break;
3028 }
3029 }
3030 } /* for each page to update */
3031
3032 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3033 {
3034#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
3035 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3036 {
3037 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
3038 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3039 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3040 }
3041#endif
3042
3043 /*
3044 * Join paths with GMMR0AllocatePages for the allocation.
3045 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3046 */
3047 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3048 }
3049 }
3050 else
3051 rc = VERR_WRONG_ORDER;
3052 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3053 }
3054 else
3055 rc = VERR_GMM_IS_NOT_SANE;
3056 gmmR0MutexRelease(pGMM);
3057 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3058 return rc;
3059}
3060
3061
3062/**
3063 * Allocate one or more pages.
3064 *
3065 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3066 * The allocated pages are not cleared and will contain random garbage.
3067 *
3068 * @returns VBox status code:
3069 * @retval VINF_SUCCESS on success.
3070 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3071 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3072 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3073 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3074 * that is we're trying to allocate more than we've reserved.
3075 *
3076 * @param pGVM The global (ring-0) VM structure.
3077 * @param idCpu The VCPU id.
3078 * @param cPages The number of pages to allocate.
3079 * @param paPages Pointer to the page descriptors.
3080 * See GMMPAGEDESC for details on what is expected on
3081 * input.
3082 * @param enmAccount The account to charge.
3083 *
3084 * @thread EMT.
3085 */
3086GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3087{
3088 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3089
3090 /*
3091 * Validate, get basics and take the semaphore.
3092 */
3093 PGMM pGMM;
3094 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3095 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3096 if (RT_FAILURE(rc))
3097 return rc;
3098
3099 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3100 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3101 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3102
3103 for (unsigned iPage = 0; iPage < cPages; iPage++)
3104 {
3105 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3106 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3107 || ( enmAccount == GMMACCOUNT_BASE
3108 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3109 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3110 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3111 VERR_INVALID_PARAMETER);
3112 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3113 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3114 }
3115
3116 gmmR0MutexAcquire(pGMM);
3117 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3118 {
3119
3120 /* No allocations before the initial reservation has been made! */
3121 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3122 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3123 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3124 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3125 else
3126 rc = VERR_WRONG_ORDER;
3127 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3128 }
3129 else
3130 rc = VERR_GMM_IS_NOT_SANE;
3131 gmmR0MutexRelease(pGMM);
3132 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3133 return rc;
3134}
3135
3136
3137/**
3138 * VMMR0 request wrapper for GMMR0AllocatePages.
3139 *
3140 * @returns see GMMR0AllocatePages.
3141 * @param pGVM The global (ring-0) VM structure.
3142 * @param idCpu The VCPU id.
3143 * @param pReq Pointer to the request packet.
3144 */
3145GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3146{
3147 /*
3148 * Validate input and pass it on.
3149 */
3150 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3151 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3152 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3153 VERR_INVALID_PARAMETER);
3154 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3155 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3156 VERR_INVALID_PARAMETER);
3157
3158 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3159}
3160
3161
3162/**
3163 * Allocate a large page to represent guest RAM
3164 *
3165 * The allocated pages are not cleared and will contains random garbage.
3166 *
3167 * @returns VBox status code:
3168 * @retval VINF_SUCCESS on success.
3169 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3170 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3171 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3172 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3173 * that is we're trying to allocate more than we've reserved.
3174 * @returns see GMMR0AllocatePages.
3175 *
3176 * @param pGVM The global (ring-0) VM structure.
3177 * @param idCpu The VCPU id.
3178 * @param cbPage Large page size.
3179 * @param pIdPage Where to return the GMM page ID of the page.
3180 * @param pHCPhys Where to return the host physical address of the page.
3181 */
3182GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3183{
3184 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3185
3186 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3187 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3188 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3189
3190 /*
3191 * Validate, get basics and take the semaphore.
3192 */
3193 PGMM pGMM;
3194 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3195 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3196 if (RT_FAILURE(rc))
3197 return rc;
3198
3199#ifdef GMM_WITH_LEGACY_MODE
3200 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3201 // if (pGMM->fLegacyAllocationMode)
3202 // return VERR_NOT_SUPPORTED;
3203#endif
3204
3205 *pHCPhys = NIL_RTHCPHYS;
3206 *pIdPage = NIL_GMM_PAGEID;
3207
3208 gmmR0MutexAcquire(pGMM);
3209 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3210 {
3211 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3212 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3213 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3214 {
3215 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3216 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3217 gmmR0MutexRelease(pGMM);
3218 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3219 }
3220
3221 /*
3222 * Allocate a new large page chunk.
3223 *
3224 * Note! We leave the giant GMM lock temporarily as the allocation might
3225 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3226 */
3227 AssertCompile(GMM_CHUNK_SIZE == _2M);
3228 gmmR0MutexRelease(pGMM);
3229
3230 RTR0MEMOBJ hMemObj;
3231 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3232 if (RT_SUCCESS(rc))
3233 {
3234 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3235 PGMMCHUNK pChunk;
3236 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3237 if (RT_SUCCESS(rc))
3238 {
3239 /*
3240 * Allocate all the pages in the chunk.
3241 */
3242 /* Unlink the new chunk from the free list. */
3243 gmmR0UnlinkChunk(pChunk);
3244
3245 /** @todo rewrite this to skip the looping. */
3246 /* Allocate all pages. */
3247 GMMPAGEDESC PageDesc;
3248 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3249
3250 /* Return the first page as we'll use the whole chunk as one big page. */
3251 *pIdPage = PageDesc.idPage;
3252 *pHCPhys = PageDesc.HCPhysGCPhys;
3253
3254 for (unsigned i = 1; i < cPages; i++)
3255 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3256
3257 /* Update accounting. */
3258 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3259 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3260 pGMM->cAllocatedPages += cPages;
3261
3262 gmmR0LinkChunk(pChunk, pSet);
3263 gmmR0MutexRelease(pGMM);
3264 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3265 return VINF_SUCCESS;
3266 }
3267 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3268 }
3269 }
3270 else
3271 {
3272 gmmR0MutexRelease(pGMM);
3273 rc = VERR_GMM_IS_NOT_SANE;
3274 }
3275
3276 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3277 return rc;
3278}
3279
3280
3281/**
3282 * Free a large page.
3283 *
3284 * @returns VBox status code:
3285 * @param pGVM The global (ring-0) VM structure.
3286 * @param idCpu The VCPU id.
3287 * @param idPage The large page id.
3288 */
3289GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3290{
3291 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3292
3293 /*
3294 * Validate, get basics and take the semaphore.
3295 */
3296 PGMM pGMM;
3297 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3298 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3299 if (RT_FAILURE(rc))
3300 return rc;
3301
3302#ifdef GMM_WITH_LEGACY_MODE
3303 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3304 // if (pGMM->fLegacyAllocationMode)
3305 // return VERR_NOT_SUPPORTED;
3306#endif
3307
3308 gmmR0MutexAcquire(pGMM);
3309 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3310 {
3311 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3312
3313 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3314 {
3315 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3316 gmmR0MutexRelease(pGMM);
3317 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3318 }
3319
3320 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3321 if (RT_LIKELY( pPage
3322 && GMM_PAGE_IS_PRIVATE(pPage)))
3323 {
3324 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3325 Assert(pChunk);
3326 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3327 Assert(pChunk->cPrivate > 0);
3328
3329 /* Release the memory immediately. */
3330 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3331
3332 /* Update accounting. */
3333 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3334 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3335 pGMM->cAllocatedPages -= cPages;
3336 }
3337 else
3338 rc = VERR_GMM_PAGE_NOT_FOUND;
3339 }
3340 else
3341 rc = VERR_GMM_IS_NOT_SANE;
3342
3343 gmmR0MutexRelease(pGMM);
3344 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3345 return rc;
3346}
3347
3348
3349/**
3350 * VMMR0 request wrapper for GMMR0FreeLargePage.
3351 *
3352 * @returns see GMMR0FreeLargePage.
3353 * @param pGVM The global (ring-0) VM structure.
3354 * @param idCpu The VCPU id.
3355 * @param pReq Pointer to the request packet.
3356 */
3357GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3358{
3359 /*
3360 * Validate input and pass it on.
3361 */
3362 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3363 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3364 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3365 VERR_INVALID_PARAMETER);
3366
3367 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3368}
3369
3370
3371/**
3372 * @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3373 * Used by gmmR0FreeChunkFlushPerVmTlbs().}
3374 */
3375static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3376{
3377 RT_NOREF(pvUser);
3378 if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3379 {
3380 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3381 uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3382 while (i-- > 0)
3383 {
3384 pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3385 pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3386 }
3387 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3388 }
3389 return VINF_SUCCESS;
3390}
3391
3392
3393/**
3394 * Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3395 * free generation ID value.
3396 *
3397 * This is done at 2^62 - 1, which allows us to drop all locks and as it will
3398 * take a while before 12 exa (2 305 843 009 213 693 952) calls to
3399 * gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3400 * invalidation passes and resets the generation ID between then. This will
3401 * make sure there are no false positives.
3402 *
3403 * @param pGMM Pointer to the GMM instance.
3404 */
3405static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3406{
3407 /*
3408 * First invalidation pass.
3409 */
3410 int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3411 AssertRCSuccess(rc);
3412
3413 /*
3414 * Reset the generation number.
3415 */
3416 RTSpinlockAcquire(pGMM->hSpinLockTree);
3417 ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3418 RTSpinlockRelease(pGMM->hSpinLockTree);
3419
3420 /*
3421 * Second invalidation pass.
3422 */
3423 rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3424 AssertRCSuccess(rc);
3425}
3426
3427
3428/**
3429 * Frees a chunk, giving it back to the host OS.
3430 *
3431 * @param pGMM Pointer to the GMM instance.
3432 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3433 * unmap and free the chunk in one go.
3434 * @param pChunk The chunk to free.
3435 * @param fRelaxedSem Whether we can release the semaphore while doing the
3436 * freeing (@c true) or not.
3437 */
3438static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3439{
3440 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3441
3442 GMMR0CHUNKMTXSTATE MtxState;
3443 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3444
3445 /*
3446 * Cleanup hack! Unmap the chunk from the callers address space.
3447 * This shouldn't happen, so screw lock contention...
3448 */
3449 if ( pChunk->cMappingsX
3450#ifdef GMM_WITH_LEGACY_MODE
3451 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3452#endif
3453 && pGVM)
3454 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3455
3456 /*
3457 * If there are current mappings of the chunk, then request the
3458 * VMs to unmap them. Reposition the chunk in the free list so
3459 * it won't be a likely candidate for allocations.
3460 */
3461 if (pChunk->cMappingsX)
3462 {
3463 /** @todo R0 -> VM request */
3464 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3465 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3466 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3467 return false;
3468 }
3469
3470
3471 /*
3472 * Save and trash the handle.
3473 */
3474 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3475 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3476
3477 /*
3478 * Unlink it from everywhere.
3479 */
3480 gmmR0UnlinkChunk(pChunk);
3481
3482 RTSpinlockAcquire(pGMM->hSpinLockTree);
3483
3484 RTListNodeRemove(&pChunk->ListNode);
3485
3486 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3487 Assert(pCore == &pChunk->Core); NOREF(pCore);
3488
3489 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3490 if (pTlbe->pChunk == pChunk)
3491 {
3492 pTlbe->idChunk = NIL_GMM_CHUNKID;
3493 pTlbe->pChunk = NULL;
3494 }
3495
3496 Assert(pGMM->cChunks > 0);
3497 pGMM->cChunks--;
3498
3499 uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3500
3501 RTSpinlockRelease(pGMM->hSpinLockTree);
3502
3503 /*
3504 * Free the Chunk ID before dropping the locks and freeing the rest.
3505 */
3506 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3507 pChunk->Core.Key = NIL_GMM_CHUNKID;
3508
3509 pGMM->cFreedChunks++;
3510
3511 gmmR0ChunkMutexRelease(&MtxState, NULL);
3512 if (fRelaxedSem)
3513 gmmR0MutexRelease(pGMM);
3514
3515 if (idFreeGeneration == UINT64_MAX / 4)
3516 gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3517
3518 RTMemFree(pChunk->paMappingsX);
3519 pChunk->paMappingsX = NULL;
3520
3521 RTMemFree(pChunk);
3522
3523#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3524 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3525#else
3526 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3527#endif
3528 AssertLogRelRC(rc);
3529
3530 if (fRelaxedSem)
3531 gmmR0MutexAcquire(pGMM);
3532 return fRelaxedSem;
3533}
3534
3535
3536/**
3537 * Free page worker.
3538 *
3539 * The caller does all the statistic decrementing, we do all the incrementing.
3540 *
3541 * @param pGMM Pointer to the GMM instance data.
3542 * @param pGVM Pointer to the GVM instance.
3543 * @param pChunk Pointer to the chunk this page belongs to.
3544 * @param idPage The Page ID.
3545 * @param pPage Pointer to the page.
3546 */
3547static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3548{
3549 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3550 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3551
3552 /*
3553 * Put the page on the free list.
3554 */
3555 pPage->u = 0;
3556 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3557 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3558 pPage->Free.iNext = pChunk->iFreeHead;
3559 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3560
3561 /*
3562 * Update statistics (the cShared/cPrivate stats are up to date already),
3563 * and relink the chunk if necessary.
3564 */
3565 unsigned const cFree = pChunk->cFree;
3566 if ( !cFree
3567 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3568 {
3569 gmmR0UnlinkChunk(pChunk);
3570 pChunk->cFree++;
3571 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3572 }
3573 else
3574 {
3575 pChunk->cFree = cFree + 1;
3576 pChunk->pSet->cFreePages++;
3577 }
3578
3579 /*
3580 * If the chunk becomes empty, consider giving memory back to the host OS.
3581 *
3582 * The current strategy is to try give it back if there are other chunks
3583 * in this free list, meaning if there are at least 240 free pages in this
3584 * category. Note that since there are probably mappings of the chunk,
3585 * it won't be freed up instantly, which probably screws up this logic
3586 * a bit...
3587 */
3588 /** @todo Do this on the way out. */
3589 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3590 || pChunk->pFreeNext == NULL
3591 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3592 { /* likely */ }
3593#ifdef GMM_WITH_LEGACY_MODE
3594 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3595 { /* likely */ }
3596#endif
3597 else
3598 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3599
3600}
3601
3602
3603/**
3604 * Frees a shared page, the page is known to exist and be valid and such.
3605 *
3606 * @param pGMM Pointer to the GMM instance.
3607 * @param pGVM Pointer to the GVM instance.
3608 * @param idPage The page id.
3609 * @param pPage The page structure.
3610 */
3611DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3612{
3613 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3614 Assert(pChunk);
3615 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3616 Assert(pChunk->cShared > 0);
3617 Assert(pGMM->cSharedPages > 0);
3618 Assert(pGMM->cAllocatedPages > 0);
3619 Assert(!pPage->Shared.cRefs);
3620
3621 pChunk->cShared--;
3622 pGMM->cAllocatedPages--;
3623 pGMM->cSharedPages--;
3624 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3625}
3626
3627
3628/**
3629 * Frees a private page, the page is known to exist and be valid and such.
3630 *
3631 * @param pGMM Pointer to the GMM instance.
3632 * @param pGVM Pointer to the GVM instance.
3633 * @param idPage The page id.
3634 * @param pPage The page structure.
3635 */
3636DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3637{
3638 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3639 Assert(pChunk);
3640 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3641 Assert(pChunk->cPrivate > 0);
3642 Assert(pGMM->cAllocatedPages > 0);
3643
3644 pChunk->cPrivate--;
3645 pGMM->cAllocatedPages--;
3646 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3647}
3648
3649
3650/**
3651 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3652 *
3653 * @returns VBox status code:
3654 * @retval xxx
3655 *
3656 * @param pGMM Pointer to the GMM instance data.
3657 * @param pGVM Pointer to the VM.
3658 * @param cPages The number of pages to free.
3659 * @param paPages Pointer to the page descriptors.
3660 * @param enmAccount The account this relates to.
3661 */
3662static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3663{
3664 /*
3665 * Check that the request isn't impossible wrt to the account status.
3666 */
3667 switch (enmAccount)
3668 {
3669 case GMMACCOUNT_BASE:
3670 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3671 {
3672 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3673 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3674 }
3675 break;
3676 case GMMACCOUNT_SHADOW:
3677 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3678 {
3679 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3680 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3681 }
3682 break;
3683 case GMMACCOUNT_FIXED:
3684 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3685 {
3686 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3687 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3688 }
3689 break;
3690 default:
3691 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3692 }
3693
3694 /*
3695 * Walk the descriptors and free the pages.
3696 *
3697 * Statistics (except the account) are being updated as we go along,
3698 * unlike the alloc code. Also, stop on the first error.
3699 */
3700 int rc = VINF_SUCCESS;
3701 uint32_t iPage;
3702 for (iPage = 0; iPage < cPages; iPage++)
3703 {
3704 uint32_t idPage = paPages[iPage].idPage;
3705 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3706 if (RT_LIKELY(pPage))
3707 {
3708 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3709 {
3710 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3711 {
3712 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3713 pGVM->gmm.s.Stats.cPrivatePages--;
3714 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3715 }
3716 else
3717 {
3718 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3719 pPage->Private.hGVM, pGVM->hSelf));
3720 rc = VERR_GMM_NOT_PAGE_OWNER;
3721 break;
3722 }
3723 }
3724 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3725 {
3726 Assert(pGVM->gmm.s.Stats.cSharedPages);
3727 Assert(pPage->Shared.cRefs);
3728#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3729 if (pPage->Shared.u14Checksum)
3730 {
3731 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3732 uChecksum &= UINT32_C(0x00003fff);
3733 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3734 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3735 }
3736#endif
3737 pGVM->gmm.s.Stats.cSharedPages--;
3738 if (!--pPage->Shared.cRefs)
3739 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3740 else
3741 {
3742 Assert(pGMM->cDuplicatePages);
3743 pGMM->cDuplicatePages--;
3744 }
3745 }
3746 else
3747 {
3748 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3749 rc = VERR_GMM_PAGE_ALREADY_FREE;
3750 break;
3751 }
3752 }
3753 else
3754 {
3755 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3756 rc = VERR_GMM_PAGE_NOT_FOUND;
3757 break;
3758 }
3759 paPages[iPage].idPage = NIL_GMM_PAGEID;
3760 }
3761
3762 /*
3763 * Update the account.
3764 */
3765 switch (enmAccount)
3766 {
3767 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3768 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3769 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3770 default:
3771 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3772 }
3773
3774 /*
3775 * Any threshold stuff to be done here?
3776 */
3777
3778 return rc;
3779}
3780
3781
3782/**
3783 * Free one or more pages.
3784 *
3785 * This is typically used at reset time or power off.
3786 *
3787 * @returns VBox status code:
3788 * @retval xxx
3789 *
3790 * @param pGVM The global (ring-0) VM structure.
3791 * @param idCpu The VCPU id.
3792 * @param cPages The number of pages to allocate.
3793 * @param paPages Pointer to the page descriptors containing the page IDs
3794 * for each page.
3795 * @param enmAccount The account this relates to.
3796 * @thread EMT.
3797 */
3798GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3799{
3800 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3801
3802 /*
3803 * Validate input and get the basics.
3804 */
3805 PGMM pGMM;
3806 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3807 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3808 if (RT_FAILURE(rc))
3809 return rc;
3810
3811 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3812 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3813 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3814
3815 for (unsigned iPage = 0; iPage < cPages; iPage++)
3816 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3817 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3818 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3819
3820 /*
3821 * Take the semaphore and call the worker function.
3822 */
3823 gmmR0MutexAcquire(pGMM);
3824 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3825 {
3826 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3827 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3828 }
3829 else
3830 rc = VERR_GMM_IS_NOT_SANE;
3831 gmmR0MutexRelease(pGMM);
3832 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3833 return rc;
3834}
3835
3836
3837/**
3838 * VMMR0 request wrapper for GMMR0FreePages.
3839 *
3840 * @returns see GMMR0FreePages.
3841 * @param pGVM The global (ring-0) VM structure.
3842 * @param idCpu The VCPU id.
3843 * @param pReq Pointer to the request packet.
3844 */
3845GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3846{
3847 /*
3848 * Validate input and pass it on.
3849 */
3850 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3851 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3852 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3853 VERR_INVALID_PARAMETER);
3854 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3855 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3856 VERR_INVALID_PARAMETER);
3857
3858 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3859}
3860
3861
3862/**
3863 * Report back on a memory ballooning request.
3864 *
3865 * The request may or may not have been initiated by the GMM. If it was initiated
3866 * by the GMM it is important that this function is called even if no pages were
3867 * ballooned.
3868 *
3869 * @returns VBox status code:
3870 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3871 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3872 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3873 * indicating that we won't necessarily have sufficient RAM to boot
3874 * the VM again and that it should pause until this changes (we'll try
3875 * balloon some other VM). (For standard deflate we have little choice
3876 * but to hope the VM won't use the memory that was returned to it.)
3877 *
3878 * @param pGVM The global (ring-0) VM structure.
3879 * @param idCpu The VCPU id.
3880 * @param enmAction Inflate/deflate/reset.
3881 * @param cBalloonedPages The number of pages that was ballooned.
3882 *
3883 * @thread EMT(idCpu)
3884 */
3885GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3886{
3887 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3888 pGVM, enmAction, cBalloonedPages));
3889
3890 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3891
3892 /*
3893 * Validate input and get the basics.
3894 */
3895 PGMM pGMM;
3896 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3897 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3898 if (RT_FAILURE(rc))
3899 return rc;
3900
3901 /*
3902 * Take the semaphore and do some more validations.
3903 */
3904 gmmR0MutexAcquire(pGMM);
3905 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3906 {
3907 switch (enmAction)
3908 {
3909 case GMMBALLOONACTION_INFLATE:
3910 {
3911 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3912 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3913 {
3914 /*
3915 * Record the ballooned memory.
3916 */
3917 pGMM->cBalloonedPages += cBalloonedPages;
3918 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3919 {
3920 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3921 AssertFailed();
3922
3923 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3924 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3925 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3926 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3927 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3928 }
3929 else
3930 {
3931 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3932 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3933 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3934 }
3935 }
3936 else
3937 {
3938 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3939 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3940 pGVM->gmm.s.Stats.Reserved.cBasePages));
3941 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3942 }
3943 break;
3944 }
3945
3946 case GMMBALLOONACTION_DEFLATE:
3947 {
3948 /* Deflate. */
3949 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3950 {
3951 /*
3952 * Record the ballooned memory.
3953 */
3954 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3955 pGMM->cBalloonedPages -= cBalloonedPages;
3956 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3957 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3958 {
3959 AssertFailed(); /* This is path is for later. */
3960 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3961 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3962
3963 /*
3964 * Anything we need to do here now when the request has been completed?
3965 */
3966 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3967 }
3968 else
3969 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3970 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3971 }
3972 else
3973 {
3974 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3975 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3976 }
3977 break;
3978 }
3979
3980 case GMMBALLOONACTION_RESET:
3981 {
3982 /* Reset to an empty balloon. */
3983 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3984
3985 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3986 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3987 break;
3988 }
3989
3990 default:
3991 rc = VERR_INVALID_PARAMETER;
3992 break;
3993 }
3994 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3995 }
3996 else
3997 rc = VERR_GMM_IS_NOT_SANE;
3998
3999 gmmR0MutexRelease(pGMM);
4000 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
4001 return rc;
4002}
4003
4004
4005/**
4006 * VMMR0 request wrapper for GMMR0BalloonedPages.
4007 *
4008 * @returns see GMMR0BalloonedPages.
4009 * @param pGVM The global (ring-0) VM structure.
4010 * @param idCpu The VCPU id.
4011 * @param pReq Pointer to the request packet.
4012 */
4013GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
4014{
4015 /*
4016 * Validate input and pass it on.
4017 */
4018 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4019 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
4020 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
4021 VERR_INVALID_PARAMETER);
4022
4023 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
4024}
4025
4026
4027/**
4028 * Return memory statistics for the hypervisor
4029 *
4030 * @returns VBox status code.
4031 * @param pReq Pointer to the request packet.
4032 */
4033GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
4034{
4035 /*
4036 * Validate input and pass it on.
4037 */
4038 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4039 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4040 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4041 VERR_INVALID_PARAMETER);
4042
4043 /*
4044 * Validate input and get the basics.
4045 */
4046 PGMM pGMM;
4047 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4048 pReq->cAllocPages = pGMM->cAllocatedPages;
4049 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
4050 pReq->cBalloonedPages = pGMM->cBalloonedPages;
4051 pReq->cMaxPages = pGMM->cMaxPages;
4052 pReq->cSharedPages = pGMM->cDuplicatePages;
4053 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4054
4055 return VINF_SUCCESS;
4056}
4057
4058
4059/**
4060 * Return memory statistics for the VM
4061 *
4062 * @returns VBox status code.
4063 * @param pGVM The global (ring-0) VM structure.
4064 * @param idCpu Cpu id.
4065 * @param pReq Pointer to the request packet.
4066 *
4067 * @thread EMT(idCpu)
4068 */
4069GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4070{
4071 /*
4072 * Validate input and pass it on.
4073 */
4074 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4075 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4076 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4077 VERR_INVALID_PARAMETER);
4078
4079 /*
4080 * Validate input and get the basics.
4081 */
4082 PGMM pGMM;
4083 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4084 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4085 if (RT_FAILURE(rc))
4086 return rc;
4087
4088 /*
4089 * Take the semaphore and do some more validations.
4090 */
4091 gmmR0MutexAcquire(pGMM);
4092 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4093 {
4094 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4095 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4096 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4097 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4098 }
4099 else
4100 rc = VERR_GMM_IS_NOT_SANE;
4101
4102 gmmR0MutexRelease(pGMM);
4103 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4104 return rc;
4105}
4106
4107
4108/**
4109 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4110 *
4111 * Don't call this in legacy allocation mode!
4112 *
4113 * @returns VBox status code.
4114 * @param pGMM Pointer to the GMM instance data.
4115 * @param pGVM Pointer to the Global VM structure.
4116 * @param pChunk Pointer to the chunk to be unmapped.
4117 */
4118static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4119{
4120 RT_NOREF_PV(pGMM);
4121#ifdef GMM_WITH_LEGACY_MODE
4122 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4123#endif
4124
4125 /*
4126 * Find the mapping and try unmapping it.
4127 */
4128 uint32_t cMappings = pChunk->cMappingsX;
4129 for (uint32_t i = 0; i < cMappings; i++)
4130 {
4131 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4132 if (pChunk->paMappingsX[i].pGVM == pGVM)
4133 {
4134 /* unmap */
4135 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4136 if (RT_SUCCESS(rc))
4137 {
4138 /* update the record. */
4139 cMappings--;
4140 if (i < cMappings)
4141 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4142 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4143 pChunk->paMappingsX[cMappings].pGVM = NULL;
4144 Assert(pChunk->cMappingsX - 1U == cMappings);
4145 pChunk->cMappingsX = cMappings;
4146 }
4147
4148 return rc;
4149 }
4150 }
4151
4152 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4153 return VERR_GMM_CHUNK_NOT_MAPPED;
4154}
4155
4156
4157/**
4158 * Unmaps a chunk previously mapped into the address space of the current process.
4159 *
4160 * @returns VBox status code.
4161 * @param pGMM Pointer to the GMM instance data.
4162 * @param pGVM Pointer to the Global VM structure.
4163 * @param pChunk Pointer to the chunk to be unmapped.
4164 * @param fRelaxedSem Whether we can release the semaphore while doing the
4165 * mapping (@c true) or not.
4166 */
4167static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4168{
4169#ifdef GMM_WITH_LEGACY_MODE
4170 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4171 {
4172#endif
4173 /*
4174 * Lock the chunk and if possible leave the giant GMM lock.
4175 */
4176 GMMR0CHUNKMTXSTATE MtxState;
4177 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4178 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4179 if (RT_SUCCESS(rc))
4180 {
4181 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4182 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4183 }
4184 return rc;
4185#ifdef GMM_WITH_LEGACY_MODE
4186 }
4187
4188 if (pChunk->hGVM == pGVM->hSelf)
4189 return VINF_SUCCESS;
4190
4191 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4192 return VERR_GMM_CHUNK_NOT_MAPPED;
4193#endif
4194}
4195
4196
4197/**
4198 * Worker for gmmR0MapChunk.
4199 *
4200 * @returns VBox status code.
4201 * @param pGMM Pointer to the GMM instance data.
4202 * @param pGVM Pointer to the Global VM structure.
4203 * @param pChunk Pointer to the chunk to be mapped.
4204 * @param ppvR3 Where to store the ring-3 address of the mapping.
4205 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4206 * contain the address of the existing mapping.
4207 */
4208static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4209{
4210#ifdef GMM_WITH_LEGACY_MODE
4211 /*
4212 * If we're in legacy mode this is simple.
4213 */
4214 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4215 {
4216 if (pChunk->hGVM != pGVM->hSelf)
4217 {
4218 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4219 return VERR_GMM_CHUNK_NOT_FOUND;
4220 }
4221
4222 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4223 return VINF_SUCCESS;
4224 }
4225#else
4226 RT_NOREF(pGMM);
4227#endif
4228
4229 /*
4230 * Check to see if the chunk is already mapped.
4231 */
4232 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4233 {
4234 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4235 if (pChunk->paMappingsX[i].pGVM == pGVM)
4236 {
4237 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4238 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4239#ifdef VBOX_WITH_PAGE_SHARING
4240 /* The ring-3 chunk cache can be out of sync; don't fail. */
4241 return VINF_SUCCESS;
4242#else
4243 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4244#endif
4245 }
4246 }
4247
4248 /*
4249 * Do the mapping.
4250 */
4251 RTR0MEMOBJ hMapObj;
4252 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4253 if (RT_SUCCESS(rc))
4254 {
4255 /* reallocate the array? assumes few users per chunk (usually one). */
4256 unsigned iMapping = pChunk->cMappingsX;
4257 if ( iMapping <= 3
4258 || (iMapping & 3) == 0)
4259 {
4260 unsigned cNewSize = iMapping <= 3
4261 ? iMapping + 1
4262 : iMapping + 4;
4263 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4264 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4265 {
4266 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4267 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4268 }
4269
4270 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4271 if (RT_UNLIKELY(!pvMappings))
4272 {
4273 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4274 return VERR_NO_MEMORY;
4275 }
4276 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4277 }
4278
4279 /* insert new entry */
4280 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4281 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4282 Assert(pChunk->cMappingsX == iMapping);
4283 pChunk->cMappingsX = iMapping + 1;
4284
4285 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4286 }
4287
4288 return rc;
4289}
4290
4291
4292/**
4293 * Maps a chunk into the user address space of the current process.
4294 *
4295 * @returns VBox status code.
4296 * @param pGMM Pointer to the GMM instance data.
4297 * @param pGVM Pointer to the Global VM structure.
4298 * @param pChunk Pointer to the chunk to be mapped.
4299 * @param fRelaxedSem Whether we can release the semaphore while doing the
4300 * mapping (@c true) or not.
4301 * @param ppvR3 Where to store the ring-3 address of the mapping.
4302 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4303 * contain the address of the existing mapping.
4304 */
4305static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4306{
4307 /*
4308 * Take the chunk lock and leave the giant GMM lock when possible, then
4309 * call the worker function.
4310 */
4311 GMMR0CHUNKMTXSTATE MtxState;
4312 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4313 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4314 if (RT_SUCCESS(rc))
4315 {
4316 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4317 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4318 }
4319
4320 return rc;
4321}
4322
4323
4324
4325#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4326/**
4327 * Check if a chunk is mapped into the specified VM
4328 *
4329 * @returns mapped yes/no
4330 * @param pGMM Pointer to the GMM instance.
4331 * @param pGVM Pointer to the Global VM structure.
4332 * @param pChunk Pointer to the chunk to be mapped.
4333 * @param ppvR3 Where to store the ring-3 address of the mapping.
4334 */
4335static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4336{
4337 GMMR0CHUNKMTXSTATE MtxState;
4338 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4339 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4340 {
4341 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4342 if (pChunk->paMappingsX[i].pGVM == pGVM)
4343 {
4344 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4345 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4346 return true;
4347 }
4348 }
4349 *ppvR3 = NULL;
4350 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4351 return false;
4352}
4353#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4354
4355
4356/**
4357 * Map a chunk and/or unmap another chunk.
4358 *
4359 * The mapping and unmapping applies to the current process.
4360 *
4361 * This API does two things because it saves a kernel call per mapping when
4362 * when the ring-3 mapping cache is full.
4363 *
4364 * @returns VBox status code.
4365 * @param pGVM The global (ring-0) VM structure.
4366 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4367 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4368 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4369 * @thread EMT ???
4370 */
4371GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4372{
4373 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4374 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4375
4376 /*
4377 * Validate input and get the basics.
4378 */
4379 PGMM pGMM;
4380 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4381 int rc = GVMMR0ValidateGVM(pGVM);
4382 if (RT_FAILURE(rc))
4383 return rc;
4384
4385 AssertCompile(NIL_GMM_CHUNKID == 0);
4386 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4387 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4388
4389 if ( idChunkMap == NIL_GMM_CHUNKID
4390 && idChunkUnmap == NIL_GMM_CHUNKID)
4391 return VERR_INVALID_PARAMETER;
4392
4393 if (idChunkMap != NIL_GMM_CHUNKID)
4394 {
4395 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4396 *ppvR3 = NIL_RTR3PTR;
4397 }
4398
4399 /*
4400 * Take the semaphore and do the work.
4401 *
4402 * The unmapping is done last since it's easier to undo a mapping than
4403 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4404 * that it pushes the user virtual address space to within a chunk of
4405 * it it's limits, so, no problem here.
4406 */
4407 gmmR0MutexAcquire(pGMM);
4408 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4409 {
4410 PGMMCHUNK pMap = NULL;
4411 if (idChunkMap != NIL_GVM_HANDLE)
4412 {
4413 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4414 if (RT_LIKELY(pMap))
4415 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4416 else
4417 {
4418 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4419 rc = VERR_GMM_CHUNK_NOT_FOUND;
4420 }
4421 }
4422/** @todo split this operation, the bail out might (theoretcially) not be
4423 * entirely safe. */
4424
4425 if ( idChunkUnmap != NIL_GMM_CHUNKID
4426 && RT_SUCCESS(rc))
4427 {
4428 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4429 if (RT_LIKELY(pUnmap))
4430 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4431 else
4432 {
4433 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4434 rc = VERR_GMM_CHUNK_NOT_FOUND;
4435 }
4436
4437 if (RT_FAILURE(rc) && pMap)
4438 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4439 }
4440
4441 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4442 }
4443 else
4444 rc = VERR_GMM_IS_NOT_SANE;
4445 gmmR0MutexRelease(pGMM);
4446
4447 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4448 return rc;
4449}
4450
4451
4452/**
4453 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4454 *
4455 * @returns see GMMR0MapUnmapChunk.
4456 * @param pGVM The global (ring-0) VM structure.
4457 * @param pReq Pointer to the request packet.
4458 */
4459GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4460{
4461 /*
4462 * Validate input and pass it on.
4463 */
4464 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4465 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4466
4467 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4468}
4469
4470
4471/**
4472 * Legacy mode API for supplying pages.
4473 *
4474 * The specified user address points to a allocation chunk sized block that
4475 * will be locked down and used by the GMM when the GM asks for pages.
4476 *
4477 * @returns VBox status code.
4478 * @param pGVM The global (ring-0) VM structure.
4479 * @param idCpu The VCPU id.
4480 * @param pvR3 Pointer to the chunk size memory block to lock down.
4481 */
4482GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4483{
4484#ifdef GMM_WITH_LEGACY_MODE
4485 /*
4486 * Validate input and get the basics.
4487 */
4488 PGMM pGMM;
4489 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4490 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4491 if (RT_FAILURE(rc))
4492 return rc;
4493
4494 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4495 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4496
4497 if (!pGMM->fLegacyAllocationMode)
4498 {
4499 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4500 return VERR_NOT_SUPPORTED;
4501 }
4502
4503 /*
4504 * Lock the memory and add it as new chunk with our hGVM.
4505 * (The GMM locking is done inside gmmR0RegisterChunk.)
4506 */
4507 RTR0MEMOBJ hMemObj;
4508 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4509 if (RT_SUCCESS(rc))
4510 {
4511 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4512 if (RT_SUCCESS(rc))
4513 gmmR0MutexRelease(pGMM);
4514 else
4515 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4516 }
4517
4518 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4519 return rc;
4520#else
4521 RT_NOREF(pGVM, idCpu, pvR3);
4522 return VERR_NOT_SUPPORTED;
4523#endif
4524}
4525
4526#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4527
4528/**
4529 * Gets the ring-0 virtual address for the given page.
4530 *
4531 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4532 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4533 * corresponding chunk will remain valid beyond the call (at least till the EMT
4534 * returns to ring-3).
4535 *
4536 * @returns VBox status code.
4537 * @param pGVM Pointer to the kernel-only VM instace data.
4538 * @param idPage The page ID.
4539 * @param ppv Where to store the address.
4540 * @thread EMT
4541 */
4542GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4543{
4544 *ppv = NULL;
4545 PGMM pGMM;
4546 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4547
4548 uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4549
4550 /*
4551 * Start with the per-VM TLB.
4552 */
4553 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4554
4555 PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4556 PGMMCHUNK pChunk = pTlbe->pChunk;
4557 if ( pChunk != NULL
4558 && pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4559 && pChunk->Core.Key == idChunk)
4560 pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4561 else
4562 {
4563 pGVM->R0Stats.gmm.cChunkTlbMisses++;
4564
4565 /*
4566 * Look it up in the chunk tree.
4567 */
4568 RTSpinlockAcquire(pGMM->hSpinLockTree);
4569 pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4570 if (RT_LIKELY(pChunk))
4571 {
4572 pTlbe->idGeneration = pGMM->idFreeGeneration;
4573 RTSpinlockRelease(pGMM->hSpinLockTree);
4574 pTlbe->pChunk = pChunk;
4575 }
4576 else
4577 {
4578 RTSpinlockRelease(pGMM->hSpinLockTree);
4579 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4580 AssertMsgFailed(("idPage=%#x\n", idPage));
4581 return VERR_GMM_PAGE_NOT_FOUND;
4582 }
4583 }
4584
4585 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4586
4587 /*
4588 * Got a chunk, now validate the page ownership and calcuate it's address.
4589 */
4590 const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4591 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4592 && pPage->Private.hGVM == pGVM->hSelf)
4593 || GMM_PAGE_IS_SHARED(pPage)))
4594 {
4595 AssertPtr(pChunk->pbMapping);
4596 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4597 return VINF_SUCCESS;
4598 }
4599 AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4600 idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4601 return VERR_GMM_NOT_PAGE_OWNER;
4602}
4603
4604#endif
4605
4606#ifdef VBOX_WITH_PAGE_SHARING
4607
4608# ifdef VBOX_STRICT
4609/**
4610 * For checksumming shared pages in strict builds.
4611 *
4612 * The purpose is making sure that a page doesn't change.
4613 *
4614 * @returns Checksum, 0 on failure.
4615 * @param pGMM The GMM instance data.
4616 * @param pGVM Pointer to the kernel-only VM instace data.
4617 * @param idPage The page ID.
4618 */
4619static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4620{
4621 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4622 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4623
4624 uint8_t *pbChunk;
4625 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4626 return 0;
4627 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4628
4629 return RTCrc32(pbPage, PAGE_SIZE);
4630}
4631# endif /* VBOX_STRICT */
4632
4633
4634/**
4635 * Calculates the module hash value.
4636 *
4637 * @returns Hash value.
4638 * @param pszModuleName The module name.
4639 * @param pszVersion The module version string.
4640 */
4641static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4642{
4643 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4644}
4645
4646
4647/**
4648 * Finds a global module.
4649 *
4650 * @returns Pointer to the global module on success, NULL if not found.
4651 * @param pGMM The GMM instance data.
4652 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4653 * @param cbModule The module size.
4654 * @param enmGuestOS The guest OS type.
4655 * @param cRegions The number of regions.
4656 * @param pszModuleName The module name.
4657 * @param pszVersion The module version.
4658 * @param paRegions The region descriptions.
4659 */
4660static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4661 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4662 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4663{
4664 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4665 pGblMod;
4666 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4667 {
4668 if (pGblMod->cbModule != cbModule)
4669 continue;
4670 if (pGblMod->enmGuestOS != enmGuestOS)
4671 continue;
4672 if (pGblMod->cRegions != cRegions)
4673 continue;
4674 if (strcmp(pGblMod->szName, pszModuleName))
4675 continue;
4676 if (strcmp(pGblMod->szVersion, pszVersion))
4677 continue;
4678
4679 uint32_t i;
4680 for (i = 0; i < cRegions; i++)
4681 {
4682 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4683 if (pGblMod->aRegions[i].off != off)
4684 break;
4685
4686 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4687 if (pGblMod->aRegions[i].cb != cb)
4688 break;
4689 }
4690
4691 if (i == cRegions)
4692 return pGblMod;
4693 }
4694
4695 return NULL;
4696}
4697
4698
4699/**
4700 * Creates a new global module.
4701 *
4702 * @returns VBox status code.
4703 * @param pGMM The GMM instance data.
4704 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4705 * @param cbModule The module size.
4706 * @param enmGuestOS The guest OS type.
4707 * @param cRegions The number of regions.
4708 * @param pszModuleName The module name.
4709 * @param pszVersion The module version.
4710 * @param paRegions The region descriptions.
4711 * @param ppGblMod Where to return the new module on success.
4712 */
4713static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4714 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4715 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4716{
4717 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4718 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4719 {
4720 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4721 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4722 }
4723
4724 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4725 if (!pGblMod)
4726 {
4727 Log(("gmmR0ShModNewGlobal: No memory\n"));
4728 return VERR_NO_MEMORY;
4729 }
4730
4731 pGblMod->Core.Key = uHash;
4732 pGblMod->cbModule = cbModule;
4733 pGblMod->cRegions = cRegions;
4734 pGblMod->cUsers = 1;
4735 pGblMod->enmGuestOS = enmGuestOS;
4736 strcpy(pGblMod->szName, pszModuleName);
4737 strcpy(pGblMod->szVersion, pszVersion);
4738
4739 for (uint32_t i = 0; i < cRegions; i++)
4740 {
4741 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4742 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4743 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4744 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4745 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4746 }
4747
4748 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4749 Assert(fInsert); NOREF(fInsert);
4750 pGMM->cShareableModules++;
4751
4752 *ppGblMod = pGblMod;
4753 return VINF_SUCCESS;
4754}
4755
4756
4757/**
4758 * Deletes a global module which is no longer referenced by anyone.
4759 *
4760 * @param pGMM The GMM instance data.
4761 * @param pGblMod The module to delete.
4762 */
4763static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4764{
4765 Assert(pGblMod->cUsers == 0);
4766 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4767
4768 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4769 Assert(pvTest == pGblMod); NOREF(pvTest);
4770 pGMM->cShareableModules--;
4771
4772 uint32_t i = pGblMod->cRegions;
4773 while (i-- > 0)
4774 {
4775 if (pGblMod->aRegions[i].paidPages)
4776 {
4777 /* We don't doing anything to the pages as they are handled by the
4778 copy-on-write mechanism in PGM. */
4779 RTMemFree(pGblMod->aRegions[i].paidPages);
4780 pGblMod->aRegions[i].paidPages = NULL;
4781 }
4782 }
4783 RTMemFree(pGblMod);
4784}
4785
4786
4787static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4788 PGMMSHAREDMODULEPERVM *ppRecVM)
4789{
4790 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4791 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4792
4793 PGMMSHAREDMODULEPERVM pRecVM;
4794 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4795 if (!pRecVM)
4796 return VERR_NO_MEMORY;
4797
4798 pRecVM->Core.Key = GCBaseAddr;
4799 for (uint32_t i = 0; i < cRegions; i++)
4800 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4801
4802 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4803 Assert(fInsert); NOREF(fInsert);
4804 pGVM->gmm.s.Stats.cShareableModules++;
4805
4806 *ppRecVM = pRecVM;
4807 return VINF_SUCCESS;
4808}
4809
4810
4811static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4812{
4813 /*
4814 * Free the per-VM module.
4815 */
4816 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4817 pRecVM->pGlobalModule = NULL;
4818
4819 if (fRemove)
4820 {
4821 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4822 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4823 }
4824
4825 RTMemFree(pRecVM);
4826
4827 /*
4828 * Release the global module.
4829 * (In the registration bailout case, it might not be.)
4830 */
4831 if (pGblMod)
4832 {
4833 Assert(pGblMod->cUsers > 0);
4834 pGblMod->cUsers--;
4835 if (pGblMod->cUsers == 0)
4836 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4837 }
4838}
4839
4840#endif /* VBOX_WITH_PAGE_SHARING */
4841
4842/**
4843 * Registers a new shared module for the VM.
4844 *
4845 * @returns VBox status code.
4846 * @param pGVM The global (ring-0) VM structure.
4847 * @param idCpu The VCPU id.
4848 * @param enmGuestOS The guest OS type.
4849 * @param pszModuleName The module name.
4850 * @param pszVersion The module version.
4851 * @param GCPtrModBase The module base address.
4852 * @param cbModule The module size.
4853 * @param cRegions The mumber of shared region descriptors.
4854 * @param paRegions Pointer to an array of shared region(s).
4855 * @thread EMT(idCpu)
4856 */
4857GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4858 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4859 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4860{
4861#ifdef VBOX_WITH_PAGE_SHARING
4862 /*
4863 * Validate input and get the basics.
4864 *
4865 * Note! Turns out the module size does necessarily match the size of the
4866 * regions. (iTunes on XP)
4867 */
4868 PGMM pGMM;
4869 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4870 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4871 if (RT_FAILURE(rc))
4872 return rc;
4873
4874 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4875 return VERR_GMM_TOO_MANY_REGIONS;
4876
4877 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4878 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4879
4880 uint32_t cbTotal = 0;
4881 for (uint32_t i = 0; i < cRegions; i++)
4882 {
4883 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4884 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4885
4886 cbTotal += paRegions[i].cbRegion;
4887 if (RT_UNLIKELY(cbTotal > _1G))
4888 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4889 }
4890
4891 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4892 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4893 return VERR_GMM_MODULE_NAME_TOO_LONG;
4894
4895 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4896 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4897 return VERR_GMM_MODULE_NAME_TOO_LONG;
4898
4899 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4900 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4901
4902 /*
4903 * Take the semaphore and do some more validations.
4904 */
4905 gmmR0MutexAcquire(pGMM);
4906 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4907 {
4908 /*
4909 * Check if this module is already locally registered and register
4910 * it if it isn't. The base address is a unique module identifier
4911 * locally.
4912 */
4913 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4914 bool fNewModule = pRecVM == NULL;
4915 if (fNewModule)
4916 {
4917 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4918 if (RT_SUCCESS(rc))
4919 {
4920 /*
4921 * Find a matching global module, register a new one if needed.
4922 */
4923 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4924 pszModuleName, pszVersion, paRegions);
4925 if (!pGblMod)
4926 {
4927 Assert(fNewModule);
4928 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4929 pszModuleName, pszVersion, paRegions, &pGblMod);
4930 if (RT_SUCCESS(rc))
4931 {
4932 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4933 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4934 }
4935 else
4936 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4937 }
4938 else
4939 {
4940 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4941 pGblMod->cUsers++;
4942 pRecVM->pGlobalModule = pGblMod;
4943
4944 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4945 }
4946 }
4947 }
4948 else
4949 {
4950 /*
4951 * Attempt to re-register an existing module.
4952 */
4953 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4954 pszModuleName, pszVersion, paRegions);
4955 if (pRecVM->pGlobalModule == pGblMod)
4956 {
4957 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4958 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4959 }
4960 else
4961 {
4962 /** @todo may have to unregister+register when this happens in case it's caused
4963 * by VBoxService crashing and being restarted... */
4964 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4965 " incoming at %RGvLB%#x %s %s rgns %u\n"
4966 " existing at %RGvLB%#x %s %s rgns %u\n",
4967 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4968 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4969 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4970 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4971 }
4972 }
4973 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4974 }
4975 else
4976 rc = VERR_GMM_IS_NOT_SANE;
4977
4978 gmmR0MutexRelease(pGMM);
4979 return rc;
4980#else
4981
4982 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4983 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4984 return VERR_NOT_IMPLEMENTED;
4985#endif
4986}
4987
4988
4989/**
4990 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4991 *
4992 * @returns see GMMR0RegisterSharedModule.
4993 * @param pGVM The global (ring-0) VM structure.
4994 * @param idCpu The VCPU id.
4995 * @param pReq Pointer to the request packet.
4996 */
4997GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4998{
4999 /*
5000 * Validate input and pass it on.
5001 */
5002 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5003 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
5004 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
5005 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5006
5007 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
5008 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
5009 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
5010 return VINF_SUCCESS;
5011}
5012
5013
5014/**
5015 * Unregisters a shared module for the VM
5016 *
5017 * @returns VBox status code.
5018 * @param pGVM The global (ring-0) VM structure.
5019 * @param idCpu The VCPU id.
5020 * @param pszModuleName The module name.
5021 * @param pszVersion The module version.
5022 * @param GCPtrModBase The module base address.
5023 * @param cbModule The module size.
5024 */
5025GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
5026 RTGCPTR GCPtrModBase, uint32_t cbModule)
5027{
5028#ifdef VBOX_WITH_PAGE_SHARING
5029 /*
5030 * Validate input and get the basics.
5031 */
5032 PGMM pGMM;
5033 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5034 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5035 if (RT_FAILURE(rc))
5036 return rc;
5037
5038 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
5039 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
5040 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
5041 return VERR_GMM_MODULE_NAME_TOO_LONG;
5042 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
5043 return VERR_GMM_MODULE_NAME_TOO_LONG;
5044
5045 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
5046
5047 /*
5048 * Take the semaphore and do some more validations.
5049 */
5050 gmmR0MutexAcquire(pGMM);
5051 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5052 {
5053 /*
5054 * Locate and remove the specified module.
5055 */
5056 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
5057 if (pRecVM)
5058 {
5059 /** @todo Do we need to do more validations here, like that the
5060 * name + version + cbModule matches? */
5061 NOREF(cbModule);
5062 Assert(pRecVM->pGlobalModule);
5063 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
5064 }
5065 else
5066 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
5067
5068 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5069 }
5070 else
5071 rc = VERR_GMM_IS_NOT_SANE;
5072
5073 gmmR0MutexRelease(pGMM);
5074 return rc;
5075#else
5076
5077 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
5078 return VERR_NOT_IMPLEMENTED;
5079#endif
5080}
5081
5082
5083/**
5084 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
5085 *
5086 * @returns see GMMR0UnregisterSharedModule.
5087 * @param pGVM The global (ring-0) VM structure.
5088 * @param idCpu The VCPU id.
5089 * @param pReq Pointer to the request packet.
5090 */
5091GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
5092{
5093 /*
5094 * Validate input and pass it on.
5095 */
5096 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5097 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5098
5099 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
5100}
5101
5102#ifdef VBOX_WITH_PAGE_SHARING
5103
5104/**
5105 * Increase the use count of a shared page, the page is known to exist and be valid and such.
5106 *
5107 * @param pGMM Pointer to the GMM instance.
5108 * @param pGVM Pointer to the GVM instance.
5109 * @param pPage The page structure.
5110 */
5111DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
5112{
5113 Assert(pGMM->cSharedPages > 0);
5114 Assert(pGMM->cAllocatedPages > 0);
5115
5116 pGMM->cDuplicatePages++;
5117
5118 pPage->Shared.cRefs++;
5119 pGVM->gmm.s.Stats.cSharedPages++;
5120 pGVM->gmm.s.Stats.Allocated.cBasePages++;
5121}
5122
5123
5124/**
5125 * Converts a private page to a shared page, the page is known to exist and be valid and such.
5126 *
5127 * @param pGMM Pointer to the GMM instance.
5128 * @param pGVM Pointer to the GVM instance.
5129 * @param HCPhys Host physical address
5130 * @param idPage The Page ID
5131 * @param pPage The page structure.
5132 * @param pPageDesc Shared page descriptor
5133 */
5134DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5135 PGMMSHAREDPAGEDESC pPageDesc)
5136{
5137 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5138 Assert(pChunk);
5139 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5140 Assert(GMM_PAGE_IS_PRIVATE(pPage));
5141
5142 pChunk->cPrivate--;
5143 pChunk->cShared++;
5144
5145 pGMM->cSharedPages++;
5146
5147 pGVM->gmm.s.Stats.cSharedPages++;
5148 pGVM->gmm.s.Stats.cPrivatePages--;
5149
5150 /* Modify the page structure. */
5151 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5152 pPage->Shared.cRefs = 1;
5153#ifdef VBOX_STRICT
5154 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5155 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5156#else
5157 NOREF(pPageDesc);
5158 pPage->Shared.u14Checksum = 0;
5159#endif
5160 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5161}
5162
5163
5164static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5165 unsigned idxRegion, unsigned idxPage,
5166 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5167{
5168 NOREF(pModule);
5169
5170 /* Easy case: just change the internal page type. */
5171 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5172 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5173 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5174 VERR_PGM_PHYS_INVALID_PAGE_ID);
5175 NOREF(idxRegion);
5176
5177 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5178
5179 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5180
5181 /* Keep track of these references. */
5182 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5183
5184 return VINF_SUCCESS;
5185}
5186
5187/**
5188 * Checks specified shared module range for changes
5189 *
5190 * Performs the following tasks:
5191 * - If a shared page is new, then it changes the GMM page type to shared and
5192 * returns it in the pPageDesc descriptor.
5193 * - If a shared page already exists, then it checks if the VM page is
5194 * identical and if so frees the VM page and returns the shared page in
5195 * pPageDesc descriptor.
5196 *
5197 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5198 *
5199 * @returns VBox status code.
5200 * @param pGVM Pointer to the GVM instance data.
5201 * @param pModule Module description
5202 * @param idxRegion Region index
5203 * @param idxPage Page index
5204 * @param pPageDesc Page descriptor
5205 */
5206GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5207 PGMMSHAREDPAGEDESC pPageDesc)
5208{
5209 int rc;
5210 PGMM pGMM;
5211 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5212 pPageDesc->u32StrictChecksum = 0;
5213
5214 AssertMsgReturn(idxRegion < pModule->cRegions,
5215 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5216 VERR_INVALID_PARAMETER);
5217
5218 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5219 AssertMsgReturn(idxPage < cPages,
5220 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5221 VERR_INVALID_PARAMETER);
5222
5223 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5224
5225 /*
5226 * First time; create a page descriptor array.
5227 */
5228 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5229 if (!pGlobalRegion->paidPages)
5230 {
5231 Log(("Allocate page descriptor array for %d pages\n", cPages));
5232 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5233 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5234
5235 /* Invalidate all descriptors. */
5236 uint32_t i = cPages;
5237 while (i-- > 0)
5238 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5239 }
5240
5241 /*
5242 * We've seen this shared page for the first time?
5243 */
5244 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5245 {
5246 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5247 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5248 }
5249
5250 /*
5251 * We've seen it before...
5252 */
5253 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5254 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5255 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5256
5257 /*
5258 * Get the shared page source.
5259 */
5260 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5261 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5262 VERR_PGM_PHYS_INVALID_PAGE_ID);
5263
5264 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5265 {
5266 /*
5267 * Page was freed at some point; invalidate this entry.
5268 */
5269 /** @todo this isn't really bullet proof. */
5270 Log(("Old shared page was freed -> create a new one\n"));
5271 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5272 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5273 }
5274
5275 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5276
5277 /*
5278 * Calculate the virtual address of the local page.
5279 */
5280 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5281 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5282 VERR_PGM_PHYS_INVALID_PAGE_ID);
5283
5284 uint8_t *pbChunk;
5285 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5286 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5287 VERR_PGM_PHYS_INVALID_PAGE_ID);
5288 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5289
5290 /*
5291 * Calculate the virtual address of the shared page.
5292 */
5293 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5294 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5295
5296 /*
5297 * Get the virtual address of the physical page; map the chunk into the VM
5298 * process if not already done.
5299 */
5300 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5301 {
5302 Log(("Map chunk into process!\n"));
5303 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5304 AssertRCReturn(rc, rc);
5305 }
5306 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5307
5308#ifdef VBOX_STRICT
5309 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5310 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5311 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5312 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5313 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5314#endif
5315
5316 /** @todo write ASMMemComparePage. */
5317 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5318 {
5319 Log(("Unexpected differences found between local and shared page; skip\n"));
5320 /* Signal to the caller that this one hasn't changed. */
5321 pPageDesc->idPage = NIL_GMM_PAGEID;
5322 return VINF_SUCCESS;
5323 }
5324
5325 /*
5326 * Free the old local page.
5327 */
5328 GMMFREEPAGEDESC PageDesc;
5329 PageDesc.idPage = pPageDesc->idPage;
5330 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5331 AssertRCReturn(rc, rc);
5332
5333 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5334
5335 /*
5336 * Pass along the new physical address & page id.
5337 */
5338 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5339 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5340
5341 return VINF_SUCCESS;
5342}
5343
5344
5345/**
5346 * RTAvlGCPtrDestroy callback.
5347 *
5348 * @returns 0 or VERR_GMM_INSTANCE.
5349 * @param pNode The node to destroy.
5350 * @param pvArgs Pointer to an argument packet.
5351 */
5352static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5353{
5354 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5355 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5356 (PGMMSHAREDMODULEPERVM)pNode,
5357 false /*fRemove*/);
5358 return VINF_SUCCESS;
5359}
5360
5361
5362/**
5363 * Used by GMMR0CleanupVM to clean up shared modules.
5364 *
5365 * This is called without taking the GMM lock so that it can be yielded as
5366 * needed here.
5367 *
5368 * @param pGMM The GMM handle.
5369 * @param pGVM The global VM handle.
5370 */
5371static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5372{
5373 gmmR0MutexAcquire(pGMM);
5374 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5375
5376 GMMR0SHMODPERVMDTORARGS Args;
5377 Args.pGVM = pGVM;
5378 Args.pGMM = pGMM;
5379 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5380
5381 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5382 pGVM->gmm.s.Stats.cShareableModules = 0;
5383
5384 gmmR0MutexRelease(pGMM);
5385}
5386
5387#endif /* VBOX_WITH_PAGE_SHARING */
5388
5389/**
5390 * Removes all shared modules for the specified VM
5391 *
5392 * @returns VBox status code.
5393 * @param pGVM The global (ring-0) VM structure.
5394 * @param idCpu The VCPU id.
5395 */
5396GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5397{
5398#ifdef VBOX_WITH_PAGE_SHARING
5399 /*
5400 * Validate input and get the basics.
5401 */
5402 PGMM pGMM;
5403 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5404 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5405 if (RT_FAILURE(rc))
5406 return rc;
5407
5408 /*
5409 * Take the semaphore and do some more validations.
5410 */
5411 gmmR0MutexAcquire(pGMM);
5412 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5413 {
5414 Log(("GMMR0ResetSharedModules\n"));
5415 GMMR0SHMODPERVMDTORARGS Args;
5416 Args.pGVM = pGVM;
5417 Args.pGMM = pGMM;
5418 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5419 pGVM->gmm.s.Stats.cShareableModules = 0;
5420
5421 rc = VINF_SUCCESS;
5422 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5423 }
5424 else
5425 rc = VERR_GMM_IS_NOT_SANE;
5426
5427 gmmR0MutexRelease(pGMM);
5428 return rc;
5429#else
5430 RT_NOREF(pGVM, idCpu);
5431 return VERR_NOT_IMPLEMENTED;
5432#endif
5433}
5434
5435#ifdef VBOX_WITH_PAGE_SHARING
5436
5437/**
5438 * Tree enumeration callback for checking a shared module.
5439 */
5440static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5441{
5442 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5443 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5444 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5445
5446 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5447 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5448
5449 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5450 if (RT_FAILURE(rc))
5451 return rc;
5452 return VINF_SUCCESS;
5453}
5454
5455#endif /* VBOX_WITH_PAGE_SHARING */
5456
5457/**
5458 * Check all shared modules for the specified VM.
5459 *
5460 * @returns VBox status code.
5461 * @param pGVM The global (ring-0) VM structure.
5462 * @param idCpu The calling EMT number.
5463 * @thread EMT(idCpu)
5464 */
5465GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5466{
5467#ifdef VBOX_WITH_PAGE_SHARING
5468 /*
5469 * Validate input and get the basics.
5470 */
5471 PGMM pGMM;
5472 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5473 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5474 if (RT_FAILURE(rc))
5475 return rc;
5476
5477# ifndef DEBUG_sandervl
5478 /*
5479 * Take the semaphore and do some more validations.
5480 */
5481 gmmR0MutexAcquire(pGMM);
5482# endif
5483 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5484 {
5485 /*
5486 * Walk the tree, checking each module.
5487 */
5488 Log(("GMMR0CheckSharedModules\n"));
5489
5490 GMMCHECKSHAREDMODULEINFO Args;
5491 Args.pGVM = pGVM;
5492 Args.idCpu = idCpu;
5493 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5494
5495 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5496 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5497 }
5498 else
5499 rc = VERR_GMM_IS_NOT_SANE;
5500
5501# ifndef DEBUG_sandervl
5502 gmmR0MutexRelease(pGMM);
5503# endif
5504 return rc;
5505#else
5506 RT_NOREF(pGVM, idCpu);
5507 return VERR_NOT_IMPLEMENTED;
5508#endif
5509}
5510
5511#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5512
5513/**
5514 * Worker for GMMR0FindDuplicatePageReq.
5515 *
5516 * @returns true if duplicate, false if not.
5517 */
5518static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5519{
5520 bool fFoundDuplicate = false;
5521 /* Only take chunks not mapped into this VM process; not entirely correct. */
5522 uint8_t *pbChunk;
5523 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5524 {
5525 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5526 if (RT_SUCCESS(rc))
5527 {
5528 /*
5529 * Look for duplicate pages
5530 */
5531 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5532 while (iPage-- > 0)
5533 {
5534 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5535 {
5536 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5537 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5538 {
5539 fFoundDuplicate = true;
5540 break;
5541 }
5542 }
5543 }
5544 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5545 }
5546 }
5547 return fFoundDuplicate;
5548}
5549
5550
5551/**
5552 * Find a duplicate of the specified page in other active VMs
5553 *
5554 * @returns VBox status code.
5555 * @param pGVM The global (ring-0) VM structure.
5556 * @param pReq Pointer to the request packet.
5557 */
5558GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5559{
5560 /*
5561 * Validate input and pass it on.
5562 */
5563 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5564 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5565
5566 PGMM pGMM;
5567 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5568
5569 int rc = GVMMR0ValidateGVM(pGVM);
5570 if (RT_FAILURE(rc))
5571 return rc;
5572
5573 /*
5574 * Take the semaphore and do some more validations.
5575 */
5576 rc = gmmR0MutexAcquire(pGMM);
5577 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5578 {
5579 uint8_t *pbChunk;
5580 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5581 if (pChunk)
5582 {
5583 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5584 {
5585 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5586 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5587 if (pPage)
5588 {
5589 /*
5590 * Walk the chunks
5591 */
5592 pReq->fDuplicate = false;
5593 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5594 {
5595 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5596 {
5597 pReq->fDuplicate = true;
5598 break;
5599 }
5600 }
5601 }
5602 else
5603 {
5604 AssertFailed();
5605 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5606 }
5607 }
5608 else
5609 AssertFailed();
5610 }
5611 else
5612 AssertFailed();
5613 }
5614 else
5615 rc = VERR_GMM_IS_NOT_SANE;
5616
5617 gmmR0MutexRelease(pGMM);
5618 return rc;
5619}
5620
5621#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5622
5623
5624/**
5625 * Retrieves the GMM statistics visible to the caller.
5626 *
5627 * @returns VBox status code.
5628 *
5629 * @param pStats Where to put the statistics.
5630 * @param pSession The current session.
5631 * @param pGVM The GVM to obtain statistics for. Optional.
5632 */
5633GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5634{
5635 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5636
5637 /*
5638 * Validate input.
5639 */
5640 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5641 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5642 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5643
5644 PGMM pGMM;
5645 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5646
5647 /*
5648 * Validate the VM handle, if not NULL, and lock the GMM.
5649 */
5650 int rc;
5651 if (pGVM)
5652 {
5653 rc = GVMMR0ValidateGVM(pGVM);
5654 if (RT_FAILURE(rc))
5655 return rc;
5656 }
5657
5658 rc = gmmR0MutexAcquire(pGMM);
5659 if (RT_FAILURE(rc))
5660 return rc;
5661
5662 /*
5663 * Copy out the GMM statistics.
5664 */
5665 pStats->cMaxPages = pGMM->cMaxPages;
5666 pStats->cReservedPages = pGMM->cReservedPages;
5667 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5668 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5669 pStats->cSharedPages = pGMM->cSharedPages;
5670 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5671 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5672 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5673 pStats->cChunks = pGMM->cChunks;
5674 pStats->cFreedChunks = pGMM->cFreedChunks;
5675 pStats->cShareableModules = pGMM->cShareableModules;
5676 pStats->idFreeGeneration = pGMM->idFreeGeneration;
5677 RT_ZERO(pStats->au64Reserved);
5678
5679 /*
5680 * Copy out the VM statistics.
5681 */
5682 if (pGVM)
5683 pStats->VMStats = pGVM->gmm.s.Stats;
5684 else
5685 RT_ZERO(pStats->VMStats);
5686
5687 gmmR0MutexRelease(pGMM);
5688 return rc;
5689}
5690
5691
5692/**
5693 * VMMR0 request wrapper for GMMR0QueryStatistics.
5694 *
5695 * @returns see GMMR0QueryStatistics.
5696 * @param pGVM The global (ring-0) VM structure. Optional.
5697 * @param pReq Pointer to the request packet.
5698 */
5699GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5700{
5701 /*
5702 * Validate input and pass it on.
5703 */
5704 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5705 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5706
5707 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5708}
5709
5710
5711/**
5712 * Resets the specified GMM statistics.
5713 *
5714 * @returns VBox status code.
5715 *
5716 * @param pStats Which statistics to reset, that is, non-zero fields
5717 * indicates which to reset.
5718 * @param pSession The current session.
5719 * @param pGVM The GVM to reset statistics for. Optional.
5720 */
5721GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5722{
5723 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5724 /* Currently nothing we can reset at the moment. */
5725 return VINF_SUCCESS;
5726}
5727
5728
5729/**
5730 * VMMR0 request wrapper for GMMR0ResetStatistics.
5731 *
5732 * @returns see GMMR0ResetStatistics.
5733 * @param pGVM The global (ring-0) VM structure. Optional.
5734 * @param pReq Pointer to the request packet.
5735 */
5736GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5737{
5738 /*
5739 * Validate input and pass it on.
5740 */
5741 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5742 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5743
5744 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5745}
5746
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette