D3D12 Memory Allocator
|
This library tries to automatically make optimal choices for the resources you create, so you don't need to care about them. There are some advanced features of Direct3D 12 that you may use to optimize your memory management. There are also some settings in D3D12MA that you may change to alter its default behavior. This page provides miscellaneous advice about features of D3D12 and D3D12MA that are non-essential, but may improve the stability or performance of your app.
When trying to allocate more memory than available in the current heap (e.g., video memory on the graphics card, system memory), one of few bad things can happen:
HRESULT
value other than S_OK
.Unfortunately, there is no way to be 100% protected against memory overcommitment. The best approach is to avoid allocating too much memory.
The full capacity of the memory can be queried using function D3D12MA::Allocator::GetMemoryCapacity. However, it is not recommended, because the amount of memory available to the application is typically smaller than the full capacity, as some portion of it is reserved by the operating system or used by other processes.
Because of this, the recommended way of fetching the memory budget available to the application is using function D3D12MA::Allocator::GetBudget. Preventing value D3D12MA::Budget::UsageBytes from exceeding the D3D12MA::Budget::BudgetBytes is probably the best we can do in trying to avoid the consequences of over-commitment. For more information, see also: Statistics.
Example:
IDXGIAdapter3::QueryVideoMemoryInfo
that queries the current memory usage and budget. This library automatically makes use of it when available (when you use recent enough version of the DirectX SDK). If not, it falls back to estimating the usage and budget based on the total amount of the allocated memory and 80% of the full memory capacity, respectively.When creating non-essential resources, you can use D3D12MA::ALLOCATION_FLAG_WITHIN_BUDGET. Then, in case the allocation would exceed the budget, the library will return failure from the function without attempting to allocate the actual D3D12 memory.
It may also be a good idea to support failed resource creation. For non-essential resources, when function D3D12MA::Allocator::CreateResource fails with a result other than S_OK
, it is worth implementing some way of recovery instead of terminating or crashing the entire app.
Creating D3D12 resources (buffers and textures) can be a time-consuming operation. The duration can be unpredictable, spanning from a small fraction of a millisecond to a significant fraction of a second. Thus, it is recommended to allocate all the memory and create all the resources needed upfront rather than doing it during application runtime. For example, a video game can try to create its resources on startup or when loading a new level. Of course, is is not always possible. For example, open-world games may require loading and unloading some graphical assets in the background (often called "streaming").
Creating and releasing D3D12 resources on a separate thread in the background may help. Both ID3D12Device
and D3D12MA::Allocator objects are thread-safe, synchronized internally. However, cases were observed where resource creation calls like ID3D12Device::CreateCommittedResource
were blocking other D3D12 calls like ExecuteCommandLists
or Present
somewhere inside the graphics driver, so hitches can happen even when using multithreading.
The most expensive part is typically the allocation of a new D3D12 memory heap. This library tackles this problem by automatically allocating large heaps (64 MB by default) and creating resources as placed inside of them. When a new requested resource can be placed in a free space of an existing heap and doesn't require allocating a new heap, this operation is typically much faster, as it only requires creating a new ID3D12Resource
object and not allocating new memory. This is the main benefit of using D3D12MA compared to the naive approach of using Direct3D 12 directly and creating each resource as committed with CreateCommittedResource
, which would result in a separate allocation of an implicit heap every time.
When a large number of small buffers needs to be created, the overhead of creating even just separate ID3D12Resource
objects can be significant. It can be avoided by creating one or few larger buffers and manually sub-allocating parts of them for specific needs. This library can also help with it. See section "Sub-allocating buffers" below.
Another reason for the slowness of D3D12 memory allocation is the guarantee that the newly allocated memory is filled with zeros. When creating and destroying resources placed in an existing heap, this overhead is not present, and the memory is not zeroed - it may contain random data left by the resource previously allocated in that place. In recent versions of the DirectX 12 SDK, clearing the memory of the newly created D3D12 heaps can also be disabled for the improved performance. D3D12MA can use this feature when:
D3D12_HEAP_FLAG_CREATE_NOT_ZEROED
is passed to D3D12MA::POOL_DESC::HeapFlags during the creation of a custom pool.It is recommended to always use these flags. The downside is that when the memory is not filled with zeros, while you don't properly clear it or otherwise initialize its content before use (which is required by D3D12), you may observe incorrect behavior. This problem mostly affects render-target and depth-stencil textures.
When an allocation needs to be made in a performance-critical code, you can use D3D12MA::ALLOCATION_FLAG_STRATEGY_MIN_TIME. In influences multiple heuristics inside the library to prefer faster allocation at the expense of possibly less optimal placement in the memory.
If the resource to be created is non-essential, while the performance is paramount, you can also use D3D12MA::ALLOCATION_FLAG_NEVER_ALLOCATE. It will create the resource only if it can be placed inside and existing memory heap and return failure from the function if a new heap would need to be allocated, which should guarantee good performance of such function call.
When a large number of small buffers needs to be created, the overhead of creating separate ID3D12Resource
objects can be significant. It can also cause a significant waste of memory, as placed buffers need to be aligned to D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT
= 64 KB by default. These problems can be avoided by creating one or few larger buffers and manually sub-allocating parts of them for specific needs.
It requires implementing a custom allocator for the data inside the buffer and using offsets to individual regions. When all the regions can be allocated linearly and freed all at once, implementing such allocator is trivial. When every region has the same size, implementing an allocator is also quite simple when using a "free list" algorithm. However, when regions can have different sizes and can be allocated and freed in random order, it requires a full allocation algorithm. D3D12MA can help with it by exposing its core allocation algorithm for custom usages. For more details and example code, see chapter: Virtual allocator. It can be used for all the cases mentioned above without too much performance overhead, because the D3D12MA::VirtualAllocation object is just a lightweight handle.
When sub-allocating a buffer, you need to remember to explicitly request proper alignment required for each region. For example, data used as a constant buffer must be aligned to D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT
= 256 B.
When too much video memory is allocated, one of the things that can happen is the system demoting some heaps to the system memory. Moving data between memory pools or reaching out directly to the system memory through PCI Express bus can have large performance overhead, which can slow down the application, or even make the game unplayable any more. Unfortunately, it is not possible to fully control or prevent this demotion. Best thing to do is avoiding memory over-commitment. For more information, see section "Avoiding running out of memory" above.
Recent versions of DirectX 12 SDK offer function ID3D12Device1::SetResidencyPriority
that sets a hint about the priority of a resource - how important it is to stay resident in the video memory. Setting the priority happens at the level of an entire memory heap. D3D12MA offers an interface to set this priority in form of D3D12MA::POOL_DESC::ResidencyPriority parameter. It affects all allocations made out of the custom pool created with it, both placed inside large heaps and created as committed.
It is recommended to create a custom pool for the purpose of using high residency priority of all resources that are critical for the performance, especially those that are written by the GPU, like render-target, depth-stencil textures, UAV textures and buffers. It is also worth creating them as committed, so that each one will have its own implicit heap. This can minimize the chance that an entire large heap is demoted to system memory, degrading performance of all the resources placed in it.
Example:
When you have a committed allocation created, you can also set the residency priority of its resource using the D3D12 function:
Note this is not the same as explicit eviction controlled using ID3D12Device::Evict
and MakeResident
functions. Resources evicted explicitly are illegal to access until they are made resident again, while the demotion described here happens automatically and only slows down the execution.
Direct3D 12 offers a fixed set of memory heap types:
D3D12_HEAP_TYPE_DEFAULT
: Represents the video memory. It is available and fast to access for the GPU. It should be used for all resources that are written by the GPU (like render-target and depth-stencil textures, UAV) and resources that are frequently read by the GPU (like textures intended for sampling, vertex, index, and constant buffers).D3D12_HEAP_TYPE_UPLOAD
: Represents the system memory that is uncached and write-combined. It can be mapped and accessed by the CPU code using a pointer. It supports only buffers, not textures. It is intended for "staging buffers" that are filled by the CPU code and then used as a source of copy operations to the DEFAULT
heap. It can also be accessed directly by the GPU - shaders can read from buffers created in this memory.D3D12_HEAP_TYPE_READBACK
: Represents the system memory that is cached. It is intended for buffers used as a destination of copy operations from the DEFAULT
heap.Note that in systems with a discrete graphics card, access to system memory is fast from the CPU code (like the C++ code mapping D3D12 buffers and accessing them through a pointer), while access to the video memory is fast from the GPU code (like shaders reading and writing buffers and textures). Any copy operation or direct access between these memory heap types happens through PCI Express bus, which can be relatively slow.
Modern systems offer a feature called Resizable BAR (ReBAR) that gives the CPU direct access to the full video memory. To be available, this feature needs to be supported by the whole hardware-software environment, including:
Recent versions of DirectX 12 SDK give access to this feature in form of a new, 4th memory pool: D3D12_HEAP_TYPE_GPU_UPLOAD
. Resources created in it behave logically similar to the D3D12_HEAP_TYPE_UPLOAD
heap:
memcpy
). It shouldn't be accessed randomly or read, because it is extremely slow for uncached memory.The main difference is that resources created in the new D3D12_HEAP_TYPE_GPU_UPLOAD
are placed in the video memory, while resources created in the old D3D12_HEAP_TYPE_UPLOAD
are placed in the system memory. This implies which budgets are consumed by new resources allocated in those heaps. This also implies which operations involve transferring data through the PCI Express bus.
D3D12_HEAP_TYPE_UPLOAD
uses the system memory, writes from the CPU code through a mapped pointer are faster, while copies or direct access from the GPU are slower because they need to go through PCIe.D3D12_HEAP_TYPE_GPU_UPLOAD
uses the video memory, copies or direct access from the GPU are faster, while writes from the CPU code through a mapped pointer can be slower, because they need to go through PCIe. For maximum performance of copy operations from this heap, a graphics or compute queue should be used, not a copy queue.GPU Upload Heap can be used for performance optimization of some resources that need to be written by the CPU and read by the GPU. It can be beneficial especially for resources that need to change frequently (often called "dynamic").
D3D12MA supports GPU upload heap when recent enough version of DirectX 12 SDK is used and when the current system supports it. The support can be queried using function D3D12MA::Allocator::IsGPUUploadHeapSupported(). When it returns TRUE
, you can create resources using D3D12_HEAP_TYPE_GPU_UPLOAD
. You can also just try creating such resource. Example:
When using D3D12 API directly, there are 3 ways of creating resources:
ID3D12Device::CreateCommittedResource
. It creates the resource with its own memory heap, which is called an "implicit heap" and cannot be accessed directly.ID3D12Device::CreatePlacedResource
. A ID3D12Heap
needs to be created beforehand using ID3D12Device::CreateHeap
. Then, the resource can be created as placed inside the heap at a specific offset.ID3D12Device::CreateReservedResource
. This library doesn't support them directly.A naive solution would be to create all the resources as committed. It works, because in D3D12 there is no strict limit on the number of resources or heaps that can be created. However, there are certain advantages and disadvantages of using committed versus placed resources:
ID3D12Device::Evict
and MakeResident
functions work at the level of the entire heap, and so does ID3D12Device1::SetResidencyPriority
, so creating resources as committed allows more fine-grained control over the eviction and residency priority of individual resources.When creating resources with the help of D3D12MA using function D3D12MA::Allocator::CreateResource, you typically don't need to care about all this. The library automatically makes the choice of creating the new resource as committed or placed. However, in cases when you need the information or the control over this choice between committed and placed, the library offers facilities to do that, described below.
You can check whether an allocation was created as a committed resource by checking if its heap is null. Committed resources have an implicit heap that is not directly accessible.
You can request a new resource to be created as committed by using D3D12MA::ALLOCATION_FLAG_COMMITTED. Note that committed resources can also be created out of Custom memory pools.
You can also request all resources to be created as committed globally for the entire allocator by using D3D12MA::ALLOCATOR_FLAG_ALWAYS_COMMITTED. However, this contradicts the main purpose of using this library. It can also prevent certain other features of the library to be used. This flag should be used only for debugging purposes.
You can create a custom pool with an explicit block size by specifying non-zero D3D12MA::POOL_DESC::BlockSize. When doing this, all resources created in such pool are placed in those blocks (heaps) and never created as committed. Example:
You can request a new resource to be created as placed by using D3D12MA::ALLOCATION_FLAG_CAN_ALIAS. This is required especially if you plan to create another resource in the same region of memory, aliasing with your resource - hence the name of this flag.
Note D3D12MA::ALLOCATION_FLAG_CAN_ALIAS can be even combined with D3D12MA::ALLOCATION_FLAG_COMMITTED. In this case, the resource is not created as committed, but it is also not placed as part of a larger heap. What happens instead is that a new heap is created with the exact size required for the resource, and the resource is created in it, placed at offset 0.
Certain types of resources require certain alignment in memory. An alignment is a requirement for the address or offset to the beginning of the resource to be a multiply of some value, which is always a power of 2. For committed resources, the problem is non-existent, because committed resources have their own implicit heaps where they are created at offset 0, which meets any alignment requirement. For placed resources, D3D12MA takes care of the alignment automatically.
D3D12_DEFAULT_MSAA_RESOURCE_PLACEMENT_ALIGNMENT
= 4 MB. Default alignment required for buffers and other textures is D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT
= 64 KB.Because the alignment required for buffers is 64 KB, small buffers can waste a lot of memory in between when created as placed. When such small buffers are created as committed, some graphics drivers are able to pack them better. D3D12MA automatically takes advantage of this by preferring to create small buffers as committed. This heuristics is enabled by default. It is also a tradeoff - it can make the allocation of these buffers slower. It can be disabled for an individual resource by using D3D12MA::ALLOCATION_FLAG_STRATEGY_MIN_TIME and for the entire allocator by using D3D12MA::ALLOCATOR_FLAG_DONT_PREFER_SMALL_BUFFERS_COMMITTED.
For certain textures that meet a complex set of requirements, special "small alignment" can be applied. Details can be found in Microsoft documentation of the D3D12_RESOURCE_DESC
structure. For MSAA textures, the small alignment is D3D12_SMALL_MSAA_RESOURCE_PLACEMENT_ALIGNMENT
= 64 KB. For other textures, the small alignment is D3D12_SMALL_RESOURCE_PLACEMENT_ALIGNMENT
= 4 KB. D3D12MA uses this feature automatically. Detailed behavior can be disabled or controlled by predefining macro D3D12MA_USE_SMALL_RESOURCE_PLACEMENT_ALIGNMENT.
D3D12 also has a concept of alignment of the entire heap, passed through D3D12_HEAP_DESC::Alignment
. This library automatically sets the alignment as small as possible. Unfortunately, any heap that has a chance of hosting an MSAA texture needs to have the alignment set to 4 MB. This problem can be overcome by passing D3D12MA::ALLOCATOR_FLAG_MSAA_TEXTURES_ALWAYS_COMMITTED on the creation of the main allocator object and D3D12MA::POOL_FLAG_MSAA_TEXTURES_ALWAYS_COMMITTED on the creation of any custom heap that supports textures, not only buffers. With those flags, the alignment of the heaps created by D3D12MA can be lower, but any MSAA textures are created as committed. You should always use these flags in your code unless you really need to create some MSAA textures as placed.