AMD GPU Services (AGS)
Classes | Typedefs | Functions
Breadcrumb API

API for writing top-of-pipe and bottom-of-pipe markers to help track down GPU hangs. More...

Classes

struct  AGSBreadcrumbMarker
 The breadcrumb marker struct used by agsDriverExtensionsDX11_WriteBreadcrumb. More...
 

Typedefs

typedef struct AGSBreadcrumbMarker AGSBreadcrumbMarker
 The breadcrumb marker struct used by agsDriverExtensionsDX11_WriteBreadcrumb.
 

Functions

AMD_AGS_API AGSReturnCode agsDriverExtensionsDX11_WriteBreadcrumb (AGSContext *context, const AGSBreadcrumbMarker *marker)
 Function to write a breadcrumb marker. More...
 

Detailed Description

API for writing top-of-pipe and bottom-of-pipe markers to help track down GPU hangs.

The API is available if the AGSDX11ReturnedParams::ExtensionsSupported::breadcrumbMarkers is present.

To use the API, a non zero value needs to be specificed in AGSDX11ExtensionParams::numBreadcrumbMarkers. This enables the API (if available) and allocates a system memory buffer which is returned to the user in AGSDX11ReturnedParams::breadcrumbBuffer.

The user can now write markers before and after draw calls using agsDriverExtensionsDX11_WriteBreadcrumb.

Background

A top-of-pipe (TOP) command is scheduled for execution as soon as the command processor (CP) reaches the command. A bottom-of-pipe (BOP) command is scheduled for execution once the previous rendering commands (draw and dispatch) finish execution. TOP and BOP commands do not block CP. i.e. the CP schedules the command for execution then proceeds to the next command without waiting. To effectively use TOP and BOP commands, it is important to understand how they interact with rendering commands:

When the CP encounters a rendering command it queues it for execution and moves to the next command. The queued rendering commands are issued in order. There can be multiple rendering commands running in parallel. When a rendering command is issued we say it is at the top of the pipe. When a rendering command finishes execution we say it has reached the bottom of the pipe.

A BOP command remains in a waiting queue and is executed once prior rendering commands finish. The queue of BOP commands is limited to 64 entries in GCN generation 1, 2, 3, 4 and 5. If the 64 limit is reached the CP will stop queueing BOP commands and also rendering commands. Developers should limit the number of BOP commands that write markers to avoid contention. In general, developers should limit both TOP and BOP commands to avoid stalling the CP.

Example 1:

// Start of a command buffer
WriteMarker(TopOfPipe, 1)
WriteMarker(BottomOfPipe, 2)
WriteMarker(BottomOfPipe, 3)
DrawX
WriteMarker(BottomOfPipe, 4)
WriteMarker(BottomOfPipe, 5)
WriteMarker(TopOfPipe, 6)
// End of command buffer

In the above example, the CP writes markers 1, 2 and 3 without waiting: Marker 1 is TOP so it's independent from other commands There's no wait for marker 2 and 3 because there are no draws preceding the BOP commands Marker 4 is only written once DrawX finishes execution Marker 5 doesn't wait for additional draws so it is written right after marker 4 Marker 6 can be written as soon as the CP reaches the command. For instance, it is very possible that CP writes marker 6 while DrawX is running and therefore marker 6 gets written before markers 4 and 5

Example 2:

WriteMarker(TopOfPipe, 1)
DrawX
WriteMarker(BottomOfPipe, 2)
WriteMarker(TopOfPipe, 3)
DrawY
WriteMarker(BottomOfPipe, 4)

In this example marker 1 is written before the start of DrawX Marker 2 is written once DrawX finishes execution Similarly marker 3 is written before the start of DrawY Marker 4 is written once DrawY finishes execution In case of a GPU hang, if markers 1 and 3 are written but markers 2 and 4 are missing we can conclude that: The CP has reached both DrawX and DrawY commands since marker 1 and 3 are present The fact that marker 2 and 4 are missing means that either DrawX is hanging while DrawY is at the top of the pipe or both DrawX and DrawY started and both are simultaneously hanging

Example 3:

// Start of a command buffer
WriteMarker(BottomOfPipe, 1)
DrawX
WriteMarker(BottomOfPipe, 2)
DrawY
WriteMarker(BottomOfPipe, 3)
DrawZ
WriteMarker(BottomOfPipe, 4)
// End of command buffer

In this example marker 1 is written before the start of DrawX Marker 2 is written once DrawX finishes Marker 3 is written once DrawY finishes Marker 4 is written once DrawZ finishes If the GPU hangs and only marker 1 is written we can conclude that the hang is happening in either DrawX, DrawY or DrawZ If the GPU hangs and only marker 1 and 2 are written we can conclude that the hang is happening in DrawY or DrawZ If the GPU hangs and only marker 4 is missing we can conclude that the hang is happening in DrawZ

Example 4:

Start of a command buffer
WriteMarker(TopOfPipe, 1)
DrawX
WriteMarker(TopOfPipe, 2)
DrawY
WriteMarker(TopOfPipe, 3)
DrawZ
// End of command buffer

In this example, in case the GPU hangs and only marker 1 is written we can conclude that the hang is happening in DrawX In case the GPU hangs and only marker 1 and 2 are written we can conclude that the hang is happening in DrawX or DrawY In case the GPU hangs and all 3 markers are written we can conclude that the hang is happening in any of DrawX, DrawY or DrawZ

Example 5:

DrawX
WriteMarker(TopOfPipe, 1)
WriteMarker(BottomOfPipe, 2)
DrawY
WriteMarker(TopOfPipe, 3)
WriteMarker(BottomOfPipe, 4)

Marker 1 is written right after DrawX is queued for execution. Marker 2 is only written once DrawX finishes execution. Marker 3 is written right after DrawY is queued for execution. Marker 4 is only written once DrawY finishes execution If marker 1 is written we would know that the CP has reached the command DrawX (DrawX at the top of the pipe). If marker 2 is written we can say that DrawX has finished execution (DrawX at the bottom of the pipe). In case the GPU hangs and only marker 1 and 3 are written we can conclude that the hang is happening in DrawX or DrawY In case the GPU hangs and only marker 1 is written we can conclude that the hang is happening in DrawX In case the GPU hangs and only marker 4 is missing we can conclude that the hang is happening in DrawY

Retrieving GPU Data

In the event of a GPU hang, the user can inspect the system memory buffer to determine which draw has caused the hang. For example:

// Force the work to be flushed to prevent CPU ahead of GPU
g_pImmediateContext->Flush();
// Present the information rendered to the back buffer to the front buffer (the screen)
HRESULT hr = g_pSwapChain->Present( 0, 0 );
// Read the marker data buffer once detect device lost
if ( hr != S_OK )
{
for (UINT i = 0; i < g_NumMarkerWritten; i++)
{
UINT64* pTempData;
pTempData = static_cast<UINT64*>(pMarkerBuffer);
// Write the marker data to file
ofs << i << "\r\n";
ofs << std::hex << *(pTempData + i * 2) << "\r\n";
ofs << std::hex << *(pTempData + (i * 2 + 1)) << "\r\n";
WCHAR s1[256];
setlocale(LC_NUMERIC, "en_US.iso88591");
// Output the marker data to console
swprintf(s1, 256, L" The Draw count is %d; The Top maker is % 016llX and the Bottom marker is % 016llX \r\n", i, *(pTempData + i * 2), *(pTempData + (i * 2 + 1)));
OutputDebugStringW(s1);
}
}

The console output would resemble something like:

D3D11: Removing Device.
D3D11 ERROR: ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #378: DEVICE_REMOVAL_PROCESS_AT_FAULT]
The Draw count is 0; The Top maker is 00000000DEADCAFE and the Bottom marker is 00000000DEADBEEF
The Draw count is 1; The Top maker is 00000000DEADCAFE and the Bottom marker is 00000000DEADBEEF
The Draw count is 2; The Top maker is 00000000DEADCAFE and the Bottom marker is 00000000DEADBEEF
The Draw count is 3; The Top maker is 00000000DEADCAFE and the Bottom marker is 00000000DEADBEEF
The Draw count is 4; The Top maker is 00000000DEADCAFE and the Bottom marker is 00000000DEADBEEF
The Draw count is 5; The Top maker is CDCDCDCDCDCDCDCD and the Bottom marker is CDCDCDCDCDCDCDCD
The Draw count is 6; The Top maker is CDCDCDCDCDCDCDCD and the Bottom marker is CDCDCDCDCDCDCDCD
The Draw count is 7; The Top maker is CDCDCDCDCDCDCDCD and the Bottom marker is CDCDCDCDCDCDCDCD

Function Documentation

◆ agsDriverExtensionsDX11_WriteBreadcrumb()

AMD_AGS_API AGSReturnCode agsDriverExtensionsDX11_WriteBreadcrumb ( AGSContext *  context,
const AGSBreadcrumbMarker marker 
)

Function to write a breadcrumb marker.

This method inserts a write marker operation in the GPU command stream. In the case where the GPU is hanging the write command will never be reached and the marker will never get written to memory.

In order to use this function, AGSDX11ExtensionParams::numBreadcrumbMarkers must be set to a non zero value.

Parameters
[in]contextPointer to a context.
[in]markerPointer to a marker.