Merge pull request #1051 from FRaNk090/feature/cuda

Add CUDA Backend Support to Tracy profiler
2026-06-08 08:33:48 +00:00 · 2025-05-23 23:29:55 +02:00
parent 71fc3bc747 0eb3a82673
commit 8b3a421153
6 changed files with 1345 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@

 ### A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.

-Tracy supports profiling CPU (Direct support is provided for C, C++, Lua, Python and Fortran integration. At the same time, third-party bindings to many other languages exist on the internet, such as [Rust](https://github.com/nagisa/rust_tracy_client), [Zig](https://github.com/tealsnow/zig-tracy), [C#](https://github.com/clibequilibrium/Tracy-CSharp), [OCaml](https://github.com/imandra-ai/ocaml-tracy), [Odin](https://github.com/oskarnp/odin-tracy), etc.), GPU (All major graphic APIs: OpenGL, Vulkan, Direct3D 11/12, Metal, OpenCL.), memory allocations, locks, context switches, automatically attribute screenshots to captured frames, and much more.
+Tracy supports profiling CPU (Direct support is provided for C, C++, Lua, Python and Fortran integration. At the same time, third-party bindings to many other languages exist on the internet, such as [Rust](https://github.com/nagisa/rust_tracy_client), [Zig](https://github.com/tealsnow/zig-tracy), [C#](https://github.com/clibequilibrium/Tracy-CSharp), [OCaml](https://github.com/imandra-ai/ocaml-tracy), [Odin](https://github.com/oskarnp/odin-tracy), etc.), GPU (All major graphic APIs: OpenGL, Vulkan, Direct3D 11/12, Metal, OpenCL, CUDA.), memory allocations, locks, context switches, automatically attribute screenshots to captured frames, and much more.

 - [Documentation](https://github.com/wolfpld/tracy/releases/latest/download/tracy.pdf) for usage and build process instructions
 - [Releases](https://github.com/wolfpld/tracy/releases) containing the documentation (`tracy.pdf`) and compiled Windows x64 binaries (`Tracy-<version>.7z`) as assets
--- a/manual/tracy.tex
+++ b/manual/tracy.tex
@@ -1692,6 +1692,20 @@ OpenCL zones can be created with the \texttt{TracyCLZone(ctx, name)} where \text

 Similar to Vulkan and OpenGL, you also need to periodically collect the OpenCL events using the \texttt{TracyCLCollect(ctx)} macro. An excellent place to perform this operation is after a \texttt{clFinish} since this will ensure that any previously queued OpenCL commands will have finished by this point.

+\subsubsection{CUDA}
+
+CUDA support is enabled by including the \texttt{public/tracy/TracyCUDA.hpp} header file. To use it, the NVIDIA CUPTI library is required. This library comes with the NVIDIA CUDA Toolkit and is located at \texttt{CUDA\_INSTALLATION\_PATH/extras/CUPTI}.
+
+Tracing CUDA requires the creation of a Tracy CUDA context using the macro \texttt{TracyCUDAContext()}, which returns an instance of a \texttt{tracy::CUDACtx} object. TracyCUDA allows only a single \texttt{tracy::CUDACtx} object at any given time. Subsequent calls to \texttt{TracyCUDAContext()} will return the same reference-counted object. There is no need for clients to instantiate multiple \texttt{tracy::CUDACtx} objects, as a single context is capable of instrumenting all CUDA contexts and streams.
+
+Cleanup is handled using the \texttt{TracyCUDAContextDestroy(ctx)} macro. To assign a custom name to the context, use the \texttt{TracyCUDAContextName(ctx, name, size)} macro.
+
+To begin instrumentation of all CUDA API calls, use the \texttt{TracyCUDAStartProfiling(ctx)} macro. This initiates the profiling of CUDA events, including relevant GPU activity such as kernel execution, memory transfers, and synchronization. This instrumentation is automatic and requires no code annotation\footnote{CUDA does not provide an API to retrieve timestamps associated with events. Therefore, the typical GPU instrumentation design of Tracy cannot be applied.}.
+
+Unlike other GPU backends in Tracy, there is no need to call \texttt{TracyCUDACollect(ctx)} periodically, since a background collector thread is enabled by default. This behavior can be disabled by defining \texttt{TRACY\_CUDA\_ENABLE\_COLLECTOR\_THREAD} as \texttt{0} prior to including \texttt{TracyCUDA.hpp}.
+
+To stop profiling, call the \texttt{TracyCUDAStopProfiling(ctx)} macro.
+
 \subsubsection{Multiple zones in one scope}

 Putting more than one GPU zone macro in a single scope features the same issue as with the \texttt{ZoneScoped} macros, described in section~\ref{multizone} (but this time the variable name is \texttt{\_\_\_tracy\_gpu\_zone}).
--- a/profiler/src/profiler/TracyView.hpp
+++ b/profiler/src/profiler/TracyView.hpp
@@ -40,7 +40,8 @@ constexpr const char* GpuContextNames[] = {
    "Direct3D 12",
    "Direct3D 11",
    "Metal",
-    "Custom"
+    "Custom",
+    "CUDA"
 };

 struct MemoryPage;
--- a/public/common/TracyProtocol.hpp
+++ b/public/common/TracyProtocol.hpp
@@ -9,7 +9,7 @@ namespace tracy

 constexpr unsigned Lz4CompressBound( unsigned isize ) { return isize + ( isize / 255 ) + 16; }

-enum : uint32_t { ProtocolVersion = 73 };
+enum : uint32_t { ProtocolVersion = 74 };
 enum : uint16_t { BroadcastVersion = 3 };

 using lz4sz_t = uint32_t;
--- a/public/common/TracyQueue.hpp
+++ b/public/common/TracyQueue.hpp
@@ -405,7 +405,8 @@ enum class GpuContextType : uint8_t
    Direct3D12,
    Direct3D11,
    Metal,
-    Custom
+    Custom,
+    CUDA
 };

 enum GpuContextFlags : uint8_t
--- a/public/tracy/TracyCUDA.hpp
+++ b/public/tracy/TracyCUDA.hpp