mirror of
https://github.com/wolfpld/tracy.git
synced 2026-06-08 08:33:48 +00:00
Merge pull request #1051 from FRaNk090/feature/cuda
Add CUDA Backend Support to Tracy profiler
This commit is contained in:
@@ -4,7 +4,7 @@
|
||||
|
||||
### A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.
|
||||
|
||||
Tracy supports profiling CPU (Direct support is provided for C, C++, Lua, Python and Fortran integration. At the same time, third-party bindings to many other languages exist on the internet, such as [Rust](https://github.com/nagisa/rust_tracy_client), [Zig](https://github.com/tealsnow/zig-tracy), [C#](https://github.com/clibequilibrium/Tracy-CSharp), [OCaml](https://github.com/imandra-ai/ocaml-tracy), [Odin](https://github.com/oskarnp/odin-tracy), etc.), GPU (All major graphic APIs: OpenGL, Vulkan, Direct3D 11/12, Metal, OpenCL.), memory allocations, locks, context switches, automatically attribute screenshots to captured frames, and much more.
|
||||
Tracy supports profiling CPU (Direct support is provided for C, C++, Lua, Python and Fortran integration. At the same time, third-party bindings to many other languages exist on the internet, such as [Rust](https://github.com/nagisa/rust_tracy_client), [Zig](https://github.com/tealsnow/zig-tracy), [C#](https://github.com/clibequilibrium/Tracy-CSharp), [OCaml](https://github.com/imandra-ai/ocaml-tracy), [Odin](https://github.com/oskarnp/odin-tracy), etc.), GPU (All major graphic APIs: OpenGL, Vulkan, Direct3D 11/12, Metal, OpenCL, CUDA.), memory allocations, locks, context switches, automatically attribute screenshots to captured frames, and much more.
|
||||
|
||||
- [Documentation](https://github.com/wolfpld/tracy/releases/latest/download/tracy.pdf) for usage and build process instructions
|
||||
- [Releases](https://github.com/wolfpld/tracy/releases) containing the documentation (`tracy.pdf`) and compiled Windows x64 binaries (`Tracy-<version>.7z`) as assets
|
||||
|
||||
@@ -1692,6 +1692,20 @@ OpenCL zones can be created with the \texttt{TracyCLZone(ctx, name)} where \text
|
||||
|
||||
Similar to Vulkan and OpenGL, you also need to periodically collect the OpenCL events using the \texttt{TracyCLCollect(ctx)} macro. An excellent place to perform this operation is after a \texttt{clFinish} since this will ensure that any previously queued OpenCL commands will have finished by this point.
|
||||
|
||||
\subsubsection{CUDA}
|
||||
|
||||
CUDA support is enabled by including the \texttt{public/tracy/TracyCUDA.hpp} header file. To use it, the NVIDIA CUPTI library is required. This library comes with the NVIDIA CUDA Toolkit and is located at \texttt{CUDA\_INSTALLATION\_PATH/extras/CUPTI}.
|
||||
|
||||
Tracing CUDA requires the creation of a Tracy CUDA context using the macro \texttt{TracyCUDAContext()}, which returns an instance of a \texttt{tracy::CUDACtx} object. TracyCUDA allows only a single \texttt{tracy::CUDACtx} object at any given time. Subsequent calls to \texttt{TracyCUDAContext()} will return the same reference-counted object. There is no need for clients to instantiate multiple \texttt{tracy::CUDACtx} objects, as a single context is capable of instrumenting all CUDA contexts and streams.
|
||||
|
||||
Cleanup is handled using the \texttt{TracyCUDAContextDestroy(ctx)} macro. To assign a custom name to the context, use the \texttt{TracyCUDAContextName(ctx, name, size)} macro.
|
||||
|
||||
To begin instrumentation of all CUDA API calls, use the \texttt{TracyCUDAStartProfiling(ctx)} macro. This initiates the profiling of CUDA events, including relevant GPU activity such as kernel execution, memory transfers, and synchronization. This instrumentation is automatic and requires no code annotation\footnote{CUDA does not provide an API to retrieve timestamps associated with events. Therefore, the typical GPU instrumentation design of Tracy cannot be applied.}.
|
||||
|
||||
Unlike other GPU backends in Tracy, there is no need to call \texttt{TracyCUDACollect(ctx)} periodically, since a background collector thread is enabled by default. This behavior can be disabled by defining \texttt{TRACY\_CUDA\_ENABLE\_COLLECTOR\_THREAD} as \texttt{0} prior to including \texttt{TracyCUDA.hpp}.
|
||||
|
||||
To stop profiling, call the \texttt{TracyCUDAStopProfiling(ctx)} macro.
|
||||
|
||||
\subsubsection{Multiple zones in one scope}
|
||||
|
||||
Putting more than one GPU zone macro in a single scope features the same issue as with the \texttt{ZoneScoped} macros, described in section~\ref{multizone} (but this time the variable name is \texttt{\_\_\_tracy\_gpu\_zone}).
|
||||
|
||||
@@ -40,7 +40,8 @@ constexpr const char* GpuContextNames[] = {
|
||||
"Direct3D 12",
|
||||
"Direct3D 11",
|
||||
"Metal",
|
||||
"Custom"
|
||||
"Custom",
|
||||
"CUDA"
|
||||
};
|
||||
|
||||
struct MemoryPage;
|
||||
|
||||
@@ -9,7 +9,7 @@ namespace tracy
|
||||
|
||||
constexpr unsigned Lz4CompressBound( unsigned isize ) { return isize + ( isize / 255 ) + 16; }
|
||||
|
||||
enum : uint32_t { ProtocolVersion = 73 };
|
||||
enum : uint32_t { ProtocolVersion = 74 };
|
||||
enum : uint16_t { BroadcastVersion = 3 };
|
||||
|
||||
using lz4sz_t = uint32_t;
|
||||
|
||||
@@ -405,7 +405,8 @@ enum class GpuContextType : uint8_t
|
||||
Direct3D12,
|
||||
Direct3D11,
|
||||
Metal,
|
||||
Custom
|
||||
Custom,
|
||||
CUDA
|
||||
};
|
||||
|
||||
enum GpuContextFlags : uint8_t
|
||||
|
||||
1325
public/tracy/TracyCUDA.hpp
Normal file
1325
public/tracy/TracyCUDA.hpp
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user