mirror of
https://github.com/wolfpld/tracy.git
synced 2026-06-08 08:33:48 +00:00
Tracy CUDA Graph GPU Zone Repro
Demonstrates that unpatched Tracy fails to show GPU zones for kernels
launched via CUDA Graphs (cudaGraphLaunch).
Root cause
When kernels are launched through CUDA Graphs, CUPTI delivers
CONCURRENT_KERNEL and MEMCPY activity records but no corresponding
API callback fires for the individual kernel launches. Tracy's
matchActivityToAPICall() always fails, and matchError() silently
drops every GPU zone.
Build and run
cmake -S . -B ./build
cmake --build ./build --parallel --config Release
ctest --test-dir ./build -C Release -R repro
What to expect
| Tracy version | GPU zones shown |
|---|---|
| Unpatched | 0 |
| Patched (cuda-graph-gpu-zones.patch) | ~30 (10 launches x 3 ops) |
The graph structure
Each graph launch contains:
vector_addkernel (c = a + b)- Device-to-device memcpy
vector_addkernel (c = a + c)
The graph is launched 10 times, so 30 GPU operations total.