Files
tracy/examples
Basil Milanich 7bca9dcd90 Fix CUDA Graph GPU zones with proper cuGraphLaunch correlation
Replace the synthetic APICallInfo hack with proper correlation via
CUPTI_ACTIVITY_KIND_GRAPH_TRACE. When cuGraphLaunch fires an API
callback, its correlationId is stored in cudaCallSiteInfo. The
GRAPH_TRACE activity record carries the same correlationId plus the
graphId, which lets us build a graphId→APICallInfo map. Kernel/memcpy/
memset activities then look up this map via their graphId field.

Key changes:
- Add cuGraphLaunch/cuGraphLaunch_ptsz to cbidDriverTrackers so the
  API callback machinery captures the CPU call site
- Enable CUPTI_ACTIVITY_KIND_GRAPH_TRACE and handle it in
  DoProcessDeviceEvent to populate cudaGraphCurrentLaunch[graphId]
- Add cudaGraphCurrentLaunch map to PersistentState
- Two-pass buffer processing in OnBufferCompleted so GRAPH_TRACE
  records (which complete last on GPU) are processed before the
  kernel/memcpy/memset records that depend on them
- Replace graphId=0 fallback in kernel/memcpy/memset with proper
  cudaGraphCurrentLaunch lookup; fall through to matchError if
  the graphId is not found
- Update repro to include TracyCUDA headers and properly test
  GPU zone correlation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 11:03:58 -05:00
..