mirror of
https://github.com/wolfpld/tracy.git
synced 2026-06-08 08:33:48 +00:00
Replace the synthetic APICallInfo hack with proper correlation via CUPTI_ACTIVITY_KIND_GRAPH_TRACE. When cuGraphLaunch fires an API callback, its correlationId is stored in cudaCallSiteInfo. The GRAPH_TRACE activity record carries the same correlationId plus the graphId, which lets us build a graphId→APICallInfo map. Kernel/memcpy/ memset activities then look up this map via their graphId field. Key changes: - Add cuGraphLaunch/cuGraphLaunch_ptsz to cbidDriverTrackers so the API callback machinery captures the CPU call site - Enable CUPTI_ACTIVITY_KIND_GRAPH_TRACE and handle it in DoProcessDeviceEvent to populate cudaGraphCurrentLaunch[graphId] - Add cudaGraphCurrentLaunch map to PersistentState - Two-pass buffer processing in OnBufferCompleted so GRAPH_TRACE records (which complete last on GPU) are processed before the kernel/memcpy/memset records that depend on them - Replace graphId=0 fallback in kernel/memcpy/memset with proper cudaGraphCurrentLaunch lookup; fall through to matchError if the graphId is not found - Update repro to include TracyCUDA headers and properly test GPU zone correlation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>