Document known MEMORY2 limitation for graph-launched alloc nodes

CUpti_ActivityMemory3 has no graphId field, so matchGraphActivityToAPICall
cannot be applied. Graph-launched cudaGraphAddMemAllocNode emits multiple
MEMORY2 records sharing the launch correlationId; only the first is
tracked, subsequent ones fire a spurious matchError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Basil Milanich
2026-04-07 15:30:27 -05:00
parent c8ebc6f21e
commit 703df05529

View File

@@ -1110,6 +1110,11 @@ namespace tracy
{
ZoneNamedN(kernel, "tracy::CUDACtx::DoProcessDeviceEvent[malloc/free]", instrument);
CUpti_ActivityMemory3* memory3 = (CUpti_ActivityMemory3*)record;
// NOTE: CUpti_ActivityMemory3 has no graphId field, so matchGraphActivityToAPICall
// cannot be used here. Graph-launched memory alloc nodes (cudaGraphAddMemAllocNode)
// share the launch's correlationId and CUPTI emits multiple MEMORY2 records per node.
// The first record consumes the cudaCallSiteInfo entry; subsequent ones will fire a
// spurious matchError and skip memory tracking. This is a known limitation.
APICallInfo apiCall;
if (!matchActivityToAPICall(memory3->correlationId, apiCall)) {
return matchError(memory3->correlationId, "MEMORY");