Commit Graph

9906 Commits

Author SHA1 Message Date
Alan Tse
9a7233ced5 Add MCP server for AI-assisted trace analysis (#1347)
* Add MCP server for AI-assisted trace analysis.

Introduce an optional Model Context Protocol (MCP) server that lets AI
assistants analyze Tracy captures and live sessions through Tracy's own
server engine. The server runs as a Python sidecar and talks to the
existing C++ analysis code through new pybind11 bindings.

- python/bindings/ServerModule.cpp: TracyServerBindings module exposing
  Worker, file I/O, zones, GPU zones, frame data, plots, messages, locks,
  source locations, and summary statistics (zone/GPU child stats, frame
  timing, etc.).
- python/CMakeLists.txt: builds and installs TracyServerBindings alongside
  TracyClientBindings.
- extra/mcp/tracy_mcp.py: FastMCP SSE singleton with dynamic port
  discovery, PID-file based singleton detection, session-isolated worker
  instances, synchronous and background eval, task polling, and a
  shutdown tool to release the .pyd lock during development.
- extra/mcp/start_mcp.sh, .gitignore: launcher with local override hook;
  ignores generated port/pid files.
- manual/tracy.md: documents building, running, and integrating the
  server with an AI assistant.

* Improve Tracy MCP cold-start guidance.

Cold-start usability testing showed an LLM agent burned ~7 exploratory
calls discovering the ctx object model, time-unit conventions, and join
keys before producing useful analysis. Surface that information up front
through MCP resources and entry-point tool guidance.

- extra/mcp/eval_guide.md: new bindings-layer reference covering the
  Worker object graph (zone / GPU zone / frame / thread / message /
  plot / lock / memory entry points), nanosecond time units, ZoneStats
  field semantics including self-time via get_child_zone_stats, the
  opaque 'name (addr)[arch] <srcloc_id>' key format, and worked
  examples translating common queries into ctx Python.
- extra/mcp/tracy_mcp.py: expose system.prompt.md and eval_guide.md as
  MCP resources (tracy://prompt and tracy://eval-guide) so external
  agents and Tracy Assist share the same guidance source. Resource
  content is re-read per request — edits propagate without a server
  restart.
- Point load_capture and live_connect return values plus the eval tool
  description at the resources, so the agent reads them before its
  first eval rather than introspecting blind.
- Expand load_capture docstring: name the path parameter explicitly,
  show Windows path syntax, and direct agents to list_captures plus
  TRACY_CAPTURES_DIR for capture discovery.
- Probe is_connected() briefly after Worker construction in
  live_connect and surface an actionable error on silent handshake
  failures (typically a Tracy client/server version mismatch or
  TRACY_ON_DEMAND) instead of returning misleading success.

Reduces a fresh agent's cold-start overhead from 7 exploratory calls
to 4, where the remaining 4 are unavoidable harness/schema-fetch
overhead, not API-design friction.

* Detect Tracy protocol mismatches via UDP broadcast pre-flight.

Tracy clients announce themselves on UDP port 8086 every ~3 seconds with
a BroadcastMessage carrying the protocol version, listen port, and
program name (public/common/TracyProtocol.hpp). The Tracy GUI reads this
and refuses to attempt a TCP connection on protocol mismatch, surfacing
a precise error. live_connect previously had no equivalent check, so a
mismatch produced an opaque 2-second handshake timeout with no
diagnostic about what was wrong.

- Add a broadcast parser handling versions 0-3, with variable-length
  programName (Tracy sends only the actual name + null terminator on
  the wire, not the full 64-byte buffer).
- Add a non-blocking UDP listener that binds 8086 with SO_REUSEADDR
  and waits up to 3.5s — enough to guarantee catching at least one
  beat at the 3s broadcast cadence.
- Read our bindings' ProtocolVersion at startup by parsing
  TracyProtocol.hpp, so the comparison stays in sync with the build
  without new C++ wiring.
- live_connect runs the broadcast pre-flight before constructing
  Worker. On a matched listen_port with a differing protocol_version,
  it returns a single-line error naming the program, both versions,
  and the remediation, without ever opening a TCP connection. If no
  matching broadcast arrives, it falls through to the existing
  handshake probe, which now reports any other broadcasts seen as a
  hint (helpful when the target uses a non-default port).

* Add MCP Server section to LaTeX manual.

The markdown manual is auto-generated from the LaTeX source; add the
corresponding \subsection{MCP Server} so the two stay in sync.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove hand-written MCP section from tracy.md.

tracy.md is generated from tracy.tex via latex2md.sh. The MCP section
was previously written by hand directly in the markdown; now that the
LaTeX source has been updated, the markdown section should be
regenerated by running latex2md.sh rather than maintained manually.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 16:17:55 +02:00
Bartosz Taudul
460352d0d5 Allow opening callstack window from LLM attachment. 2026-04-30 21:42:22 +02:00
Bartosz Taudul
89bd8b2ba5 Add public View method for opening callstack window. 2026-04-30 21:41:53 +02:00
Bartosz Taudul
cf2c07d905 Mark GetSelectThread() const. 2026-04-30 21:38:33 +02:00
Bartosz Taudul
529c688ff2 Callstack thread display is not gated by LLM. 2026-04-30 18:40:54 +02:00
Bartosz Taudul
af665ef3dc Update NEWS. 2026-04-30 18:13:11 +02:00
Bartosz Taudul
6c5e0cdf1d Draw call stack thread name in call stack window. 2026-04-30 18:08:15 +02:00
Bartosz Taudul
80f504019f Store callstack window callstack id together with origin thread id. 2026-04-30 17:06:59 +02:00
Bartosz Taudul
6ccad8a94d Generic thread id naming in crash callstack attachments. 2026-04-30 12:48:58 +02:00
Bartosz Taudul
f0df6112dc Move callstack summary in callstack window to its own line. 2026-04-30 12:46:22 +02:00
Bartosz Taudul
170b0b4edf More compact listing format. 2026-04-30 12:19:13 +02:00
Bartosz Taudul
39cd16a92b Move specialized LLM skills out of the system prompt. 2026-04-30 02:41:48 +02:00
Bartosz Taudul
eaf66e7af9 Cleanup. 2026-04-28 00:27:27 +02:00
Bartosz Taudul
ab75aeeac9 Add section headers to the in-program user manual. 2026-04-27 23:32:34 +02:00
Bartosz Taudul
4fe2949033 Instruct LLM how to provide user manual links. 2026-04-27 23:21:49 +02:00
Bartosz Taudul
ac20896f57 Add links to manual search results. 2026-04-27 23:15:52 +02:00
Bartosz Taudul
afddce2538 Implement going to manual link. 2026-04-27 23:05:43 +02:00
Bartosz Taudul
14aa5dafe5 Add manual section link tooltips. 2026-04-27 22:47:10 +02:00
Bartosz Taudul
4d62b3a573 Move view and worker checks to initial isSource check. 2026-04-27 22:46:35 +02:00
Bartosz Taudul
d4241d987d Pass view, worker to user manual markdown renderer. 2026-04-27 22:23:08 +02:00
Bartosz Taudul
04d30fb487 Add manual chunk retrieval function. 2026-04-27 22:20:58 +02:00
Bartosz Taudul
d653e984b5 Really fix first-word-continuation word wrap.
The previous solution (75c173) didn't account for the fact that the text
to print may start with a space, in which case the text width calculation
results in 0. In effect, first word length was never greater than the
space left for printing, and the problem was still there.

Fix by walking through all initial spaces in firstWord.

If fwLen > left, then a line break is needed. In this case ignore initial
spaces in text.
2026-04-26 00:40:55 +02:00
Bartosz Taudul
afe51ec5f3 Merge pull request #1348 from Casqade/fix-multiline-messages-tooltips
Show tooltips for multiline messages
2026-04-25 18:54:24 +02:00
casqade
29d21fe1f7 Show tooltips for multiline messages 2026-04-25 17:27:13 +03:00
Bartosz Taudul
43643ba0b6 Update NEWS. 2026-04-25 13:08:01 +02:00
Bartosz Taudul
efb1973210 Dim out external frames in callstack tooltips. 2026-04-25 13:04:49 +02:00
Bartosz Taudul
0bd56feb2d Only reset start time when role changes to non-assistant.
Fixes time reset on tool replies.
2026-04-25 12:49:44 +02:00
Bartosz Taudul
83a31730ce Reply timing logic is only relevant for assistant replies. 2026-04-25 12:49:10 +02:00
Bartosz Taudul
5f7a36cf44 Do not assert on early TracyDebug calls.
TracyDebug fires from SysPower's ctor while it scans intel-rapl, which
runs as a Profiler member initializer -- before s_instance is set in
the Profiler ctor body. Under TRACY_MANUAL_LIFETIME without
TRACY_ON_DEMAND, the TracyInternalMessage path guarded this with
assert(ProfilerAvailable()), which aborted tracy-monitor whenever it
was run as root (only then is intel-rapl readable, so the log is
actually reached).

Soften the assert to an early-out, matching the TRACY_ON_DEMAND branch.
A TracyDebug issued before the profiler is up now silently skips
instead of aborting.
2026-04-24 21:27:58 +02:00
Bartosz Taudul
46bccd9a92 Refresh external image cache on symbolization misses.
FindExternalImageRefresh already re-parsed /proc/<pid>/maps on miss,
but only one of three external decode paths used it. Switch the
DecodeCallstackPtrFastExternal and DecodeSymbolAddressExternal paths
over so symbol-name and file/line lookups stay fresh after the target
dlopens a library.

Rate-limit the re-parse to once per wall-clock second so samples
landing on permanently unresolvable regions (JIT, vDSO, stacks) do
not trigger a full parse each time.
2026-04-24 21:16:42 +02:00
Bartosz Taudul
7448c0fbe1 Cover all target threads in tracy-monitor.
perf_event_open(pid>0, cpu>=0) binds to a single task, so the previous
setup only sampled the target's main thread. In monitor mode, enumerate
/proc/<pid>/task/ and open one per-task event per existing thread with
cpu=-1; inherit=1 then covers every descendant. Self-profiling behavior
is preserved byte-for-byte: the iter list becomes (currentPid, i) for
each CPU, exactly what the old code did inline.
2026-04-24 21:14:33 +02:00
Bartosz Taudul
2755166543 Flush stdout before perf preflight, so it's printed before stderr. 2026-04-24 21:10:11 +02:00
Bartosz Taudul
cbfa625fb9 Harden tracy-monitor startup and shutdown paths.
- Loop startup waitpid on EINTR; kill and reap the child on fatal error
  or when interrupted, instead of leaking a ptrace-stopped process.
- Treat PTRACE_DETACH failure as fatal -- otherwise the child is stuck
  stopped forever.
- Zero-initialize procName so the memcpy into ___tracy_magic_process_name
  does not copy uninitialized stack past the NUL.
- Forward SIGINT to the child from the signal handler when in forked
  mode, so Ctrl-C during a blocking waitpid unblocks cleanly.
- Preflight perf_event_open on the target before StartupProfiler so
  permission failures surface with actionable guidance instead of
  silently producing no samples.
- Also handle SIGHUP and SIGQUIT.
2026-04-24 21:07:41 +02:00
Bartosz Taudul
23930e998b Display label with assistant model name and reply duration for each message. 2026-04-24 19:53:45 +02:00
Bartosz Taudul
9b708c433f Store model and response time for assistant messages. 2026-04-24 18:15:49 +02:00
Bartosz Taudul
2c6adfb416 Regenerate markdown manual. 2026-04-23 00:28:59 +02:00
Bartosz Taudul
bde6e06cd7 Update manual. 2026-04-23 00:24:09 +02:00
Bartosz Taudul
4022494934 Update NEWS. 2026-04-22 22:07:39 +02:00
Bartosz Taudul
68ee8704d2 Merge pull request #1335 from siliceum/feature/check-macros-mismatch
Detecting macro definitions mismatches at link time
2026-04-21 18:12:32 +02:00
Bartosz Taudul
8d9a8494b8 Merge pull request #1344 from siliceum/fix/etw-compat
Fix TracyETW_compat.h structs and values based on docs
2026-04-21 18:11:01 +02:00
Clément Grégoire
c1ab158f6c Update manual with mismatch detection info 2026-04-21 13:51:24 +02:00
Clément Grégoire
ca076b4a60 Add TracyMangle.hpp file to centralize config name mangling
Also rename MANGLED_NAME_BASED_ON_DEFINES => MANGLED_NAME_BASED_ON_CONFIG
2026-04-21 13:28:48 +02:00
Clément Grégoire
a4f245550b Macro incompatibility experiment
We redirect GetProfiler() (most likely used by any project consuming tracy, since it's used by `tracy::ScopedZone`) to its implementation which now has a different function name based on the macros that can impact ABI (and enabled/disabled).
That way, when linking with mismatched defines you'd get an error such as

> main.obj : error LNK2019: unresolved external symbol "int __cdecl GetProfiler_CFG_E0_OD0_DI0_ML0_F0_DHT0_TF0(void)" (?GetProfiler_CFG_E0_OD0_DI0_ML0_F0_DHT0_TF0@@YAHXZ) referenced in function "int __cdecl GetProfiler(void)" (?GetProfiler@@YAHXZ)

Or

>[build] /usr/bin/ld: CMakeFiles/app.dir/main.cpp.o: in function `GetProfiler()':
[build] /..../TracyProfiler.hpp:143: undefined reference to `GetProfiler_CFG_E1_OD0_DI0_ML0_F0_DHT0_TF0()'

Reason for going with acronym+0/1 instead of just acronym when enabled is for us to be able to tell users easily which define is wrong by just looking at the error if needed.

The only thing we don't really detect is user not having TRACY_ENABLE but tracy having been built with it. This is because macros become noops in that case, with no reference to `GetProfiler`.
There may be a way to do it by introducing a local variable into each TU, but I don't really like that idea.
We could also add pragma detect mismatch for a more user-friendly error on windows (https://learn.microsoft.com/en-us/cpp/preprocessor/detect-mismatch?view=msvc-170).
2026-04-21 13:28:48 +02:00
Clément Grégoire
1711d024dd Fix TracyETW_compat.h structs and values based on docs
https://learn.microsoft.com/fr-fr/windows/win32/api/evntprov/ns-evntprov-event_filter_event_id
https://learn.microsoft.com/fr-fr/windows/win32/etw/system-providers

Note: WinSDK does not use ULL in keyword constants so I removed them too.
2026-04-21 13:20:48 +02:00
Bartosz Taudul
217bdcf5a9 Merge pull request #1343 from siliceum/fix/better-tracefs-detection
Better tracefs detection
2026-04-21 11:55:10 +02:00
Clément Grégoire
4aac9e677d Fix formating/whitespaces in SysTraceStart 2026-04-21 09:51:43 +02:00
Clément Grégoire
aba343e429 Pick first debugfs entry only 2026-04-21 09:51:42 +02:00
Bartosz Taudul
25f09bee2c Merge pull request #1342 from siliceum/fix/1337-respect-max-sample-rate
Fix #1337 : On Linux respect max sample rate
2026-04-20 19:33:14 +02:00
Clément Grégoire
cb9ef7814e Use ReadFile and atoi 2026-04-20 18:05:31 +02:00
Clément Grégoire
8f208d732a Fix extra space 2026-04-20 18:04:41 +02:00