Add MCP server for AI-assisted trace analysis (#1347)

* Add MCP server for AI-assisted trace analysis. Introduce an optional Model Context Protocol (MCP) server that lets AI assistants analyze Tracy captures and live sessions through Tracy's own server engine. The server runs as a Python sidecar and talks to the existing C++ analysis code through new pybind11 bindings. - python/bindings/ServerModule.cpp: TracyServerBindings module exposing Worker, file I/O, zones, GPU zones, frame data, plots, messages, locks, source locations, and summary statistics (zone/GPU child stats, frame timing, etc.). - python/CMakeLists.txt: builds and installs TracyServerBindings alongside TracyClientBindings. - extra/mcp/tracy_mcp.py: FastMCP SSE singleton with dynamic port discovery, PID-file based singleton detection, session-isolated worker instances, synchronous and background eval, task polling, and a shutdown tool to release the .pyd lock during development. - extra/mcp/start_mcp.sh, .gitignore: launcher with local override hook; ignores generated port/pid files. - manual/tracy.md: documents building, running, and integrating the server with an AI assistant. * Improve Tracy MCP cold-start guidance. Cold-start usability testing showed an LLM agent burned ~7 exploratory calls discovering the ctx object model, time-unit conventions, and join keys before producing useful analysis. Surface that information up front through MCP resources and entry-point tool guidance. - extra/mcp/eval_guide.md: new bindings-layer reference covering the Worker object graph (zone / GPU zone / frame / thread / message / plot / lock / memory entry points), nanosecond time units, ZoneStats field semantics including self-time via get_child_zone_stats, the opaque 'name (addr)[arch] <srcloc_id>' key format, and worked examples translating common queries into ctx Python. - extra/mcp/tracy_mcp.py: expose system.prompt.md and eval_guide.md as MCP resources (tracy://prompt and tracy://eval-guide) so external agents and Tracy Assist share the same guidance source. Resource content is re-read per request — edits propagate without a server restart. - Point load_capture and live_connect return values plus the eval tool description at the resources, so the agent reads them before its first eval rather than introspecting blind. - Expand load_capture docstring: name the path parameter explicitly, show Windows path syntax, and direct agents to list_captures plus TRACY_CAPTURES_DIR for capture discovery. - Probe is_connected() briefly after Worker construction in live_connect and surface an actionable error on silent handshake failures (typically a Tracy client/server version mismatch or TRACY_ON_DEMAND) instead of returning misleading success. Reduces a fresh agent's cold-start overhead from 7 exploratory calls to 4, where the remaining 4 are unavoidable harness/schema-fetch overhead, not API-design friction. * Detect Tracy protocol mismatches via UDP broadcast pre-flight. Tracy clients announce themselves on UDP port 8086 every ~3 seconds with a BroadcastMessage carrying the protocol version, listen port, and program name (public/common/TracyProtocol.hpp). The Tracy GUI reads this and refuses to attempt a TCP connection on protocol mismatch, surfacing a precise error. live_connect previously had no equivalent check, so a mismatch produced an opaque 2-second handshake timeout with no diagnostic about what was wrong. - Add a broadcast parser handling versions 0-3, with variable-length programName (Tracy sends only the actual name + null terminator on the wire, not the full 64-byte buffer). - Add a non-blocking UDP listener that binds 8086 with SO_REUSEADDR and waits up to 3.5s — enough to guarantee catching at least one beat at the 3s broadcast cadence. - Read our bindings' ProtocolVersion at startup by parsing TracyProtocol.hpp, so the comparison stays in sync with the build without new C++ wiring. - live_connect runs the broadcast pre-flight before constructing Worker. On a matched listen_port with a differing protocol_version, it returns a single-line error naming the program, both versions, and the remediation, without ever opening a TCP connection. If no matching broadcast arrives, it falls through to the existing handshake probe, which now reports any other broadcasts seen as a hint (helpful when the target uses a non-default port). * Add MCP Server section to LaTeX manual. The markdown manual is auto-generated from the LaTeX source; add the corresponding \subsection{MCP Server} so the two stay in sync. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove hand-written MCP section from tracy.md. tracy.md is generated from tracy.tex via latex2md.sh. The MCP section was previously written by hand directly in the markdown; now that the LaTeX source has been updated, the markdown section should be regenerated by running latex2md.sh rather than maintained manually. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 00:23:47 +00:00 · 2026-05-01 07:17:55 -07:00
parent 460352d0d5
commit 9a7233ced5
8 changed files with 1907 additions and 2 deletions
--- a/extra/mcp/.gitignore
+++ b/extra/mcp/.gitignore
@@ -0,0 +1,3 @@
+tracy_mcp.port
+tracy_mcp.pid
+*.local.sh
--- a/extra/mcp/eval_guide.md
+++ b/extra/mcp/eval_guide.md
@@ -0,0 +1,72 @@
+# Tracy MCP eval guide
+
+This document covers the bindings-layer detail that the curated catalog
+(`tracy://catalog`) and analysis guidance (`tracy://prompt`) do not.
+
+## ctx
+
+`ctx` is a `TracyServerBindings.Worker` — the same object Tracy Assist's
+C++ tools query through `Worker::Get*`. The pybind methods are the canonical
+data surface. Common entry points:
+
+- Zones: `get_all_zone_stats()` (every callsite, large), `get_root_zone_stats()`
+  (top-level zones only, useful for "where is the program spending time"),
+  `get_zone_stats(srcloc_id)`, `get_child_zone_stats(srcloc_id)` (subtract for
+  self-time), `get_zone_durations(name)`, `get_zone_count()`,
+  `get_all_zone_source_locations()`
+- GPU zones: `get_all_gpu_zone_stats()`, `get_gpu_zone_durations(...)`,
+  `get_gpu_contexts()`
+- Frames: `get_frame_count()`, `get_frame_times()`, `get_frame_times_named(name)`,
+  `get_frame_boundaries()`, `get_zones_in_frame(...)`
+- Threads: `get_threads()`, `get_thread_name(tid)`, `get_thread_context_switches(tid)`
+- Messages / plots / locks / memory / callstacks: `get_messages()`, `get_plots()`,
+  `get_locks()`, `get_memory_events()`, `get_callstack_frames(...)`
+- Capture metadata: `get_capture_name()`, `get_capture_program()`,
+  `get_first_time()`, `get_last_time()`, `get_resolution()`, `get_host_info()`
+
+Run `print([m for m in dir(ctx) if not m.startswith('_')])` for the full list.
+
+## Units and conventions
+
+- All time values returned by Worker methods are **nanoseconds** (int).
+  `get_first_time()` / `get_last_time()` bound the capture timeline.
+- `ZoneStats` fields: `count`, `total`, `min`, `max`, `avg`, `sum_sq`. `total`
+  is the inclusive aggregate; use `get_child_zone_stats(srcloc_id)` to subtract
+  child time when you need self-time.
+- `get_all_zone_stats()` returns `dict[str, ZoneStats]` keyed by an opaque label
+  of the form `'name (addr)[arch] <srcloc_id>'`. The trailing `<id>` is the
+  source-location ID — the int accepted by `get_zone_stats(int)`,
+  `get_zone_durations_by_id`, and friends. Parse it with a regex if you need
+  to join across calls.
+- Source-location IDs from `get_all_zone_source_locations()` are the join key
+  between zone-name lookups and per-callsite queries.
+
+## Translating catalog entries to ctx Python
+
+The catalog (`tracy://catalog`) lists curated queries. Each maps to a small
+Python snippet:
+
+```python
+# zone_list — top 10 hottest zones by total time
+top = sorted(ctx.get_all_zone_stats().items(),
+             key=lambda kv: kv[1].total, reverse=True)[:10]
+for k, v in top:
+    print(f"{v.total/1e6:.2f}ms  count={v.count}  {k}")
+
+# frame_list — primary frame set timing
+times = ctx.get_frame_times()  # ns per frame
+print(f"frames={len(times)}  avg={sum(times)/len(times)/1e6:.2f}ms  "
+      f"p99={sorted(times)[int(len(times)*0.99)]/1e6:.2f}ms")
+
+# zone_stats for a named zone — find the srcloc id, then drill in
+import re
+matches = [k for k in ctx.get_all_zone_stats() if k.startswith("MyFunc ")]
+sid = int(re.search(r"<(\d+)>$", matches[0]).group(1))
+stats = ctx.get_zone_stats(sid)
+```
+
+## Async mode
+
+For long-running queries pass `async_mode=True` to `eval`; it returns
+`{task_id, status: "running"}`. Poll with the `task` tool
+(`action="poll", task_id=...`).
--- a/extra/mcp/start_mcp.sh
+++ b/extra/mcp/start_mcp.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+# Start the Tracy MCP server.
+#
+# Set PYTHONPATH to the directory containing TracyServerBindings.so/.pyd.
+# Adjust the Release/Debug suffix to match your CMake build configuration.
+PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}$(dirname "$0")/../../build/python/Release"
+export PYTHONPATH
+
+# Machine-local overrides (not committed). Create start_mcp.local.sh next to
+# this file to set TRACY_CAPTURES_DIR, TRACY_MCP_PORT, or any other env var:
+#   export TRACY_CAPTURES_DIR=/path/to/captures
+#   export TRACY_MCP_PORT=47380
+if [ -f "$(dirname "$0")/start_mcp.local.sh" ]; then
+    . "$(dirname "$0")/start_mcp.local.sh"
+fi
+
+exec python3 "$(dirname "$0")/tracy_mcp.py" "$@"
--- a/extra/mcp/tracy_mcp.py
+++ b/extra/mcp/tracy_mcp.py
@@ -0,0 +1,596 @@
+# -*- coding: utf-8 -*-
+from __future__ import annotations
+
+import asyncio
+import atexit
+import builtins
+import concurrent.futures
+import glob
+import io
+import os
+import logging
+import re
+import socket
+import struct
+import sys
+import time
+import uuid
+from contextlib import redirect_stdout
+
+import mcp.server.fastmcp as fastmcp
+
+# Suppress noisy ASGI shutdown errors known to occur with SSE and Control-C.
+# These occur when Starlette attempts to send a 500 error after the loop is cancelled
+# but after the SSE 200 OK headers have already been sent. Global level suppression
+# is used because surgical filtering of ASGI exceptions is unreliable in this stack.
+logging.getLogger("uvicorn.error").setLevel(logging.CRITICAL)
+logging.getLogger("starlette").setLevel(logging.CRITICAL)
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_PORT_FILE = os.path.join(_HERE, "tracy_mcp.port")
+_PID_FILE  = os.path.join(_HERE, "tracy_mcp.pid")
+_PREFERRED_PORT = int(os.environ.get("TRACY_MCP_PORT", "47380"))
+
+# Shared documentation surfaces. system.prompt.md is Tracy Assist's source
+# system prompt; exposing it as an MCP resource keeps analysis guidance in
+# sync across both surfaces with no plumbing. eval_guide.md covers
+# bindings-layer detail (ctx object model, units, source-location ID joins).
+_LLM_DIR = os.path.normpath(os.path.join(_HERE, "..", "..", "profiler", "src", "llm"))
+_PROMPT_PATH = os.path.join(_LLM_DIR, "system.prompt.md")
+_EVAL_GUIDE_PATH = os.path.join(_HERE, "eval_guide.md")
+
+
+def _read_text(path: str) -> str:
+    try:
+        with open(path, encoding="utf-8") as f:
+            return f.read()
+    except Exception as e:
+        return f"(unavailable: {e})"
+
+
+# Tracy UDP broadcast packet support. Tracy clients announce themselves on
+# port 8086 with a BroadcastMessage (see public/common/TracyProtocol.hpp).
+# The dev GUI reads protocolVersion from the broadcast and refuses connection
+# on mismatch instead of hitting an opaque TCP timeout. We do the same.
+_PROTOCOL_HPP = os.path.normpath(
+    os.path.join(_HERE, "..", "..", "public", "common", "TracyProtocol.hpp")
+)
+_BROADCAST_PORT = 8086
+_PROGRAM_NAME_SIZE = 64
+
+
+def _read_bindings_protocol_version() -> int | None:
+    """Parse ProtocolVersion from TracyProtocol.hpp at startup so our 'expected'
+    version stays in sync with the bindings build without extra C++ wiring."""
+    try:
+        with open(_PROTOCOL_HPP, encoding="utf-8") as f:
+            for line in f:
+                m = re.search(r"constexpr\s+uint32_t\s+ProtocolVersion\s*=\s*(\d+)", line)
+                if m:
+                    return int(m.group(1))
+    except Exception:
+        pass
+    return None
+
+
+_OUR_PROTOCOL_VERSION = _read_bindings_protocol_version()
+
+
+def _parse_broadcast(data: bytes) -> dict | None:
+    """Parse a Tracy BroadcastMessage. Handles broadcast versions 0-3.
+
+    Fixed-field sizes (from TracyProtocol.hpp, packed):
+      v3: u16 bv, u16 lp, u32 pv, u64 pid, i32 at, char[<=64] name  (>=20 + name)
+      v2: u16 bv, u16 lp, u32 pv, i32 at, char[<=64] name           (>=12 + name)
+      v1: u32 bv, u32 pv, u32 lp, u32 at, char[<=64] name           (>=16 + name)
+      v0: u32 bv, u32 pv, u32 at, char[<=64] name                   (>=12 + name)
+
+    The programName field is variable-length on the wire — the sender writes
+    only the actual name plus null terminator, not the full 64-byte buffer.
+    """
+    if len(data) < 4:
+        return None
+
+    def _name(buf: bytes) -> str:
+        return buf[:_PROGRAM_NAME_SIZE].split(b"\0", 1)[0].decode("utf-8", "replace")
+
+    bv16 = struct.unpack_from("<H", data, 0)[0]
+    if bv16 == 3 and len(data) >= 21:
+        bv, lp, pv, pid, at = struct.unpack_from("<HHIQi", data, 0)
+        return {"broadcast_version": bv, "listen_port": lp,
+                "protocol_version": pv, "pid": pid,
+                "active_seconds": at, "program": _name(data[20:])}
+    if bv16 == 2 and len(data) >= 13:
+        bv, lp, pv, at = struct.unpack_from("<HHIi", data, 0)
+        return {"broadcast_version": bv, "listen_port": lp,
+                "protocol_version": pv, "active_seconds": at,
+                "program": _name(data[12:])}
+    bv32 = struct.unpack_from("<I", data, 0)[0]
+    if bv32 == 1 and len(data) >= 17:
+        bv, pv, lp, at = struct.unpack_from("<IIII", data, 0)
+        return {"broadcast_version": bv, "listen_port": lp,
+                "protocol_version": pv, "active_seconds": at,
+                "program": _name(data[16:])}
+    if bv32 == 0 and len(data) >= 13:
+        bv, pv, at = struct.unpack_from("<III", data, 0)
+        return {"broadcast_version": bv, "listen_port": None,
+                "protocol_version": pv, "active_seconds": at,
+                "program": _name(data[12:])}
+    return None
+
+
+async def _listen_broadcasts(timeout_s: float = 1.5) -> list[dict]:
+    """Listen briefly on UDP 8086 for Tracy client announcements.
+
+    Returns a list of parsed broadcasts (deduplicated by listen_port). Empty
+    list means no broadcast received — the target may use TRACY_ON_DEMAND,
+    a non-default broadcast port, or simply isn't running.
+    """
+    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+    try:
+        s.bind(("", _BROADCAST_PORT))
+    except OSError:
+        s.close()
+        return []
+    s.setblocking(False)
+    loop = asyncio.get_running_loop()
+    seen: dict[int | None, dict] = {}
+    deadline = loop.time() + timeout_s
+    try:
+        while loop.time() < deadline:
+            remaining = deadline - loop.time()
+            if remaining <= 0:
+                break
+            try:
+                fut = loop.sock_recvfrom(s, 2048)
+                data, _addr = await asyncio.wait_for(fut, timeout=remaining)
+            except (asyncio.TimeoutError, BlockingIOError):
+                break
+            parsed = _parse_broadcast(data)
+            if parsed:
+                seen.setdefault(parsed.get("listen_port"), parsed)
+    finally:
+        s.close()
+    return list(seen.values())
+
+
+def _is_our_server_running() -> tuple[bool, int]:
+    """
+    Check the PID file to see if our server is already running.
+    Returns (running, port). Uses os.kill(pid, 0) to confirm the process is alive.
+    """
+    try:
+        with open(_PID_FILE) as f:
+            pid = int(f.read().strip())
+        with open(_PORT_FILE) as f:
+            port = int(f.read().strip())
+        os.kill(pid, 0)   # raises OSError if process is gone
+        return True, port
+    except Exception:
+        return False, 0
+
+
+def _find_free_port() -> int:
+    """Scan from preferred port upward; fall back to OS-assigned if the range is exhausted."""
+    for port in range(_PREFERRED_PORT, _PREFERRED_PORT + 16):
+        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+        try:
+            s.bind(("127.0.0.1", port))
+            s.close()
+            return port
+        except OSError:
+            s.close()
+    # Let OS assign any free port
+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    s.bind(("127.0.0.1", 0))
+    port = s.getsockname()[1]
+    s.close()
+    return port
+
+
+def _write_pid_and_port(port: int) -> None:
+    try:
+        with open(_PID_FILE, "w") as f:
+            f.write(str(os.getpid()))
+        with open(_PORT_FILE, "w") as f:
+            f.write(str(port))
+    except Exception:
+        pass
+
+
+def _cleanup_pid_files() -> None:
+    for path in (_PID_FILE, _PORT_FILE):
+        try:
+            os.unlink(path)
+        except Exception:
+            pass
+
+
+# Attempt to import Tracy Server bindings
+try:
+    import TracyServerBindings as tracy_server
+except ImportError:
+    sys.path.append(os.path.join(os.path.dirname(__file__), "../../build/python"))
+    try:
+        import TracyServerBindings as tracy_server
+    except ImportError:
+        tracy_server = None
+
+mcp_server = fastmcp.FastMCP("Tracy Profiler")
+executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)
+
+
+class Task:
+    def __init__(self, task_id: str, code: str):
+        self.id = task_id
+        self.code = code
+        self.status = "pending"
+        self.result = None
+        self.error = None
+        self.start_time = time.time()
+        self.end_time = None
+
+
+class TracyInstance:
+    def __init__(self, name: str, worker: object | None = None):
+        self.name = name
+        self.worker = worker
+        self.path = None
+        self.mtime = None
+
+
+instances: dict[str, TracyInstance] = {}
+tasks: dict[str, Task] = {}
+captures_dir: str | None = os.environ.get("TRACY_CAPTURES_DIR")
+
+
+@mcp_server.resource("tracy://prompt")
+def _prompt_resource() -> str:
+    """Tracy Assist's analysis guidance (system.prompt.md). Contains workflows
+    for optimization, callstack inspection, and privacy rules. %TIME%, %USER%,
+    and %PROGRAMNAME% are placeholders filled by the in-app chat — ignore them
+    when reading from MCP."""
+    return _read_text(_PROMPT_PATH)
+
+
+@mcp_server.resource("tracy://eval-guide")
+def _eval_guide_resource() -> str:
+    """Bindings-layer guide for the eval tool: ctx object model, time units,
+    source-location ID semantics, and worked examples translating catalog
+    entries into ctx Python."""
+    return _read_text(_EVAL_GUIDE_PATH)
+
+
+@mcp_server.tool()
+async def list_captures() -> list[str]:
+    """List .tracy capture files in the TRACY_CAPTURES_DIR directory (non-recursive)."""
+    if not captures_dir:
+        return []
+    return sorted(glob.glob(os.path.join(captures_dir, "*.tracy")))
+
+
+@mcp_server.tool()
+async def list_instances() -> list[dict]:
+    """List all loaded Tracy instances and captures with metadata."""
+    return [
+        {
+            "id": name,
+            "path": inst.path,
+            "mtime": inst.mtime,
+            "live": inst.path is None
+        }
+        for name, inst in instances.items()
+    ]
+
+
+@mcp_server.tool()
+async def discover_instances(port_range: str = "8086-8095") -> list[dict]:
+    """
+    Scan for running Tracy-instrumented applications on local ports.
+
+    Returns a list of discovered ports that are listening.
+    """
+    start_port, end_port = map(int, port_range.split("-"))
+    discovered = []
+
+    async def check_port(port: int) -> None:
+        try:
+            _, writer = await asyncio.wait_for(
+                asyncio.open_connection("127.0.0.1", port), timeout=0.1
+            )
+            writer.close()
+            await writer.wait_closed()
+            discovered.append({"port": port, "address": "127.0.0.1"})
+        except (OSError, asyncio.TimeoutError, ConnectionRefusedError):
+            pass
+
+    await asyncio.gather(*(check_port(p) for p in range(start_port, end_port + 1)))
+    return discovered
+
+
+@mcp_server.tool()
+async def live_connect(address: str = "127.0.0.1", port: int = 8086, alias: str | None = None) -> str:
+    """
+    Connect to a live running Tracy-instrumented application.
+
+    Wraps Worker(addr, port, memoryLimit=-1). Returns the instance_id.
+    """
+    if not tracy_server:
+        return "Error: Tracy Server bindings not found."
+
+    # Pre-flight: read Tracy's UDP broadcast on port 8086 and compare protocol
+    # versions before attempting TCP. Mirrors what the Tracy GUI does so a
+    # version mismatch produces a precise error instead of an opaque timeout.
+    # Tracy clients broadcast every ~3s (TracyProfiler.cpp), so we listen a
+    # little longer to guarantee catching at least one beat.
+    broadcasts = await _listen_broadcasts(timeout_s=3.5)
+    match = next((b for b in broadcasts if b.get("listen_port") == port), None)
+    if match and _OUR_PROTOCOL_VERSION is not None:
+        if match["protocol_version"] != _OUR_PROTOCOL_VERSION:
+            return (
+                f"Protocol mismatch: target program '{match['program']}' "
+                f"announces Tracy protocol v{match['protocol_version']} on "
+                f"{address}:{port}, but these server bindings are built "
+                f"against v{_OUR_PROTOCOL_VERSION}. Rebuild the bindings or "
+                f"the target against a matching Tracy version."
+            )
+
+    try:
+        w = tracy_server.Worker(address, port)
+    except Exception as e:
+        return f"Failed to connect: {str(e)}"
+
+    # Worker construction returns immediately even on protocol failure (the
+    # bindings expose no error state — is_connected() is the only signal).
+    # Probe briefly so silent failures (e.g. TRACY_ON_DEMAND with no profiler
+    # request yet, or a target broadcasting on a non-default port) surface
+    # cleanly even when broadcast pre-flight didn't catch them.
+    deadline_s = 2.0
+    step_s = 0.1
+    elapsed = 0.0
+    while elapsed < deadline_s and not w.is_connected():
+        await asyncio.sleep(step_s)
+        elapsed += step_s
+
+    if not w.is_connected():
+        try:
+            w.shutdown()
+        except Exception:
+            pass
+        hint = ""
+        if broadcasts and not match:
+            seen = ", ".join(
+                f"'{b['program']}' on port {b.get('listen_port')} (protocol v{b['protocol_version']})"
+                for b in broadcasts
+            )
+            hint = f" Detected other Tracy broadcasts: {seen}."
+        elif not broadcasts:
+            hint = (
+                " No Tracy broadcasts were received on port 8086 in 3.5s — "
+                "the target may use TRACY_ON_DEMAND, a non-default broadcast "
+                "port, or isn't running."
+            )
+        return (
+            f"Reached {address}:{port} but the Tracy handshake did not complete "
+            f"within {deadline_s:.1f}s.{hint} Common causes: (1) the Tracy "
+            f"client version embedded in the target program differs from these "
+            f"server bindings; (2) the target was built with TRACY_ON_DEMAND "
+            f"and is awaiting a profiler request; (3) another client is "
+            f"already attached."
+        )
+
+    name = alias or f"live_{address}_{port}"
+    instances[name] = TracyInstance(name, w)
+    return (
+        f"Connected to live instance as '{name}'. "
+        f"Before your first eval, read resources tracy://prompt "
+        f"(analysis guidance) and tracy://eval-guide (ctx object model, "
+        f"ns time units, srcloc IDs)."
+    )
+
+
+@mcp_server.tool()
+async def load_capture(path: str, alias: str | None = None) -> str:
+    """
+    Load a .tracy capture file by absolute path.
+
+    Parameters:
+      path  — absolute path to a .tracy file. On Windows use backslashes
+              (e.g. 'E:\\\\traces\\\\foo.tracy').
+      alias — optional instance name; overwrites existing on collision.
+              If omitted, an ID is derived from filename and mtime.
+
+    If you don't already have a path, call `list_captures` first — it lists
+    .tracy files in the TRACY_CAPTURES_DIR environment directory.
+    """
+    if not tracy_server:
+        return "Error: Tracy Server bindings not found."
+    try:
+        mtime = os.path.getmtime(path)
+        if alias:
+            name = alias
+        else:
+            # unique name including mtime to avoid version collision
+            name = f"{os.path.basename(path)}@{int(mtime):x}"
+
+        if name in instances:
+            inst = instances[name]
+            if inst.path == path and inst.mtime == mtime:
+                return f"Instance '{name}' is already loaded and up to date."
+
+        f = tracy_server.open_file(path)
+        w = tracy_server.create_worker_from_file(f)
+        inst = TracyInstance(name, w)
+        inst.path = path
+        inst.mtime = mtime
+        instances[name] = inst
+        return (
+            f"Loaded as '{name}'. "
+            f"Before your first eval, read resources tracy://prompt "
+            f"(analysis guidance) and tracy://eval-guide (ctx object model, "
+            f"ns time units, srcloc IDs)."
+        )
+    except Exception as e:
+        return f"Failed to load: {str(e)}"
+
+
+@mcp_server.tool()
+async def unload_capture(instance_id: str) -> str:
+    """Unload a Tracy instance and release its memory."""
+    if instance_id in instances:
+        del instances[instance_id]
+        return f"Instance '{instance_id}' unloaded."
+    return f"Instance '{instance_id}' not found."
+
+
+@mcp_server.tool(name="eval")
+async def tracy_eval(code: str, instance_id: str, async_mode: bool = False) -> object:
+    """
+    Execute Python code against a specific Tracy Worker bound as `ctx`.
+
+    On first use, read the `tracy://prompt` (analysis guidance) and
+    `tracy://eval-guide` (ctx object model, units, source-location ID joins)
+    resources. Time values returned by Worker methods are nanoseconds.
+
+    If async_mode=True, returns a task_id immediately; poll via the `task` tool.
+    """
+    if instance_id not in instances:
+        return f"Error: Instance '{instance_id}' not found. Use list_instances to find valid IDs."
+
+    instance = instances[instance_id]
+    if not instance.worker:
+        return f"Error: Instance '{instance_id}' has no worker."
+
+    if not async_mode:
+        return await _execute_eval(code, instance.worker)
+
+    # Async mode: spawn task and return immediately
+    task_id = str(uuid.uuid4())
+    task = Task(task_id, code)
+    tasks[task_id] = task
+    asyncio.get_running_loop().run_in_executor(
+        executor, _run_task_sync, task, instance.worker
+    )
+    return {"task_id": task_id, "status": "running"}
+
+
+def _run_task_sync(task: Task, worker: object) -> None:
+    """Run a background eval task in the thread pool."""
+    task.status = "running"
+    try:
+        task.result = _execute_eval_sync(task.code, worker)
+        task.status = "completed"
+    except Exception as e:
+        task.error = str(e)
+        task.status = "failed"
+    finally:
+        task.end_time = time.time()
+
+
+def _execute_eval_sync(code: str, ctx: object) -> str:
+    """Execute *code* with `ctx` bound to the Tracy worker. Captures stdout."""
+    global_vars = {
+        "__builtins__": builtins,
+        "ctx": ctx,
+        "tracy": tracy_server,
+        "instances": {name: inst.worker for name, inst in instances.items()},
+    }
+    buf = io.StringIO()
+    with redirect_stdout(buf):
+        try:
+            result = eval(compile(code, "<eval>", "eval"), global_vars)
+        except SyntaxError:
+            exec(compile(code, "<exec>", "exec"), global_vars)
+            result = None
+    output = buf.getvalue()
+    if result is None:
+        return output or ""
+    return str(result)
+
+
+async def _execute_eval(code: str, ctx: object) -> str:
+    """Async wrapper: runs `_execute_eval_sync` in the thread-pool executor."""
+    return await asyncio.get_running_loop().run_in_executor(
+        executor, _execute_eval_sync, code, ctx
+    )
+
+
+@mcp_server.tool()
+async def task(action: str, task_id: str | None = None) -> object:
+    """
+    Manage background analysis tasks.
+
+    Actions: poll, cancel, list
+    """
+    if action == "list":
+        return [
+            {"id": t.id, "status": t.status, "elapsed": time.time() - t.start_time}
+            for t in tasks.values()
+        ]
+
+    if not task_id or task_id not in tasks:
+        return "Error: Task ID not found."
+
+    t = tasks[task_id]
+    if action == "poll":
+        res: dict = {"id": t.id, "status": t.status}
+        if t.status == "completed":
+            res["result"] = t.result
+        elif t.status == "failed":
+            res["error"] = t.error
+        return res
+
+    if action == "cancel":
+        # Cancellation of thread-pool work is not possible post-submission;
+        # mark the task so callers know it was abandoned.
+        if t.status == "running":
+            t.status = "cancelled"
+            return f"Task {task_id} marked as cancelled."
+        return f"Task {task_id} is not running."
+
+    return "Error: Unknown action."
+
+
+@mcp_server.tool()
+async def shutdown_server() -> str:
+    """
+    Shut down the Tracy MCP server.
+
+    Because the server runs as a singleton (SSE transport, one process shared
+    across all VS Code windows), this releases the TracyServerBindings.pyd lock
+    for all clients at once. Restart tracy_mcp.py after rebuilding.
+    """
+    import threading
+    def _exit() -> None:
+        time.sleep(0.2)
+        os._exit(0)
+    threading.Thread(target=_exit, daemon=True).start()
+    return "Server shutting down. Restart tracy_mcp.py to reconnect."
+
+
+if __name__ == "__main__":
+    atexit.register(_cleanup_pid_files)
+
+    running, existing_port = _is_our_server_running()
+    if running:
+        print(
+            f"Tracy MCP already running on port {existing_port}. "
+            "All VS Code windows share that instance.",
+            file=sys.stderr,
+        )
+        sys.exit(0)
+
+    port = _find_free_port()
+    _write_pid_and_port(port)
+
+    print(f"Tracy MCP listening on http://127.0.0.1:{port}/sse", file=sys.stderr)
+
+    mcp_server.settings.host = "127.0.0.1"
+    mcp_server.settings.port = port
+    try:
+        mcp_server.run(transport="sse")
+    except KeyboardInterrupt:
+        print("\nTracy MCP server stopped.", file=sys.stderr)
+        sys.exit(0)
--- a/manual/tracy.md
+++ b/manual/tracy.md
@@ -1,4 +1,4 @@
---
+---
 bibliography:
 - tracy.bib
 ---
--- a/manual/tracy.tex
+++ b/manual/tracy.tex
@@ -2473,6 +2473,123 @@ The following additional CMake options are available when building the Python pa

 Be aware that the memory allocated by this buffer is global and is not freed, see section~\ref{uniquepointers}.

+\subsection{MCP Server}
+\label{mcpserver}
+
+Tracy provides an optional MCP (Model Context Protocol\footnote{\url{https://modelcontextprotocol.io}}) server that allows AI coding assistants to load and analyze Tracy captures as part of automated workflows. It runs as a separate Python sidecar process and does not integrate with or depend on Tracy Assist (section~\ref{tracyassist}). No Python interpreter is required to run Tracy itself.
+
+The primary use case is agentic tooling: an AI agent can load a \texttt{.tracy} capture, execute arbitrary analysis code against the \texttt{Worker} bindings (see below), and compare results across multiple captures — for example, validating that a proposed optimization reduced frame time.
+
+\subsubsection{Building}
+
+The MCP server requires the Tracy Server Python bindings, which are built alongside the client bindings when \texttt{TRACY\_CLIENT\_PYTHON} is enabled:
+
+\begin{lstlisting}
+cmake -B build -DTRACY_CLIENT_PYTHON=ON
+cmake --build build --config Release
+\end{lstlisting}
+
+\subsubsection{Running}
+
+\begin{lstlisting}
+pip install mcp
+python extra/mcp/tracy_mcp.py
+\end{lstlisting}
+
+Set the following environment variables before launching (or export them in your shell):
+
+\begin{lstlisting}
+PYTHONPATH=/path/to/tracy/build/python/Release
+TRACY_CAPTURES_DIR=/path/to/captures   # enables list_captures
+TRACY_MCP_PORT=47380                   # optional; default 47380
+\end{lstlisting}
+
+\subsubsection{Integrating with an AI assistant}
+
+The server runs as a singleton on SSE transport (port 47380 by default). Only one process loads \texttt{TracyServerBindings} regardless of how many editor windows are open; subsequent launches detect the port is taken and exit immediately.
+
+The server prints its URL on startup and writes it to \texttt{extra/mcp/tracy\_mcp.port}:
+
+\begin{lstlisting}
+Tracy MCP listening on http://127.0.0.1:47380/sse
+\end{lstlisting}
+
+Configure your AI assistant using that URL. For example, for a JSON-based MCP configuration:
+
+\begin{lstlisting}
+{
+  "mcpServers": {
+    "tracy": {
+      "url": "http://127.0.0.1:47380/sse"
+    }
+  }
+}
+\end{lstlisting}
+
+\subsubsection{Available tools}
+
+\begin{itemize}
+\item \texttt{list\_captures} --- List \texttt{*.tracy} files in \texttt{TRACY\_CAPTURES\_DIR} (top-level only).
+\item \texttt{list\_instances} --- List all captures currently loaded in the server.
+\item \texttt{load\_capture} --- Load a \texttt{.tracy} file by path, optionally giving it an alias.
+\item \texttt{connect\_instance} --- Set the active instance for subsequent analysis calls.
+\item \texttt{live\_connect} --- Connect to a running Tracy-instrumented application by address and port.
+\item \texttt{discover\_instances} --- Scan a port range for running Tracy-instrumented applications.
+\item \texttt{eval} --- Execute arbitrary Python against the active \texttt{Worker} object (available as \texttt{ctx}). Supports \texttt{async\_mode=True} for long-running queries.
+\item \texttt{task} --- Poll, cancel, or list background analysis tasks started with \texttt{async\_mode=True}.
+\end{itemize}
+
+\subsubsection{Worker API (available via \texttt{eval})}
+
+Inside \texttt{eval}, the variable \texttt{ctx} is a \texttt{Worker} instance. All time values are in nanoseconds. The following methods are available:
+
+\paragraph{Capture metadata}
+\begin{itemize}
+\item \texttt{get\_capture\_name()} / \texttt{get\_capture\_program()} --- Name and program string stored in the trace.
+\item \texttt{get\_host\_info()} --- OS, CPU, RAM, and compiler info as a string.
+\item \texttt{get\_resolution()} --- Timer resolution in nanoseconds.
+\item \texttt{get\_first\_time()} / \texttt{get\_last\_time()} --- Trace time range in nanoseconds.
+\end{itemize}
+
+\paragraph{CPU zones}
+\begin{itemize}
+\item \texttt{get\_all\_zone\_stats()} --- Returns a \texttt{dict[str, ZoneStats]} keyed by zone name. Each \texttt{ZoneStats} has \texttt{min}, \texttt{max}, \texttt{total}, \texttt{avg}, \texttt{count}, \texttt{sum\_sq} (all in nanoseconds). Includes nested zones.
+\item \texttt{get\_root\_zone\_stats()} --- Like \texttt{get\_all\_zone\_stats()} but aggregates only top-level zones per thread. Safe to sum across zones.
+\item \texttt{get\_zone\_stats(srcloc\_id)} --- Stats for a single source location.
+\item \texttt{get\_zone\_durations(name)} --- List of individual zone durations (ns) for distribution analysis.
+\item \texttt{get\_zone\_source\_location(name)} --- Returns \texttt{\{"name", "function", "file", "line", "color"\}} for the named zone.
+\end{itemize}
+
+\paragraph{GPU zones}
+\begin{itemize}
+\item \texttt{get\_all\_gpu\_zone\_stats()} --- Returns a \texttt{dict[str, GpuZoneStats]}.
+\item \texttt{get\_gpu\_contexts()} --- Returns a list of \texttt{GpuContextSummary} objects.
+\item \texttt{get\_gpu\_zone\_durations(name)} --- Individual GPU zone durations (ns).
+\end{itemize}
+
+\paragraph{Frames}
+\begin{itemize}
+\item \texttt{get\_frame\_times()} --- Per-frame durations (ns) for the default frame set.
+\item \texttt{get\_frame\_times\_named(name)} --- Per-frame durations for a named frame set.
+\item \texttt{get\_frame\_boundaries()} --- List of \texttt{(start\_ns, end\_ns)} tuples for each frame.
+\item \texttt{get\_frame\_count()} --- Frame count for the default frame set.
+\end{itemize}
+
+\paragraph{Threads, messages, plots, memory, and locks}
+\begin{itemize}
+\item \texttt{get\_threads()} --- List of \texttt{ThreadData} objects with \texttt{id}, \texttt{count}, \texttt{is\_fiber}.
+\item \texttt{get\_messages()} --- List of \texttt{MessageInfo} objects with \texttt{time}, \texttt{text}, \texttt{color}, \texttt{thread}.
+\item \texttt{get\_plots()} --- List of \texttt{PlotSummary} objects with \texttt{name}, \texttt{type}, \texttt{min}, \texttt{max}, \texttt{sum}, \texttt{avg}, \texttt{count}.
+\item \texttt{get\_memory\_events()} --- List of raw allocation events including pointer, size, alloc/free times, and callstack index.
+\item \texttt{get\_locks()} --- List of \texttt{LockSummary} objects. Use \texttt{get\_lock\_wait\_stats()} for contention analysis.
+\item \texttt{get\_symbol\_stats()} --- Callstack-sample hit counts per symbol. Sort by \texttt{excl} to find hot functions.
+\item \texttt{get\_callstack\_frames(callstack\_idx)} --- Resolve a callstack index to a list of \texttt{\{"name", "file", "line", "addr"\}} frames.
+\end{itemize}
+
+\subsubsection{Loading a capture}
+
+Traces must be explicitly loaded through the MCP server — opening a file in the Tracy GUI does not make it available to the server. Use \texttt{load\_capture} with the full path to a \texttt{.tracy} file, or use \texttt{list\_captures} first if \texttt{TRACY\_CAPTURES\_DIR} is configured.
+
 \subsection{Fortran API}
 \label{fortranapi}

--- a/python/CMakeLists.txt
+++ b/python/CMakeLists.txt
@@ -20,13 +20,24 @@ target_link_libraries(TracyClientBindings PUBLIC ${Python_LIBRARIES})
 target_compile_definitions(TracyClientBindings PUBLIC BUFFER_SIZE=${BUFFER_SIZE})
 target_compile_definitions(TracyClientBindings PUBLIC NAME_LENGTH=${NAME_LENGTH})

+include(${CMAKE_CURRENT_LIST_DIR}/../cmake/config.cmake)
+include(${CMAKE_CURRENT_LIST_DIR}/../cmake/vendor.cmake)
+include(${CMAKE_CURRENT_LIST_DIR}/../cmake/server.cmake)
+
+pybind11_add_module(TracyServerBindings SHARED bindings/ServerModule.cpp)
+target_link_libraries(TracyServerBindings PUBLIC TracyServer)
+target_link_libraries(TracyServerBindings PUBLIC ${Python_LIBRARIES})
+
 if (UNIX)
  set_target_properties(TracyClientBindings PROPERTIES
    BUILD_RPATH_USE_ORIGIN TRUE
    INSTALL_RPATH "\$ORIGIN/lib")
+  set_target_properties(TracyServerBindings PROPERTIES
+    BUILD_RPATH_USE_ORIGIN TRUE
+    INSTALL_RPATH "\$ORIGIN/lib")
 endif ()

-install(TARGETS TracyClientBindings
+install(TARGETS TracyClientBindings TracyServerBindings
  RUNTIME DESTINATION .
  LIBRARY DESTINATION .
 )
--- a/python/bindings/ServerModule.cpp
+++ b/python/bindings/ServerModule.cpp