Compare commits

..

41 Commits

Author SHA1 Message Date
Bartosz Taudul
e5aa8eba51 Merge pull request #1387 from Lectem/wip/offline-res-for-any-toolchain
Offline resolution for any toolchain
2026-06-06 14:48:54 +02:00
Clément Grégoire
7437c41514 Escape provided addr2line tool path 2026-06-06 14:47:24 +02:00
Bartosz Taudul
f441a5070b Wrap achievements column contents into child windows. 2026-06-06 13:19:56 +02:00
Bartosz Taudul
00b6abd67b Move achievements text to markdown files. 2026-06-06 13:19:55 +02:00
Bartosz Taudul
e4e3d75eb8 Add PDF link to built-in manual viewer. 2026-06-06 10:56:25 +02:00
Bartosz Taudul
fc5318dcad Some more markdown user manual compatibility fixes. 2026-06-06 01:53:05 +02:00
Bartosz Taudul
661c664b75 Converte LaTeX math in markdown to plain text. 2026-06-06 01:44:27 +02:00
Bartosz Taudul
6dbebca666 Reset user manual view scroll position after changing section from toc. 2026-06-06 01:28:24 +02:00
Bartosz Taudul
73d78ad517 Fix tables in markdown manual. 2026-06-06 01:25:49 +02:00
Bartosz Taudul
e5371d7987 Icons description separator must start at newline.
Otherwise it will clash with wrong things, like middle-of-line table header
separators.
2026-06-06 01:25:08 +02:00
Bartosz Taudul
9806f35714 Increase spacing between admonition icon and header text. 2026-06-06 00:43:09 +02:00
Bartosz Taudul
d40289d594 Add support for markdown admonitions. 2026-06-05 23:28:09 +02:00
Bartosz Taudul
86fbe529ed Bump font awesome to 7.2. 2026-06-05 23:28:09 +02:00
Bartosz Taudul
9b169ef3f9 Merge pull request #1391 from wolfpld/slomp/cuda-examples
Adding CUDA examples
2026-06-05 21:55:02 +02:00
Marcos Slomp
64797dc735 adding basic CUDA graph example 2026-06-05 12:52:08 -07:00
Bartosz Taudul
76797799c0 Merge pull request #1390 from wolfpld/slomp/cuda-tests
Relocating existing CUDA tests
2026-06-05 21:50:11 +02:00
Marcos Slomp
19549693a0 removing Makefile 2026-06-05 12:46:10 -07:00
Marcos Slomp
10d64d69b5 better ctest integration across the board 2026-06-05 12:46:10 -07:00
Marcos Slomp
d89c956394 CUPTI DLL paths... 2026-06-05 12:46:10 -07:00
Marcos Slomp
79467b4b31 cmake version shenanigans 2026-06-05 12:46:10 -07:00
Marcos Slomp
ae275f239d adding cmake recipe file 2026-06-05 12:46:10 -07:00
Marcos Slomp
77fb86155f relocating CUDAGraphRepro "example" to the tests folder 2026-06-05 12:46:10 -07:00
Bartosz Taudul
e627fcce98 Merge pull request #1386 from wolfpld/slomp/test-relocation
Relocating Tracy test app, plus GitHub actions
2026-06-05 21:19:22 +02:00
Bartosz Taudul
e80893ac20 Reset font size when displaying markdown tooltip. 2026-06-05 19:19:37 +02:00
Bartosz Taudul
912f8c048c Render footnotes in smaller size font. 2026-06-05 19:17:46 +02:00
Bartosz Taudul
d16f627cbc Header sizes are 1-6, remove extra entry. 2026-06-05 19:12:04 +02:00
Clément Grégoire
7cb98245ce Batch addr2line invocations by command-line length
The fixed batches of 1024 addresses could overflow the platform's command-line limit (`La ligne de commande est trop longue.` from cmd.exe on Windows, whose limit is ~8191 characters). Build each batch by appending addresses until a length budget is reached instead. A single conservative budget of 8000 stays under the smallest limit on every platform, and keeps batches in the same ballpark as before (several hundred addresses per invocation).
2026-06-05 19:10:19 +02:00
Bartosz Taudul
3974cc8026 Add support for proper rendering of markdown footnotes. 2026-06-05 19:09:49 +02:00
Clément Grégoire
55d5436fb9 Add option to reset callstack frame symbols to the unresolved state
The new `-R` option of tracy-update sets every callstack frame back to `[unresolved]` / `[unknown]`. Since failed lookups leave frames untouched and the image-relative offset in `symAddr` survives patching, this makes it possible to chain several resolution passes over the same capture, each with different `-p` path substitutions (e.g. one pass per symbol directory).
2026-06-05 19:03:35 +02:00
Clément Grégoire
2b11785b05 Allow offline symbol resolution with any addr2line-compatible tool
The addr2line backend of tracy-update now builds on every platform, including Windows, and can be pointed at any addr2line-compatible executable:

- `-a`: path to a custom symbol resolution tool (e.g. `llvm-addr2line` or a cross-compilation toolchain's `addr2line`). Works on all platforms and takes precedence over the platform default (DbgHelp on Windows, the `addr2line` found in `PATH` elsewhere). Path-like values are validated up front so a wrong path fails with an actionable message instead of a cryptic, localized shell error.
- `-A`: extra arguments passed verbatim to the tool, e.g. `--relative-address` so `llvm-addr2line`/`llvm-symbolizer` accept the image-relative offsets Tracy records for images with a non-zero preferred base (PE, Mach-O).
- `-v`: verbose output while patching symbols.
2026-06-05 19:03:35 +02:00
Bartosz Taudul
715815374d Rebuild markdown manual. 2026-06-05 18:42:50 +02:00
Bartosz Taudul
4f64b974c6 Update manual. 2026-06-05 18:42:24 +02:00
Bartosz Taudul
7c58db4c0a Add public sidecar icon. 2026-06-05 18:40:18 +02:00
Bartosz Taudul
bebf20846f Expose setting public sidecar in UI. 2026-06-05 17:45:58 +02:00
Bartosz Taudul
5f6bc2238a Public user data sidecar support. 2026-06-05 17:45:58 +02:00
Bartosz Taudul
4564a626b2 UserData::Save() returns success status. 2026-06-05 17:40:22 +02:00
Bartosz Taudul
e95a757e6c Store trace file path in UserData. 2026-06-05 17:32:46 +02:00
Bartosz Taudul
fc97af4c68 Redirect unlink to _unlink with msvc. 2026-06-05 17:23:58 +02:00
Bartosz Taudul
8eada19734 Add a separate function to retrieve sidecar path. 2026-06-05 17:21:24 +02:00
Bartosz Taudul
268cab7f89 Remove UserData::GetConfigLocation(). 2026-06-05 17:03:23 +02:00
Bartosz Taudul
aeda64d36b Move saving user data to a separate function. 2026-06-05 15:58:07 +02:00
52 changed files with 1333 additions and 512 deletions

View File

@@ -217,7 +217,7 @@ CPMAddPackage(
CPMAddPackage(
NAME md4c
GITHUB_REPOSITORY mity/md4c
GIT_TAG release-0.5.3
GIT_TAG 755ce49acdc7cd682d4502b4796db5ed6a1230fb
EXCLUDE_FROM_ALL TRUE
)

View File

@@ -1,49 +0,0 @@
TRACY_PUBLIC := ../../public
NVCC := nvcc
CXX := g++
CUPTI_INC := /usr/local/cuda/include
CUPTI_LIB := /usr/local/cuda/lib64
TRACY_SRCS := $(TRACY_PUBLIC)/TracyClient.cpp
INCLUDES := -I$(TRACY_PUBLIC) -I$(CUPTI_INC)
LIBS := -L$(CUPTI_LIB) -lcuda -lcupti -lpthread -ldl
CXXFLAGS_REL := -O2 -DTRACY_ENABLE
CXXFLAGS_DBG := -g -O0 -DTRACY_ENABLE
NVCCFLAGS_REL := -arch=native -O2 -DTRACY_ENABLE
NVCCFLAGS_DBG := -arch=native -g -O0 -DTRACY_ENABLE
.PHONY: all debug investigate investigate2 clean
all: repro
debug: repro_debug
investigate: test_corr_reuse
investigate2: test_graphid_recycle
# Release build
repro: repro.cu tracy_client.o
$(NVCC) $(NVCCFLAGS_REL) $(INCLUDES) -o $@ $< tracy_client.o $(LIBS)
tracy_client.o: $(TRACY_SRCS)
$(CXX) $(CXXFLAGS_REL) $(INCLUDES) -c -o $@ $<
# Debug build (asserts enabled, no NDEBUG)
repro_debug: repro.cu tracy_client_debug.o
$(NVCC) $(NVCCFLAGS_DBG) $(INCLUDES) -o $@ $< tracy_client_debug.o $(LIBS)
tracy_client_debug.o: $(TRACY_SRCS)
$(CXX) $(CXXFLAGS_DBG) $(INCLUDES) -c -o $@ $<
# Investigation: correlationId uniqueness per graph launch (no Tracy dependency)
test_corr_reuse: test_corr_reuse.cu
$(NVCC) $(NVCCFLAGS_REL) $(INCLUDES) -o $@ $< $(LIBS)
# Investigation: does CUPTI recycle graphId values after cudaGraphExecDestroy?
test_graphid_recycle: test_graphid_recycle.cu
$(NVCC) $(NVCCFLAGS_REL) $(INCLUDES) -o $@ $< $(LIBS)
clean:
rm -f repro repro_debug test_corr_reuse test_graphid_recycle tracy_client.o tracy_client_debug.o

0
examples/cuda/README.md Normal file
View File

View File

@@ -0,0 +1,39 @@
cmake_minimum_required(VERSION 3.18)
project(CUDAGraphDemo LANGUAGES CXX CUDA)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.24")
set(CMAKE_CUDA_ARCHITECTURES native)
endif()
set(TRACY_PATH "${CMAKE_CURRENT_SOURCE_DIR}/../../.."
CACHE PATH "Root of the Tracy repository")
set(TRACY_PUBLIC "${TRACY_PATH}/public")
find_package(CUDAToolkit REQUIRED)
find_package(Threads REQUIRED)
# cuda-graph-demo.cu embeds Tracy via #include <TracyClient.cpp> (unity build),
# so no separate TracyClient library is needed — just expose the public headers.
add_executable(cuda-graph-demo cuda-graph-demo.cu)
target_include_directories(cuda-graph-demo PRIVATE ${TRACY_PUBLIC})
target_link_libraries(cuda-graph-demo PRIVATE
CUDA::cupti CUDA::cuda_driver Threads::Threads ${CMAKE_DL_LIBS})
# ctest-related integration below
# to run the binaries via ctest:
# ctest --test-dir <cmake-build-dir> -R <binary-name> -C <build-config>
enable_testing()
add_test(NAME cuda-graph-demo COMMAND cuda-graph-demo)
# On Windows, CUPTI's DLL must be on PATH at runtime.
if(WIN32)
set(_cupti_dir "$<TARGET_FILE_DIR:CUDA::cupti>")
set_target_properties(cuda-graph-demo PROPERTIES
VS_DEBUGGER_ENVIRONMENT "PATH=${_cupti_dir};$ENV{PATH}")
set_tests_properties(cuda-graph-demo PROPERTIES
ENVIRONMENT "PATH=${_cupti_dir};$ENV{PATH}")
endif()

View File

@@ -0,0 +1,11 @@
TRACY_PATH=<path-to-tracy>
CUDA_TOOLKIT_PATH=/usr/local/cuda
CUDA_CUPTI_PATH=${CUDA_TOOLKIT_PATH}/extras/CUPTI
# pass -v to nvcc for verbose build information
nvcc -O2 -std=c++17 cuda-graph-demo.cu \
-o cuda-graph-demo \
-I "${TRACY_PATH}/public" \
-I "${CUDA_CUPTI_PATH}/include" -I "${CUDA_TOOLKIT_PATH}/include" \
-L "${CUDA_CUPTI_PATH}/lib64" -L "${CUDA_TOOLKIT_PATH}/lib64" \
-lcupti -lcuda

View File

@@ -0,0 +1,146 @@
#include <cuda_runtime.h>
// WARN: for simplicity, we enable and "embed" the Tracy client directly into the code
#define TRACY_ENABLE
#include <TracyClient.cpp>
#include <tracy/Tracy.hpp>
#include <tracy/TracyCUDA.hpp>
#include <cstdio>
#include <cstdlib>
#include <vector>
#define CUDA_CHECK(call) \
do { \
cudaError_t err__ = (call); \
if (err__ != cudaSuccess) { \
std::fprintf(stderr, "CUDA error %s at %s:%d: %s\n", \
cudaGetErrorName(err__), __FILE__, __LINE__, \
cudaGetErrorString(err__)); \
std::exit(EXIT_FAILURE); \
} \
} while (0)
__global__ void saxpy(float a, const float* x, float* y, int n)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) y[i] = a * x[i] + y[i];
}
int main()
{
// CUPTI-backed Tracy context. Auto-captures all CUDA activity from the
// point StartProfiling() is called until StopProfiling(). The background
// collector thread flushes activity into Tracy; the explicit Collect()
// calls below just force a flush at known phase boundaries.
auto* cudaCtx = TracyCUDAContext();
{
constexpr char ctxName[] = "CUDA Graph Demo";
TracyCUDAContextName(cudaCtx, ctxName, sizeof(ctxName) - 1);
}
TracyCUDAStartProfiling(cudaCtx);
constexpr int N = 1 << 16; // small N => kernel is short => launch overhead dominates
constexpr int KERNELS_PER_GRAPH = 32; // chain length captured into the graph
constexpr int OUTER_ITERS = 2000; // how many times we replay the chain
// allocate device buffers
float *dX = nullptr, *dY = nullptr;
CUDA_CHECK(cudaMalloc(&dX, N * sizeof(float)));
CUDA_CHECK(cudaMalloc(&dY, N * sizeof(float)));
std::vector<float> hX(N, 1.0f);
CUDA_CHECK(cudaMemcpy(dX, hX.data(), N * sizeof(float), cudaMemcpyHostToDevice));
cudaStream_t stream = nullptr;
CUDA_CHECK(cudaStreamCreate(&stream));
const dim3 block(256);
const dim3 grid((N + block.x - 1) / block.x);
cudaEvent_t evStart, evStop;
CUDA_CHECK(cudaEventCreate(&evStart));
CUDA_CHECK(cudaEventCreate(&evStop));
// warm-up (so first-launch lazy-init and/or JIT doesn't bias the measurement)
saxpy<<<grid, block, 0, stream>>>(0.0f, dX, dY, N);
CUDA_CHECK(cudaStreamSynchronize(stream));
// baseline: launch each kernel directly on the stream
float msStream = 0.0f;
{
ZoneScopedN("stream-launches");
CUDA_CHECK(cudaMemsetAsync(dY, 0, N * sizeof(float), stream));
CUDA_CHECK(cudaEventRecord(evStart, stream));
for (int outer = 0; outer < OUTER_ITERS; ++outer) {
for (int k = 0; k < KERNELS_PER_GRAPH; ++k) {
saxpy<<<grid, block, 0, stream>>>(1.0e-6f, dX, dY, N);
}
}
CUDA_CHECK(cudaEventRecord(evStop, stream));
CUDA_CHECK(cudaEventSynchronize(evStop));
CUDA_CHECK(cudaEventElapsedTime(&msStream, evStart, evStop));
TracyCUDACollect(cudaCtx);
}
// capture: record the same kernel chain into a graph
cudaGraph_t graph = nullptr;
cudaGraphExec_t graphExec = nullptr;
{
ZoneScopedN("graph-capture");
// cudaStreamCaptureModeRelaxed allows the calling thread to perform
// unrelated CUDA work during capture; ThreadLocal is stricter if you need
// isolation. Most short, single-stream captures work fine in either mode.
CUDA_CHECK(cudaStreamBeginCapture(stream, cudaStreamCaptureModeRelaxed));
for (int k = 0; k < KERNELS_PER_GRAPH; ++k) {
saxpy<<<grid, block, 0, stream>>>(1.0e-6f, dX, dY, N);
}
CUDA_CHECK(cudaStreamEndCapture(stream, &graph));
// Instantiate once -> reusable executable graph.
CUDA_CHECK(cudaGraphInstantiate(&graphExec, graph, nullptr, nullptr, 0));
// The template graph isn't needed once instantiated.
CUDA_CHECK(cudaGraphDestroy(graph));
}
// replay: launch the instantiated graph OUTER_ITERS times
float msGraph = 0.0f;
{
ZoneScopedN("graph-launches");
CUDA_CHECK(cudaMemsetAsync(dY, 0, N * sizeof(float), stream));
CUDA_CHECK(cudaEventRecord(evStart, stream));
for (int outer = 0; outer < OUTER_ITERS; ++outer) {
CUDA_CHECK(cudaGraphLaunch(graphExec, stream));
}
CUDA_CHECK(cudaEventRecord(evStop, stream));
CUDA_CHECK(cudaEventSynchronize(evStop));
CUDA_CHECK(cudaEventElapsedTime(&msGraph, evStart, evStop));
TracyCUDACollect(cudaCtx);
}
// sanity check: y[i] = OUTER_ITERS * KERNELS_PER_GRAPH * 1e-6 * x[i]
std::vector<float> hY(N);
CUDA_CHECK(cudaMemcpy(hY.data(), dY, N * sizeof(float), cudaMemcpyDeviceToHost));
const float expected = float(OUTER_ITERS) * float(KERNELS_PER_GRAPH) * 1.0e-6f;
std::printf("Stream launches: %8.3f ms (%d kernels)\n",
msStream, OUTER_ITERS * KERNELS_PER_GRAPH);
std::printf("Graph launches: %8.3f ms (%d graph launches x %d kernels)\n",
msGraph, OUTER_ITERS, KERNELS_PER_GRAPH);
std::printf("Speedup : %8.2fx\n", msStream / msGraph);
std::printf("hY[0] = %.6e (expected %.6e)\n", hY[0], expected);
// shutdown
CUDA_CHECK(cudaGraphExecDestroy(graphExec));
CUDA_CHECK(cudaEventDestroy(evStart));
CUDA_CHECK(cudaEventDestroy(evStop));
CUDA_CHECK(cudaStreamDestroy(stream));
CUDA_CHECK(cudaFree(dX));
CUDA_CHECK(cudaFree(dY));
TracyCUDAStopProfiling(cudaCtx);
TracyCUDAContextDestroy(cudaCtx);
return 0;
}

View File

@@ -3,3 +3,151 @@ function Link(el)
el.attributes['reference'] = nil
return el
end
-- Drop Div wrappers (e.g. table/titlepage containers), keeping their content.
function Div(el)
return el.content
end
-- ---------------------------------------------------------------------------
-- LaTeX math -> plain-text approximation.
--
-- The target Markdown renderer has no math support, so a raw "$\frac{1}{2}$"
-- would show verbatim. We turn each math node into the closest Unicode/ASCII
-- equivalent: fractions become "a/b", \times becomes "x", super/subscripts use
-- Unicode digits, and the one multi-line display equation becomes a fenced
-- code block (Markdown collapses plain newlines, a code block keeps them).
-- ---------------------------------------------------------------------------
local sup = {['0']='',['1']='¹',['2']='²',['3']='³',['4']='',['5']='',
['6']='',['7']='',['8']='',['9']='',['+']='',['-']='',
['=']='',['(']='',[')']=''}
local sub = {['0']='',['1']='',['2']='',['3']='',['4']='',['5']='',
['6']='',['7']='',['8']='',['9']='',['+']='',['-']='',
['=']='',['(']='',[')']=''}
-- Symbol replacements, applied as literal substitutions. Longer commands must
-- precede those that are a prefix of them (e.g. \rightarrow before \right).
local symbols = {
{'\\leftrightarrow',''}, {'\\rightarrow',''}, {'\\leftarrow',''},
{'\\Rightarrow',''}, {'\\Leftarrow',''}, {'\\to',''}, {'\\mapsto',''},
{'\\times','×'}, {'\\cdot','·'}, {'\\div','÷'}, {'\\ast','*'}, {'\\star','*'},
{'\\leq',''}, {'\\geq',''}, {'\\neq',''}, {'\\approx',''}, {'\\equiv',''},
{'\\ll','«'}, {'\\gg','»'}, {'\\le',''}, {'\\ge',''},
{'\\ldots',''}, {'\\cdots',''}, {'\\dots',''}, {'\\infty',''},
{'\\pm','±'}, {'\\mp',''}, {'\\propto',''}, {'\\sum','Σ'}, {'\\prod','Π'},
{'\\alpha','α'}, {'\\beta','β'}, {'\\gamma','γ'}, {'\\delta','δ'}, {'\\Delta','Δ'},
{'\\mu','µ'}, {'\\sigma','σ'}, {'\\pi','π'}, {'\\lambda','λ'}, {'\\theta','θ'},
{'\\left',''}, {'\\right',''},
{'\\qquad',' '}, {'\\quad',' '}, {'\\,',' '}, {'\\;',' '}, {'\\:',' '},
{'\\ ',' '}, {'\\!',''},
{'\\%','%'}, {'\\#','#'}, {'\\&','&'}, {'\\_','_'}, {'\\{','{'}, {'\\}','}'},
{'\\$','$'},
}
-- Literal (non-pattern) string replacement; avoids Lua pattern magic in keys.
local function lit_replace(s, a, b)
local out, i = {}, 1
while true do
local p = s:find(a, i, true)
if not p then out[#out + 1] = s:sub(i); break end
out[#out + 1] = s:sub(i, p - 1)
out[#out + 1] = b
i = p + #a
end
return table.concat(out)
end
-- Strip the outer braces of a "%b{}" capture.
local function grp(b) return b:sub(2, #b - 1) end
-- Map a string to Unicode super/subscript, or nil if any char is unsupported.
local function map_script(txt, map)
local res = {}
for i = 1, #txt do
local c = txt:sub(i, i)
if not map[c] then return nil end
res[#res + 1] = map[c]
end
return table.concat(res)
end
local function convert(s)
-- Text/font wrappers: keep the content, recurse to handle nesting.
for _, cmd in ipairs({'text', 'mathrm', 'mathit', 'mathbf', 'mathbb',
'mathsf', 'mathtt', 'mathcal', 'operatorname',
'textbf', 'textit', 'textrm'}) do
s = s:gsub('\\' .. cmd .. '(%b{})', function(b) return convert(grp(b)) end)
end
-- Fractions -> "num/den" (spaced when either side has spaces).
local function frac(a, b)
local n, d = convert(grp(a)), convert(grp(b))
local sep = (n:find(' ', 1, true) or d:find(' ', 1, true)) and ' / ' or '/'
return n .. sep .. d
end
s = s:gsub('\\frac(%b{})(%b{})', frac)
s = s:gsub('\\dfrac(%b{})(%b{})', frac)
s = s:gsub('\\tfrac(%b{})(%b{})', frac)
s = s:gsub('\\sfrac(%b{})(%b{})', frac)
-- Roots.
s = s:gsub('\\sqrt(%b{})', function(b) return '√(' .. convert(grp(b)) .. ')' end)
-- Single-char scripts first, so the braced fallback (e.g. "_native") below
-- is not re-scanned and mangled into Unicode subscripts.
s = s:gsub('%^([%w])', function(c) return sup[c] or ('^' .. c) end)
s = s:gsub('_([%w])', function(c) return sub[c] or ('_' .. c) end)
-- Braced scripts: Unicode when the content is all digits/signs, else keep
-- a readable "^(...)" / "_..." form.
s = s:gsub('%^(%b{})', function(b)
local inner = convert(grp(b))
return map_script(inner, sup) or ('^(' .. inner .. ')')
end)
s = s:gsub('_(%b{})', function(b)
local inner = convert(grp(b))
return map_script(inner, sub) or ('_' .. inner)
end)
-- Remaining symbols.
for _, pair in ipairs(symbols) do s = lit_replace(s, pair[1], pair[2]) end
return s
end
-- Convert a display equation, preserving its line structure for a code block.
local function convert_display(s)
s = convert(s)
for _, env in ipairs({'cases', 'aligned', 'align', 'array', 'matrix',
'gathered', 'split'}) do
s = lit_replace(s, '\\begin{' .. env .. '}', '')
s = lit_replace(s, '\\end{' .. env .. '}', '')
end
s = lit_replace(s, '\\\\', '\n') -- row break
s = s:gsub('%s*&%s*', ' ') -- column separator -> spacing
local lines = {}
for line in (s .. '\n'):gmatch('(.-)\n') do
line = line:gsub('^%s+', ''):gsub('%s+$', '')
if line ~= '' then lines[#lines + 1] = line end
end
for i = 2, #lines do lines[i] = ' ' .. lines[i] end -- indent continuations
return table.concat(lines, '\n')
end
function Math(el)
if el.mathtype == 'DisplayMath' then
return el -- handled at block level by Para, to emit a code block
end
return pandoc.Str(convert(el.text))
end
-- A paragraph that is solely a display equation becomes a fenced code block.
function Para(el)
local maths, only_math = {}, true
for _, x in ipairs(el.content) do
if x.t == 'Math' and x.mathtype == 'DisplayMath' then
maths[#maths + 1] = x
elseif x.t ~= 'Space' and x.t ~= 'SoftBreak' and x.t ~= 'LineBreak' then
only_math = false
end
end
if #maths == 0 or not only_math then return nil end
local parts = {}
for _, m in ipairs(maths) do parts[#parts + 1] = convert_display(m.text) end
return pandoc.CodeBlock(table.concat(parts, '\n\n'))
end

View File

@@ -7,12 +7,18 @@ sed -i -e 's@\\ctrl@Ctrl@g' _tmp.tex
sed -i -e 's@\\shift@Shift@g' _tmp.tex
sed -i -e 's@\\Alt@Alt@g' _tmp.tex
sed -i -e 's@\\del@Delete@g' _tmp.tex
python3 fa-icons.py ../profiler/src/profiler/IconsFontAwesome6.h _tmp.tex
python3 fa-icons.py ../profiler/src/profiler/IconsFontAwesome7.h _tmp.tex
sed -i -e 's@\\LMB{}~@@g' _tmp.tex
sed -i -e 's@\\MMB{}~@@g' _tmp.tex
sed -i -e 's@\\RMB{}~@@g' _tmp.tex
sed -i -e 's@\\Scroll{}~@@g' _tmp.tex
# Resolve \circled{} markers and lstlisting escapeinside (@...@) snippets, which
# pandoc would otherwise emit verbatim or drop, to their Unicode equivalents.
sed -i -e 's|@\\circled{a}@|(a)|g' -e 's|@\\circled{b}@|(b)|g' -e 's|@\\circled{c}@|(c)|g' _tmp.tex
sed -i -e 's|\\circled{a}|(a)|g' -e 's|\\circled{b}|(b)|g' -e 's|\\circled{c}|(c)|g' _tmp.tex
sed -i -e 's|@\\ldots@|…|g' _tmp.tex
sed -i -e 's@\\nameref{quicklook}@A quick look at Tracy Profiler@g' _tmp.tex
sed -i -e 's@\\nameref{firststeps}@First steps@g' _tmp.tex
sed -i -e 's@\\nameref{client}@Client markup@g' _tmp.tex
@@ -26,7 +32,10 @@ sed -i -e 's@\\nameref{configurationfiles}@Configuration files@g' _tmp.tex
awk -f bclogo2quote.awk _tmp.tex > _tmp_quoted.tex
mv _tmp_quoted.tex _tmp.tex
pandoc --wrap=none --reference-location=block --number-sections -L filter.lua -s _tmp.tex -o tracy.md
pandoc --wrap=none --reference-location=block --number-sections -L filter.lua -t 'markdown-simple_tables-multiline_tables-grid_tables+pipe_tables' -s _tmp.tex -o tracy.md
awk -f tablecaption.awk tracy.md > _tmp_caption.md
mv _tmp_caption.md tracy.md
sed -i -e 's/^> \*\*IMPORTANT:\([^*]*\)\*\*/> [!IMPORTANT]\
> **\1**/' tracy.md
@@ -37,6 +46,6 @@ sed -i -e 's/^> \*\*CAUTION:\([^*]*\)\*\*/> [!CAUTION]\
sed -i -e 's/^> \*\*NOTE:\([^*]*\)\*\*/> [!NOTE]\
> **\1**/' tracy.md
python3 icon-explain.py ../profiler/src/profiler/IconsFontAwesome6.h tracy.md
python3 icon-explain.py ../profiler/src/profiler/IconsFontAwesome7.h tracy.md
rm -f _tmp.tex

16
manual/tablecaption.awk Normal file
View File

@@ -0,0 +1,16 @@
# Pandoc emits table captions as a line beginning with ": ", which GitHub
# renders literally instead of as a caption. Strip the marker and italicize
# the caption instead. Captions may span several physical lines when they
# contain a hard line break (a trailing backslash). Underscores are used for
# the emphasis so captions that already contain "*...*" markup are left intact.
!incap && /^: / {
incap = 1
$0 = "_" substr($0, 3)
}
incap && !/\\$/ {
print $0 "_"
incap = 0
next
}
incap { print; next }
{ print }

View File

@@ -3,7 +3,6 @@ bibliography:
- tracy.bib
---
::: titlepage
Tracy Profiler
The user manual
@@ -12,8 +11,7 @@ The user manual
**Bartosz Taudul** [\<wolf@nereid.pl\>](mailto:wolf@nereid.pl)
2026-05-30 <https://github.com/wolfpld/tracy>
:::
2026-06-06 <https://github.com/wolfpld/tracy>
# Quick overview {#quick-overview .unnumbered}
@@ -95,11 +93,11 @@ The concept of Tracy being a real-time profiler may be explained in a couple of
It is hard to imagine how long a nanosecond is. One good analogy is to compare it with a measure of length. Let's say that one second is one meter (the average doorknob is at the height of one meter).
One millisecond ($\frac{1}{1000}$ of a second) would be then the length of a millimeter. The average size of a red ant or the width of a pencil is 5 or 6 mm. A modern game running at 60 frames per second has only 16 ms to update the game world and render the entire scene.
One millisecond (1/1000 of a second) would be then the length of a millimeter. The average size of a red ant or the width of a pencil is 5 or 6 mm. A modern game running at 60 frames per second has only 16 ms to update the game world and render the entire scene.
One microsecond ($\frac{1}{1000}$ of a millisecond) in our comparison equals one micron. The diameter of a typical bacterium ranges from 1 to 10 microns. The diameter of a red blood cell or width of a strand of spider web silk is about 7 μm.
One microsecond (1/1000 of a millisecond) in our comparison equals one micron. The diameter of a typical bacterium ranges from 1 to 10 microns. The diameter of a red blood cell or width of a strand of spider web silk is about 7 μm.
And finally, one nanosecond ($\frac{1}{1000}$ of a microsecond) would be one nanometer. The modern microprocessor transistor gate, the width of the DNA helix, or the thickness of a cell membrane are in the range of 5 nm. In one ns the light can travel only 30 cm.
And finally, one nanosecond (1/1000 of a microsecond) would be one nanometer. The modern microprocessor transistor gate, the width of the DNA helix, or the thickness of a cell membrane are in the range of 5 nm. In one ns the light can travel only 30 cm.
Tracy can achieve single-digit nanosecond measurement resolution due to usage of hardware timing mechanisms on the x86 and ARM architectures[^4]. Other profilers may rely on the timers provided by the operating system, which do have significantly reduced resolution (about 300 ns -- 1 μs). This is enough to hide the subtle impact of cache access optimization, etc.
@@ -115,7 +113,7 @@ It is wrong to think so. Optimizing a function to execute in 430 ns, instead of
[^6]: This is a real optimization case. The values are median function run times and do not reflect the real execution time, which explains the discrepancy in the total reported time.
You also need to understand how timer precision is reflected in measurement errors. Take a look at figure [1](#timer). There you can see three discrete timer tick events, which increase the value reported by the timer by 300 ns. You can also see four readings of time ranges, marked $A_1$, $A_2$; $B_1$, $B_2$; $C_1$, $C_2$ and $D_1$, $D_2$.
You also need to understand how timer precision is reflected in measurement errors. Take a look at figure [1](#timer). There you can see three discrete timer tick events, which increase the value reported by the timer by 300 ns. You can also see four readings of time ranges, marked A₁, A₂; B₁, B₂; C₁, C₂ and D₁, D₂.
<figure id="timer" data-latex-placement="h">
@@ -124,11 +122,11 @@ You also need to understand how timer precision is reflected in measurement erro
Now let's take a look at the timer readings.
- The $A$ and $D$ ranges both take a very short amount of time (10 ns), but the $A$ range is reported as 300 ns, and the $D$ range is reported as 0 ns.
- The A and D ranges both take a very short amount of time (10 ns), but the A range is reported as 300 ns, and the D range is reported as 0 ns.
- The $B$ range takes a considerable amount of time (590 ns), but according to the timer readings, it took the same time (300 ns) as the short lived $A$ range.
- The B range takes a considerable amount of time (590 ns), but according to the timer readings, it took the same time (300 ns) as the short lived A range.
- The $C$ range (610 ns) is only 20 ns longer than the $B$ range, but it is reported as 900 ns, a 600 ns difference!
- The C range (610 ns) is only 20 ns longer than the B range, but it is reported as 900 ns, a 600 ns difference!
Here, you can see why using a high-precision timer is essential. While there is no escape from the measurement errors, a profiler can reduce their impact by increasing the timer accuracy.
@@ -190,20 +188,18 @@ You may wonder why you should use Tracy when so many other profilers are availab
## Performance impact {#perfimpact}
Let's profile an example application to check how much slowdown is introduced by using Tracy. For this purpose we have used etcpak[^10]. The input data was a $16384 \times 16384$ pixels test image, and the $4 \times 4$ pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image.
Let's profile an example application to check how much slowdown is introduced by using Tracy. For this purpose we have used etcpak[^10]. The input data was a 16384 × 16384 pixels test image, and the 4 × 4 pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image.
[^10]: <https://github.com/wolfpld/etcpak>
The results are presented in table [1](#PerformanceImpact). Dividing the average of run time differences (37.7 ms) by the count of captured zones per single image (16777216) shows us that the impact of profiling is only 2.25 ns per zone (this includes two events: start and end of a zone).
::: {#PerformanceImpact}
**Mode** **Zones (total)** **Zones (single image)** **Clean run** **Profiling run** **Difference**
---------- ------------------- -------------------------- --------------- ------------------- ----------------
ETC1 201326592 16777216 110.9 ms 148.2 ms +37.3 ms
ETC2 201326592 16777216 212.4 ms 250.5 ms +38.1 ms
| **Mode** | **Zones (total)** | **Zones (single image)** | **Clean run** | **Profiling run** | **Difference** |
|:--:|:--:|:--:|:--:|:--:|:--:|
| ETC1 | 201326592 | 16777216 | 110.9 ms | 148.2 ms | +37.3 ms |
| ETC2 | 201326592 | 16777216 | 212.4 ms | 250.5 ms | +38.1 ms |
: Zone capture time cost.
:::
_Zone capture time cost._
### Assembly analysis
@@ -401,7 +397,7 @@ Here's a sample command to set up a build directory with profiling enabled. The
### Short-lived applications
In case you want to profile a short-lived program (for example, a compression utility that finishes its work in one second), set the `TRACY_NO_EXIT` environment variable to $1$. With this option enabled, Tracy will not exit until an incoming connection is made, even if the application has already finished executing. If your platform doesn't support an easy setup of environment variables, you may also add the `TRACY_NO_EXIT` define to your build configuration, which has the same effect.
In case you want to profile a short-lived program (for example, a compression utility that finishes its work in one second), set the `TRACY_NO_EXIT` environment variable to 1. With this option enabled, Tracy will not exit until an incoming connection is made, even if the application has already finished executing. If your platform doesn't support an easy setup of environment variables, you may also add the `TRACY_NO_EXIT` define to your build configuration, which has the same effect.
### On-demand profiling {#ondemand}
@@ -426,11 +422,11 @@ The program name that is sent out in the broadcast messages can be customized by
### Client network interface
By default, the Tracy client will listen on all network interfaces. If you want to restrict it to only listening on the localhost interface, define the `TRACY_ONLY_LOCALHOST` macro at compile-time, or set the `TRACY_ONLY_LOCALHOST` environment variable to $1$ at runtime.
By default, the Tracy client will listen on all network interfaces. If you want to restrict it to only listening on the localhost interface, define the `TRACY_ONLY_LOCALHOST` macro at compile-time, or set the `TRACY_ONLY_LOCALHOST` environment variable to 1 at runtime.
If you need to use a specific Tracy client address, such as QNX requires, define the `TRACY_CLIENT_ADDRESS` macro at compile-time as the desired string address.
By default, the Tracy client will listen on IPv6 interfaces, falling back to IPv4 only if IPv6 is unavailable. If you want to restrict it to only listening on IPv4 interfaces, define the `TRACY_ONLY_IPV4` macro at compile-time, or set the `TRACY_ONLY_IPV4` environment variable to $1$ at runtime.
By default, the Tracy client will listen on IPv6 interfaces, falling back to IPv4 only if IPv6 is unavailable. If you want to restrict it to only listening on IPv4 interfaces, define the `TRACY_ONLY_IPV4` macro at compile-time, or set the `TRACY_ONLY_IPV4` environment variable to 1 at runtime.
### Setup for multi-DLL projects
@@ -522,15 +518,13 @@ The best way to run Tracy is on bare metal. Avoid profiling applications in virt
Additionally, you can rebuild your application with the `TRACY_DISALLOW_HW_TIMER` define, which will disable usage of the hardware timer, even if it *appears* to be available. See table [2](#timeroptions) for details.
::: {#timeroptions}
**Scenario** **HW timer** **Fallback timer**
---------------------------------------------------- -------------- -----------------------
Neither defined Used Not compiled in
Only `TRACY_TIMER_FALLBACK` Used Compiled in as backup
`TRACY_DISALLOW_HW_TIMER` + `TRACY_TIMER_FALLBACK` Disabled Used
| **Scenario** | **HW timer** | **Fallback timer** |
|:--:|:--:|:--:|
| Neither defined | Used | Not compiled in |
| Only `TRACY_TIMER_FALLBACK` | Used | Compiled in as backup |
| `TRACY_DISALLOW_HW_TIMER` + `TRACY_TIMER_FALLBACK` | Disabled | Used |
: Timer options interaction
:::
_Timer options interaction_
#### Docker on Linux
@@ -558,13 +552,13 @@ Inside that header, enable any subset of the hooks you need by defining the corr
The available hooks are:
- `TRACY_HAS_CUSTOM_THREAD_ID` $\rightarrow$ `tracy::PlatformGetThreadId()`. Required.
- `TRACY_HAS_CUSTOM_THREAD_ID` `tracy::PlatformGetThreadId()`. Required.
- `TRACY_HAS_CUSTOM_USER_INFO` $\rightarrow$ `tracy::PlatformGetHostname()`, `tracy::PlatformGetUserLogin()`, `tracy::PlatformGetUserFullName()`.
- `TRACY_HAS_CUSTOM_USER_INFO` `tracy::PlatformGetHostname()`, `tracy::PlatformGetUserLogin()`, `tracy::PlatformGetUserFullName()`.
- `TRACY_HAS_CUSTOM_SAFE_COPY` $\rightarrow$ `tracy::PlatformSafeMemcpy()`.
- `TRACY_HAS_CUSTOM_SAFE_COPY` `tracy::PlatformSafeMemcpy()`.
- `TRACY_HAS_CUSTOM_ALLOCATOR` $\rightarrow$ `tracy::PlatformMalloc()`, `tracy::PlatformFree()`, `tracy::PlatformRealloc()`, `tracy::PlatformAllocatorInit()`, `tracy::PlatformAllocatorThreadInit()`, `tracy::PlatformAllocatorFinalize()`, `tracy::PlatformAllocatorThreadFinalize()`.
- `TRACY_HAS_CUSTOM_ALLOCATOR` `tracy::PlatformMalloc()`, `tracy::PlatformFree()`, `tracy::PlatformRealloc()`, `tracy::PlatformAllocatorInit()`, `tracy::PlatformAllocatorThreadInit()`, `tracy::PlatformAllocatorFinalize()`, `tracy::PlatformAllocatorThreadFinalize()`.
Template files are provided in the repository ( `examples/CustomPlatform/CustomPlatform(.h|.cpp)` ). See `CustomPlatform.h` for the contract each `Platform*` function must satisfy (return values, threading guarantees, and footguns to avoid). Copy these files into your project, fill in the bodies for the hooks you enable, and point Tracy at the header.
@@ -604,11 +598,11 @@ When using Tracy Profiler, keep in mind the following requirements:
- If there are recursive zones at any point in a zone stack, each unique zone source location should not appear more than 255 times.
- Profiling session cannot be longer than 1.6 days ($2^{47}$ ns). This also includes on-demand sessions.
- Profiling session cannot be longer than 1.6 days (2⁴⁷ ns). This also includes on-demand sessions.
- No more than 4 billion ($2^{32}$) memory free events may be recorded.
- No more than 4 billion (2³²) memory free events may be recorded.
- No more than 16 million ($2^{24}$) unique call stacks can be captured.
- No more than 16 million (2²⁴) unique call stacks can be captured.
[^18]: A source location is a place in the code, which is identified by source file name and line number, for example, when you markup a zone.
@@ -900,31 +894,29 @@ This is an automatic process, and it doesn't require user interaction. If you ar
Some features of the profiler are only available on selected platforms. Please refer to table [3](#featuretable) for details.
::: {#featuretable}
**Feature** **Windows** **Linux** **Android** **OSX** **iOS** **BSD** **QNX**
-------------------------- ------------- ----------- ------------- --------- --------- --------- ---------
Profiling program init       
CPU zones       
Locks       
Plots       
Messages       
Memory       
GPU zones (OpenGL)      
GPU zones (Vulkan)      
GPU zones (Metal)    ^*b*^ ^*b*^  
Call stacks       
Symbol resolution       
Crash handling       
CPU usage probing       
Context switches       
Wait stacks       
CPU topology information       
Call stack sampling       
Hardware sampling ^*a*^      
VSync capture       
| **Feature** | **Windows** | **Linux** | **Android** | **OSX** | **iOS** | **BSD** | **QNX** |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| Profiling program init |  |  |  |  |  |  |  |
| CPU zones |  |  |  |  |  |  |  |
| Locks |  |  |  |  |  |  |  |
| Plots |  |  |  |  |  |  |  |
| Messages |  |  |  |  |  |  |  |
| Memory |  |  |  |  |  |  |  |
| GPU zones (OpenGL) |  |  |  |  |  | |  |
| GPU zones (Vulkan) |  |  |  |  |  | |  |
| GPU zones (Metal) |  |  |  | ^*b*^ | ^*b*^ |  |  |
| Call stacks |  |  |  |  |  |  |  |
| Symbol resolution |  |  |  |  |  |  |  |
| Crash handling |  |  |  |  |  |  |  |
| CPU usage probing |  |  |  |  |  |  |  |
| Context switches |  |  |  |  |  |  |  |
| Wait stacks |  |  |  |  |  |  |  |
| CPU topology information |  |  |  |  |  |  |  |
| Call stack sampling |  |  |  |  |  |  |  |
| Hardware sampling | ^*a*^ |  |  |  |  |  |  |
| VSync capture |  |  |  |  |  |  |  |
: Feature support matrix
:::
_Feature support matrix_
 -- Not possible to support due to platform limitations.\
^*a*^Possible through WSL2. ^*b*^Only tested on Apple Silicon M1 series
@@ -1045,7 +1037,7 @@ Images are sent using the `FrameImage(image, width, height, offset, flip)` macro
[^36]: For example, OpenGL flips images, but Vulkan does not.
Handling image data requires a lot of memory and bandwidth[^37]. To achieve sane memory usage, you should scale down taken screenshots to a suitable size, e.g., $320\times180$.
Handling image data requires a lot of memory and bandwidth[^37]. To achieve sane memory usage, you should scale down taken screenshots to a suitable size, e.g., 320×180.
[^37]: One uncompressed 1080p image takes 8 MB.
@@ -1055,18 +1047,16 @@ To further reduce image data size, frame images are internally compressed using
[^39]: One pixel is stored in a nibble (4 bits) instead of 32 bits.
::: {#EtcSimd}
**Implementation** **Required define** **Time**
-------------------- --------------------- ----------
x86 Reference --- 198.2 μs
x86 SSE4.1^a^ `__SSE4_1__` 25.4 μs
x86 AVX2 `__AVX2__` 17.4 μs
ARM Reference --- 1.04 ms
ARM32 NEON^b^ `__ARM_NEON` 529 μs
ARM64 NEON `__ARM_NEON` 438 μs
| **Implementation** | **Required define** | **Time** |
|:------------------:|:-------------------:|:--------:|
| x86 Reference | --- | 198.2 μs |
| x86 SSE4.1^a^ | `__SSE4_1__` | 25.4 μs |
| x86 AVX2 | `__AVX2__` | 17.4 μs |
| ARM Reference | --- | 1.04 ms |
| ARM32 NEON^b^ | `__ARM_NEON` | 529 μs |
| ARM64 NEON | `__ARM_NEON` | 438 μs |
: Client compression time of $320\times180$ image. x86: Ryzen 9 3900X (MSVC); ARM: ODROID-C2 (gcc).
:::
_Client compression time of 320×180 image. x86: Ryzen 9 3900X (MSVC); ARM: ODROID-C2 (gcc)._
^a)^ VEX encoding; ^b)^ ARM32 NEON code compiled for ARM64
@@ -1077,7 +1067,7 @@ To further reduce image data size, frame images are internally compressed using
>
> - This second thread will be periodically woken up, even if there are no frame images to compress[^41]. If you are not using the frame image capture functionality and you don't wish this thread to be running, you can define the `TRACY_NO_FRAME_IMAGE` macro.
>
> - Due to implementation details of the network buffer, a single frame image cannot be greater than 256 KB after compression. Note that a $960\times540$ image fits in this limit.
> - Due to implementation details of the network buffer, a single frame image cannot be greater than 256 KB after compression. Note that a 960×540 image fits in this limit.
[^40]: Small part of compression task is offloaded to the server.
@@ -1118,7 +1108,7 @@ Everything needs to be correctly initialized (the cleanup is left for the reader
glBufferData(GL_PIXEL_PACK_BUFFER, 320*180*4, nullptr, GL_STREAM_READ);
}
We will now set up a screen capture, which will downscale the screen contents to $320\times180$ pixels and copy the resulting image to a buffer accessible by the CPU when the operation is done. This should be placed right before *swap buffers* or *present* call.
We will now set up a screen capture, which will downscale the screen contents to 320×180 pixels and copy the resulting image to a buffer accessible by the CPU when the operation is done. This should be placed right before *swap buffers* or *present* call.
assert(m_fiQueue.empty() || m_fiQueue.front() != m_fiIdx); // check for buffer overrun
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, m_fiFramebuffer[m_fiIdx]);
@@ -1179,25 +1169,19 @@ With all this done, you can perform the screen capture as follows:
While this approach is much more complex than the previously discussed one, the resulting image quality increase makes it worthwhile.
<figure id="highqualityss" data-latex-placement="h">
<div class="minipage">
<img src="images/screenshot-lo.png" style="width:90.0%" />
</div>
<div class="minipage">
<img src="images/screenshot-hi.png" style="width:90.0%" />
</div>
<figcaption>High-quality screen shot</figcaption>
</figure>
You can see the performance results you may expect in a simple application in table [5](#asynccapture). The naïve capture performs synchronous retrieval of full-screen image and resizes it using *stb_image_resize*. The proper and high-quality captures do things as described in this chapter.
::: {#asynccapture}
**Resolution** **Naïve capture** **Proper capture** **High quality**
------------------ ------------------- -------------------- ------------------
$1280\times720$ 80 FPS 4200 FPS 2800 FPS
$2560\times1440$ 23 FPS 3300 FPS 1600 FPS
| **Resolution** | **Naïve capture** | **Proper capture** | **High quality** |
|:--------------:|:-----------------:|:------------------:|:----------------:|
| 1280×720 | 80 FPS | 4200 FPS | 2800 FPS |
| 2560×1440 | 23 FPS | 3300 FPS | 1600 FPS |
: Frame capture efficiency
:::
_Frame capture efficiency_
## Marking zones {#markingzones}
@@ -1241,15 +1225,15 @@ Zone objects can't be moved or copied.
>
> {
> ZoneNamed(Zone1, true);
> @\circled{a}@
> (a)
> {
> ZoneNamed(Zone2, true);
> @\circled{b}@
> (b)
> }
> @\circled{c}@
> (c)
> }
>
> It is valid to set the `Zone1` text or name *only* in places or . After `Zone2` is created at you can no longer perform operations on `Zone1`, until `Zone2` is destroyed.
> It is valid to set the `Zone1` text or name *only* in places (a) or (c). After `Zone2` is created at (b) you can no longer perform operations on `Zone1`, until `Zone2` is destroyed.
### Filtering zones {#filteringzones}
@@ -1366,7 +1350,7 @@ To configure how plot values are presented by the profiler, you may use the `Tra
- `tracy::PlotFormatType::Memory` -- treats the values as memory sizes. Will display kilobytes, megabytes, etc.
- `tracy::PlotFormatType::Percentage` -- values will be displayed as percentage (with value $100$ being equal to $100\%$).
- `tracy::PlotFormatType::Percentage` -- values will be displayed as percentage (with value 100 being equal to 100%).
The `step` parameter determines whether the plot will be displayed as a staircase or will smoothly change between plot points (see figure [5](#plotconfig)). The `fill` parameter can be used to disable filling the area below the plot with a solid color.
@@ -1585,7 +1569,7 @@ Similar to Vulkan and OpenGL, you also need to periodically collect the OpenCL e
### CUDA
CUDA support is enabled by including the `public/tracy/TracyCUDA.hpp` header file. To use it, the NVIDIA CUPTI library is required. This library comes with the NVIDIA CUDA Toolkit and is located at `CUDA_INSTALLATION_PATH/extras/CUPTI`.
CUDA support is enabled by including the `public/tracy/TracyCUDA.hpp` header file. To use it, make sure you have the NVIDIA CUDA Toolkit v12.4 (or later) installed, and that the NVIDIA CUPTI library is available in the toolkit (located at `CUDA_INSTALLATION_PATH/extras/CUPTI`).
Tracing CUDA requires the creation of a Tracy CUDA context using the macro `TracyCUDAContext()`, which returns an instance of a `tracy::CUDACtx` object. TracyCUDA allows only a single `tracy::CUDACtx` object at any given time. Subsequent calls to `TracyCUDAContext()` will return the same reference-counted object. There is no need for clients to instantiate multiple `tracy::CUDACtx` objects, as a single context is capable of instrumenting all CUDA contexts and streams.
@@ -1678,28 +1662,26 @@ Capture of true calls stacks can be performed by using macros with the `S` postf
Be aware that call stack collection is a relatively slow operation. Table [6](#CallstackTimes) and figure [6](#CallstackPlot) show how long it took to perform a single capture of varying depth on multiple CPU architectures.
::: {#CallstackTimes}
**Depth** **x86** **x64** **ARM** **ARM64**
----------- --------- --------- ---------- -----------
1 34 ns 98 ns 6.62 μs 6.63 μs
2 35 ns 150 ns 8.08 μs 8.25 μs
3 36 ns 168 ns 9.75 μs 10 μs
4 39 ns 190 ns 10.92 μs 11.58 μs
5 42 ns 206 ns 12.5 μs 13.33 μs
10 52 ns 306 ns 19.62 μs 21.71 μs
15 63 ns 415 ns 26.83 μs 30.13 μs
20 77 ns 531 ns 34.25 μs 38.71 μs
25 89 ns 630 ns 41.17 μs 47.17 μs
30 109 ns 735 ns 48.33 μs 55.63 μs
35 123 ns 843 ns 55.87 μs 64.09 μs
40 142 ns 950 ns 63.12 μs 72.59 μs
45 154 ns 1.05 μs 70.54 μs 81 μs
50 167 ns 1.16 μs 78 μs 89.5 μs
55 179 ns 1.26 μs 85.04 μs 98 μs
60 193 ns 1.37 μs 92.75 μs 106.59 μs
| **Depth** | **x86** | **x64** | **ARM** | **ARM64** |
|:---------:|:-------:|:-------:|:--------:|:---------:|
| 1 | 34 ns | 98 ns | 6.62 μs | 6.63 μs |
| 2 | 35 ns | 150 ns | 8.08 μs | 8.25 μs |
| 3 | 36 ns | 168 ns | 9.75 μs | 10 μs |
| 4 | 39 ns | 190 ns | 10.92 μs | 11.58 μs |
| 5 | 42 ns | 206 ns | 12.5 μs | 13.33 μs |
| 10 | 52 ns | 306 ns | 19.62 μs | 21.71 μs |
| 15 | 63 ns | 415 ns | 26.83 μs | 30.13 μs |
| 20 | 77 ns | 531 ns | 34.25 μs | 38.71 μs |
| 25 | 89 ns | 630 ns | 41.17 μs | 47.17 μs |
| 30 | 109 ns | 735 ns | 48.33 μs | 55.63 μs |
| 35 | 123 ns | 843 ns | 55.87 μs | 64.09 μs |
| 40 | 142 ns | 950 ns | 63.12 μs | 72.59 μs |
| 45 | 154 ns | 1.05 μs | 70.54 μs | 81 μs |
| 50 | 167 ns | 1.16 μs | 78 μs | 89.5 μs |
| 55 | 179 ns | 1.26 μs | 85.04 μs | 98 μs |
| 60 | 193 ns | 1.37 μs | 92.75 μs | 106.59 μs |
: Median times of zone capture with call stack. x86, x64: i7 8700K; ARM: Banana Pi; ARM64: ODROID-C2. Selected architectures are plotted on figure [6](#CallstackPlot)
:::
_Median times of zone capture with call stack. x86, x64: i7 8700K; ARM: Banana Pi; ARM64: ODROID-C2. Selected architectures are plotted on figure [6](#CallstackPlot)_
<figure id="CallstackPlot" data-latex-placement="h">
@@ -1845,34 +1827,30 @@ Be aware that for Lua call stack retrieval to work, you need to be on a platform
Cost of performing Lua call stack capture is presented in table [7](#CallstackTimesLua) and figure [7](#CallstackPlotLua). Lua call stacks include native call stacks, which have a capture cost of their own (table [6](#CallstackTimes)), and the `depth` parameter is applied for both captures. The presented data were captured with full Lua stack depth, but only 13 frames were available on the native call stack. Hence, to explain the non-linearity of the graph, you need to consider what was truly measured:
$$\text{Cost}_{\text{total}}(\text{depth}) =
\begin{cases}
\text{Cost}_{\text{Lua}}(\text{depth}) + \text{Cost}_{\text{native}}(\text{depth}) & \text{when depth} \leq 13 \\
\text{Cost}_{\text{Lua}}(\text{depth}) + \text{Cost}_{\text{native}}(13) & \text{when depth} > 13
\end{cases}$$
Cost_total(depth) =
Cost_Lua(depth) + Cost_native(depth) when depth ≤ 13
Cost_Lua(depth) + Cost_native(13) when depth > 13
::: {#CallstackTimesLua}
**Depth** **Time**
----------- ----------
1 707 ns
2 699 ns
3 624 ns
4 727 ns
5 836 ns
10 1.77 μs
15 2.44 μs
20 2.51 μs
25 2.98 μs
30 3.6 μs
35 4.33 μs
40 5.17 μs
45 6.01 μs
50 6.99 μs
55 8.11 μs
60 9.17 μs
| **Depth** | **Time** |
|:---------:|:--------:|
| 1 | 707 ns |
| 2 | 699 ns |
| 3 | 624 ns |
| 4 | 727 ns |
| 5 | 836 ns |
| 10 | 1.77 μs |
| 15 | 2.44 μs |
| 20 | 2.51 μs |
| 25 | 2.98 μs |
| 30 | 3.6 μs |
| 35 | 4.33 μs |
| 40 | 5.17 μs |
| 45 | 6.01 μs |
| 50 | 6.99 μs |
| 55 | 8.11 μs |
| 60 | 9.17 μs |
: Median times of Lua zone capture with call stack (x64, 13 native frames)
:::
_Median times of Lua zone capture with call stack (x64, 13 native frames)_
<figure id="CallstackPlotLua" data-latex-placement="h">
@@ -2677,11 +2655,11 @@ While the call stack sampling is a generic software-implemented functionality of
Tracy can use these counters to present you the following three statistics, which may help guide you in discovering why your code is not as fast as possible:
1. *Instructions Per Cycle (IPC)* -- shows how many instructions were executing concurrently within a single core cycle. Higher values are better. The maximum achievable value depends on the design of the CPU, including things such as the number of execution units and their individual capabilities. Calculated as $\frac{\text{\#instructions retired}}{\text{\#cycles}}$. You can disable it with the `TRACY_NO_SAMPLE_RETIREMENT` macro.
1. *Instructions Per Cycle (IPC)* -- shows how many instructions were executing concurrently within a single core cycle. Higher values are better. The maximum achievable value depends on the design of the CPU, including things such as the number of execution units and their individual capabilities. Calculated as #instructions retired / #cycles. You can disable it with the `TRACY_NO_SAMPLE_RETIREMENT` macro.
2. *Branch miss rate* -- shows how frequently the CPU branch predictor makes a wrong choice. Lower values are better. Calculated as $\frac{\text{\#branch misses}}{\text{\#branch instructions}}$. You can disable it with the `TRACY_NO_SAMPLE_BRANCH` macro.
2. *Branch miss rate* -- shows how frequently the CPU branch predictor makes a wrong choice. Lower values are better. Calculated as #branch misses / #branch instructions. You can disable it with the `TRACY_NO_SAMPLE_BRANCH` macro.
3. *Cache miss rate* -- shows how frequently the CPU has to retrieve data from memory. Lower values are better. The specifics of which cache level is taken into account here vary from one implementation to another. Calculated as $\frac{\text{\#cache misses}}{\text{\#cache references}}$. You can disable it with the `TRACY_NO_SAMPLE_CACHE` macro.
3. *Cache miss rate* -- shows how frequently the CPU has to retrieve data from memory. Lower values are better. The specifics of which cache level is taken into account here vary from one implementation to another. Calculated as #cache misses / #cache references. You can disable it with the `TRACY_NO_SAMPLE_CACHE` macro.
Each performance counter has to be collected by a dedicated Performance Monitoring Unit (PMU). However, the availability of PMUs is very limited, so you may not be able to capture all the statistics mentioned above at the same time (as each requires capture of two different counters). In such a case, you will need to manually select what needs to be sampled with the macros specified above.
@@ -2918,7 +2896,7 @@ You can also adjust some settings that affect global profiler behavior in this w
- *Zone name shortening* -- Sets the default zone name shortening behavior used in new traces. See section [5.4](#options) for more information.
- *Scroll multipliers* -- Allows you to fine-tune the sensitivity of the horizontal and vertical scroll in the timeline. The default values ($1.0$) are an attempt at the best possible settings, but differences in hardware manufacturers, platform implementations, and user expectations may require adjustments.
- *Scroll multipliers* -- Allows you to fine-tune the sensitivity of the horizontal and vertical scroll in the timeline. The default values (1.0) are an attempt at the best possible settings, but differences in hardware manufacturers, platform implementations, and user expectations may require adjustments.
- *Memory limit* -- When enabled, profiler will stop recording data when memory usage exceeds the specified percentage of the total system memory. This mechanism does not measure the current system memory usage or limits. The upper value is not capped, as you may use swap. See section [4.6](#memoryusage) for more information.
@@ -3004,52 +2982,46 @@ The `update` utility supports optional higher levels of data compression, which
- `-z level` -- selects Zstandard algorithm, with a specified compression level.
::: {#compressiontimes}
**Mode** **Size** **Ratio** **Save time** **Load time**
------------- ----------- ----------- --------------- ---------------
lz4 162.48 MB 17.19% 1.91 s 470 ms
lz4 hc 77.33 MB 8.18% 39.24 s 401 ms
lz4 extreme 72.67 MB 7.68% 4:30 406 ms
zstd 1 63.17 MB 6.68% 2.27 s 868 ms
zstd 2 63.29 MB 6.69% 2.31 s 884 ms
zstd 3 62.94 MB 6.65% 2.43 s 867 ms
zstd 4 62.81 MB 6.64% 2.44 s 855 ms
zstd 5 61.04 MB 6.45% 3.98 s 855 ms
zstd 6 60.27 MB 6.37% 4.19 s 827 ms
zstd 7 61.53 MB 6.5% 6.6 s 761 ms
zstd 8 60.44 MB 6.39% 7.84 s 746 ms
zstd 9 59.58 MB 6.3% 9.6 s 724 ms
zstd 10 59.36 MB 6.28% 10.29 s 706 ms
zstd 11 59.2 MB 6.26% 11.23 s 717 ms
zstd 12 58.51 MB 6.19% 15.43 s 695 ms
zstd 13 56.16 MB 5.94% 35.55 s 642 ms
zstd 14 55.76 MB 5.89% 37.74 s 627 ms
zstd 15 54.65 MB 5.78% 1:01 600 ms
zstd 16 50.94 MB 5.38% 1:34 537 ms
zstd 17 50.18 MB 5.30% 1:44 542 ms
zstd 18 49.91 MB 5.28% 2:17 554 ms
zstd 19 46.99 MB 4.97% 7:09 605 ms
zstd 20 46.81 MB 4.95% 7:08 608 ms
zstd 21 45.77 MB 4.84% 13:01 614 ms
zstd 22 45.52 MB 4.81% 15:11 621 ms
| **Mode** | **Size** | **Ratio** | **Save time** | **Load time** |
|:-----------:|:---------:|:---------:|:-------------:|:-------------:|
| lz4 | 162.48 MB | 17.19% | 1.91 s | 470 ms |
| lz4 hc | 77.33 MB | 8.18% | 39.24 s | 401 ms |
| lz4 extreme | 72.67 MB | 7.68% | 4:30 | 406 ms |
| zstd 1 | 63.17 MB | 6.68% | 2.27 s | 868 ms |
| zstd 2 | 63.29 MB | 6.69% | 2.31 s | 884 ms |
| zstd 3 | 62.94 MB | 6.65% | 2.43 s | 867 ms |
| zstd 4 | 62.81 MB | 6.64% | 2.44 s | 855 ms |
| zstd 5 | 61.04 MB | 6.45% | 3.98 s | 855 ms |
| zstd 6 | 60.27 MB | 6.37% | 4.19 s | 827 ms |
| zstd 7 | 61.53 MB | 6.5% | 6.6 s | 761 ms |
| zstd 8 | 60.44 MB | 6.39% | 7.84 s | 746 ms |
| zstd 9 | 59.58 MB | 6.3% | 9.6 s | 724 ms |
| zstd 10 | 59.36 MB | 6.28% | 10.29 s | 706 ms |
| zstd 11 | 59.2 MB | 6.26% | 11.23 s | 717 ms |
| zstd 12 | 58.51 MB | 6.19% | 15.43 s | 695 ms |
| zstd 13 | 56.16 MB | 5.94% | 35.55 s | 642 ms |
| zstd 14 | 55.76 MB | 5.89% | 37.74 s | 627 ms |
| zstd 15 | 54.65 MB | 5.78% | 1:01 | 600 ms |
| zstd 16 | 50.94 MB | 5.38% | 1:34 | 537 ms |
| zstd 17 | 50.18 MB | 5.30% | 1:44 | 542 ms |
| zstd 18 | 49.91 MB | 5.28% | 2:17 | 554 ms |
| zstd 19 | 46.99 MB | 4.97% | 7:09 | 605 ms |
| zstd 20 | 46.81 MB | 4.95% | 7:08 | 608 ms |
| zstd 21 | 45.77 MB | 4.84% | 13:01 | 614 ms |
| zstd 22 | 45.52 MB | 4.81% | 15:11 | 621 ms |
: Compression results for an example trace.\
Tests performed on Ryzen 9 3900X.
:::
_Compression results for an example trace.\
Tests performed on Ryzen 9 3900X._
<figure id="savetime">
<div class="minipage">
<figure id="savesize" data-latex-placement="H">
<figcaption>Plot of trace sizes for different compression modes (see table <a href="#compressiontimes">8</a>).</figcaption>
</figure>
</div>
<div class="minipage">
<figure id="savetime" data-latex-placement="H">
<figcaption>Logarithmic plot of trace compression times for different compression modes (see table <a href="#compressiontimes">8</a>).</figcaption>
</figure>
</div>
<figcaption>Logarithmic plot of trace compression times for different compression modes (see table <a href="#compressiontimes">8</a>).</figcaption>
</figure>
@@ -3068,37 +3040,33 @@ Saving and loading trace data can be parallelized using the `-j streams` paramet
Going overboard with the number of streams is not recommended, especially with the fast compression modes where it will be difficult to keep each stream busy. Also, complex compression codecs (e.g. zstd at level 22) have significantly worse compression rates when the work is divided. This is a fairly nuanced topic, and you are encouraged to do your own measurements, but for a rough guideline on the behavior, you can refer to tables [9](#streamsize) and [10](#streamspeedup).
::: {#streamsize}
**4** **8** **16** **32**
--------- --------- --------- --------- ---------
lz4 100.30% 100.30% 100.61% 102.73%
lz4 hc 100.80% 101.20% 101.61% 102.41%
lz4 ext 100.40% 101.21% 101.62% 102.02%
zstd 1 100.90% 101.36% 101.81% 102.26%
zstd 3 100.51% 101.02% 101.53% 102.04%
zstd 6 100.55% 101.10% 101.65% 102.75%
zstd 9 101.27% 103.16% 105.06% 108.23%
zstd 18 103.08% 106.15% 109.23% 115.38%
zstd 22 107.08% 113.27% 122.12% 130.97%
| | **4** | **8** | **16** | **32** |
|:-------:|:-------:|:-------:|:-------:|:-------:|
| lz4 | 100.30% | 100.30% | 100.61% | 102.73% |
| lz4 hc | 100.80% | 101.20% | 101.61% | 102.41% |
| lz4 ext | 100.40% | 101.21% | 101.62% | 102.02% |
| zstd 1 | 100.90% | 101.36% | 101.81% | 102.26% |
| zstd 3 | 100.51% | 101.02% | 101.53% | 102.04% |
| zstd 6 | 100.55% | 101.10% | 101.65% | 102.75% |
| zstd 9 | 101.27% | 103.16% | 105.06% | 108.23% |
| zstd 18 | 103.08% | 106.15% | 109.23% | 115.38% |
| zstd 22 | 107.08% | 113.27% | 122.12% | 130.97% |
: The increase in file size for different compression modes, as compared to a single stream.
:::
_The increase in file size for different compression modes, as compared to a single stream._
::: {#streamspeedup}
**4** **8** **16** **32**
--------- ------- ------- -------- --------
lz4 2.04 2.52 2.11 3.24
lz4 hc 3.56 6.73 9.49 15.26
lz4 ext 3.38 6.53 9.57 17.03
zstd 1 2.24 3.68 3.40 3.37
zstd 3 3.23 4.13 4.07 4.50
zstd 6 3.52 6.00 6.53 6.95
zstd 9 3.10 4.26 5.12 5.40
zstd 18 3.22 5.41 8.49 14.51
zstd 22 3.99 7.47 11.10 18.20
| | **4** | **8** | **16** | **32** |
|:-------:|:-----:|:-----:|:------:|:------:|
| lz4 | 2.04 | 2.52 | 2.11 | 3.24 |
| lz4 hc | 3.56 | 6.73 | 9.49 | 15.26 |
| lz4 ext | 3.38 | 6.53 | 9.57 | 17.03 |
| zstd 1 | 2.24 | 3.68 | 3.40 | 3.37 |
| zstd 3 | 3.23 | 4.13 | 4.07 | 4.50 |
| zstd 6 | 3.52 | 6.00 | 6.53 | 6.95 |
| zstd 9 | 3.10 | 4.26 | 5.12 | 5.40 |
| zstd 18 | 3.22 | 5.41 | 8.49 | 14.51 |
| zstd 22 | 3.99 | 7.47 | 11.10 | 18.20 |
: The speedup (*x* times faster) in saving time for different modes of compression, as compared to a single stream.
:::
_The speedup (*x* times faster) in saving time for different modes of compression, as compared to a single stream._
### Frame images dictionary {#fidict}
@@ -3152,7 +3120,7 @@ The workflow is identical, whether you are viewing a previously saved trace or i
In most cases Tracy will display an approximation of time value, depending on how big it is. For example, a short time range will be displayed as 123 ns, and some longer ones will be shortened to 123.45 μs, 123.45 ms, 12.34 s, 1:23.4, 12:34:56, or even 1d12:34:56 to indicate more than a day has passed.
While such a presentation makes time values easy to read, it is not always appropriate. For example, you may have multiple events happen at a time approximated to 1:23.4, giving you the precision of only $\sfrac{1}{10}$ of a second. And there's certainly a lot that can happen in 100 ms.
While such a presentation makes time values easy to read, it is not always appropriate. For example, you may have multiple events happen at a time approximated to 1:23.4, giving you the precision of only 1/10 of a second. And there's certainly a lot that can happen in 100 ms.
An alternative time display is used in appropriate places to solve this problem. It combines a day--hour--minute--second value with full nanosecond resolution, resulting in values such as 1:23 456,789,012 ns.
@@ -3623,7 +3591,7 @@ Annotations are displayed on the timeline, as presented in figure [21](#annotat
<figcaption>Annotation region.</figcaption>
</figure>
Please note that while the annotations persist between profiling sessions, they are not saved in the trace but in the user data files, as described in section [9.2](#tracespecific).
Please note that while the annotations persist between profiling sessions, they are not saved in the trace but in the trace sidecar file, as described in section [9.2](#tracespecific).
-----
@@ -4159,6 +4127,8 @@ The information about the selected memory allocation is displayed in this window
This window contains information about the current trace: captured program name, time of the capture, profiler version which performed the capture, and a custom trace description, which you can fill in.
If the * Public sidecar* option is selected, the file containing trace-specific user settings (see section [9.2](#tracespecific)) will be saved on disk next to the trace file.
Open the *Trace statistics* section to see information about the trace, such as achieved timer resolution, number of captured zones, lock events, plot data points, memory allocations, etc.
There's also a section containing the selected frame set timing statistics and histogram[^90]. As a convenience, you can switch the active frame set here and limit the displayed frame statistics to the frame range visible on the screen.
@@ -4180,12 +4150,17 @@ The *Source location substitutions* section allows adapting the source file path
>
> - `\\` `/`
By default, all source file modification times need to be older than the cature time of the trace. This can be disabled using the *Enforce source file modification time older than trace capture time* check box, i.e. when the source files are under source control and the file modification time is not relevant.
By default, all source file modification times need to be older than the capture time of the trace. This can be disabled using the *Enforce source file modification time older than trace capture time* check box, i.e. when the source files are under source control and the file modification time is not relevant.
In this window, you can view the information about the machine on which the profiled application was running. This includes the operating system, used compiler, CPU name, total available RAM, etc. In addition, if application information was provided (see section [3.7.1](#appinfo)), it will also be displayed here.
If an application should crash during profiling (section [2.5](#crashhandling)), the profiler will display the crash information in this window. It provides you information about the thread that has crashed, the crash reason, and the crash call stack (section [5.15](#callstackwindow)).
-----
 - User Gear icon
## Zone information window {#zoneinfo}
The zone information window displays detailed information about a single zone. There can be only one zone information window open at any time. While the window is open, the profiler will highlight the zone on the timeline view with a green outline. The following data is presented:
@@ -4307,8 +4282,8 @@ You need to take special care when reading call stacks. Contrary to their name,
Let's say you are looking at the call stack of some function called within `Application::Run`. This is the result you might get:
0. @\ldots@
1. @\ldots@
0.
1.
2. Application::Run
3. std::unique_ptr<Application>::reset
4. main
@@ -4506,9 +4481,9 @@ As described in chapter [3.17.6](#hardwaresampling), on some platforms, Tracy c
- *Cycles* -- an option very similar to the *sample count*, but the data is collected directly by the CPU hardware counters. This may make the results more reliable.
- *Branch impact* -- indicates places where many branch instructions are issued, and at the same time, incorrectly predicted. Calculated as $\sqrt{\text{\#branch instructions}*\text{\#branch misses}}$. This is more useful than the raw branch miss rate, as it considers the number of events taking place.
- *Branch impact* -- indicates places where many branch instructions are issued, and at the same time, incorrectly predicted. Calculated as √(#branch instructions\*#branch misses). This is more useful than the raw branch miss rate, as it considers the number of events taking place.
- *Cache impact* -- similar to *branch impact*, but it shows cache miss data instead. These values are calculated as $\sqrt{\text{\#cache references}*\text{\#cache misses}}$ and will highlight places with lots of cache accesses that also miss.
- *Cache impact* -- similar to *branch impact*, but it shows cache miss data instead. These values are calculated as √(#cache references\*#cache misses) and will highlight places with lots of cache accesses that also miss.
- The rest of the available selections just show raw values gathered from the hardware counters. These are: *Retirements*, *Branches taken*, *Branch miss*, *Cache access* and *Cache miss*.
@@ -4563,7 +4538,7 @@ This window presents information and statistics about a lock. The lock events co
You may view a live replay of the profiled application screen captures (see section [3.3.3](#frameimages)) using this window. Playback is controlled by the * Play* and * Pause* buttons and the *Frame image* slider can be used to scrub to the desired timestamp. Alternatively you may use the ** and ** buttons to change single frame back or forward.
If the *Sync timeline* option is selected, the profiler will focus the timeline view on the frame corresponding to the currently displayed screenshot. The *Zoom 2$\times$* option enlarges the image for easier viewing.
If the *Sync timeline* option is selected, the profiler will focus the timeline view on the frame corresponding to the currently displayed screenshot. The *Zoom 2×* option enlarges the image for easier viewing.
The following parameters also accompany each displayed frame image: *timestamp*, showing at which time the image was captured, *frame*, displaying the numerical value of the corresponding frame, and *ratio*, telling how well the in-memory loss-less compression was able to reduce the image data size.
@@ -4741,7 +4716,7 @@ So, which model should you run and what hardware you need to be able to do so? L
As a rule of thumb, the specified number of parameters is how much total memory is needed to run the model with 8-bit quantization. Another way to get a rough estimate is to look at the model file size. Strive to fit the active parameters completely into VRAM, leaving space for computation scratch space and the context.
To make this practical, the 35B-A3B model at 2 bit quantization requires $35 * 2 / 8 = 8.75$ GB, which fits into the 4 + 16 GB budget in the example above. The 3B active parameters similarly calculate to 0.75 GB, with additional 1 GB or so needed for computation buffer and another 1 GB for the 50K context, which is less than the 4 GB of VRAM available, making everything fit.
To make this practical, the 35B-A3B model at 2 bit quantization requires 35 \* 2 / 8 = 8.75 GB, which fits into the 4 + 16 GB budget in the example above. The 3B active parameters similarly calculate to 0.75 GB, with additional 1 GB or so needed for computation buffer and another 1 GB for the 50K context, which is less than the 4 GB of VRAM available, making everything fit.
## Usage {#llmusage}
@@ -4966,9 +4941,9 @@ Various files at the root configuration directory store common profiler state su
Trace files saved on disk are immutable and can't be changed. Still, it may be desirable to store additional per-trace information to be used by the profiler, for example, a custom description of the trace or the timeline view position used in the previous profiling session.
This external data is stored in the `user/[letter]/[program]/[week]/[epoch]` directory, relative to the configuration's root directory. The `program` part is the name of the profiled application (for example `program.exe`). The `letter` part is the first letter of the profiled application's name. The `week` part is a count of weeks since the Unix epoch, and the `epoch` part is a count of seconds since the Unix epoch. This rather unusual convention prevents the creation of directories with hundreds of entries.
This external sidecar data is stored by default in the `sidecar/[program]/[date].json` file, relative to the configuration root directory. The `program` part is the name of the profiled application (for example `program.exe`). The `date` part is in year-month-day-*dash*-hour-minutes-seconds format.
The profiler never prunes user settings.
The sidecar file can be made public (see section [5.13](#traceinfo)), in which case it will be placed next to the trace file with the `.json` extension, allowing both files to be easily moved or copied.
## Cache files

View File

@@ -14,7 +14,7 @@
\usepackage{verbatim}
\usepackage[hyphens]{url}
\usepackage{hyperref} % For hyperlinks in the PDF
\usepackage{fontawesome6}
\usepackage{fontawesome7}
\usepackage[os=win]{menukeys}
\usepackage{xfrac}
\usepackage[euler]{textgreek}
@@ -2041,6 +2041,20 @@ filesystem setup as the one used to run the tracy instrumented application).
You can do path substitution with the \texttt{-p} option to perform any number of path
substitions in order to use symbols located elsewhere.
By default symbol resolution is performed with the platform's native facility: the DbgHelp
library on Windows, and the \texttt{addr2line} tool found in \texttt{PATH} elsewhere. You can
override this with the \texttt{-a} option, passing the path to a custom
\texttt{addr2line}-compatible tool (for instance an \texttt{addr2line} from a cross-compilation
toolchain, or \texttt{llvm-addr2line}). The \texttt{-a} option works on all platforms, including
Windows, and takes precedence over the platform default.
Extra arguments can be passed verbatim to the resolution tool with the \texttt{-A} option. Tracy
records callstack frame offsets relative to the image base, but \texttt{addr2line}-compatible
tools expect a full virtual address for images that have a non-zero preferred image base (such as
PE on Windows or Mach-O on Apple). For these, pass \texttt{-A "--relative-address"} so that
\texttt{llvm-addr2line} or \texttt{llvm-symbolizer} adds the image base back. ELF images need no
such adjustment.
\begin{bclogo}[
noborder=true,
couleur=black!5,
@@ -4042,7 +4056,7 @@ Annotations are displayed on the timeline, as presented in figure~\ref{annotatio
\label{annotation}
\end{figure}
Please note that while the annotations persist between profiling sessions, they are not saved in the trace but in the user data files, as described in section~\ref{tracespecific}.
Please note that while the annotations persist between profiling sessions, they are not saved in the trace but in the trace sidecar file, as described in section~\ref{tracespecific}.
\subsection{Options menu}
\label{options}
@@ -4539,6 +4553,8 @@ The information about the selected memory allocation is displayed in this window
This window contains information about the current trace: captured program name, time of the capture, profiler version which performed the capture, and a custom trace description, which you can fill in.
If the \emph{\faUserGear{}~Public sidecar} option is selected, the file containing trace-specific user settings (see section~\ref{tracespecific}) will be saved on disk next to the trace file.
Open the \emph{Trace statistics} section to see information about the trace, such as achieved timer resolution, number of captured zones, lock events, plot data points, memory allocations, etc.
There's also a section containing the selected frame set timing statistics and histogram\footnote{See section~\ref{findzone} for a description of the histogram. Note that there are subtle differences in the available functionality.}. As a convenience, you can switch the active frame set here and limit the displayed frame statistics to the frame range visible on the screen.
@@ -4560,7 +4576,7 @@ Let's say we have an Unix-based operating system with program sources in \texttt
\end{itemize}
\end{bclogo}
By default, all source file modification times need to be older than the cature time of the trace. This can be disabled using the \emph{Enforce source file modification time older than trace capture time} check box, i.e. when the source files are under source control and the file modification time is not relevant.
By default, all source file modification times need to be older than the capture time of the trace. This can be disabled using the \emph{Enforce source file modification time older than trace capture time} check box, i.e. when the source files are under source control and the file modification time is not relevant.
In this window, you can view the information about the machine on which the profiled application was running. This includes the operating system, used compiler, CPU name, total available RAM, etc. In addition, if application information was provided (see section~\ref{appinfo}), it will also be displayed here.
@@ -5218,9 +5234,9 @@ Various files at the root configuration directory store common profiler state su
Trace files saved on disk are immutable and can't be changed. Still, it may be desirable to store additional per-trace information to be used by the profiler, for example, a custom description of the trace or the timeline view position used in the previous profiling session.
This external data is stored in the \texttt{user/[letter]/[program]/[week]/[epoch]} directory, relative to the configuration's root directory. The \texttt{program} part is the name of the profiled application (for example \texttt{program.exe}). The \texttt{letter} part is the first letter of the profiled application's name. The \texttt{week} part is a count of weeks since the Unix epoch, and the \texttt{epoch} part is a count of seconds since the Unix epoch. This rather unusual convention prevents the creation of directories with hundreds of entries.
This external sidecar data is stored by default in the \texttt{sidecar/[program]/[date].json} file, relative to the configuration root directory. The \texttt{program} part is the name of the profiled application (for example \texttt{program.exe}). The \texttt{date} part is in year-month-day-\emph{dash}-hour-minutes-seconds format.
The profiler never prunes user settings.
The sidecar file can be made public (see section~\ref{traceinfo}), in which case it will be placed next to the trace file with the \texttt{.json} extension, allowing both files to be easily moved or copied.
\subsection{Cache files}

View File

@@ -149,15 +149,30 @@ Embed(PROFILER_FILES SystemPrompt src/llm/system.prompt.md)
Embed(PROFILER_FILES SkillCallstack src/llm/skill.callstack.md)
Embed(PROFILER_FILES SkillOptimization src/llm/skill.optimization.md)
Embed(PROFILER_FILES ToolsJson src/llm/tools.json)
Embed(PROFILER_FILES FontFixed src/font/FiraCode-Retina.ttf)
Embed(PROFILER_FILES FontIcons src/font/Font\ Awesome\ 6\ Free-Solid-900.otf)
Embed(PROFILER_FILES FontIcons src/font/Font\ Awesome\ 7\ Free-Solid-900.otf)
Embed(PROFILER_FILES FontNormal src/font/Roboto-Regular.ttf)
Embed(PROFILER_FILES FontBold src/font/Roboto-Bold.ttf)
Embed(PROFILER_FILES FontItalic src/font/Roboto-Italic.ttf)
Embed(PROFILER_FILES FontBoldItalic src/font/Roboto-BoldItalic.ttf)
Embed(PROFILER_FILES FontEmoji src/font/NotoEmoji-Regular.ttf)
Embed(PROFILER_FILES Manual ../manual/tracy.md)
Embed(PROFILER_FILES Text100Million src/achievements/100Million.md)
Embed(PROFILER_FILES TextConnectToClient src/achievements/ConnectToClient.md)
Embed(PROFILER_FILES TextFindZone src/achievements/FindZone.md)
Embed(PROFILER_FILES TextFrameImages src/achievements/FrameImages.md)
Embed(PROFILER_FILES TextGlobalSettings src/achievements/GlobalSettings.md)
Embed(PROFILER_FILES TextInstrumentationIntro src/achievements/InstrumentationIntro.md)
Embed(PROFILER_FILES TextInstrumentationStatistics src/achievements/InstrumentationStatistics.md)
Embed(PROFILER_FILES TextInstrumentFrames src/achievements/InstrumentFrames.md)
Embed(PROFILER_FILES TextIntro src/achievements/Intro.md)
Embed(PROFILER_FILES TextLoadTrace src/achievements/LoadTrace.md)
Embed(PROFILER_FILES TextSamplingIntro src/achievements/SamplingIntro.md)
Embed(PROFILER_FILES TextSaveTrace src/achievements/SaveTrace.md)
set(INCLUDES "${CMAKE_CURRENT_BINARY_DIR}")
set(LIBS "")

View File

@@ -4,7 +4,6 @@
#include <misc/freetype/imgui_freetype.h>
#include "Fonts.hpp"
#include "profiler/IconsFontAwesome6.h"
#include "profiler/TracyEmbed.hpp"
#include "data/FontFixed.hpp"

View File

@@ -0,0 +1,12 @@
# It's over 100 million!
Tracy can handle a lot of data. How about 100 million zones in a single trace? Add a lot of zones to your program and see how it handles it!
Capturing a long-running profile trace is easy. Need to profile an hour of your program execution? You can do it.
Note that it doesn't make much sense to instrument every little function you might have. The cost of the instrumentation itself will be higher than the cost of the function in such a case.
> [!TIP]
> Keep in mind that the more zones you have, the more memory and CPU time the profiler will use. Be careful not to run out of memory.
>
> To capture 100 million zones, you will need approximately 4 GB of RAM.

View File

@@ -0,0 +1,10 @@
# First profiling session
Let's start our adventure by instrumenting your application and connecting it to the profiler. Here's a quick refresher:
1. Integrate Tracy Profiler into your application. This can be done using CMake, Meson, or simply by adding the source files to your project.
2. Make sure that `TracyClient.cpp` (or the Tracy library) is included in your build.
3. Define `TRACY_ENABLE` in your build configuration, for the whole application. Do not do it in a single source file because it won't work.
4. Start your application, and * Connect* to it with the profiler.
Please refer to the [user manual](https://github.com/wolfpld/tracy/releases) for more details.

View File

@@ -0,0 +1,11 @@
# Find some zones
You can search for zones in the trace by opening the search window with the * Find zone* button on the top bar. It will ask you for the zone name, which in most cases will be the function name in the code.
The search may find more than one zone with the same name. A list of all the zones found is displayed, and you can select any of them.
Alternatively, you can open the Statistics window and click an entry there. This will open the Find zone window as if you had searched for that zone.
When a zone is selected, a number of statistics are displayed to help you understand the performance of your application. In addition, a histogram of the zone execution times is displayed to make it easier for you to determine the performance of the profiled code. Be sure to select a zone with a large number of calls to make the histogram look interesting!
Note that you can draw a range on the histogram to limit the number of entries displayed in the zone list below. This list allows you to examine each zone individually. There are also a number of zone groupings that you can select. Each group can be selected and the time associated with the selected group will be highlighted on the histogram.

View File

@@ -0,0 +1,11 @@
# A picture is worth a thousand words
Tracy allows you to add context to each frame, by attaching a screenshot. You can do this with the `FrameImage` macro.
You will have to do the screen capture and resizing yourself, which can be a bit complicated. The manual provides a sample code that shows how to do this in a performant way.
The frame images are displayed in the context of a frame, for example, when you hover over the frame in the timeline or in the frame graph at the top of the screen.
You can even view a recording of what your application was doing by clicking the * Tools* icon and then selecting the * Playback* option. Try it out!
The `FrameImage` macro is a great way to see what happened in your application at a particular time. Maybe you have a performance problem that only occurs when a certain object is on the screen?

View File

@@ -0,0 +1,5 @@
# Global settings
Tracy has a variety of settings that can be adjusted to suit your needs. These settings can be found by clicking on the * Wrench* icon on the welcome screen. This will open the about window, where you can expand the * Global settings* menu.
The settings are saved between sessions, so you only need to set them once.

View File

@@ -0,0 +1,22 @@
# Instrumenting frames
In addition to instrumenting functions, you can also instrument frames. This allows you to see how much time is spent in each frame of your application.
To instrument frames, you need to add the `FrameMark` macro at the beginning of each frame. This can be done in the main loop of your application, or in a separate function that is called at the beginning of each frame.
```c++
#include "Tracy.hpp"
void Render()
{
// Render the frame
SwapBuffers();
FrameMark;
}
```
When you profile your application, you will see a new frame appear on the timeline each time the `FrameMark` macro is called. This allows you to see how much time is spent in each frame and how many frames are rendered per second.
The `FrameMark` macro is a great way to see at a glance how your application is performing over time. Maybe there are some performance problems that only appear after a few minutes of running the application? A frame graph is drawn at the top of the profiler window where you can see the timing of all frames.
Note that some applications do not have a frame-based structure, and in such cases, frame instrumentation may not be useful. That's ok.

View File

@@ -0,0 +1,22 @@
# Instrumentating your application
Instrumentation is a powerful feature that allows you to see the exact runtime of each call to the selected set of functions. The downside is that it takes a bit of manual work to get it set up.
To get started, open a source file and include the `Tracy.hpp` header. This will give you access to a variety of macros provided by Tracy. Next, add the `ZoneScoped` macro to the beginning of one of your functions, like this:
```c++
#include "Tracy.hpp"
void SomeFunction()
{
ZoneScoped;
// Your code here
}
```
Now, when you profile your application, you will see a new zone appear on the timeline for each call to the function. This allows you to see how much time is spent in each call and how many times the function is called.
> [!NOTE]
> The `ZoneScoped` macro is just one of the many macros provided by Tracy. See the documentation for more information.
The above description applies to C++ code, but things are done similarly in other programming languages. Refer to the documentation for your language for more information.

View File

@@ -0,0 +1,5 @@
# Show me the stats!
Once you have instrumented your application, you can view the statistics for each zone in the timeline. This allows you to see how much time is spent in each zone and how many times it is called.
To view the statistics, click on the * Statistics* button on the top bar. This will open a new window with a list of all zones in the trace.

View File

@@ -0,0 +1,12 @@
# Click here to discover achievements!
Clicking on the * Achievements* button opens the Achievements List. Here you can see the tasks to be completed along with a short description of what needs to be done.
As you complete each Achievement, new Achievements will appear, so be sure to keep checking the list for new ones!
To make the new things easier to spot, the Achievements List will show a marker next to them. The achievements * Achievements* button will glow yellow when there are new things to see.
- New tasks: orange 
- Completed tasks: green 
Good luck!

View File

@@ -0,0 +1,3 @@
# Load a trace
You can open a previously saved trace file (or one received from a friend) with the * Open saved trace* button on the welcome screen.

View File

@@ -0,0 +1,10 @@
# Sampling program execution
Sampling program execution is a great way to find out where the hot spots are in your program. It can be used to find out which functions take the most time, or which lines of code are executed the most often.
While instrumentation requires changes to your code, sampling does not. However, because of the way it works, the results are coarser and it's not possible to know when functions are called or when they return.
Sampling is automatic on Linux. On Windows, you must run the profiled application as an administrator for it to work.
> [!WARNING]
> Depending on your system configuration, some additional steps may be required. Please refer to the user manual for more information.

View File

@@ -0,0 +1,12 @@
# Save a trace
Now that you have traced your application (or are in the process of doing so), you can save it to disk for future reference. You can do this by clicking on the * Connection* icon in the top left corner of the screen and then clicking on the * Save trace* button.
Keeping old traces on hand can be beneficial, as you can compare the performance of your optimizations with what you had before.
You can also share the trace with your friends or co-workers by sending them the trace file.
> [!WARNING]
> **Warning**
>
> Trace files can contain sensitive information about your application, such as program code, or even the contents of source files. Be careful when sharing them with others.

Binary file not shown.

View File

@@ -39,7 +39,7 @@
#include "profiler/TracyTexture.hpp"
#include "profiler/TracyView.hpp"
#include "profiler/TracyWeb.hpp"
#include "profiler/IconsFontAwesome6.h"
#include "profiler/IconsFontAwesome7.h"
#include "../../server/tracy_pdqsort.h"
#include "../../server/tracy_robin_hood.h"
#include "../../server/TracyFileHeader.hpp"
@@ -1466,9 +1466,17 @@ Would you like to enable achievements?
{
ImGui::Columns( 2 );
ImGui::SetColumnWidth( 0, 300 * dpiScale );
ImGui::BeginChild( "##achievementtoc", ImVec2( 0, 0 ), ImGuiChildFlags_AlwaysUseWindowPadding );
DrawAchievements( c->items );
ImGui::EndChild();
ImGui::NextColumn();
if( s_achievementItem ) s_achievementItem->description();
ImGui::BeginChild( "##achievementtext", ImVec2( 0, 0 ), ImGuiChildFlags_AlwaysUseWindowPadding );
if( s_achievementItem )
{
tracy::Markdown md( nullptr, nullptr );
md.Print( s_achievementItem->text.c_str(), s_achievementItem->text.size() );
}
ImGui::EndChild();
ImGui::EndColumns();
ImGui::EndTabItem();
}

View File

@@ -1,14 +1,17 @@
// Generated by https://github.com/juliettef/IconFontCppHeaders script GenerateIconFontCppHeaders.py for languages C and C++
// from https://github.com/FortAwesome/Font-Awesome/raw/6.x/metadata/icons.yml
// for use with https://github.com/FortAwesome/Font-Awesome/blob/6.x/webfonts/fa-regular-400.ttf, https://github.com/FortAwesome/Font-Awesome/blob/6.x/webfonts/fa-solid-900.ttf
// Generated by https://github.com/juliettef/IconFontCppHeaders script GenerateIconFontCppHeaders.py
// for C and C++
// from codepoints https://github.com/FortAwesome/Font-Awesome/raw/7.x/metadata/icons.yml
// for use with font https://github.com/FortAwesome/Font-Awesome/blob/7.x/webfonts/fa-regular-400.woff2 (You may need to convert the .woff2 files to .ttf depending upon your loader.), https://github.com/FortAwesome/Font-Awesome/blob/7.x/webfonts/fa-solid-900.woff2 (You may need to convert the .woff2 files to .ttf depending upon your loader.)
#pragma once
#define FONT_ICON_FILE_NAME_FAR "fa-regular-400.ttf"
#define FONT_ICON_FILE_NAME_FAS "fa-solid-900.ttf"
#define FONT_ICON_FILE_NAME_FAR "fa-regular-400.woff2"
#define FONT_ICON_FILE_NAME_FAS "fa-solid-900.woff2"
#define ICON_MIN_FA 0xe005
#define ICON_MAX_16_FA 0xf8ff
#define ICON_MAX_FA 0xf8ff
#define ICON_FA_0 "0" // U+0030
#define ICON_FA_1 "1" // U+0031
#define ICON_FA_2 "2" // U+0032
@@ -22,6 +25,7 @@
#define ICON_FA_A "A" // U+0041
#define ICON_FA_ADDRESS_BOOK "\xef\x8a\xb9" // U+f2b9
#define ICON_FA_ADDRESS_CARD "\xef\x8a\xbb" // U+f2bb
#define ICON_FA_ALARM_CLOCK "\xef\x8d\x8e" // U+f34e
#define ICON_FA_ALIGN_CENTER "\xef\x80\xb7" // U+f037
#define ICON_FA_ALIGN_JUSTIFY "\xef\x80\xb9" // U+f039
#define ICON_FA_ALIGN_LEFT "\xef\x80\xb6" // U+f036
@@ -41,7 +45,9 @@
#define ICON_FA_ANGLES_UP "\xef\x84\x82" // U+f102
#define ICON_FA_ANKH "\xef\x99\x84" // U+f644
#define ICON_FA_APPLE_WHOLE "\xef\x97\x91" // U+f5d1
#define ICON_FA_AQUARIUS "\xee\xa1\x85" // U+e845
#define ICON_FA_ARCHWAY "\xef\x95\x97" // U+f557
#define ICON_FA_ARIES "\xee\xa1\x86" // U+e846
#define ICON_FA_ARROW_DOWN "\xef\x81\xa3" // U+f063
#define ICON_FA_ARROW_DOWN_1_9 "\xef\x85\xa2" // U+f162
#define ICON_FA_ARROW_DOWN_9_1 "\xef\xa2\x86" // U+f886
@@ -116,6 +122,7 @@
#define ICON_FA_BAN "\xef\x81\x9e" // U+f05e
#define ICON_FA_BAN_SMOKING "\xef\x95\x8d" // U+f54d
#define ICON_FA_BANDAGE "\xef\x91\xa2" // U+f462
#define ICON_FA_BANGLADESHI_TAKA_SIGN "\xee\x8b\xa6" // U+e2e6
#define ICON_FA_BARCODE "\xef\x80\xaa" // U+f02a
#define ICON_FA_BARS "\xef\x83\x89" // U+f0c9
#define ICON_FA_BARS_PROGRESS "\xef\xa0\xa8" // U+f828
@@ -214,6 +221,7 @@
#define ICON_FA_BURGER "\xef\xa0\x85" // U+f805
#define ICON_FA_BURST "\xee\x93\x9c" // U+e4dc
#define ICON_FA_BUS "\xef\x88\x87" // U+f207
#define ICON_FA_BUS_SIDE "\xee\xa0\x9d" // U+e81d
#define ICON_FA_BUS_SIMPLE "\xef\x95\x9e" // U+f55e
#define ICON_FA_BUSINESS_TIME "\xef\x99\x8a" // U+f64a
#define ICON_FA_C "C" // U+0043
@@ -232,8 +240,10 @@
#define ICON_FA_CAMERA_RETRO "\xef\x82\x83" // U+f083
#define ICON_FA_CAMERA_ROTATE "\xee\x83\x98" // U+e0d8
#define ICON_FA_CAMPGROUND "\xef\x9a\xbb" // U+f6bb
#define ICON_FA_CANCER "\xee\xa1\x87" // U+e847
#define ICON_FA_CANDY_CANE "\xef\x9e\x86" // U+f786
#define ICON_FA_CANNABIS "\xef\x95\x9f" // U+f55f
#define ICON_FA_CAPRICORN "\xee\xa1\x88" // U+e848
#define ICON_FA_CAPSULES "\xef\x91\xab" // U+f46b
#define ICON_FA_CAR "\xef\x86\xb9" // U+f1b9
#define ICON_FA_CAR_BATTERY "\xef\x97\x9f" // U+f5df
@@ -266,6 +276,7 @@
#define ICON_FA_CHART_AREA "\xef\x87\xbe" // U+f1fe
#define ICON_FA_CHART_BAR "\xef\x82\x80" // U+f080
#define ICON_FA_CHART_COLUMN "\xee\x83\xa3" // U+e0e3
#define ICON_FA_CHART_DIAGRAM "\xee\x9a\x95" // U+e695
#define ICON_FA_CHART_GANTT "\xee\x83\xa4" // U+e0e4
#define ICON_FA_CHART_LINE "\xef\x88\x81" // U+f201
#define ICON_FA_CHART_PIE "\xef\x88\x80" // U+f200
@@ -287,9 +298,9 @@
#define ICON_FA_CHEVRON_RIGHT "\xef\x81\x94" // U+f054
#define ICON_FA_CHEVRON_UP "\xef\x81\xb7" // U+f077
#define ICON_FA_CHILD "\xef\x86\xae" // U+f1ae
#define ICON_FA_CHILD_COMBATANT "\xee\x93\xa0" // U+e4e0
#define ICON_FA_CHILD_DRESS "\xee\x96\x9c" // U+e59c
#define ICON_FA_CHILD_REACHING "\xee\x96\x9d" // U+e59d
#define ICON_FA_CHILD_RIFLE "\xee\x93\xa0" // U+e4e0
#define ICON_FA_CHILDREN "\xee\x93\xa1" // U+e4e1
#define ICON_FA_CHURCH "\xef\x94\x9d" // U+f51d
#define ICON_FA_CIRCLE "\xef\x84\x91" // U+f111
@@ -334,6 +345,7 @@
#define ICON_FA_CLOCK_ROTATE_LEFT "\xef\x87\x9a" // U+f1da
#define ICON_FA_CLONE "\xef\x89\x8d" // U+f24d
#define ICON_FA_CLOSED_CAPTIONING "\xef\x88\x8a" // U+f20a
#define ICON_FA_CLOSED_CAPTIONING_SLASH "\xee\x84\xb5" // U+e135
#define ICON_FA_CLOUD "\xef\x83\x82" // U+f0c2
#define ICON_FA_CLOUD_ARROW_DOWN "\xef\x83\xad" // U+f0ed
#define ICON_FA_CLOUD_ARROW_UP "\xef\x83\xae" // U+f0ee
@@ -360,6 +372,7 @@
#define ICON_FA_COMMENT_DOLLAR "\xef\x99\x91" // U+f651
#define ICON_FA_COMMENT_DOTS "\xef\x92\xad" // U+f4ad
#define ICON_FA_COMMENT_MEDICAL "\xef\x9f\xb5" // U+f7f5
#define ICON_FA_COMMENT_NODES "\xee\x9a\x96" // U+e696
#define ICON_FA_COMMENT_SLASH "\xef\x92\xb3" // U+f4b3
#define ICON_FA_COMMENT_SMS "\xef\x9f\x8d" // U+f7cd
#define ICON_FA_COMMENTS "\xef\x82\x86" // U+f086
@@ -522,6 +535,8 @@
#define ICON_FA_FILE_CSV "\xef\x9b\x9d" // U+f6dd
#define ICON_FA_FILE_EXCEL "\xef\x87\x83" // U+f1c3
#define ICON_FA_FILE_EXPORT "\xef\x95\xae" // U+f56e
#define ICON_FA_FILE_FRAGMENT "\xee\x9a\x97" // U+e697
#define ICON_FA_FILE_HALF_DASHED "\xee\x9a\x98" // U+e698
#define ICON_FA_FILE_IMAGE "\xef\x87\x85" // U+f1c5
#define ICON_FA_FILE_IMPORT "\xef\x95\xaf" // U+f56f
#define ICON_FA_FILE_INVOICE "\xef\x95\xb0" // U+f570
@@ -585,6 +600,7 @@
#define ICON_FA_GEAR "\xef\x80\x93" // U+f013
#define ICON_FA_GEARS "\xef\x82\x85" // U+f085
#define ICON_FA_GEM "\xef\x8e\xa5" // U+f3a5
#define ICON_FA_GEMINI "\xee\xa1\x89" // U+e849
#define ICON_FA_GENDERLESS "\xef\x88\xad" // U+f22d
#define ICON_FA_GHOST "\xef\x9b\xa2" // U+f6e2
#define ICON_FA_GIFT "\xef\x81\xab" // U+f06b
@@ -642,8 +658,6 @@
#define ICON_FA_HANDS_PRAYING "\xef\x9a\x84" // U+f684
#define ICON_FA_HANDSHAKE "\xef\x8a\xb5" // U+f2b5
#define ICON_FA_HANDSHAKE_ANGLE "\xef\x93\x84" // U+f4c4
#define ICON_FA_HANDSHAKE_SIMPLE "\xef\x93\x86" // U+f4c6
#define ICON_FA_HANDSHAKE_SIMPLE_SLASH "\xee\x81\x9f" // U+e05f
#define ICON_FA_HANDSHAKE_SLASH "\xee\x81\xa0" // U+e060
#define ICON_FA_HANUKIAH "\xef\x9b\xa6" // U+f6e6
#define ICON_FA_HARD_DRIVE "\xef\x82\xa0" // U+f0a0
@@ -657,7 +671,6 @@
#define ICON_FA_HEAD_SIDE_VIRUS "\xee\x81\xa4" // U+e064
#define ICON_FA_HEADING "\xef\x87\x9c" // U+f1dc
#define ICON_FA_HEADPHONES "\xef\x80\xa5" // U+f025
#define ICON_FA_HEADPHONES_SIMPLE "\xef\x96\x8f" // U+f58f
#define ICON_FA_HEADSET "\xef\x96\x90" // U+f590
#define ICON_FA_HEART "\xef\x80\x84" // U+f004
#define ICON_FA_HEART_CIRCLE_BOLT "\xee\x93\xbc" // U+e4fc
@@ -672,6 +685,9 @@
#define ICON_FA_HELICOPTER_SYMBOL "\xee\x94\x82" // U+e502
#define ICON_FA_HELMET_SAFETY "\xef\xa0\x87" // U+f807
#define ICON_FA_HELMET_UN "\xee\x94\x83" // U+e503
#define ICON_FA_HEXAGON "\xef\x8c\x92" // U+f312
#define ICON_FA_HEXAGON_NODES "\xee\x9a\x99" // U+e699
#define ICON_FA_HEXAGON_NODES_BOLT "\xee\x9a\x9a" // U+e69a
#define ICON_FA_HIGHLIGHTER "\xef\x96\x91" // U+f591
#define ICON_FA_HILL_AVALANCHE "\xee\x94\x87" // U+e507
#define ICON_FA_HILL_ROCKSLIDE "\xee\x94\x88" // U+e508
@@ -767,8 +783,10 @@
#define ICON_FA_LEFT_LONG "\xef\x8c\x8a" // U+f30a
#define ICON_FA_LEFT_RIGHT "\xef\x8c\xb7" // U+f337
#define ICON_FA_LEMON "\xef\x82\x94" // U+f094
#define ICON_FA_LEO "\xee\xa1\x8a" // U+e84a
#define ICON_FA_LESS_THAN "<" // U+003c
#define ICON_FA_LESS_THAN_EQUAL "\xef\x94\xb7" // U+f537
#define ICON_FA_LIBRA "\xee\xa1\x8b" // U+e84b
#define ICON_FA_LIFE_RING "\xef\x87\x8d" // U+f1cd
#define ICON_FA_LIGHTBULB "\xef\x83\xab" // U+f0eb
#define ICON_FA_LINES_LEANING "\xee\x94\x9e" // U+e51e
@@ -842,6 +860,7 @@
#define ICON_FA_MOBILE_RETRO "\xee\x94\xa7" // U+e527
#define ICON_FA_MOBILE_SCREEN "\xef\x8f\x8f" // U+f3cf
#define ICON_FA_MOBILE_SCREEN_BUTTON "\xef\x8f\x8d" // U+f3cd
#define ICON_FA_MOBILE_VIBRATE "\xee\xa0\x96" // U+e816
#define ICON_FA_MONEY_BILL "\xef\x83\x96" // U+f0d6
#define ICON_FA_MONEY_BILL_1 "\xef\x8f\x91" // U+f3d1
#define ICON_FA_MONEY_BILL_1_WAVE "\xef\x94\xbb" // U+f53b
@@ -871,6 +890,7 @@
#define ICON_FA_NETWORK_WIRED "\xef\x9b\xbf" // U+f6ff
#define ICON_FA_NEUTER "\xef\x88\xac" // U+f22c
#define ICON_FA_NEWSPAPER "\xef\x87\xaa" // U+f1ea
#define ICON_FA_NON_BINARY "\xee\xa0\x87" // U+e807
#define ICON_FA_NOT_EQUAL "\xef\x94\xbe" // U+f53e
#define ICON_FA_NOTDEF "\xee\x87\xbe" // U+e1fe
#define ICON_FA_NOTE_STICKY "\xef\x89\x89" // U+f249
@@ -878,6 +898,7 @@
#define ICON_FA_O "O" // U+004f
#define ICON_FA_OBJECT_GROUP "\xef\x89\x87" // U+f247
#define ICON_FA_OBJECT_UNGROUP "\xef\x89\x88" // U+f248
#define ICON_FA_OCTAGON "\xef\x8c\x86" // U+f306
#define ICON_FA_OIL_CAN "\xef\x98\x93" // U+f613
#define ICON_FA_OIL_WELL "\xee\x94\xb2" // U+e532
#define ICON_FA_OM "\xef\x99\xb9" // U+f679
@@ -906,6 +927,7 @@
#define ICON_FA_PEN_RULER "\xef\x96\xae" // U+f5ae
#define ICON_FA_PEN_TO_SQUARE "\xef\x81\x84" // U+f044
#define ICON_FA_PENCIL "\xef\x8c\x83" // U+f303
#define ICON_FA_PENTAGON "\xee\x9e\x90" // U+e790
#define ICON_FA_PEOPLE_ARROWS "\xee\x81\xa8" // U+e068
#define ICON_FA_PEOPLE_CARRY_BOX "\xef\x93\x8e" // U+f4ce
#define ICON_FA_PEOPLE_GROUP "\xee\x94\xb3" // U+e533
@@ -968,8 +990,10 @@
#define ICON_FA_PHONE_SLASH "\xef\x8f\x9d" // U+f3dd
#define ICON_FA_PHONE_VOLUME "\xef\x8a\xa0" // U+f2a0
#define ICON_FA_PHOTO_FILM "\xef\xa1\xbc" // U+f87c
#define ICON_FA_PICTURE_IN_PICTURE "\xee\xa0\x8b" // U+e80b
#define ICON_FA_PIGGY_BANK "\xef\x93\x93" // U+f4d3
#define ICON_FA_PILLS "\xef\x92\x84" // U+f484
#define ICON_FA_PISCES "\xee\xa1\x8c" // U+e84c
#define ICON_FA_PIZZA_SLICE "\xef\xa0\x98" // U+f818
#define ICON_FA_PLACE_OF_WORSHIP "\xef\x99\xbf" // U+f67f
#define ICON_FA_PLANE "\xef\x81\xb2" // U+f072
@@ -1060,6 +1084,7 @@
#define ICON_FA_S "S" // U+0053
#define ICON_FA_SACK_DOLLAR "\xef\xa0\x9d" // U+f81d
#define ICON_FA_SACK_XMARK "\xee\x95\xaa" // U+e56a
#define ICON_FA_SAGITTARIUS "\xee\xa1\x8d" // U+e84d
#define ICON_FA_SAILBOAT "\xee\x91\x85" // U+e445
#define ICON_FA_SATELLITE "\xef\x9e\xbf" // U+f7bf
#define ICON_FA_SATELLITE_DISH "\xef\x9f\x80" // U+f7c0
@@ -1073,6 +1098,7 @@
#define ICON_FA_SCHOOL_FLAG "\xee\x95\xae" // U+e56e
#define ICON_FA_SCHOOL_LOCK "\xee\x95\xaf" // U+e56f
#define ICON_FA_SCISSORS "\xef\x83\x84" // U+f0c4
#define ICON_FA_SCORPIO "\xee\xa1\x8e" // U+e84e
#define ICON_FA_SCREWDRIVER "\xef\x95\x8a" // U+f54a
#define ICON_FA_SCREWDRIVER_WRENCH "\xef\x9f\x99" // U+f7d9
#define ICON_FA_SCROLL "\xef\x9c\x8e" // U+f70e
@@ -1080,6 +1106,7 @@
#define ICON_FA_SD_CARD "\xef\x9f\x82" // U+f7c2
#define ICON_FA_SECTION "\xee\x91\x87" // U+e447
#define ICON_FA_SEEDLING "\xef\x93\x98" // U+f4d8
#define ICON_FA_SEPTAGON "\xee\xa0\xa0" // U+e820
#define ICON_FA_SERVER "\xef\x88\xb3" // U+f233
#define ICON_FA_SHAPES "\xef\x98\x9f" // U+f61f
#define ICON_FA_SHARE "\xef\x81\xa4" // U+f064
@@ -1108,6 +1135,8 @@
#define ICON_FA_SIGNATURE "\xef\x96\xb7" // U+f5b7
#define ICON_FA_SIGNS_POST "\xef\x89\xb7" // U+f277
#define ICON_FA_SIM_CARD "\xef\x9f\x84" // U+f7c4
#define ICON_FA_SINGLE_QUOTE_LEFT "\xee\xa0\x9b" // U+e81b
#define ICON_FA_SINGLE_QUOTE_RIGHT "\xee\xa0\x9c" // U+e81c
#define ICON_FA_SINK "\xee\x81\xad" // U+e06d
#define ICON_FA_SITEMAP "\xef\x83\xa8" // U+f0e8
#define ICON_FA_SKULL "\xef\x95\x8c" // U+f54c
@@ -1131,12 +1160,14 @@
#define ICON_FA_SPELL_CHECK "\xef\xa2\x91" // U+f891
#define ICON_FA_SPIDER "\xef\x9c\x97" // U+f717
#define ICON_FA_SPINNER "\xef\x84\x90" // U+f110
#define ICON_FA_SPIRAL "\xee\xa0\x8a" // U+e80a
#define ICON_FA_SPLOTCH "\xef\x96\xbc" // U+f5bc
#define ICON_FA_SPOON "\xef\x8b\xa5" // U+f2e5
#define ICON_FA_SPRAY_CAN "\xef\x96\xbd" // U+f5bd
#define ICON_FA_SPRAY_CAN_SPARKLES "\xef\x97\x90" // U+f5d0
#define ICON_FA_SQUARE "\xef\x83\x88" // U+f0c8
#define ICON_FA_SQUARE_ARROW_UP_RIGHT "\xef\x85\x8c" // U+f14c
#define ICON_FA_SQUARE_BINARY "\xee\x9a\x9b" // U+e69b
#define ICON_FA_SQUARE_CARET_DOWN "\xef\x85\x90" // U+f150
#define ICON_FA_SQUARE_CARET_LEFT "\xef\x86\x91" // U+f191
#define ICON_FA_SQUARE_CARET_RIGHT "\xef\x85\x92" // U+f152
@@ -1194,7 +1225,10 @@
#define ICON_FA_T "T" // U+0054
#define ICON_FA_TABLE "\xef\x83\x8e" // U+f0ce
#define ICON_FA_TABLE_CELLS "\xef\x80\x8a" // U+f00a
#define ICON_FA_TABLE_CELLS_COLUMN_LOCK "\xee\x99\xb8" // U+e678
#define ICON_FA_TABLE_CELLS_LARGE "\xef\x80\x89" // U+f009
#define ICON_FA_TABLE_CELLS_ROW_LOCK "\xee\x99\xba" // U+e67a
#define ICON_FA_TABLE_CELLS_ROW_UNLOCK "\xee\x9a\x91" // U+e691
#define ICON_FA_TABLE_COLUMNS "\xef\x83\x9b" // U+f0db
#define ICON_FA_TABLE_LIST "\xef\x80\x8b" // U+f00b
#define ICON_FA_TABLE_TENNIS_PADDLE_BALL "\xef\x91\x9d" // U+f45d
@@ -1208,6 +1242,7 @@
#define ICON_FA_TAPE "\xef\x93\x9b" // U+f4db
#define ICON_FA_TARP "\xee\x95\xbb" // U+e57b
#define ICON_FA_TARP_DROPLET "\xee\x95\xbc" // U+e57c
#define ICON_FA_TAURUS "\xee\xa1\x8f" // U+e84f
#define ICON_FA_TAXI "\xef\x86\xba" // U+f1ba
#define ICON_FA_TEETH "\xef\x98\xae" // U+f62e
#define ICON_FA_TEETH_OPEN "\xef\x98\xaf" // U+f62f
@@ -1235,6 +1270,7 @@
#define ICON_FA_THUMBS_DOWN "\xef\x85\xa5" // U+f165
#define ICON_FA_THUMBS_UP "\xef\x85\xa4" // U+f164
#define ICON_FA_THUMBTACK "\xef\x82\x8d" // U+f08d
#define ICON_FA_THUMBTACK_SLASH "\xee\x9a\x8f" // U+e68f
#define ICON_FA_TICKET "\xef\x85\x85" // U+f145
#define ICON_FA_TICKET_SIMPLE "\xef\x8f\xbf" // U+f3ff
#define ICON_FA_TIMELINE "\xee\x8a\x9c" // U+e29c
@@ -1310,8 +1346,6 @@
#define ICON_FA_USER_GRADUATE "\xef\x94\x81" // U+f501
#define ICON_FA_USER_GROUP "\xef\x94\x80" // U+f500
#define ICON_FA_USER_INJURED "\xef\x9c\xa8" // U+f728
#define ICON_FA_USER_LARGE "\xef\x90\x86" // U+f406
#define ICON_FA_USER_LARGE_SLASH "\xef\x93\xba" // U+f4fa
#define ICON_FA_USER_LOCK "\xef\x94\x82" // U+f502
#define ICON_FA_USER_MINUS "\xef\x94\x83" // U+f503
#define ICON_FA_USER_NINJA "\xef\x94\x84" // U+f504
@@ -1336,7 +1370,6 @@
#define ICON_FA_V "V" // U+0056
#define ICON_FA_VAN_SHUTTLE "\xef\x96\xb6" // U+f5b6
#define ICON_FA_VAULT "\xee\x8b\x85" // U+e2c5
#define ICON_FA_VECTOR_SQUARE "\xef\x97\x8b" // U+f5cb
#define ICON_FA_VENUS "\xef\x88\xa1" // U+f221
#define ICON_FA_VENUS_DOUBLE "\xef\x88\xa6" // U+f226
#define ICON_FA_VENUS_MARS "\xef\x88\xa8" // U+f228
@@ -1349,6 +1382,7 @@
#define ICON_FA_VIDEO "\xef\x80\xbd" // U+f03d
#define ICON_FA_VIDEO_SLASH "\xef\x93\xa2" // U+f4e2
#define ICON_FA_VIHARA "\xef\x9a\xa7" // U+f6a7
#define ICON_FA_VIRGO "\xee\xa1\x90" // U+e850
#define ICON_FA_VIRUS "\xee\x81\xb4" // U+e074
#define ICON_FA_VIRUS_COVID "\xee\x92\xa8" // U+e4a8
#define ICON_FA_VIRUS_COVID_SLASH "\xee\x92\xa9" // U+e4a9
@@ -1357,6 +1391,7 @@
#define ICON_FA_VOICEMAIL "\xef\xa2\x97" // U+f897
#define ICON_FA_VOLCANO "\xef\x9d\xb0" // U+f770
#define ICON_FA_VOLLEYBALL "\xef\x91\x9f" // U+f45f
#define ICON_FA_VOLUME "\xef\x9a\xa8" // U+f6a8
#define ICON_FA_VOLUME_HIGH "\xef\x80\xa8" // U+f028
#define ICON_FA_VOLUME_LOW "\xef\x80\xa7" // U+f027
#define ICON_FA_VOLUME_OFF "\xef\x80\xa6" // U+f026
@@ -1372,6 +1407,7 @@
#define ICON_FA_WATER "\xef\x9d\xb3" // U+f773
#define ICON_FA_WATER_LADDER "\xef\x97\x85" // U+f5c5
#define ICON_FA_WAVE_SQUARE "\xef\xa0\xbe" // U+f83e
#define ICON_FA_WEB_AWESOME "\xee\x9a\x82" // U+e682
#define ICON_FA_WEIGHT_HANGING "\xef\x97\x8d" // U+f5cd
#define ICON_FA_WEIGHT_SCALE "\xef\x92\x96" // U+f496
#define ICON_FA_WHEAT_AWN "\xee\x8b\x8d" // U+e2cd

View File

@@ -1,52 +1,60 @@
#include "IconsFontAwesome6.h"
#include "TracyAchievements.hpp"
#include "TracyImGui.hpp"
#include "TracySourceContents.hpp"
#include "TracyWeb.hpp"
#include "../Fonts.hpp"
#include "TracyEmbed.hpp"
#include "data/Text100Million.hpp"
#include "data/TextConnectToClient.hpp"
#include "data/TextFindZone.hpp"
#include "data/TextFrameImages.hpp"
#include "data/TextGlobalSettings.hpp"
#include "data/TextInstrumentFrames.hpp"
#include "data/TextInstrumentationIntro.hpp"
#include "data/TextInstrumentationStatistics.hpp"
#include "data/TextIntro.hpp"
#include "data/TextLoadTrace.hpp"
#include "data/TextSamplingIntro.hpp"
#include "data/TextSaveTrace.hpp"
namespace tracy::data
{
AchievementItem ai_samplingIntro = { "samplingIntro", "Sampling program execution", [](){
ImGui::TextWrapped( "Sampling program execution is a great way to find out where the hot spots are in your program. It can be used to find out which functions take the most time, or which lines of code are executed the most often." );
ImGui::TextWrapped( "While instrumentation requires changes to your code, sampling does not. However, because of the way it works, the results are coarser and it's not possible to know when functions are called or when they return." );
ImGui::TextWrapped( "Sampling is automatic on Linux. On Windows, you must run the profiled application as an administrator for it to work." );
ImGui::PushFont( g_fonts.normal, FontSmall );
ImGui::PushStyleColor( ImGuiCol_Text, GImGui->Style.Colors[ImGuiCol_TextDisabled] );
ImGui::TextWrapped( "Depending on your system configuration, some additional steps may be required. Please refer to the user manual for more information." );
ImGui::PopStyleColor();
ImGui::PopFont();
} };
static std::string UnpackImpl( size_t size, size_t lz4Size, const uint8_t* data )
{
std::string ret;
const EmbedData unembed( size, lz4Size, data );
ret.assign( unembed.data(), unembed.size() );
return ret;
}
#define Unpack( name ) UnpackImpl( Embed::name##Size, Embed::name##Lz4Size, Embed::name##Data )
AchievementItem ai_samplingIntro = {
.id = "samplingIntro",
.name = "Sampling program execution",
.text = Unpack( TextSamplingIntro ),
};
AchievementItem* ac_samplingItems[] = { &ai_samplingIntro, nullptr };
AchievementCategory ac_sampling = { "sampling", "Sampling", ac_samplingItems };
AchievementItem ai_100million = { "100million", "It's over 100 million!", [](){
ImGui::TextWrapped( "Tracy can handle a lot of data. How about 100 million zones in a single trace? Add a lot of zones to your program and see how it handles it!" );
ImGui::TextWrapped( "Capturing a long-running profile trace is easy. Need to profile an hour of your program execution? You can do it." );
ImGui::TextWrapped( "Note that it doesn't make much sense to instrument every little function you might have. The cost of the instrumentation itself will be higher than the cost of the function in such a case." );
ImGui::PushFont( g_fonts.normal, FontSmall );
ImGui::PushStyleColor( ImGuiCol_Text, GImGui->Style.Colors[ImGuiCol_TextDisabled] );
ImGui::TextWrapped( "Keep in mind that the more zones you have, the more memory and CPU time the profiler will use. Be careful not to run out of memory." );
ImGui::TextWrapped( "To capture 100 million zones, you will need approximately 4 GB of RAM." );
ImGui::PopStyleColor();
ImGui::PopFont();
} };
AchievementItem ai_100million = {
.id = "100million",
.name = "It's over 100 million!",
.text = Unpack( Text100Million )
};
AchievementItem ai_instrumentationStatistics = { "instrumentationStatistics", "Show me the stats!", [](){
ImGui::TextWrapped( "Once you have instrumented your application, you can view the statistics for each zone in the timeline. This allows you to see how much time is spent in each zone and how many times it is called." );
ImGui::TextWrapped( "To view the statistics, click on the \"" ICON_FA_ARROW_UP_WIDE_SHORT " Statistics\" button on the top bar. This will open a new window with a list of all zones in the trace." );
} };
AchievementItem ai_instrumentationStatistics = {
.id = "instrumentationStatistics",
.name = "Show me the stats!",
.text = Unpack( TextInstrumentationStatistics )
};
AchievementItem ai_findZone = { "findZone", "Find some zones", [](){
ImGui::TextWrapped( "You can search for zones in the trace by opening the search window with the \"" ICON_FA_MAGNIFYING_GLASS " Find zone\" button on the top bar. It will ask you for the zone name, which in most cases will be the function name in the code." );
ImGui::TextWrapped( "The search may find more than one zone with the same name. A list of all the zones found is displayed, and you can select any of them." );
ImGui::TextWrapped( "Alternatively, you can open the Statistics window and click an entry there. This will open the Find zone window as if you had searched for that zone." );
ImGui::TextWrapped( "When a zone is selected, a number of statistics are displayed to help you understand the performance of your application. In addition, a histogram of the zone execution times is displayed to make it easier for you to determine the performance of the profiled code. Be sure to select a zone with a large number of calls to make the histogram look interesting!" );
ImGui::TextWrapped( "Note that you can draw a range on the histogram to limit the number of entries displayed in the zone list below. This list allows you to examine each zone individually. There are also a number of zone groupings that you can select. Each group can be selected and the time associated with the selected group will be highlighted on the histogram." );
} };
AchievementItem ai_findZone = {
.id = "findZone",
.name = "Find some zones",
.text = Unpack( TextFindZone )
};
AchievementItem* ac_instrumentationIntroItems[] = {
&ai_100million,
@@ -55,90 +63,46 @@ AchievementItem* ac_instrumentationIntroItems[] = {
nullptr
};
AchievementItem ai_instrumentationIntro = { "instrumentationIntro", "Instrumentating your application", [](){
constexpr const char* src = R"(#include "Tracy.hpp"
AchievementItem ai_instrumentationIntro = {
.id = "instrumentationIntro",
.name = "Instrumentating your application",
.text = Unpack( TextInstrumentationIntro ),
.items = ac_instrumentationIntroItems
};
void SomeFunction()
{
ZoneScoped;
// Your code here
}
)";
static SourceContents sc;
sc.Parse( src );
ImGui::TextWrapped( "Instrumentation is a powerful feature that allows you to see the exact runtime of each call to the selected set of functions. The downside is that it takes a bit of manual work to get it set up." );
ImGui::TextWrapped( "To get started, open a source file and include the Tracy.hpp header. This will give you access to a variety of macros provided by Tracy. Next, add the ZoneScoped macro to the beginning of one of your functions, like this:" );
ImGui::PushFont( g_fonts.mono, FontNormal );
PrintSource( sc.get() );
ImGui::PopFont();
ImGui::TextWrapped( "Now, when you profile your application, you will see a new zone appear on the timeline for each call to the function. This allows you to see how much time is spent in each call and how many times the function is called." );
ImGui::PushFont( g_fonts.normal, FontSmall );
ImGui::PushStyleColor( ImGuiCol_Text, GImGui->Style.Colors[ImGuiCol_TextDisabled] );
ImGui::TextWrapped( "Note: The ZoneScoped macro is just one of the many macros provided by Tracy. See the documentation for more information." );
ImGui::TextWrapped( "The above description applies to C++ code, but things are done similarly in other programming languages. Refer to the documentation for your language for more information." );
ImGui::PopStyleColor();
ImGui::PopFont();
}, ac_instrumentationIntroItems };
AchievementItem ai_frameImages = { "frameImages", "A picture is worth a thousand words", [](){
ImGui::TextWrapped( "Tracy allows you to add context to each frame, by attaching a screenshot. You can do this with the FrameImage macro." );
ImGui::TextWrapped( "You will have to do the screen capture and resizing yourself, which can be a bit complicated. The manual provides a sample code that shows how to do this in a performant way." );
ImGui::TextWrapped( "The frame images are displayed in the context of a frame, for example, when you hover over the frame in the timeline or in the frame graph at the top of the screen." );
ImGui::TextWrapped( "You can even view a recording of what your application was doing by clicking the " ICON_FA_SCREWDRIVER_WRENCH " icon and then selecting the \"" ICON_FA_PLAY " Playback\" option. Try it out!" );
ImGui::TextWrapped( "The FrameImage macro is a great way to see what happened in your application at a particular time. Maybe you have a performance problem that only occurs when a certain object is on the screen?" );
} };
AchievementItem ai_frameImages = {
.id = "frameImages",
.name = "A picture is worth a thousand words",
.text = Unpack( TextFrameImages )
};
AchievementItem* ac_instrumentFramesItems[] = {
&ai_frameImages,
nullptr
};
AchievementItem ai_instrumentFrames = { "instrumentFrames", "Instrumenting frames", [](){
constexpr const char* src = R"(#include "Tracy.hpp"
void Render()
{
// Render the frame
SwapBuffers();
FrameMark;
}
)";
static SourceContents sc;
sc.Parse( src );
ImGui::TextWrapped( "In addition to instrumenting functions, you can also instrument frames. This allows you to see how much time is spent in each frame of your application." );
ImGui::TextWrapped( "To instrument frames, you need to add the FrameMark macro at the beginning of each frame. This can be done in the main loop of your application, or in a separate function that is called at the beginning of each frame." );
ImGui::PushFont( g_fonts.mono, FontNormal );
PrintSource( sc.get() );
ImGui::PopFont();
ImGui::TextWrapped( "When you profile your application, you will see a new frame appear on the timeline each time the FrameMark macro is called. This allows you to see how much time is spent in each frame and how many frames are rendered per second." );
ImGui::TextWrapped( "The FrameMark macro is a great way to see at a glance how your application is performing over time. Maybe there are some performance problems that only appear after a few minutes of running the application? A frame graph is drawn at the top of the profiler window where you can see the timing of all frames." );
ImGui::TextWrapped( "Note that some applications do not have a frame-based structure, and in such cases, frame instrumentation may not be useful. That's ok." );
}, ac_instrumentFramesItems };
AchievementItem ai_instrumentFrames = {
.id = "instrumentFrames",
.name = "Instrumenting frames",
.text = Unpack( TextInstrumentFrames ),
.items = ac_instrumentFramesItems
};
AchievementItem* ac_instrumentationItems[] = { &ai_instrumentationIntro, &ai_instrumentFrames, nullptr };
AchievementCategory ac_instrumentation = { "instrumentation", "Instrumentation", ac_instrumentationItems };
AchievementItem ai_loadTrace = { "loadTrace", "Load a trace", [](){
ImGui::TextWrapped( "You can open a previously saved trace file (or one received from a friend) with the \"" ICON_FA_FOLDER_OPEN " Open saved trace\" button on the welcome screen." );
} };
AchievementItem ai_loadTrace = {
.id = "loadTrace",
.name = "Load a trace",
.text = Unpack( TextLoadTrace )
};
AchievementItem ai_saveTrace = { "saveTrace", "Save a trace", [](){
ImGui::TextWrapped( "Now that you have traced your application (or are in the process of doing so), you can save it to disk for future reference. You can do this by clicking on the " ICON_FA_WIFI " icon in the top left corner of the screen and then clicking on the \"" ICON_FA_FLOPPY_DISK " Save trace\" button." );
ImGui::TextWrapped( "Keeping old traces on hand can be beneficial, as you can compare the performance of your optimizations with what you had before." );
ImGui::TextWrapped( "You can also share the trace with your friends or co-workers by sending them the trace file." );
ImGui::Spacing();
tracy::TextColoredUnformatted( 0xFF44FFFF, ICON_FA_TRIANGLE_EXCLAMATION );
ImGui::SameLine();
ImGui::TextUnformatted( "Warning" );
ImGui::SameLine();
tracy::TextColoredUnformatted( 0xFF44FFFF, ICON_FA_TRIANGLE_EXCLAMATION );
ImGui::TextWrapped( "Trace files can contain sensitive information about your application, such as program code, or even the contents of source files. Be careful when sharing them with others." );
} };
AchievementItem ai_saveTrace = {
.id = "saveTrace",
.name = "Save a trace",
.text = Unpack( TextSaveTrace )
};
AchievementItem* ac_connectToServerItems[] = {
&ai_saveTrace,
@@ -152,23 +116,19 @@ AchievementItem* ac_connectToServerUnlock[] = {
nullptr
};
AchievementItem ai_connectToServer = { "connectToClient", "First profiling session", [](){
ImGui::TextWrapped( "Let's start our adventure by instrumenting your application and connecting it to the profiler. Here's a quick refresher:" );
ImGui::TextWrapped( " 1. Integrate Tracy Profiler into your application. This can be done using CMake, Meson, or simply by adding the source files to your project." );
ImGui::TextWrapped( " 2. Make sure that TracyClient.cpp (or the Tracy library) is included in your build." );
ImGui::TextWrapped( " 3. Define TRACY_ENABLE in your build configuration, for the whole application. Do not do it in a single source file because it won't work." );
ImGui::TextWrapped( " 4. Start your application, and \"" ICON_FA_WIFI " Connect\" to it with the profiler." );
ImGui::TextWrapped( "Please refer to the user manual for more details." );
if( ImGui::SmallButton( "Download the user manual" ) )
{
tracy::OpenWebpage( "https://github.com/wolfpld/tracy/releases" );
}
}, ac_connectToServerItems, ac_connectToServerUnlock };
AchievementItem ai_connectToServer = {
.id = "connectToClient",
.name = "First profiling session",
.text = Unpack( TextConnectToClient ),
.items = ac_connectToServerItems,
.unlocks = ac_connectToServerUnlock
};
AchievementItem ai_globalSettings = { "globalSettings", "Global settings", [](){
ImGui::TextWrapped( "Tracy has a variety of settings that can be adjusted to suit your needs. These settings can be found by clicking on the " ICON_FA_WRENCH " icon on the welcome screen. This will open the about window, where you can expand the \"" ICON_FA_TOOLBOX " Global settings\" menu." );
ImGui::TextWrapped( "The settings are saved between sessions, so you only need to set them once." );
} };
AchievementItem ai_globalSettings = {
.id = "globalSettings",
.name = "Global settings",
.text = Unpack( TextGlobalSettings )
};
AchievementItem* ac_achievementsIntroItems[] = {
&ai_connectToServer,
@@ -176,18 +136,14 @@ AchievementItem* ac_achievementsIntroItems[] = {
nullptr
};
AchievementItem ai_achievementsIntro = { "achievementsIntro", "Click here to discover achievements!", [](){
ImGui::TextWrapped( "Clicking on the " ICON_FA_STAR " button opens the Achievements List. Here you can see the tasks to be completed along with a short description of what needs to be done." );
ImGui::TextWrapped( "As you complete each Achievement, new Achievements will appear, so be sure to keep checking the list for new ones!" );
ImGui::TextWrapped( "To make the new things easier to spot, the Achievements List will show a marker next to them. The achievements " ICON_FA_STAR " button will glow yellow when there are new things to see." );
ImGui::TextUnformatted( "New tasks:" );
ImGui::SameLine();
TextColoredUnformatted( 0xFF4488FF, ICON_FA_CIRCLE_EXCLAMATION );
ImGui::TextUnformatted( "Completed tasks:" );
ImGui::SameLine();
TextColoredUnformatted( 0xFF44FF44, ICON_FA_CIRCLE_CHECK );
ImGui::TextWrapped( "Good luck!" );
}, ac_achievementsIntroItems, nullptr, true, 1 };
AchievementItem ai_achievementsIntro = {
.id = "achievementsIntro",
.name = "Click here to discover achievements!",
.text = Unpack( TextIntro ),
.items = ac_achievementsIntroItems,
.keepOpen = true,
.unlockTime = 1
};
AchievementItem* ac_firstStepsItems[] = { &ai_achievementsIntro, nullptr };
AchievementCategory ac_firstSteps = { "firstSteps", "First steps", ac_firstStepsItems, 1 };

View File

@@ -20,7 +20,7 @@ struct AchievementItem
{
const char* id;
const char* name;
void(*description)();
std::string text;
AchievementItem** items;
AchievementItem** unlocks;
bool keepOpen;

View File

@@ -3,7 +3,7 @@
#include "imgui.h"
#include "../Fonts.hpp"
#include "IconsFontAwesome6.h"
#include "IconsFontAwesome7.h"
#include "TracyBadVersion.hpp"
#include "TracyImGui.hpp"
#include "TracyWeb.hpp"

View File

@@ -13,7 +13,7 @@
#include "imgui_internal.h"
#include "../public/common/TracyForceInline.hpp"
#include "IconsFontAwesome6.h"
#include "IconsFontAwesome7.h"
#include "TracySourceTokenizer.hpp"
ImTextureID GetProfilerIconTexture();

View File

@@ -32,6 +32,17 @@ void* memmem( const void* haystack, size_t hsize, const char* needle, size_t nsi
namespace tracy
{
static constexpr std::array FontSizes = {
1.f, // normal text
1.6f, // h1
1.5f, // h2
1.4f, // h3
1.3f, // h4
1.2f, // h5
1.1f, // h6
0.75f, // footnote
};
class MarkdownContext
{
struct List
@@ -140,6 +151,55 @@ public:
case MD_BLOCK_TD:
ImGui::TableNextColumn();
break;
case MD_BLOCK_FOOTNOTE_DEF_SECTION:
Separate();
ImGui::Separator();
header = 7;
break;
case MD_BLOCK_FOOTNOTE_DEF:
{
ImGui::Dummy( ImVec2( 0, ImGui::GetTextLineHeight() * 0.5f ) );
auto footnote = ((MD_BLOCK_FOOTNOTE_DEF_DETAIL*)detail);
ImGui::PushFont( g_fonts.normal, FontNormal * FontSizes[header] );
PrintTextExt( footnote->label.text, footnote->label.text + footnote->label.size, false );
Glue();
ImGui::TextUnformatted( ". " );
break;
}
case MD_BLOCK_ADMONITION:
{
Separate();
ImGui::Indent();
origin = ImGui::GetCursorScreenPos();
auto admonition = ((MD_BLOCK_ADMONITION_DETAIL*)detail);
switch( admonition->type.text[0] )
{
case 'n': // note
color = 0xFFEB6F1F;
TextColoredUnformatted( color, ICON_FA_CIRCLE_INFO " " );
break;
case 't': // tip
color = 0xFF368623;
TextColoredUnformatted( color, ICON_FA_LIGHTBULB " " );
break;
case 'i': // important
color = 0xFFE55789;
TextColoredUnformatted( color, ICON_FA_MESSAGE " " );
break;
case 'w': // warning
color = 0xFF036A9E;
TextColoredUnformatted( color, ICON_FA_TRIANGLE_EXCLAMATION " " );
break;
case 'c': // caution
color = 0xFF3336DA;
TextColoredUnformatted( color, ICON_FA_HAND " " );
break;
default:
assert( false );
}
Glue();
break;
}
default:
break;
}
@@ -194,6 +254,18 @@ public:
case MD_BLOCK_TD:
glue = false;
break;
case MD_BLOCK_FOOTNOTE_DEF:
ImGui::PopFont();
break;
case MD_BLOCK_ADMONITION:
{
const auto scale = GetScale();
const auto pos = ImGui::GetCursorScreenPos();
const auto offset = ImVec2( 8.f * scale, 0 );
ImGui::Unindent();
ImGui::GetWindowDrawList()->AddLine( origin - offset, pos - offset, color, 2.f * scale );
break;
}
default:
break;
}
@@ -216,6 +288,14 @@ public:
case MD_SPAN_DEL:
strikethrough = true;
break;
case MD_SPAN_FOOTNOTE_REF:
{
auto footnote = ((MD_SPAN_FOOTNOTE_REF_DETAIL*)detail);
ImGui::PushFont( g_fonts.normal, FontSmall );
Glue();
PrintTextExt( footnote->label.text, footnote->label.text + footnote->label.size );
break;
}
default:
break;
}
@@ -246,17 +326,6 @@ public:
int Text( MD_TEXTTYPE type, const MD_CHAR* text, MD_SIZE size )
{
constexpr std::array FontSizes = {
1.f,
1.7f,
1.6f,
1.5f,
1.4f,
1.3f,
1.2f,
1.1f
};
switch( type )
{
case MD_TEXT_NORMAL:
@@ -373,6 +442,7 @@ private:
ImGui::SetMouseCursor( ImGuiMouseCursor_Hand );
ImGui::BeginTooltip();
ImGui::PushFont( g_fonts.normal, FontNormal );
ImGui::PushStyleColor( ImGuiCol_Text, ImVec4( 1.f, 1.f, 1.f, 1.f ) );
if( isSource )
{
@@ -430,6 +500,7 @@ private:
ImGui::TextUnformatted( link.c_str() );
}
ImGui::PopStyleColor();
ImGui::PopFont();
ImGui::EndTooltip();
if( IsMouseClicked( ImGuiMouseButton_Left ) )
{
@@ -465,6 +536,9 @@ private:
int idx = 0;
uint32_t color;
ImVec2 origin;
std::vector<List> lists;
std::string link;
@@ -479,7 +553,7 @@ Markdown::Markdown( View* view, Worker* worker )
, m_worker( worker )
{
memset( m_parser, 0, sizeof( MD_PARSER ) );
m_parser->flags = MD_FLAG_COLLAPSEWHITESPACE | MD_FLAG_PERMISSIVEAUTOLINKS | MD_FLAG_NOHTML | MD_FLAG_TABLES | MD_FLAG_TASKLISTS | MD_FLAG_STRIKETHROUGH;
m_parser->flags = MD_FLAG_COLLAPSEWHITESPACE | MD_FLAG_PERMISSIVEAUTOLINKS | MD_FLAG_NOHTML | MD_FLAG_TABLES | MD_FLAG_TASKLISTS | MD_FLAG_STRIKETHROUGH | MD_FLAG_FOOTNOTES | MD_FLAG_ADMONITIONS;
m_parser->enter_block = []( MD_BLOCKTYPE type, void* detail, void* ud ) -> int { return ((MarkdownContext*)ud)->EnterBlock( type, detail ); };
m_parser->leave_block = []( MD_BLOCKTYPE type, void* detail, void* ud ) -> int { return ((MarkdownContext*)ud)->LeaveBlock( type, detail ); };
m_parser->enter_span = []( MD_SPANTYPE type, void* detail, void* ud ) -> int { return ((MarkdownContext*)ud)->EnterSpan( type, detail ); };

View File

@@ -20,7 +20,7 @@
#include "tracy_pdqsort.h"
#include "../Fonts.hpp"
#include "IconsFontAwesome6.h"
#include "IconsFontAwesome7.h"
namespace tracy
{

View File

@@ -1,9 +1,13 @@
#include <assert.h>
#include <memory>
#include <nlohmann/json.hpp>
#include <sys/stat.h>
#ifdef _WIN32
# include <stdio.h>
# ifdef _MSC_VER
# define unlink _unlink
# endif
#else
# include <unistd.h>
#endif
@@ -19,15 +23,32 @@ namespace tracy
UserData::UserData()
: m_preserveState( false )
, m_sidecarPublic( false )
{
}
UserData::UserData( const char* program, uint64_t time )
UserData::UserData( const char* program, uint64_t time, const char* filePath )
: m_program( program )
, m_time( time )
, m_preserveState( false )
, m_sidecarPublic( false )
{
if( m_program.empty() ) m_program = "_";
if( filePath )
{
m_filePath = filePath;
m_sidecarPublic = true;
auto sidecar = GetSidecarPath( false );
if( sidecar.empty() )
{
m_sidecarPublic = false;
}
else
{
struct stat st;
if( stat( sidecar.c_str(), &st ) != 0 ) m_sidecarPublic = false;
}
}
if( !Load() )
{
@@ -38,15 +59,23 @@ UserData::UserData( const char* program, uint64_t time )
}
}
void UserData::Init( const char* program, uint64_t time )
void UserData::Init( const char* program, uint64_t time, const char* filePath )
{
assert( !Valid() );
m_program = program;
m_time = time;
if( filePath ) m_filePath = filePath;
if( m_program.empty() ) m_program = "_";
}
void UserData::SetFilePath( const char* filePath )
{
assert( filePath );
m_filePath = filePath;
if( m_sidecarPublic ) Save();
}
void UserData::SetDescription( const char* description )
{
m_description = description;
@@ -71,6 +100,24 @@ void UserData::StateShouldBePreserved()
m_preserveState = true;
}
void UserData::SetSidecarPublic( bool state )
{
assert( Valid() );
assert( m_sidecarPublic != state );
const auto oldFn = GetSidecarPath( false );
m_sidecarPublic = state;
if( Save() )
{
unlink( oldFn.c_str() );
}
else
{
m_sidecarPublic = !state;
}
}
void UserData::LoadAnnotations( std::vector<std::shared_ptr<Annotation>>& data )
{
assert( m_preserveState );
@@ -95,9 +142,9 @@ void UserData::StoreSourceSubstitutions( const std::vector<SourceRegex>& data )
m_sourceSubstitutions = data;
}
void UserData::Save()
bool UserData::Save()
{
if( !m_preserveState ) return;
if( !m_preserveState ) return false;
assert( Valid() );
nlohmann::json json = {
@@ -158,12 +205,14 @@ void UserData::Save()
}
auto f = OpenFile( true );
if( f )
{
auto str = json.dump( 2 );
fwrite( str.c_str(), 1, str.size(), f );
fclose( f );
}
if( !f ) return false;
auto str = json.dump( 2 );
const auto sz = str.size();
const auto wrote = fwrite( str.c_str(), 1, sz, f );
fclose( f );
return sz == wrote;
}
template<typename T>
@@ -262,9 +311,9 @@ bool UserData::Load()
FILE* UserData::OpenFile( bool write )
{
const auto path = GetSavePath( m_program.c_str(), m_time, write );
if( !path ) return nullptr;
FILE* f = fopen( path, write ? "wb" : "rb" );
const auto path = GetSidecarPath( write );
if( path.empty() ) return nullptr;
FILE* f = fopen( path.c_str(), write ? "wb" : "rb" );
return f;
}
@@ -276,10 +325,17 @@ FILE* UserData::OpenFileLegacy( const char* filename )
return f;
}
const char* UserData::GetConfigLocation() const
std::string UserData::GetSidecarPath( bool write ) const
{
assert( Valid() );
return GetSavePathLegacy( m_program.c_str(), m_time, nullptr );
if( m_sidecarPublic )
{
assert( !m_filePath.empty() );
return m_filePath + ".json";
}
auto path = GetSavePath( m_program.c_str(), m_time, write );
if( !path ) return {};
return path;
}
void UserData::LoadLegacyDescription()

View File

@@ -20,10 +20,11 @@ class UserData
{
public:
UserData();
UserData( const char* program, uint64_t time );
UserData( const char* program, uint64_t time, const char* filePath );
bool Valid() const { return !m_program.empty(); }
void Init( const char* program, uint64_t time );
void Init( const char* program, uint64_t time, const char* filePath );
void SetFilePath( const char* filePath );
const std::string& GetDescription() const { return m_description; }
void SetDescription( const char* description );
@@ -38,14 +39,17 @@ public:
void LoadSourceSubstitutions( std::vector<SourceRegex>& data );
void StoreSourceSubstitutions( const std::vector<SourceRegex>& data );
void Save();
bool Save();
const char* GetConfigLocation() const;
bool IsSidecarPublic() const { return m_sidecarPublic; }
void SetSidecarPublic( bool state );
private:
FILE* OpenFile( bool write );
FILE* OpenFileLegacy( const char* filename );
std::string GetSidecarPath( bool write ) const;
bool Load();
void LoadLegacyDescription();
@@ -55,6 +59,7 @@ private:
std::string m_program;
uint64_t m_time;
std::string m_filePath;
std::string m_description;
ViewData m_viewData;
@@ -62,6 +67,7 @@ private:
std::vector<SourceRegex> m_sourceSubstitutions;
bool m_preserveState;
bool m_sidecarPublic;
};
}

View File

@@ -26,7 +26,7 @@
#include "../Fonts.hpp"
#include "imgui_internal.h"
#include "IconsFontAwesome6.h"
#include "IconsFontAwesome7.h"
namespace tracy
{
@@ -78,7 +78,7 @@ View::View( void(*cbMainThread)(const std::function<void()>&, bool), FileRead& f
, m_stcb( stcb )
, m_sscb( sscb )
, m_acb( acb )
, m_userData( m_worker.GetCaptureProgram().c_str(), m_worker.GetCaptureTime() )
, m_userData( m_worker.GetCaptureProgram().c_str(), m_worker.GetCaptureTime(), f.GetFilename().c_str() )
, m_cbMainThread( cbMainThread )
, m_achievementsMgr( amgr )
, m_achievements( s_config.achievements )
@@ -115,11 +115,7 @@ View::View( void(*cbMainThread)(const std::function<void()>&, bool), FileRead& f
View::~View()
{
m_worker.Shutdown();
m_userData.StoreState( m_vd );
m_userData.StoreAnnotations( m_annotations );
m_userData.StoreSourceSubstitutions( m_sourceSubstitutions );
m_userData.Save();
SaveUserData();
if( m_compare.loadThread.joinable() ) m_compare.loadThread.join();
if( m_saveThread.joinable() ) m_saveThread.join();
@@ -156,6 +152,14 @@ void View::Achieve( const char* id )
m_achievementsMgr->Achieve( id );
}
void View::SaveUserData()
{
m_userData.StoreState( m_vd );
m_userData.StoreAnnotations( m_annotations );
m_userData.StoreSourceSubstitutions( m_sourceSubstitutions );
m_userData.Save();
}
void View::ViewSource( const char* fileName, int line )
{
assert( fileName );
@@ -734,7 +738,7 @@ bool View::DrawImpl()
m_uarchSet = true;
m_sourceView->SetCpuId( m_worker.GetCpuId() );
}
if( !m_userData.Valid() ) m_userData.Init( m_worker.GetCaptureProgram().c_str(), m_worker.GetCaptureTime() );
if( !m_userData.Valid() ) m_userData.Init( m_worker.GetCaptureProgram().c_str(), m_worker.GetCaptureTime(), nullptr );
if( m_saveThreadState.load( std::memory_order_acquire ) == SaveThreadState::NeedsJoin )
{
m_saveThread.join();
@@ -1472,6 +1476,7 @@ bool View::Save( const char* fn, FileCompression comp, int zlevel, bool buildDic
if( !f ) return false;
m_userData.StateShouldBePreserved();
m_userData.SetFilePath( fn );
m_saveThreadState.store( SaveThreadState::Saving, std::memory_order_relaxed );
m_saveThread = std::thread( [this, f{std::move( f )}, buildDict] {
Worker::MainThreadDataLockGuard lock = m_worker.ObtainLockForMainThread();

View File

@@ -269,6 +269,7 @@ private:
void InitTextEditor();
void SetupConfig();
void Achieve( const char* id );
void SaveUserData();
bool DrawImpl();
void DrawFrameImage( FrameImageCache& cache, const FrameImage& fi, float scale = GetScale() );

View File

@@ -226,7 +226,7 @@ void View::DrawCompare()
try
{
m_compare.second = std::make_unique<Worker>( *f, EventType::SourceCache );
m_compare.userData = std::make_unique<UserData>( m_compare.second->GetCaptureProgram().c_str(), m_compare.second->GetCaptureTime() );
m_compare.userData = std::make_unique<UserData>( m_compare.second->GetCaptureProgram().c_str(), m_compare.second->GetCaptureTime(), nullptr );
m_compare.diffDirection = m_worker.GetCaptureTime() < m_compare.second->GetCaptureTime();
}
catch( const tracy::UnsupportedVersion& e )

View File

@@ -25,6 +25,8 @@ void View::DrawManual()
ImGui::PopStyleColor();
ImGui::SameLine();
TextDisabledUnformatted( "This user manual is missing features. See the PDF file for the proper version." );
ImGui::SameLine();
if( ImGui::Button( ICON_FA_BOOK " PDF Manual" ) ) OpenWebpage( "https://github.com/wolfpld/tracy/releases" );
ImGui::Separator();
ImGui::BeginChild( "##usermanual" );
@@ -88,6 +90,7 @@ void View::DrawManual()
if( ImGui::IsItemClicked() && !ImGui::IsItemToggledOpen() )
{
m_activeManualChunk = i;
m_manualPositionReset = true;
}
}
while( level-- > 0 ) ImGui::TreePop();
@@ -142,8 +145,8 @@ void View::DrawManual()
ImGui::Dummy( ImVec2( 0, ImGui::GetTextLineHeight() * 0.25f ) );
ImGui::PopFont();
const auto separator = chunk.text.find( "-----" );
const auto size = separator == std::string::npos ? chunk.text.size() : separator;
const auto separator = chunk.text.find( "\n-----" );
const auto size = separator == std::string::npos ? chunk.text.size() : ( separator + 1 );
m_markdown.Print( chunk.text.c_str(), size );
}

View File

@@ -41,15 +41,11 @@ void View::DrawInfo()
TextFocused( "File:", m_filename.c_str() );
if( m_userData.Valid() )
{
const auto save = m_userData.GetConfigLocation();
if( save )
ImGui::SameLine();
auto sidecarPublic = m_userData.IsSidecarPublic();
if( SmallCheckbox( ICON_FA_USER_GEAR " Public sidecar", &sidecarPublic ) )
{
ImGui::SameLine();
if( ImGui::SmallButton( ICON_FA_FOLDER ) )
{
ImGui::SetClipboardText( save );
}
TooltipIfHovered( "Copy user settings location to clipboard." );
m_userData.SetSidecarPublic( sidecarPublic );
}
}
}

View File

@@ -0,0 +1,60 @@
cmake_minimum_required(VERSION 3.18)
project(CUDAGraphReproTests LANGUAGES CXX CUDA)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.24")
set(CMAKE_CUDA_ARCHITECTURES native)
endif()
set(TRACY_PATH "${CMAKE_CURRENT_SOURCE_DIR}/../../../.."
CACHE PATH "Root of the Tracy repository")
set(TRACY_PUBLIC "${TRACY_PATH}/public")
option(TRACY_ENABLE "Enable profiling" ON)
find_package(CUDAToolkit REQUIRED)
find_package(Threads REQUIRED)
# Tracy client (CXX-only, matching the Makefile's g++ step for TracyClient.cpp)
add_library(TracyClient STATIC ${TRACY_PUBLIC}/TracyClient.cpp)
target_include_directories(TracyClient PUBLIC ${TRACY_PUBLIC})
target_link_libraries(TracyClient PUBLIC Threads::Threads ${CMAKE_DL_LIBS})
if(TRACY_ENABLE)
target_compile_definitions(TracyClient PUBLIC TRACY_ENABLE)
endif()
# repro: Tracy-integrated CUDA Graph reproducer
add_executable(repro repro.cu)
target_link_libraries(repro PRIVATE TracyClient CUDA::cupti CUDA::cuda_driver)
# Standalone CUPTI probes (no Tracy dependency)
add_executable(test_corr_reuse test_corr_reuse.cu)
target_link_libraries(test_corr_reuse PRIVATE CUDA::cupti CUDA::cuda_driver)
add_executable(test_graphid_recycle test_graphid_recycle.cu)
target_link_libraries(test_graphid_recycle PRIVATE CUDA::cupti CUDA::cuda_driver)
set(_all_targets repro test_corr_reuse test_graphid_recycle)
# ctest-related integration below
# to run the binaries via ctest:
# ctest --test-dir <cmake-build-dir> -R <binary-name> -C <build-config>
enable_testing()
foreach(_target ${_all_targets})
add_test(NAME ${_target} COMMAND ${_target})
endforeach()
# On Windows, CUPTI's DLL must be on PATH at runtime.
# Propagate the DLL directory to both the VS debugger and ctest.
if(WIN32)
set(_cupti_dir "$<TARGET_FILE_DIR:CUDA::cupti>")
foreach(_target ${_all_targets})
set_target_properties(${_target} PROPERTIES
VS_DEBUGGER_ENVIRONMENT "PATH=${_cupti_dir};$ENV{PATH}")
set_tests_properties(${_target} PROPERTIES
ENVIRONMENT "PATH=${_cupti_dir};$ENV{PATH}")
endforeach()
endif()

View File

@@ -14,8 +14,9 @@ drops every GPU zone.
## Build and run
```bash
make
./repro
cmake -S . -B ./build
cmake --build ./build --parallel --config Release
ctest --test-dir ./build -C Release -R repro
```
## What to expect

View File

@@ -11,6 +11,22 @@
#include "OfflineSymbolResolver.h"
bool ResolveSymbols( const std::string& addr2lineToolPath, const std::string& addr2lineArgs,
const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries )
{
#ifdef _WIN32
// On Windows the default (no custom tool given) is the DbgHelp backend.
if( addr2lineToolPath.empty() )
{
return ResolveSymbolsDbgHelp( imagePath, inputEntryList, resolvedEntries );
}
#endif
// Everywhere else, and whenever a custom tool is given, use the addr2line-compatible backend.
// An empty path lets that backend fall back to the 'addr2line' found in PATH.
return ResolveSymbolsAddr2Line( addr2lineToolPath, addr2lineArgs, imagePath, inputEntryList, resolvedEntries );
}
bool ApplyPathSubstitutions( std::string& path, const PathSubstitutionList& pathSubstitutionlist )
{
for( const auto& substitution : pathSubstitutionlist )
@@ -31,7 +47,35 @@ tracy::StringIdx AddSymbolString( tracy::Worker& worker, const std::string& str
return tracy::StringIdx( location.idx );
}
bool PatchSymbolsWithRegex( tracy::Worker& worker, const PathSubstitutionList& pathSubstitutionlist, bool verbose )
void ResetSymbols( tracy::Worker& worker )
{
std::cout << "Resetting callstack frame symbols to the unresolved state..." << std::endl;
const tracy::StringIdx unresolvedName = AddSymbolString( worker, "[unresolved]" );
const tracy::StringIdx unknownFile = AddSymbolString( worker, "[unknown]" );
uint64_t frameCount = 0;
auto& callstackFrameMap = worker.GetCallstackFrameMap();
for( auto it = callstackFrameMap.begin(); it != callstackFrameMap.end(); ++it )
{
if( !it->second ) continue;
tracy::CallstackFrameData& frameData = *it->second;
for( uint8_t f = 0; f < frameData.size; f++ )
{
tracy::CallstackFrame& frame = frameData.data[f];
frame.name = unresolvedName;
frame.file = unknownFile;
frame.line = 0;
++frameCount;
}
}
std::cout << "Reset " << frameCount << " callstack frames." << std::endl;
}
bool PatchSymbolsWithRegex( tracy::Worker& worker, const PathSubstitutionList& pathSubstitutionlist,
const std::string& addr2lineToolPath, const std::string& addr2lineArgs, bool verbose )
{
uint64_t callstackFrameCount = worker.GetCallstackFrameCount();
std::string relativeSoNameMatch = "[unresolved]";
@@ -91,7 +135,7 @@ bool PatchSymbolsWithRegex( tracy::Worker& worker, const PathSubstitutionList& p
}
SymbolEntryList resolvedEntries;
ResolveSymbols( imagePath, entries, resolvedEntries );
ResolveSymbols( addr2lineToolPath, addr2lineArgs, imagePath, entries, resolvedEntries );
if( resolvedEntries.size() != entries.size() )
{
@@ -131,7 +175,8 @@ bool PatchSymbolsWithRegex( tracy::Worker& worker, const PathSubstitutionList& p
return true;
}
void PatchSymbols( tracy::Worker& worker, const std::vector<std::string>& pathSubstitutionsStrings, bool verbose )
void PatchSymbols( tracy::Worker& worker, const std::vector<std::string>& pathSubstitutionsStrings,
const std::string& addr2lineToolPath, const std::string& addr2lineArgs, bool verbose )
{
std::cout << "Resolving and patching symbols..." << std::endl;
@@ -160,7 +205,7 @@ void PatchSymbols( tracy::Worker& worker, const std::vector<std::string>& pathSu
}
}
if ( !PatchSymbolsWithRegex(worker, pathSubstitutionList, verbose) )
if ( !PatchSymbolsWithRegex(worker, pathSubstitutionList, addr2lineToolPath, addr2lineArgs, verbose) )
{
std::cerr << "Failed to patch symbols" << std::endl;
}

View File

@@ -29,12 +29,41 @@ struct SymbolEntry
using SymbolEntryList = std::vector<SymbolEntry>;
bool ResolveSymbols( const std::string& imagePath, const FrameEntryList& inputEntryList,
// Dispatches to the appropriate backend depending on the platform and whether a custom
// addr2line-compatible tool was specified. When addr2lineToolPath is non-empty, the tool at
// that path is invoked (on any platform); otherwise the platform default is used (DbgHelp on
// Windows, the 'addr2line' found in PATH elsewhere). addr2lineArgs are extra arguments passed
// verbatim to the addr2line-compatible tool (e.g. "--relative-address").
bool ResolveSymbols( const std::string& addr2lineToolPath, const std::string& addr2lineArgs,
const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries );
void PatchSymbols( tracy::Worker& worker, const std::vector<std::string>& pathSubstitutionsStrings, bool verbose = false );
// Backend invoking an addr2line-compatible tool. Available on all platforms. An empty
// addr2lineToolPath falls back to the 'addr2line' found in PATH. addr2lineArgs are inserted
// verbatim into the tool's command line.
bool ResolveSymbolsAddr2Line( const std::string& addr2lineToolPath, const std::string& addr2lineArgs,
const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries );
#ifdef _WIN32
// Backend using the Windows DbgHelp library.
bool ResolveSymbolsDbgHelp( const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries );
#endif
// Resets all callstack frame symbols back to the unresolved state ("[unresolved]" / "[unknown]"),
// so a subsequent PatchSymbols pass re-resolves every frame. This is useful to chain several
// resolution passes with different path substitutions. Only meaningful for traces captured with
// TRACY_SYMBOL_OFFLINE_RESOLVE, where each frame's symAddr holds the image-relative offset.
void ResetSymbols( tracy::Worker& worker );
void PatchSymbols( tracy::Worker& worker, const std::vector<std::string>& pathSubstitutionsStrings,
const std::string& addr2lineToolPath = std::string(),
const std::string& addr2lineArgs = std::string(), bool verbose = false );
using PathSubstitutionList = std::vector<std::pair<std::regex, std::string> >;
bool PatchSymbolsWithRegex( tracy::Worker& worker, const PathSubstitutionList& pathSubstituionlist, bool verbose = false );
bool PatchSymbolsWithRegex( tracy::Worker& worker, const PathSubstitutionList& pathSubstituionlist,
const std::string& addr2lineToolPath = std::string(),
const std::string& addr2lineArgs = std::string(), bool verbose = false );
#endif // __SYMBOLRESOLVER_HPP__

View File

@@ -1,5 +1,3 @@
#ifndef _WIN32
#include "OfflineSymbolResolver.h"
#include <fstream>
@@ -10,6 +8,11 @@
#include <memory>
#include <stdio.h>
#ifdef _WIN32
# define popen _popen
# define pclose _pclose
#endif
std::string ExecShellCommand( const char* cmd )
{
std::array<char, 128> buffer;
@@ -29,23 +32,66 @@ std::string ExecShellCommand( const char* cmd )
class SymbolResolver
{
public:
SymbolResolver()
SymbolResolver( const std::string& addr2lineToolPath, const std::string& addr2lineArgs )
{
// Extra arguments are inserted verbatim into the tool invocation. Tracy records frame
// offsets as RVAs; for images with a non-zero preferred image base (PE, Mach-O) the user
// can pass "--relative-address" here so llvm-addr2line / llvm-symbolizer add the base back.
if( !addr2lineArgs.empty() )
{
m_addr2LineArgs = " " + addr2lineArgs;
}
if( !addr2lineToolPath.empty() )
{
// If the value looks like a path (not a bare command name resolved via PATH), verify
// it exists so a wrong path fails with an actionable error instead of a cryptic shell one.
const bool looksLikePath = addr2lineToolPath.find( '/' ) != std::string::npos ||
addr2lineToolPath.find( '\\' ) != std::string::npos;
if( looksLikePath && !std::ifstream( addr2lineToolPath ).good() )
{
std::cerr << "Specified symbol resolution tool not found: '" << addr2lineToolPath
<< "' (check the path passed to the '-a' option)" << std::endl;
return;
}
// A user-provided path may contain spaces or other shell-special characters.
escapeShellParam( addr2lineToolPath, m_addr2LinePath );
std::cout << "Using user-specified symbol resolution tool: '" << addr2lineToolPath.c_str() << "'" << std::endl;
return;
}
#ifdef _WIN32
std::cerr << "No symbol resolution tool specified (use the '-a' option to provide one)" << std::endl;
#else
std::stringstream result( ExecShellCommand("which addr2line") );
std::getline(result, m_addr2LinePath);
if( !m_addr2LinePath.length() )
{
std::cerr << "'addr2line' was not found in the system, please installed it" << std::endl;
std::cerr << "'addr2line' was not found in the system, please install it" << std::endl;
}
else
{
std::cout << "Using 'addr2line' found at: '" << m_addr2LinePath.c_str() << "'" << std::endl;
}
#endif
}
static void escapeShellParam(std::string const& s, std::string& out)
{
#ifdef _WIN32
// cmd.exe / the CRT command parser do not understand POSIX backslash escapes, and
// backslashes are path separators on Windows. Wrap the parameter in double quotes
// (which handles spaces) and drop any embedded quotes, which cannot appear in a path.
out.reserve( s.size() + 2 );
out.push_back( '"' );
for( char c : s )
{
if( c != '"' ) out.push_back( c );
}
out.push_back( '"' );
#else
out.reserve( s.size() + 2 );
out.push_back( '"' );
for( unsigned char c : s )
@@ -73,34 +119,51 @@ public:
}
}
out.push_back( '"' );
#endif
}
bool ResolveSymbols( const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries )
{
if( !m_addr2LinePath.length() ) return false;
std:: string escapedPath;
escapeShellParam( imagePath, escapedPath );
// Command-line length limits: cmd.exe (used by _popen on Windows) allows ~8191 characters;
// a single POSIX 'sh -c' argument is capped by MAX_ARG_STRLEN (128 KiB on Linux).
// 8000 stays under all of these, so a single conservative budget works on every platform.
const size_t maxCmdLength = 8000;
size_t entryIdx = 0;
while( entryIdx < inputEntryList.size() )
{
const size_t startIdx = entryIdx;
const size_t batchEndIdx = std::min( inputEntryList.size(), startIdx + (size_t)1024 );
printf( "Resolving symbols [%zu-%zu]\n", startIdx, batchEndIdx );
// generate a single addr2line cmd line for all addresses in one invocation
// generate a single addr2line cmd line for as many addresses as fit the length budget
std::stringstream ss;
ss << m_addr2LinePath << " -C -f -e " << escapedPath << " -a ";
for( ; entryIdx < batchEndIdx; entryIdx++ )
ss << m_addr2LinePath << " -C -f" << m_addr2LineArgs << " -e " << escapedPath << " -a ";
while( entryIdx < inputEntryList.size() )
{
const FrameEntry& entry = inputEntryList[entryIdx];
ss << " 0x" << std::hex << entry.symbolOffset;
entryIdx++;
// always include at least one address, then stop once near the length limit
if( static_cast<size_t>( ss.tellp() ) >= maxCmdLength ) break;
}
const size_t batchEndIdx = entryIdx;
std::string resultStr = ExecShellCommand( ss.str().c_str() );
printf( "Resolving symbols [%zu-%zu]\n", startIdx, batchEndIdx );
std::string cmd = ss.str();
#ifdef _WIN32
// _popen runs the command through 'cmd.exe /c', which strips the outermost pair of
// quotes. Wrap the whole command so the quoting around the (possibly spaced) tool
// and image paths survives.
cmd = "\"" + cmd + "\"";
#endif
std::string resultStr = ExecShellCommand( cmd.c_str() );
std::stringstream result( resultStr );
//printf("executing: '%s' got '%s'\n", ss.str().c_str(), result.str().c_str());
@@ -147,13 +210,13 @@ public:
private:
std::string m_addr2LinePath;
std::string m_addr2LineArgs;
};
bool ResolveSymbols( const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries )
bool ResolveSymbolsAddr2Line( const std::string& addr2lineToolPath, const std::string& addr2lineArgs,
const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries )
{
static SymbolResolver symbolResolver;
static SymbolResolver symbolResolver( addr2lineToolPath, addr2lineArgs );
return symbolResolver.ResolveSymbols( imagePath, inputEntryList, resolvedEntries );
}
#endif // #ifndef _WIN32

View File

@@ -122,8 +122,8 @@ private:
char SymbolResolver::s_symbolResolutionBuffer[symbolResolutionBufferSize];
bool ResolveSymbols( const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries )
bool ResolveSymbolsDbgHelp( const std::string& imagePath, const FrameEntryList& inputEntryList,
SymbolEntryList& resolvedEntries )
{
static SymbolResolver resolver;
return resolver.ResolveSymbolsForModule( imagePath, inputEntryList, resolvedEntries );

View File

@@ -38,7 +38,12 @@ void Usage()
printf( " c: context switches, s: sampling data, C: symbol code, S: source cache\n" );
printf( " -c: scan for source files missing in cache and add if found\n" );
printf( " -r: resolve symbols and patch callstack frames\n");
printf( " -R: reset all callstack frame symbols to unresolved (e.g. to re-run resolution)\n");
printf( " -p: substitute symbol resolution path with an alternative: \"REGEX_MATCH;REPLACEMENT\"\n");
printf( " -a: path to a custom addr2line-compatible tool to use for symbol resolution\n");
printf( " -A: extra arguments passed verbatim to the symbol resolution tool,\n");
printf( " e.g. \"--relative-address\" for llvm-addr2line on PE/Mach-O images\n");
printf( " -v: verbose output while resolving symbols\n");
printf( " -j: number of threads to use for compression (-1 to use all cores)\n" );
exit( 1 );
@@ -61,10 +66,14 @@ int main( int argc, char** argv )
bool buildDict = false;
bool cacheSource = false;
bool resolveSymbols = false;
bool resetSymbols = false;
std::vector<std::string> pathSubstitutions;
std::string addr2lineToolPath;
std::string addr2lineArgs;
bool verboseSymbols = false;
int c;
while( ( c = getopt( argc, argv, "4hez:ds:crp:j:" ) ) != -1 )
while( ( c = getopt( argc, argv, "4hez:ds:crRp:a:A:vj:" ) ) != -1 )
{
switch( c )
{
@@ -137,9 +146,21 @@ int main( int argc, char** argv )
case 'r':
resolveSymbols = true;
break;
case 'R':
resetSymbols = true;
break;
case 'p':
pathSubstitutions.push_back(optarg);
break;
case 'a':
addr2lineToolPath = optarg;
break;
case 'A':
addr2lineArgs = optarg;
break;
case 'v':
verboseSymbols = true;
break;
case 'j':
streams = atoi( optarg );
break;
@@ -171,7 +192,7 @@ int main( int argc, char** argv )
{
const auto t0 = std::chrono::high_resolution_clock::now();
const bool allowBgThreads = false;
const bool allowStringModification = resolveSymbols;
const bool allowStringModification = resolveSymbols || resetSymbols;
tracy::Worker worker( *f, (tracy::EventType::Type)events, allowBgThreads, allowStringModification );
#ifndef TRACY_NO_STATISTICS
@@ -181,7 +202,8 @@ int main( int argc, char** argv )
const auto t1 = std::chrono::high_resolution_clock::now();
if( cacheSource ) worker.CacheSourceFiles();
if( resolveSymbols ) PatchSymbols( worker, pathSubstitutions );
if( resetSymbols ) ResetSymbols( worker );
if( resolveSymbols ) PatchSymbols( worker, pathSubstitutions, addr2lineToolPath, addr2lineArgs, verboseSymbols );
auto w = std::unique_ptr<tracy::FileWrite>( tracy::FileWrite::Open( output, clev, zstdLevel, streams ) );
if( !w )