Commit Graph

1024 Commits

Author SHA1 Message Date
Bartosz Taudul
f0f579172b Force inline fast check. 2026-05-17 16:06:56 +02:00
Bartosz Taudul
4c6157d249 Remember last retrieved external check result. 2026-05-17 15:43:26 +02:00
Bartosz Taudul
6ae6fb741e Check if image is external once, before checking subframe filenames.
Cache is shared between image names and source file names, because the
underlying StringIdx storage makes indices unique. Both name sets should
be completely separate, but if you have conflicts here, you have much
more pressing problems to solve.
2026-05-17 15:30:03 +02:00
Bartosz Taudul
4ab7ef301e Split IsFrameExternalImpl into image + filename parts. 2026-05-17 15:30:03 +02:00
Bartosz Taudul
41f1172774 Change order of IsFrameExternal checks.
Check image first, then perform the expensive filename check.
2026-05-17 14:30:45 +02:00
Bartosz Taudul
4e0259148f Change IsFrameExternal interface to work with external cache.
Locks are dominating the execution time, making the global cache non-viable.
2026-05-14 22:29:51 +02:00
Bartosz Taudul
0dfd7fb20b Cache IsFrameExternal() queries. 2026-05-14 19:51:41 +02:00
Bartosz Taudul
744bd21423 Change IsFrameExternal() interface to operate on StringIdx, move to Worker. 2026-05-14 19:01:04 +02:00
Bartosz Taudul
f7ab78893c Don't query inline-symbol frames as native addresses.
Frames whose symbol data is shipped inline with the callstack payload
(sel=1, e.g. Lua-side stack entries) were being passed to
GetCanonicalPointer() in the AddCallstackAllocPayload() query loop,
tripping its sel==0 assertion. They have no native pointer to query
and were already registered in callstackFrameMap earlier in the same
function, so just skip them.

Regression from c704f909, which hoisted the per-call-site dedup into
QueryCallstackFrame(). Three of the four updated call sites were
equivalent before and after, because the old guard and the new one
keyed on the same value. The fourth, this one, was not: the old guard
tested the frame as-is and matched the entry inserted a few lines above,
short-circuiting before GetCanonicalPointer() ran. The new guard keys on
PackPointer(addr), so GetCanonicalPointer() must run first to compute
addr, and the assert fires.
2026-05-09 12:13:45 +02:00
Bartosz Taudul
305382453d Add callstack sample events with 32 and 16 bit timestamps. 2026-05-07 02:16:53 +02:00
Bartosz Taudul
ccaef5ba0b ZoneBegin / ZoneBeginCallstack with 32 and 16 bit time data. 2026-05-07 02:16:53 +02:00
Bartosz Taudul
4d094c108d Add zone end messages with 32 and 16 byte time deltas.
Change in test application:
    compressed data: 130 Mpbps -> 105 Mbps
    uncompressed: 830 Mbps -> 740 Mbps
2026-05-06 19:09:19 +02:00
Bartosz Taudul
d6e77b3f40 Remove server query fast path.
The profiler will typically want to send bursts of queries (e.g. 3 queries
to retrieve source location strings, or multiple queries to get all the call
stack frames, etc.).

Each of these queries will be sent immediately, if available space in the
network buffer permits. Each of these sends is a separate syscall.

Remove this and instead batch all queries with the already existing network
buffer overflow handling functionality.
2026-05-06 00:42:24 +02:00
Bartosz Taudul
e300d56f68 Force resort of possibly broken plots. 2026-03-27 20:16:27 +01:00
Bartosz Taudul
d0222ef7d9 Worker::GetZoneEnd() can be const. 2026-03-19 19:53:58 +01:00
Bartosz Taudul
9442517f30 Remove trailing whitespace. 2026-03-02 19:40:41 +01:00
Bartosz Taudul
2b201e6f59 RetrieveThread() uses a cache that cannot be shared between threads.
Note: IsThreadFiber() uses the same functionality, but is only called from
the main thread.
2026-02-01 18:01:39 +01:00
Bartosz Taudul
73694c7a24 Cleanup enums. 2026-01-24 01:50:11 +01:00
Clément Grégoire
5ef64841cc Remove MessageMetadata type and replace by uint8_t everywhere 2025-12-28 15:04:18 +01:00
Clément Grégoire
1e61dc88de Add source and severity to the server's MessageData + bump minor version for serialization
Note this does not change `sizeof(MessageData)` as there were 5 bytes left due to alignment. (now 3)
2025-12-28 14:50:38 +01:00
Clément Grégoire
f981330f66 Replace all messages text addr by TaggedUserlandAddress and send metadata over the network
There are two changes to the protocol:

- `QueueMessageLiteral*` were changed and what used to be addresses are now addresses+metadata
- Other messages now send `QueueMessage*Metadata` with added metadata.

This will later be used to store and transmit message sources, level, etc.
2025-12-28 14:44:40 +01:00
Trevor L. McDonell
d1b0406801 Add option to ignore memory free faults
This replaces the IsApple flag, which was previously only used for this purpose.
2025-12-02 16:16:50 +01:00
Clément Grégoire
255f465a8f Fix uninitialized Worker::m_pending* loaded traces.
This was causing issues in the Infos -> Trace Statistics window as `GetCallstackFrameCount` uses `m_pendingCallstackFrames`. Just in case, init those all those variables where declared instead of constructor.
2025-10-08 07:47:51 +02:00
Antoine Mura
80126ed1e0 Add lock struct with condition variable to main thread 2025-07-27 11:05:45 +02:00
Bartosz Taudul
3c1c444a15 Unbreak loading traces from previous versions. 2025-07-22 20:18:55 +02:00
Bartosz Taudul
c03fdaec1e Merge pull request #1097 from erieaton-amd/rocprofv3-2
Collect dispatches and counter values with Rocprofv3
2025-07-22 13:33:15 +02:00
Eric Eaton
1639598d62 Update documentation
This provides some instructions and tips for the manual. Also:
* Made the calibration feature a CMake option
* Cleaned up some minor code issues
* Fixed an issue with the calibration
* Incremented patch number
2025-07-21 15:30:42 -07:00
Bartosz Taudul
2f17e33851 Prevent duplicate callstack frame queries.
Callstack frames will now have nullptr as the value in the callstackFrameMap
map, as a way to signal that a query for given key is already pending.
Duplicate queries should no longer happen.

@slomp provided alternative implementation, which produced the following
results:

Queries made: 195,778
Duplicate queries skipped: 9,518,910

Co-authored-by: Marcos Slomp <slomp@adobe.com>
2025-07-18 01:16:46 +02:00
Bartosz Taudul
c704f909be Move duplication check into QueryCallstackFrame() to make things clear. 2025-07-18 01:15:16 +02:00
Bartosz Taudul
c4a6cf3456 Use contains() to check if map contains element. 2025-07-18 00:45:26 +02:00
Bartosz Taudul
f578d14553 Extract callstack frame query to a separate function. 2025-07-18 00:33:09 +02:00
Bartosz Taudul
38ff7a6697 Make sure count is right. 2025-07-18 00:24:23 +02:00
Bartosz Taudul
320eb67581 Assume callstackFrameMap can store null ptrs. 2025-07-18 00:23:43 +02:00
Eric Eaton
9caa91f06f Move the annotations data to the GPU context 2025-07-16 16:02:54 -07:00
Eric Eaton
21a34c5a38 Fix windows build error 2025-07-14 12:58:47 -07:00
Eric Eaton
114f6ef096 Supply the correct thread ID for annotations 2025-07-11 17:18:17 -07:00
Eric Eaton
3fce5c1280 Use a map to record counter values
This removes the limitation of 10 counters.
2025-07-11 17:17:05 -07:00
Eric Eaton
1494fe5671 Apply code style 2025-07-11 17:17:05 -07:00
Eric Eaton
9413436ba1 Search backward for the correct zone
Needed to make tracing more intense applications work.
2025-07-11 17:16:00 -07:00
Eric Eaton
d1f9df8058 Get counter names from environment variable
Allows user to customize the collected information with environment
variable TRACY_ROCPROF_COUNTERS.
2025-07-11 17:16:00 -07:00
Eric Eaton
a754db16f8 Show counter name in GUI 2025-07-11 17:16:00 -07:00
Eric Eaton
2e49bdf4cb Add counter value collection 2025-07-11 17:16:00 -07:00
Bartosz Taudul
880c600506 Remove queue delay calibration.
This value is not used for anything, it was just a number displayed in
the UI without much meaning to anyone.

Operations on the queue during early init may not work correctly, stopping
some programs from running past the calibration loop.
2025-07-11 23:23:31 +02:00
jcl1234
791904a37f Fix FiberEnter start time always being processed as 0 2025-07-01 22:38:10 -05:00
Clément Grégoire
c100b32b48 Fix crash when reaching the source location limit
Instead of crashing when reaching the maximum number of source locations, display an empty source location ( with "???" everywhere).
Keeping the assert for discoverability of the limit in debug, but ensure profiler won't crash later on (or in release).
2025-05-12 13:49:31 +02:00
Bartosz Taudul
5140a5a411 Merge pull request #1021 from siliceum/wakeup
Thread wakeup visualization
2025-05-10 13:55:51 +02:00
Tomaž Vöröš
a088ebe337 fix some buffer sizes 2025-05-10 00:27:32 +02:00
Clément Grégoire
2c2d126967 Remove added spaces 2025-05-05 15:52:54 +02:00
Clément Grégoire
58401a93ab Fix file read when skipping context switches data 2025-05-05 15:29:30 +02:00
Clément Grégoire
31859042bd Remove ContextSwitchData SetStartCpu and SetEndReasonState 2025-04-09 17:02:51 +02:00