Compare commits

..

2841 Commits
v0.1 ... v0.6

Author SHA1 Message Date
Bartosz Taudul
31b6ff4bae Release 0.6.0. 2019-11-17 19:56:42 +01:00
Bartosz Taudul
3854ae11b2 Revert "Remove dead code."
This reverts commit a36b73f745.
2019-11-17 17:38:02 +01:00
Bartosz Taudul
c62732804a Round CPU data font height. 2019-11-17 14:30:10 +01:00
Bartosz Taudul
1e29d12819 More saturation in dynamic colors. 2019-11-16 23:03:46 +01:00
Bartosz Taudul
8ca67e49e4 Scale frame images in tooltips according to DPI scaling. 2019-11-16 22:58:51 +01:00
Bartosz Taudul
2d22372de3 Scale playback contents according to DPI scale. 2019-11-16 22:54:52 +01:00
Bartosz Taudul
37a658d933 Add srcloc color box to zone tooltips. 2019-11-16 22:38:43 +01:00
Bartosz Taudul
a36b73f745 Remove dead code. 2019-11-16 18:34:05 +01:00
Bartosz Taudul
134f108a30 Update NEWS. 2019-11-16 17:58:04 +01:00
Bartosz Taudul
f670d82796 Show migrations when thread is hovered in CPU data window. 2019-11-16 16:51:56 +01:00
Bartosz Taudul
2318bc8553 Pre-v0.5 support will be dropped after v0.6. 2019-11-16 16:38:18 +01:00
Bartosz Taudul
41f9dc0aa1 Cosmetics. 2019-11-16 16:37:08 +01:00
Bartosz Taudul
d9f71643ac Lock event time is known, don't reconstruct it. 2019-11-15 22:50:08 +01:00
Bartosz Taudul
a46731996d Thread list size is known from iteration. 2019-11-15 22:44:44 +01:00
Bartosz Taudul
db930f7f93 Reserve space for thread map, list. 2019-11-15 22:44:36 +01:00
Bartosz Taudul
18fd928a9d Don't display callstack message column if there are no callstacks. 2019-11-15 20:34:19 +01:00
Bartosz Taudul
31e1558467 Use standard includes. 2019-11-15 20:17:55 +01:00
Bartosz Taudul
d7d6a0fa9d More consistent srcloc/thread colors in zone info windows. 2019-11-15 20:13:13 +01:00
Bartosz Taudul
a15e83e590 Use default zone coloring for Lua zones. 2019-11-15 20:07:01 +01:00
Bartosz Taudul
49e3bc8b21 Don't draw unneeded separator. 2019-11-15 20:04:59 +01:00
Bartosz Taudul
5f0cab6b63 Display call stack calls in memory allocation window. 2019-11-15 20:02:21 +01:00
Bartosz Taudul
973fd941d5 Extract call stack calls drawing functionality. 2019-11-15 19:59:13 +01:00
Bartosz Taudul
a518564006 No extended font in no-extended-font path. 2019-11-15 19:58:50 +01:00
Bartosz Taudul
8fd019b474 Update manual. 2019-11-15 01:37:19 +01:00
Bartosz Taudul
26ab2f633f Update NEWS. 2019-11-15 01:23:01 +01:00
Bartosz Taudul
12037b88ff Display messages callstack in messages list. 2019-11-15 01:22:26 +01:00
Bartosz Taudul
49945c7198 Process message callstacks. 2019-11-15 01:22:26 +01:00
Bartosz Taudul
60ae748635 Add C API no-callstack redirect macros. 2019-11-14 23:53:35 +01:00
Bartosz Taudul
95e3fb1663 Add missing C API empty macros. 2019-11-14 23:51:01 +01:00
Bartosz Taudul
9f53150a6a Update macros. 2019-11-14 23:50:52 +01:00
Bartosz Taudul
8286b0b72f Plumbing for message call stacks. 2019-11-14 23:40:41 +01:00
Bartosz Taudul
0befc75f83 Fix conflicts with X.h. 2019-11-14 18:24:29 +01:00
Bartosz Taudul
9cf46e6ae6 Fix lock time announce/terminate in older traces. 2019-11-13 02:04:35 +01:00
Bartosz Taudul
ce997cf6b0 Update manual. 2019-11-11 22:11:26 +01:00
Bartosz Taudul
f7ff0781b6 Properly set background done state in no-statistics builds. 2019-11-11 00:20:33 +01:00
Bartosz Taudul
b946c1d39e Only enable magic fitted vectors in no-statistics builds.
Source location zones pointer fixup is just too slow to be feasible.

Note: no-statistics builds of the graphical profiler don't perform fixup
of view-related pointers (e.g. zone info window zone pointer). This
won't cause crashes, because the pointers are still valid, but the
displayed data will be incorrect and potentially changing in time, as
the pointer can be reused for completely other zone.

Memory usage of ToyPathTracer data, in various scenarios:

Capture + statistics:   7121 MB
Load + statistics:      6057 MB
Capture - statistics:   4876 MB
Load - statistics:      4521 MB
2019-11-11 00:20:33 +01:00
Bartosz Taudul
e1e3bbbe3e Fixup source location zones pointers. 2019-11-11 00:20:33 +01:00
Bartosz Taudul
ae33aa4869 Fitted zone vectors are now magic vectors.
The pointed-to zones in the original children vector can't be freed, so
they are put into a zone pool for re-use by future zones.
2019-11-11 00:20:33 +01:00
Bartosz Taudul
4f962d2fcc Add ZoneEvent re-use pool. 2019-11-11 00:20:33 +01:00
Bartosz Taudul
85ae52b725 Update manual. 2019-11-10 23:30:49 +01:00
Bartosz Taudul
fd7ad586af Make display of zone time in frames toggleable.
And disable it by default, as it can be heavy on resources.
2019-11-10 23:27:37 +01:00
Bartosz Taudul
fa53c2e683 Don't care about memory usage tracking data races. 2019-11-10 19:21:24 +01:00
Bartosz Taudul
9504d6c68f Don't try to delete empty Vectors. 2019-11-10 17:54:50 +01:00
Bartosz Taudul
f2801491bf Don't copy back pointer. 2019-11-10 17:48:54 +01:00
Bartosz Taudul
44f1d3dc1c Use proper memory ordering. 2019-11-10 17:30:38 +01:00
Bartosz Taudul
d4a1168491 Messages are inserted for current thread context. 2019-11-10 17:23:04 +01:00
Bartosz Taudul
003bed573c Use ThreadData cache in zone validation. 2019-11-10 17:20:55 +01:00
Bartosz Taudul
b1c88cd1f2 Cache ThreadData pointer for current thread context. 2019-11-10 17:17:07 +01:00
Bartosz Taudul
ded49edf4c Fix magic vectors in single-threaded Vulkan tooltip. 2019-11-10 16:50:19 +01:00
Bartosz Taudul
672093cf0e Adapt WriteTimeline() to magic vectors. 2019-11-10 16:34:38 +01:00
Bartosz Taudul
4eb8acc973 Magic vectors in automatic GPU drift detection. 2019-11-10 02:27:46 +01:00
Bartosz Taudul
1b6c79fa7b More magic vector fixes. 2019-11-10 02:10:21 +01:00
Bartosz Taudul
226a7b7cfb Magic vectors in GPU children list. 2019-11-10 02:03:31 +01:00
Bartosz Taudul
c65d524725 Magic vectors in GPU zone info window. 2019-11-10 01:59:20 +01:00
Bartosz Taudul
d32e3cb867 Adapt GPU zone utility functions to magic vectors. 2019-11-10 01:56:28 +01:00
Bartosz Taudul
9b52152e77 Adapt GetZoneEnd() for magic vectors. 2019-11-10 01:43:28 +01:00
Bartosz Taudul
7c277234e7 Load GPU zones into magic vectors. 2019-11-10 01:36:13 +01:00
Bartosz Taudul
4ed4e1005c Magic vectors in GPU drawing setup. 2019-11-10 01:35:57 +01:00
Bartosz Taudul
675e6a8d1a Support magic vectors for GPU zones. 2019-11-10 01:30:10 +01:00
Bartosz Taudul
06ad948abc Adapt zone children to magic vectors. 2019-11-10 01:23:44 +01:00
Bartosz Taudul
50efa8f672 Adapt time distribution calculation to magic vectors. 2019-11-10 01:08:15 +01:00
Bartosz Taudul
0c1f3ac16d Adapt zone getters to magic vectors. 2019-11-10 00:57:44 +01:00
Bartosz Taudul
f8edd3a37b Zone statistics reconstructions has to use magic vectors. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
065ba4ce5a Load zones into magic vectors. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
8ab2cf09b7 Handle magic vectors during dispatch. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
60c2b53d47 Add magic field to Vector. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
7be19193d9 Use adapters during zone level iteration. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
85e7125fee Add Vector iterator adapters. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
40e9c8807d Remove unused lambda capture. 2019-11-10 00:00:15 +01:00
Bartosz Taudul
3a317c81c6 Fix logic error. 2019-11-09 23:57:08 +01:00
Bartosz Taudul
b3698ebb0f Merge read calls. 2019-11-09 00:48:20 +01:00
Bartosz Taudul
3e65532eaa Add Read3(), Read4() helpers. 2019-11-09 00:27:49 +01:00
Bartosz Taudul
2131eed4e7 Support multiple types in Read2(). 2019-11-09 00:25:12 +01:00
Bartosz Taudul
e80a19234e Don't store and read compressed thread. 2019-11-09 00:23:09 +01:00
Bartosz Taudul
467d675262 Zone reads can be merged. 2019-11-09 00:08:26 +01:00
Bartosz Taudul
23c59a6fc9 Use query cache. 2019-11-08 23:59:20 +01:00
Bartosz Taudul
ec895372b7 Thread is not needed in ReadTimeline(). 2019-11-08 23:56:11 +01:00
Bartosz Taudul
6ec734c264 Split ReadTimelineUpdateStatistics(). 2019-11-08 23:53:43 +01:00
Bartosz Taudul
c20da5ea70 Move unimportant fields to back of FileRead class. 2019-11-08 23:31:17 +01:00
Bartosz Taudul
31e2bc1141 Free Vector's memory during move assignment. 2019-11-08 22:52:23 +01:00
Bartosz Taudul
a1488a74a1 Perform Vector's swap() as a bitwise move. 2019-11-08 22:50:22 +01:00
Bartosz Taudul
b6213cfbc5 Define Vector's max capacity in one place. 2019-11-08 22:48:44 +01:00
Bartosz Taudul
4b0654afe5 Update manual. 2019-11-07 23:59:12 +01:00
Bartosz Taudul
5df7444cbb Replace djb hash with xxh3. 2019-11-07 23:52:52 +01:00
Bartosz Taudul
17ee1aed5f Add xxhash.
https://github.com/Cyan4973/xxHash/tree/master
e2f4695899e831171ecd2e780078474712ea61d3
2019-11-07 23:52:12 +01:00
Bartosz Taudul
4a9138fc51 Reduce FrameEvent size by 4 bytes.
While it would be nice to store frame times on 48 bytes, it is not
currently possible, as older traces have full 64 bit frame time stamps,
which are only then offset to first frame start time.
2019-11-07 23:05:13 +01:00
Bartosz Taudul
77a449a8f0 Update manual. 2019-11-07 22:37:11 +01:00
Bartosz Taudul
675cbc51cc Store memory free indices as 32 bit.
More than 4 billion memory events seems unlikely.

Memory savings in "mem" trace: 5747 MB -> 5427 MB.
2019-11-07 22:36:51 +01:00
Bartosz Taudul
655864eb7c Enable crash handler on cygwin.
Crash is properly recorded, but the profiler hangs while waiting for
shutdown finish.
2019-11-07 19:20:13 +01:00
Bartosz Taudul
3fd74a92f9 Native threads are used on mingw. 2019-11-07 19:02:54 +01:00
Bartosz Taudul
0f6101b19a Fix mingw/cygwin thread name setter/getter. 2019-11-07 18:58:08 +01:00
Bartosz Taudul
bb2d44ae08 All time deltas must be processed. 2019-11-07 16:14:23 +01:00
Bartosz Taudul
351e220d30 Don't calculate queue delay if delayed init is used.
Queue calibration requires queue access during profiler construction. This
in turn requires construction of profiler data block, *which at this point
is underway*, because the profiler is being constructed.
2019-06-19 17:29:04 +02:00
Bartosz Taudul
c98f1f0b6b Make sure profiler is initialized only once in delayed init scenario. 2019-06-19 17:28:18 +02:00
Bartosz Taudul
9702461b09 Display elapsed time in capture utility. 2019-11-07 01:51:45 +01:00
Bartosz Taudul
ea2c329510 Input data *must not* be changed.
Not even for a short moment.
2019-11-07 01:29:11 +01:00
Bartosz Taudul
4a4fe82a1b No need to inject string terminator.
Comparison in m_data.stringMap already takes string size into account,
as an charutil::StringKey optimization.
2019-11-07 01:28:29 +01:00
Bartosz Taudul
dfad9695d2 Compress frame image data right as it arrives.
This removes the need to store temporary uncompressed image buffers,
which involves constant memory allocation and freeing. Instead, just one
permanent buffer is used, and only because the input data cannot change
during processing.
2019-11-06 23:29:59 +01:00
Bartosz Taudul
46d33f45bf Frame image packer doesn't care about width and height. 2019-11-06 22:53:01 +01:00
Bartosz Taudul
10a3516099 Delete uncompressed frame image data. 2019-11-06 22:38:19 +01:00
Bartosz Taudul
d741fb0af9 Plot can be empty if it was only configured. 2019-11-06 12:08:20 +01:00
Bartosz Taudul
d4f58ddaf3 Use native windows threads on cygwin, mingw. 2019-11-06 01:42:14 +01:00
Bartosz Taudul
df0e28a61f Remove more unneeded includes. 2019-11-06 01:37:58 +01:00
Bartosz Taudul
3abdd7cdaf Remove LZ4 include from TracyProtocol.hpp. 2019-11-06 01:30:20 +01:00
Bartosz Taudul
f53637891a Remove LZ4 include from TracyWorker.hpp. 2019-11-06 01:25:38 +01:00
Bartosz Taudul
5d3392428e Remove unneeded includes. 2019-11-06 01:21:22 +01:00
Bartosz Taudul
6015c964a9 Enable LZ4 fast decompression loop on MSVC. 2019-11-05 22:00:13 +01:00
Bartosz Taudul
ca198e44d3 Remove dead code from concurrentqueue. 2019-11-05 21:40:52 +01:00
Bartosz Taudul
b5590ed197 Include <mutex> for std::once. 2019-11-05 21:40:35 +01:00
Bartosz Taudul
3e9bb80217 More header cleanup. 2019-11-05 20:15:53 +01:00
Bartosz Taudul
6bbf273581 Partial header inclusion cleanup. 2019-11-05 20:09:40 +01:00
Bartosz Taudul
25c39a3311 Update manual. 2019-11-05 18:16:58 +01:00
Bartosz Taudul
c558a9a436 Update NEWS. 2019-11-05 18:10:32 +01:00
Bartosz Taudul
cfce429fca Format plot values according to requested formatting. 2019-11-05 18:08:42 +01:00
Bartosz Taudul
661c4a417b Process and store plot value formatting. 2019-11-05 18:02:08 +01:00
Bartosz Taudul
907574e637 Allow remote plot configuration. 2019-11-05 17:45:19 +01:00
Bartosz Taudul
a7a739eea9 Use precalculated context switch usage data. 2019-11-05 01:41:27 +01:00
Bartosz Taudul
51090e5fb9 Implement ctx switch usage reconstruction. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
8128b3894a Add vector debug macro.
Natvis is lacking in functionality, so this has to do.
2019-11-05 01:28:44 +01:00
Bartosz Taudul
946e328198 Fix 32-bit short_ptr. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
6a500ccdb3 Don't display CPU usage until data is ready. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
50b96c757e Context switch usage reconstruction skeleton. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
a62c4135ad Add context switch usage struct. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
09d6f3f917 Check if CPU graph is not obscured. 2019-11-04 01:15:49 +01:00
Bartosz Taudul
9bc6a3e0ee Add zone color boxes to parent groups in find zone menu. 2019-11-03 22:52:24 +01:00
Bartosz Taudul
68bc82c11b Simplify README. 2019-11-03 22:45:30 +01:00
Bartosz Taudul
9034c9d9e6 Update profiler screenshot. 2019-11-03 22:39:14 +01:00
Bartosz Taudul
209c1fdc72 Small radio buttons in find zone menu. 2019-11-03 22:32:34 +01:00
Bartosz Taudul
f34609fd9b Set per-cpu kernel buffer size to 512 KB.
The default setting was causing events to be lost on Android.
2019-11-03 21:52:20 +01:00
Bartosz Taudul
b8d459d48b Use proper string size (for consistency).
On Android code path this value is ignored.
2019-11-03 21:51:49 +01:00
Bartosz Taudul
9b5ec8451f Remove dead assignment. 2019-11-03 16:57:31 +01:00
Bartosz Taudul
dfc35c1bf1 Fix crashes when callstack frames are not yet available. 2019-11-03 16:44:26 +01:00
Bartosz Taudul
5620597fb4 Use short ptr in VarArray. 2019-11-03 16:29:45 +01:00
Bartosz Taudul
390558b627 Update memory requirements. 2019-11-03 16:29:45 +01:00
Bartosz Taudul
1b33bfd522 Update manual. 2019-11-03 16:29:45 +01:00
Bartosz Taudul
d9c3238462 Save 2 bytes per PlotItem.
Memory savings:

android     2614 MB -> 2487 MB (95%)
chicken     1932 MB -> 1852 MB (95%)
mem         6067 MB -> 5747 MB (94%)
q3bsp-mt    5059 MB -> 5017 MB (99%)
q3bsp-st    1211 MB -> 1171 MB (96%)
2019-11-03 16:29:45 +01:00
Bartosz Taudul
29dcc5c8bc Don't zero-initialize Int48. 2019-11-03 14:33:13 +01:00
Bartosz Taudul
acce6867f1 Selecting a zone in time distribution list opens zone statistics. 2019-11-03 03:08:23 +01:00
Bartosz Taudul
13a7444f03 Add zone color boxes to time distribution table. 2019-11-02 23:14:49 +01:00
Bartosz Taudul
c294e62f5e Add zone color boxes to child zone list. 2019-11-02 23:11:37 +01:00
Bartosz Taudul
1a6f04f6ce Add zone color boxes to zone trace. 2019-11-02 23:05:11 +01:00
Bartosz Taudul
3a304ad054 Add zone color boxes to statistics menu. 2019-11-02 23:00:42 +01:00
Bartosz Taudul
04cb7732b8 Add zone color boxes to compare menu. 2019-11-02 22:58:50 +01:00
Bartosz Taudul
4dde1ca070 Add zone color boxes to find zone menu. 2019-11-02 22:48:00 +01:00
Bartosz Taudul
b7cd28ef72 Add source location color retriever. 2019-11-02 22:45:11 +01:00
Bartosz Taudul
8d7299fe1f Get 64-bit file size. 2019-11-02 22:11:40 +01:00
Bartosz Taudul
4bc1588a5e Clear proper vector. 2019-11-02 16:57:18 +01:00
Bartosz Taudul
ce82bb816b Use short ptr for find zone grouping data.
Overall, the short ptr changes have the following effect on memory
usage:

big         9007 MB -> 8670 MB (96%)
chicken     2007 MB -> 1932 MB (96%)
drl-l-b     1383 MB -> 1304 MB (94%)
q3bsp-mt    5252 MB -> 5059 MB (96%)
long        5152 MB -> 4799 MB (93%)
fi-big      4141 MB -> 4000 MB (96%)
2019-11-02 16:54:12 +01:00
Bartosz Taudul
0df29d1e0b Use short ptr for source location payload data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
04c92f8d19 Use short ptr for callstack payload storage. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
b0e52f20f8 Use short ptr for FrameImage storage. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
72efbe28ed Use short ptr for message data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
52062f96d0 Use short ptr for GPU context map. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
308c280e40 Use short ptr for GPU context query data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
1e4022e05b Use proper comparison. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
03656b2320 Remove unused variable. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
a40bbacb17 Use short ptr for CPU zone data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
cb20bf01f9 Use short ptr for GPU zone data. 2019-11-02 16:54:11 +01:00
Bartosz Taudul
c7664b0a98 Use short ptr in LockEventPtr. 2019-11-02 16:17:45 +01:00
Bartosz Taudul
181d16459c Use short ptr for Vector data. 2019-11-02 16:17:45 +01:00
Bartosz Taudul
ea23d2b91a Use short ptr for frame images. 2019-11-02 15:43:32 +01:00
Bartosz Taudul
2a28c6cc72 Use short ptr for callstack frame data. 2019-11-02 15:43:32 +01:00
Bartosz Taudul
654f54d877 Add short pointer class, storing 6 bytes. 2019-11-02 15:43:32 +01:00
Bartosz Taudul
45ff14d678 Fix saving source location payload data. 2019-11-02 14:28:59 +01:00
Bartosz Taudul
16bc862904 Save sizes of children vectors to prevent reallocation. 2019-11-02 12:38:32 +01:00
Bartosz Taudul
c99dc5c431 Disable SetGpuStart() assert for compat with old traces.
Currently the unknown GPU start is indicated by a -1 value, but it was
maximum int value previously. While the assert check is valid for newly
created traces, it will fire off if an older trace is loaded.

Temporarily disabling the check (effectively until only 0.6 traces are
supported) fixes the problem, as the max int value (0x7f...) has its
high bits removed and the low bytes will be sign extended during number
reconstruction, making it -1, as intended.
2019-11-02 02:41:51 +01:00
Bartosz Taudul
a9738deae7 Update manual. 2019-11-01 20:49:02 +01:00
Bartosz Taudul
b4103c56a5 Update NEWS. 2019-11-01 20:49:02 +01:00
Bartosz Taudul
0552d75400 Allow filtering entries in statistics menu. 2019-11-01 20:49:02 +01:00
Bartosz Taudul
5ff40b05b3 Update manual. 2019-11-01 20:29:02 +01:00
Bartosz Taudul
f88ec0c141 Convert namespaces combo box to radio buttons. 2019-11-01 20:23:22 +01:00
Bartosz Taudul
13b656fe61 Make srcloc dynamic color depend on function name. 2019-11-01 20:17:25 +01:00
Bartosz Taudul
ca0fae33d1 Remove obsolete assert.
Before-terminate-events now include events that have time delta
processing, with no memory to free.
2019-11-01 20:10:24 +01:00
Bartosz Taudul
2d46b50dd0 Update manual. 2019-11-01 02:13:02 +01:00
Bartosz Taudul
bb25c82110 Update NEWS. 2019-11-01 02:08:47 +01:00
Bartosz Taudul
d38257ea90 Add zone coloring mode based on source location. 2019-11-01 02:07:55 +01:00
Bartosz Taudul
39988ad636 Check for shutdown in background processing thread. 2019-10-31 21:41:21 +01:00
Bartosz Taudul
6a6009dbdf Update manual. 2019-10-31 15:00:35 +01:00
Bartosz Taudul
190dd456a7 Update NEWS. 2019-10-31 15:00:35 +01:00
Bartosz Taudul
978071f2ba Allow grouping zones by parent. 2019-10-31 15:00:22 +01:00
Bartosz Taudul
c0df3dd965 Implement getting zone parent when thread id is known. 2019-10-31 14:59:52 +01:00
Bartosz Taudul
456deefdbc Keep child idx on stack. 2019-10-30 23:55:21 +01:00
Bartosz Taudul
25b610a36f Pack child into GPU start/end in GpuEvent (saves 4 bytes).
long    5152 MB -> 5061 MB
2019-10-30 23:50:37 +01:00
Bartosz Taudul
7319293081 Use proper scale for next time of collapsed items. 2019-10-30 23:17:46 +01:00
Bartosz Taudul
e8286600d1 Use -1 as invalid GPU start time. 2019-10-30 23:12:43 +01:00
Bartosz Taudul
7ce8c772ad Disallow negative GPU times.
Shouldn't happen, but GPU timestamps are a shitshow, so better be safe
than sorry.
2019-10-30 22:37:07 +01:00
Bartosz Taudul
0ac432dd25 Better GPU time check. 2019-10-30 22:35:58 +01:00
Bartosz Taudul
ae4794ab4c Save 2 bytes in ContextSwitchData and ContextSwitchCpu. 2019-10-30 22:25:46 +01:00
Bartosz Taudul
99d198d0bf Pack csAlloc in MemEvent (saves 3 bytes).
Memory usage change on selected traces:

android     2699 MB -> 2613 MB
chicken     2019 MB -> 2007 MB
mem         6308 MB -> 6068 MB
q3bsp-mt    5283 MB -> 5252 MB
q3bsp-st    1241 MB -> 1211 MB
2019-10-30 22:01:13 +01:00
Bartosz Taudul
94da3b8467 Update manual. 2019-10-29 23:11:08 +01:00
Bartosz Taudul
d54ff0f9c2 Update NEWS. 2019-10-29 23:11:03 +01:00
Bartosz Taudul
1f0c18882c Don't collect sys time after application has exited. 2019-10-29 23:05:14 +01:00
Bartosz Taudul
079e21ea43 Leave two threads for smooth operation of profiler. 2019-10-29 22:53:03 +01:00
Bartosz Taudul
3e19fbc2fb Instrument functions. 2019-10-29 22:45:30 +01:00
Bartosz Taudul
516ec6883d Limit number of rendered frames. 2019-10-29 22:45:01 +01:00
Bartosz Taudul
5bcf288333 Integrate Tracy. 2019-10-29 22:27:04 +01:00
Bartosz Taudul
546eeda1cd Ignore compiled shaders. 2019-10-29 22:25:10 +01:00
Bartosz Taudul
0b1eff8b0d Add aras-p's ToyPathTracer.
https://github.com/aras-p/ToyPathTracer
b076563906169aa2f9e6d7218ef85decf81f8f72
2019-10-29 22:21:34 +01:00
Bartosz Taudul
789b95f259 Force inline small functions. 2019-10-29 01:32:09 +01:00
Bartosz Taudul
8c8f15c420 Force inline Slab::AllocInit(). 2019-10-29 01:19:40 +01:00
Bartosz Taudul
0ceba49d78 Update NEWS. 2019-10-28 23:44:54 +01:00
Bartosz Taudul
706e031046 Update manual. 2019-10-28 23:43:44 +01:00
Bartosz Taudul
6f0dc2885f Fix connection abort. 2019-10-28 23:32:51 +01:00
Bartosz Taudul
8050622b0f Read and decompress network data on a separate thread. 2019-10-28 23:22:50 +01:00
Bartosz Taudul
e0356ae01e Cosmetics. 2019-10-28 22:53:06 +01:00
Bartosz Taudul
99b7e8ad92 Close socket when shutting down. 2019-10-28 22:52:52 +01:00
Bartosz Taudul
788ca2e5df Spawn no-op network thread. 2019-10-28 22:45:10 +01:00
Bartosz Taudul
fb71800557 Update manual. 2019-10-28 22:15:12 +01:00
Bartosz Taudul
106411e1f6 Add missing freeaddrinfo(). 2019-10-27 13:39:01 +01:00
Bartosz Taudul
5956366118 Need to explicitly specify gl3w as OpenGL loader. 2019-10-27 12:45:31 +01:00
Bartosz Taudul
7f07f5beb4 Free child time stack. 2019-10-26 23:32:16 +02:00
Bartosz Taudul
312b7190f8 Mention that only release builds should be profiled. 2019-10-26 16:59:54 +02:00
Bartosz Taudul
f024a05a01 Document another funny optimization. 2019-10-26 16:49:52 +02:00
Bartosz Taudul
01985f50ef Cache source location zones counter search. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
dfe99c2604 Update capture utility in the manual. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
f1fe2df780 Add data transferred display to capture utility. 2019-10-26 16:18:03 +02:00
Bartosz Taudul
dda192985a General updates to the manual. 2019-10-26 16:05:43 +02:00
Bartosz Taudul
0b142a7b29 Remove FAQ. 2019-10-26 15:33:40 +02:00
Bartosz Taudul
492b7f9134 Update connection speed in the manual. 2019-10-26 14:37:45 +02:00
Bartosz Taudul
6aab54cfc4 Improve frame time graph in the manual. 2019-10-26 14:10:47 +02:00
Bartosz Taudul
f7155d7a77 Update context switches in the manual. 2019-10-26 14:00:32 +02:00
Bartosz Taudul
cccabe9b64 Update connection popup in the manual. 2019-10-26 13:54:57 +02:00
Bartosz Taudul
1d0084aa28 Add cache for last accessed source location zones. 2019-10-25 21:29:55 +02:00
Bartosz Taudul
b5419944aa Only write to memory if value has changed. 2019-10-25 21:28:55 +02:00
Bartosz Taudul
779063a18b Cache last shrinked source location. 2019-10-25 21:07:28 +02:00
Bartosz Taudul
294793367f Cache last CheckSourceLocation query.
Just knowing that the query was performed is enough here -- this
function adds a new source location entry, if there already isn't one.
2019-10-25 21:01:33 +02:00
Bartosz Taudul
0f2503d334 Send time deltas in GPU time events. 2019-10-25 19:52:01 +02:00
Bartosz Taudul
1ce25d3aef Init cache in-place. 2019-10-25 19:19:35 +02:00
Bartosz Taudul
8fa5188176 Send delta times for context switches. 2019-10-25 19:13:11 +02:00
Bartosz Taudul
25b3cdc1ee Send thread wakeups when handling disconnect request. 2019-10-25 18:22:42 +02:00
Bartosz Taudul
c8e5489e99 Group caches together. 2019-10-25 18:16:27 +02:00
Bartosz Taudul
29c42cc8d7 Fix assert. 2019-10-25 01:00:32 +02:00
Bartosz Taudul
17a51c898e No need to check if vector is empty. 2019-10-25 00:54:46 +02:00
Bartosz Taudul
b5e759bc5a Don't calculate child index twice. 2019-10-25 00:54:46 +02:00
Bartosz Taudul
70f1074490 Don't iterate over children to calculate zone self time. 2019-10-25 00:33:44 +02:00
Bartosz Taudul
d6a8a8532f Prevent storing variable on stack. 2019-10-24 23:40:21 +02:00
Bartosz Taudul
1fe76be955 Don't reconstruct lock event time on insert. 2019-10-24 23:25:04 +02:00
Bartosz Taudul
b83d0f46d9 Improve updating last time.
Avoid LHS, don't write if don't need to.
2019-10-24 23:23:52 +02:00
Bartosz Taudul
721f3c8925 Callstack is already zero-initialized. 2019-10-24 23:05:39 +02:00
Bartosz Taudul
45332fd837 Don't read memory when setting values. 2019-10-24 23:03:13 +02:00
Bartosz Taudul
c9da5f1474 Use cached thread retriever. 2019-10-24 22:34:18 +02:00
Bartosz Taudul
5873561b54 Add cached thread retriever. 2019-10-24 22:33:48 +02:00
Bartosz Taudul
06bc802107 Avoid load-hit-store. 2019-10-24 22:24:00 +02:00
Bartosz Taudul
04b132b6e2 Check if requested data size doesn't overflow buffer. 2019-10-24 21:22:22 +02:00
Bartosz Taudul
01ceedb57a Focus out labels in connection window. 2019-10-24 00:54:19 +02:00
Bartosz Taudul
c5a6c7bf63 Display transferred data size. 2019-10-24 00:47:25 +02:00
Bartosz Taudul
1cfb5adc44 Count transferred data size. 2019-10-24 00:47:16 +02:00
Bartosz Taudul
2d31ca993e Update NEWS. 2019-10-24 00:13:12 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
cf88265304 Full 64-bit register is set by rdtsc. 2019-10-21 01:13:55 +02:00
Bartosz Taudul
699ff43f1e Update timings. 2019-10-20 22:18:20 +02:00
Bartosz Taudul
07b66cd4ab Move fake source location out of loop. 2019-10-20 22:18:05 +02:00
Bartosz Taudul
909503403b Simplify delay calibration. 2019-10-20 22:13:29 +02:00
Bartosz Taudul
411e4d42ac Move disassembly from FAQ to manual. 2019-10-20 21:23:16 +02:00
Bartosz Taudul
c774534b47 Use rdtsc instead of rdtscp.
But rdtscp is serializing!

No, it's not. Quoting the Intel Instruction Set Reference:

"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible. But it does not wait for previous stores to
be globally visible, and subsequent instructions may begin execution
before the read operation is performed.",

"The RDTSC instruction is not a serializing instruction. It does not
necessarily wait until all previous instructions have been executed
before reading the counter. Similarly, subsequent instructions may begin
execution before the read operation is performed."

So, the difference is in waiting for prior instructions to finish
executing. Notice that even in the rdtscp case, execution of the
following instructions may commence before time measurement is finished
and data stores may be still pending.

But, you may say, Intel in its "How to Benchmark Code Execution Times"
document shows that using rdtscp is superior to rdstc. Well, not
exactly. What they do show is that when a *single function* is
considered, there are ways to measure its execution time with little to
no error.

This is not what Tracy is doing.

In our case there is no way to determine absolute "this is before" and
"this is after" points of a zone, as we probably already are inside
another zone.  Stopping the CPU execution, so that a deeply nested zone
may be measured with great precision, will skew the measurements of all
parent zones.

And this is not what we want to measure, anyway. We are not interested
in how a *single function* behaves, but how a *whole program* behaves.
The out-of-order CPU behavior may influence the measurements? Good! We
are interested in that. We want to see *how* the code is really
executed. How is *stopping* the CPU to make a timer read an appropriate
thing to do, when we want to see how a program is performing?

At least that's the theory.

And besides all that, the profiling overhead is now reduced.
2019-10-20 20:52:33 +02:00
Bartosz Taudul
30fc2f02ab Omit calculation of on-stack variable address. 2019-10-20 19:42:29 +02:00
Bartosz Taudul
5c92eae3ed Add early exit for invalid times. 2019-10-20 18:47:50 +02:00
Bartosz Taudul
d592af9c2f Fix TRACY_NO_STATISTICS build. 2019-10-20 17:32:20 +02:00
Bartosz Taudul
5816dc2b11 Don't cache timedist data if ctx switch data is incomplete. 2019-10-20 17:03:30 +02:00
Bartosz Taudul
ccdc102d5a Cache zone time distribution data. 2019-10-20 03:24:58 +02:00
Bartosz Taudul
4d761def61 Microoptimize comparison. 2019-10-16 20:26:39 +02:00
Bartosz Taudul
14292f9e35 Update manual. 2019-10-15 21:57:49 +02:00
Bartosz Taudul
f89bc970ee Update NEWS. 2019-10-15 21:50:22 +02:00
Bartosz Taudul
bfbd09b619 Add CPU usage graph tooltip. 2019-10-15 21:47:37 +02:00
Bartosz Taudul
7a9d4aecd3 Fix graph height calculation. 2019-10-15 21:41:06 +02:00
Bartosz Taudul
4372ad1bc3 Allow disabling CPU usage graph. 2019-10-15 21:37:16 +02:00
Bartosz Taudul
c28bab59b5 Improve look of CPU usage graph. 2019-10-15 21:20:00 +02:00
Bartosz Taudul
5aeeefefbd Draw CPU usage graph. 2019-10-15 16:55:15 +02:00
Bartosz Taudul
3ae5c125f6 Implement counting CPU usage (ctx switch) at a given time. 2019-10-15 16:54:43 +02:00
Bartosz Taudul
3ce6b1205f Don't iterate over 256 CPUs. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
eccb0b1e4a Track max CPU present in context switch data. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
bdb8516d04 Make sure context switch end time wasn't set already. 2019-10-15 14:54:28 +02:00
Bartosz Taudul
a20c6604c3 Add natvis for ContextSwitchData and ContextSwitchCpu. 2019-10-15 14:11:02 +02:00
Bartosz Taudul
fefa3b4693 Improve options UI. 2019-10-15 01:49:36 +02:00
Bartosz Taudul
dffe65f8e2 Update manual. 2019-10-14 20:52:18 +02:00
Bartosz Taudul
f0c77b4ef4 Add annotation list window. 2019-10-14 20:52:18 +02:00
Bartosz Taudul
1ad246b4ca Update manual. 2019-10-14 20:17:28 +02:00
Bartosz Taudul
c6207ed0e9 Move extra tools to main window button bar popup. 2019-10-14 20:07:55 +02:00
Bartosz Taudul
fc7f77eb7a Add implementation of disablable button. 2019-10-14 20:06:57 +02:00
Bartosz Taudul
6de8e6987f Sort annotations. 2019-10-14 19:04:37 +02:00
Bartosz Taudul
5c47467c88 Fix includes. 2019-10-13 17:13:15 +02:00
Bartosz Taudul
671a8f673e Don't interact with unfocused annotations. 2019-10-13 17:01:55 +02:00
Bartosz Taudul
98ab83c69b Update manual. 2019-10-13 17:00:07 +02:00
Bartosz Taudul
ae2c9b4859 Update NEWS. 2019-10-13 16:30:07 +02:00
Bartosz Taudul
e462335f83 Save/load annotations. 2019-10-13 16:29:24 +02:00
Bartosz Taudul
c2f38d0db7 Implement removal of user data files. 2019-10-13 16:29:02 +02:00
Bartosz Taudul
9d0316342d Move Annotation struct to a proper place. 2019-10-13 16:28:40 +02:00
Bartosz Taudul
20cf1d9f83 Implement color selection for annotation region. 2019-10-13 16:14:22 +02:00
Bartosz Taudul
f9e860f559 Display annotation text on timeline. 2019-10-13 15:59:48 +02:00
Bartosz Taudul
1527e7bc10 Add annotation modification window. 2019-10-13 15:50:37 +02:00
Bartosz Taudul
5fed86dae7 Allow adding annotations to timeline. 2019-10-13 15:28:52 +02:00
Bartosz Taudul
215dc8a804 More compact GpuEvent struct (save 4 bytes).
Memory usage reduction of various traces:

big         9011 -> 9007
frameimages 561  -> 552
fi-big      4144 -> 4139
long        5253 -> 5125
2019-10-13 14:42:52 +02:00
Bartosz Taudul
c044df6324 Display number of GPU zones. 2019-10-13 14:21:28 +02:00
Bartosz Taudul
1ae49c14a2 GPU zone count accessor. 2019-10-13 14:13:28 +02:00
Bartosz Taudul
5e1894dd79 Count GPU zones. 2019-10-13 14:13:04 +02:00
Bartosz Taudul
c3870f8837 Use proper type. 2019-10-10 20:30:08 +02:00
Bartosz Taudul
707f113bda Add missing NOMINMAX definitions. 2019-10-10 20:29:06 +02:00
Bartosz Taudul
d4620b4157 Fix UI. 2019-10-09 22:33:02 +02:00
Bartosz Taudul
0a358ac1f0 Time distribution may now only include running time. 2019-10-09 22:13:52 +02:00
Bartosz Taudul
6ced346e08 Different sorting modes for zone time distribution. 2019-10-09 21:42:46 +02:00
Bartosz Taudul
f03b7f33ed Update NEWS. 2019-10-07 22:33:55 +02:00
Bartosz Taudul
ed1f722c51 Display trace file name in trace info window. 2019-10-07 21:36:19 +02:00
Bartosz Taudul
4c4099877d Track trace file name in TracyView. 2019-10-07 21:36:19 +02:00
Bartosz Taudul
c6f320d2d8 Store file name in FileRead. 2019-10-07 21:32:27 +02:00
Bartosz Taudul
7cf3608493 Avoid unused variables. 2019-10-05 02:11:45 +02:00
Bartosz Taudul
4ba885ac95 Update manual. 2019-10-04 21:47:30 +02:00
Bartosz Taudul
3e3bd30290 Update NEWS. 2019-10-04 21:38:08 +02:00
Bartosz Taudul
1cd5ccb3c1 Display zone time distribution. 2019-10-04 21:34:00 +02:00
Bartosz Taudul
871e1f1c37 Describe workaround for exiting from within a zone. 2019-10-04 20:43:08 +02:00
Bartosz Taudul
e481b5ba22 Add missing thread sent indication. 2019-10-04 19:18:47 +02:00
Bartosz Taudul
4e7e9ee3b1 Update manual. 2019-10-04 18:53:06 +02:00
Bartosz Taudul
5111275770 Highlight hovered zone on the find zone zones list. 2019-10-04 13:02:26 +02:00
Bartosz Taudul
b913c17f5b Add "no grouping" mode to find zone zones list. 2019-10-04 12:42:05 +02:00
Bartosz Taudul
9e1935f070 Make C API symbols visible across dlls. 2019-10-03 22:39:26 +02:00
Bartosz Taudul
eba64b30a3 TracySystem.cpp should be always compiled in. 2019-10-03 22:34:06 +02:00
Bartosz Taudul
f2bb933f49 Use proper background color. 2019-10-02 00:49:30 +02:00
Bartosz Taudul
3b223c64d4 Darken to background color to hide overhang.
This only handles the root window case. When the profiler is embedded in
other application, the window background color is not matched.
2019-10-01 23:17:36 +02:00
Bartosz Taudul
db29d309a2 Lambda capture is not needed here. 2019-10-01 22:42:43 +02:00
Bartosz Taudul
68f476834f Make sure TracyCountBits() always returns uint64_t. 2019-10-01 22:42:29 +02:00
Bartosz Taudul
65ea33a60f Store memory callstack data as 24-bit ints.
This reduces MemEvent size from 40 to 38 bytes.

Memory usage reduction:

chicken     2027 -> 2019
mem         6468 -> 6308
q3bsp-mt    5304 -> 5283
2019-10-01 22:38:17 +02:00
Bartosz Taudul
f0b957ec56 Store callstacks on 24 bits.
ZoneEvent is now 27 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9224 -> 9011  (97%)
chicken         2044 -> 2027  (99%)
drl-l-b         1443 -> 1383  (95%)
long            5327 -> 5253  (98%)
q3bsp-mt        5400 -> 5304  (98%)
selfprofile     1403 -> 1382  (98%)
2019-10-01 22:38:17 +02:00
Bartosz Taudul
c631e33f81 Add 24-bit int implementation. 2019-10-01 21:48:34 +02:00
Bartosz Taudul
472959b29f Remove irrelevant comment. 2019-10-01 01:15:43 +02:00
Bartosz Taudul
717a212563 Save another 2 bytes per ZoneEvent.
ZoneEvent is not 28 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9527 -> 9224  (96%)
chicken         2107 -> 2044  (97%)
drl-l-b         1479 -> 1443  (97%)
long            5412 -> 5327  (98%)
q3bsp-mt        5592 -> 5400  (96%)
selfprofile     1443 -> 1403  (97%)
2019-10-01 01:05:37 +02:00
Bartosz Taudul
4964aa9547 Assert on getting index only for active strings. 2019-10-01 00:40:58 +02:00
Bartosz Taudul
acfcfb09ce Hide context switch options, if no data is available. 2019-09-30 23:46:10 +02:00
Bartosz Taudul
ffdb6d8a3b Update manual. 2019-09-30 23:43:07 +02:00
Bartosz Taudul
36fdc71588 Update NEWS. 2019-09-30 23:40:13 +02:00
Bartosz Taudul
0e56682964 Darkening of inactive thread regions. 2019-09-30 23:37:36 +02:00
Bartosz Taudul
e758e98ca4 Update manual. 2019-09-29 21:16:44 +02:00
Bartosz Taudul
80ff267a77 Update NEWS. 2019-09-29 21:03:40 +02:00
Bartosz Taudul
599fa17e4f Expose extreme compression level in update utility. 2019-09-29 21:03:08 +02:00
Bartosz Taudul
6e7e8eff87 Set extreme compression level to really be extreme. 2019-09-29 21:02:01 +02:00
Bartosz Taudul
2470936050 Don't perform background tasks during trace upgrade. 2019-09-29 20:52:25 +02:00
Bartosz Taudul
947eb56f3d Add loading/saving messages to update utility. 2019-09-29 20:48:18 +02:00
Bartosz Taudul
d228bcb622 Pack StringIdx in 24 bits.
This reduces ZoneEvent size from 32 to 30 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9902 -> 9527  (96%)
chicken         2172 -> 2107  (97%)
ctx-big          311 ->  309  (99%)
drl-l-b         1570 -> 1479  (94%)
long            5496 -> 5412  (98%)
mem             6468 -> 6468  (100%)
q3bsp-mt        5784 -> 5592  (96%)
selfprofile     1486 -> 1443  (97%)
2019-09-29 20:32:42 +02:00
Bartosz Taudul
781ebeb835 Add table initializing alloc to slab allocator. 2019-09-29 20:18:16 +02:00
Bartosz Taudul
59632f0d37 One more place to check if srcloc zones are ready. 2019-09-29 20:17:47 +02:00
Bartosz Taudul
873d536845 Display number of strings. 2019-09-29 19:22:50 +02:00
Bartosz Taudul
c91ae667d1 Add string count getter. 2019-09-29 19:22:15 +02:00
Bartosz Taudul
cb6a3f3334 Highlight CPU data timeline from thread tooltip. 2019-09-29 18:55:31 +02:00
Bartosz Taudul
3b8ab5715f Highlight CPU data timeline from CPU data window. 2019-09-29 18:53:58 +02:00
Bartosz Taudul
cafb5d6a99 Highlight threads on CPU data timeline. 2019-09-29 18:49:48 +02:00
Aleksei Skriabin
05a2fa487f Merged in Vuhdo/tracy/strstr_nocase_fix (pull request #41)
strstr_nocase() typo fix.
2019-09-28 17:55:31 +00:00
Aleksei Skriabin
c0c2f4536a strstr_nocase() typo fix. 2019-09-28 14:20:29 +05:00
Bartosz Taudul
2356069eac Update manual. 2019-09-27 18:15:32 +02:00
Bartosz Taudul
130365f4ff Inject tracy_systrace into filesystem and use instead of cat.
Statistics for a one-minute trace:

  Capture tool | Running time | Running regions
---------------+--------------+-----------------
      cat      |    25.11 s   |     392,300
tracy_systrace |    10.41 s   |      12,249
2019-09-27 15:51:29 +02:00
Bartosz Taudul
3dba4088ee Embed precompiled tracy_systrace for android. 2019-09-27 15:50:58 +02:00
Bartosz Taudul
0850a5e4a3 Use a proper build script. 2019-09-27 00:06:45 +02:00
Bartosz Taudul
6094d69479 Manually load required symbols. 2019-09-27 00:05:41 +02:00
Bartosz Taudul
9de2d312a3 Tiny binary. 2019-09-26 23:54:08 +02:00
Bartosz Taudul
6f5dd44f1f Helper for reading data from kernel more efficiently. 2019-09-26 22:55:02 +02:00
Bartosz Taudul
c09f3c0676 Add thread color boxes to CPU data window. 2019-09-25 02:12:35 +02:00
Bartosz Taudul
0cc0b456cc Update NEWS. 2019-09-24 23:59:20 +02:00
Bartosz Taudul
6c5627d8e4 Add thread color boxes to memory allocations listings. 2019-09-24 23:58:11 +02:00
Bartosz Taudul
581fd920a1 Add thread color boxes to lock info. 2019-09-24 23:52:52 +02:00
Bartosz Taudul
12e2bcb691 Add thread color boxes to zone info windows. 2019-09-24 23:51:47 +02:00
Bartosz Taudul
ad2dd09c25 Add thread color boxes to zone tooltips. 2019-09-24 23:50:00 +02:00
Bartosz Taudul
47f81d0ba4 Add thread color box to memory plot tooltip. 2019-09-24 23:47:51 +02:00
Bartosz Taudul
9c86102bad Add thread color box to CPU data on timeline. 2019-09-24 23:46:54 +02:00
Bartosz Taudul
a7e3324eba Add thread color boxes to GPU context tooltips. 2019-09-24 23:45:36 +02:00
Bartosz Taudul
6ffbd00b0c Add thread color box to crash info. 2019-09-24 23:42:25 +02:00
Bartosz Taudul
c73a74b8d5 Add thread color boxes to memory allocation info. 2019-09-24 23:41:28 +02:00
Bartosz Taudul
e9b815a3b8 Show thread color boxes in find zone menu. 2019-09-24 23:38:29 +02:00
Bartosz Taudul
06fe469598 Add thread color boxes to messages thread list. 2019-09-24 23:33:33 +02:00
Bartosz Taudul
e7578777c3 Update ImGui to 1.73. 2019-09-24 23:32:03 +02:00
Bartosz Taudul
63184f8762 Better Vulkan thread heuristics. 2019-09-24 00:55:24 +02:00
Bartosz Taudul
891e7711e9 Update manual. 2019-09-24 00:20:41 +02:00
Bartosz Taudul
49abad2dec Update manual. 2019-09-23 17:30:00 +02:00
Bartosz Taudul
7503bb1aee Update NEWS. 2019-09-23 17:28:32 +02:00
Bartosz Taudul
a5ba74ed13 Handle multiple Vulkan threads. 2019-09-23 17:27:49 +02:00
Bartosz Taudul
0f68e1e981 Send thread id in GPU zone end message.
We don't care about OpenGL zone thread ids, so the identifier is zeroed.
2019-09-23 16:06:14 +02:00
Bartosz Taudul
daf64c703a Serialize Vulkan GPU profiling messages.
Since Vulkan can be multi-threaded, the guarantee of GPU time data
arriving after CPU time data can't be held with asynchronous messages.
Use serial queue instead.
2019-09-23 15:38:16 +02:00
Bartosz Taudul
9a49f49cfd Also build test with TRACY_ON_DEMAND enabled. 2019-09-21 15:50:27 +02:00
Bartosz Taudul
2a9b1b3cf3 Allow easy adding of tracy flags in test application. 2019-09-21 15:49:54 +02:00
Bartosz Taudul
a5fecc350b Update manual. 2019-09-21 15:47:37 +02:00
Bartosz Taudul
c2728832af Update NEWS. 2019-09-21 15:43:31 +02:00
Bartosz Taudul
82cd667b30 Allow specifying network port in server. 2019-09-21 15:43:01 +02:00
Bartosz Taudul
fb63dd89bc Update manual. 2019-09-21 15:21:29 +02:00
Bartosz Taudul
e13cbf52fd Allow changing tracy port in client. 2019-09-21 15:11:15 +02:00
Bartosz Taudul
140654961c Update NEWS. 2019-09-21 15:03:35 +02:00
Bartosz Taudul
dfb9ae1a90 Update manual. 2019-09-21 15:03:09 +02:00
Bartosz Taudul
a221f121ba Extract lock state handling to a separate context class. 2019-09-21 14:55:14 +02:00
Bartosz Taudul
4c736aecfa Use fibonacci hashing to determine thread colors. 2019-09-21 14:03:42 +02:00
Bartosz Taudul
7a1fb4e0bd Proper message when call stack trees are not available. 2019-09-21 00:57:12 +02:00
Bartosz Taudul
46f7235e32 Display proper message when there are no active allocations. 2019-09-21 00:54:30 +02:00
Bartosz Taudul
feddd58b46 Better way to scale ImGui style. 2019-09-21 00:52:13 +02:00
Bartosz Taudul
d8e0853cd8 Multithreaded frame image compression. 2019-09-20 23:03:12 +02:00
Bartosz Taudul
6f5a23a198 Add task dispatcher to server. 2019-09-20 22:58:12 +02:00
Bartosz Taudul
e1e5d6bd47 Add const version of PackFrameImage().
Temporary buffer needs to be handled outside of the function.
2019-09-20 22:55:55 +02:00
Bartosz Taudul
b362baed5f Minor UI improvements. 2019-09-19 01:10:33 +02:00
Bartosz Taudul
6fbfd12d1f Update manual. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
0fed97b241 Update NEWS. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
6a0512fe16 Allow comparing frame times. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
8fe9b56b6f Calculate frame statistics. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
b99675ae60 Use thread color for collapsed zones. 2019-09-16 20:34:55 +02:00
Bartosz Taudul
36b2b8f71f Always return static thread color if dynamic colors are disabled. 2019-09-16 20:31:32 +02:00
Bartosz Taudul
5796f19a3b Focus out exact memory plot value. 2019-09-16 20:27:16 +02:00
Bartosz Taudul
7673028dba Fix skipping memory data. 2019-09-16 15:42:25 +02:00
Bartosz Taudul
5429f04614 Don't use source location data before it's ready. 2019-09-16 15:37:57 +02:00
Bartosz Taudul
e0105451f6 Update manual. 2019-09-12 20:14:54 +02:00
Bartosz Taudul
ab5a128674 Update NEWS. 2019-09-12 20:09:37 +02:00
Bartosz Taudul
6d00a56c61 Draw thread migrations across CPU cores. 2019-09-12 20:08:57 +02:00
Bartosz Taudul
c1731f864b Update manual. 2019-09-11 19:05:53 +02:00
Bartosz Taudul
23b6e5156b Display thread color in thread tooltip. 2019-09-11 19:01:27 +02:00
Bartosz Taudul
2872edce5d Use thread colors in context switch graph. 2019-09-11 18:56:54 +02:00
Bartosz Taudul
8ddafe4153 Extract color highlight functionality. 2019-09-11 18:52:25 +02:00
Bartosz Taudul
0850145811 Disable color box drag and drop. 2019-09-11 18:48:28 +02:00
Bartosz Taudul
2cec6f5482 Add thread colors to options menu. 2019-09-11 18:44:06 +02:00
Bartosz Taudul
4ea62ecb06 Extract small color box drawing. 2019-09-11 18:38:10 +02:00
Bartosz Taudul
00409b0b94 Extract thread color getter. 2019-09-11 18:34:48 +02:00
Bartosz Taudul
9cd359f0b9 Update manual. 2019-09-08 14:38:40 +02:00
Bartosz Taudul
2464eedbd4 Update NEWS. 2019-09-08 14:34:35 +02:00
Bartosz Taudul
a5a6b11b63 Zones can now have dynamic colors. 2019-09-08 14:33:30 +02:00
Bartosz Taudul
2714152f84 Allow calculating zone depth. 2019-09-08 14:16:12 +02:00
Bartosz Taudul
cdc4575dba Setup tid -> thread data mapping when loading trace. 2019-09-08 14:15:40 +02:00
Bartosz Taudul
ea6a0a58a7 Thread data accessor. 2019-09-08 14:07:16 +02:00
Bartosz Taudul
3a9ff94580 Update manual. 2019-09-08 13:38:19 +02:00
Bartosz Taudul
1e57ae3f49 Update NEWS. 2019-09-08 13:20:13 +02:00
Bartosz Taudul
c9a1d3d7e5 Display zone color in zone info window. 2019-09-08 13:19:43 +02:00
Bartosz Taudul
b7522ec4c1 Allow getting zone color sans higlights, etc. 2019-09-08 13:16:00 +02:00
Bartosz Taudul
6ef282dd1a Notify user that the data might not be correct. 2019-09-07 18:20:26 +02:00
Bartosz Taudul
17e6a97552 Let's leave this here. 2019-09-07 17:49:54 +02:00
Bartosz Taudul
a0814a2e5c Correctly calculate discontinuous frames time. 2019-09-07 17:39:39 +02:00
Bartosz Taudul
aac0a36a2d Don't use source location zones before they are ready. 2019-09-07 17:23:11 +02:00
Bartosz Taudul
70ae2f763d Update manual. 2019-09-07 17:20:51 +02:00
Bartosz Taudul
b4e019e7e7 Update NEWS. 2019-09-07 17:00:05 +02:00
Bartosz Taudul
3449f0777e Display zone time on frames plot. 2019-09-07 16:55:49 +02:00
Bartosz Taudul
0b1a6047f6 Add different highlight for zones selected on histogram. 2019-09-07 15:33:11 +02:00
Bartosz Taudul
57a2b62edc Display number of threads for pids in CPU data list. 2019-09-04 01:43:56 +02:00
Bartosz Taudul
0837463f05 Describe how wonderful linux interfaces are. 2019-09-03 21:45:19 +02:00
Bartosz Taudul
37661fd2ee Fix 32 bit NEON version of DXT1 compression.
This reverts commit b32e8fa24e.

Apparently it is possible to receive non-uniform data in alpha channel, which
breaks the original assumption about not needing the mask. This seemed to be a
problem only on 32 bit NEON implementation of DXT1 compression. Other
implementations handle such data without degradation of visual output.
2019-09-03 21:37:07 +02:00
Bartosz Taudul
aa2530d442 Display external thread name (if applicable) on CPU data timeline. 2019-08-31 19:37:05 +02:00
Bartosz Taudul
be36e7a19c Update manual. 2019-08-31 01:08:03 +02:00
Bartosz Taudul
86cb477811 Pack ZoneThreadData.
This reduces struct size from 10 to 8 bytes. Assumes 48-bit pointers
(4-level paging)!

Memory savings (MB):

android     2766    ->  2757    (99%)
big         10.29 G ->  9902    (96%)
chicken     2244    ->  2172    (96%)
ctx-android 228     ->  224     (98%)
drl-l-b     1635    ->  1570    (96%)
gn-vulkan   244     ->  240     (98%)
long        5656    ->  5496    (97%)
q3bsp-mt    6043    ->  5784    (95%)
selfprofile 1554    ->  1486    (95%)
2019-08-31 00:55:51 +02:00
Bartosz Taudul
3ec534cdf3 Prevent "ntdll.dll" from appearing as a thread name. 2019-08-30 23:09:07 +02:00
Bartosz Taudul
7a6564feae Only recycle producers, if there's no data in queue.
("The queue" is per-thread partial queue here.)

This fixes a problem where one thread writes to the queue, then is
terminated, making the (partially filled) queue available for other
threads to recycle. If another thread re-owns the queue, it will change
the associated thread id, while part of the queue was filled by the
original thread. This obviously created invalid data during dequeue.

The fix makes the recycling process check not only for queue inactivity
(which is marked when the original thread terminates), but also if the
queue is empty, preventing mixing data from different threads.
2019-08-30 14:28:44 +02:00
Bartosz Taudul
1c0c6311ec Fix skipping data when loading traces. 2019-08-30 01:16:42 +02:00
Bartosz Taudul
217a3781e6 Fix possible wrong process name for pid 0. 2019-08-30 00:59:54 +02:00
Bartosz Taudul
19f8f9f101 Use proper type. 2019-08-30 00:56:11 +02:00
Bartosz Taudul
a8d204821e Signed left shift is undefined. 2019-08-29 18:42:29 +02:00
Bartosz Taudul
adfc4eb59b Store UdpListen instance in an unique ptr. 2019-08-29 18:36:55 +02:00
Bartosz Taudul
5e8b2a0723 Display wakeup times in zone wait regions list. 2019-08-28 23:03:16 +02:00
Bartosz Taudul
0e89105bdb Update NEWS. 2019-08-28 21:40:58 +02:00
Bartosz Taudul
fc0593a840 Update manual. 2019-08-28 21:38:51 +02:00
Bartosz Taudul
6f25ad5fcb Save per-trace options. 2019-08-28 21:35:08 +02:00
Bartosz Taudul
fc5293b1ae Only scroll message list to bottom if capture is live. 2019-08-28 21:04:28 +02:00
Bartosz Taudul
a2f968d843 Compress thread id in MessageData. 2019-08-28 21:03:01 +02:00
Bartosz Taudul
ede26b0caf Fix skipping zone levels. 2019-08-28 20:47:19 +02:00
Bartosz Taudul
ee14ff6d6e Update manual. 2019-08-28 20:39:29 +02:00
Bartosz Taudul
85027c185d Extract notification area drawing to a separate function. 2019-08-28 20:27:39 +02:00
Bartosz Taudul
a8eb99efcc Add notification icons when a drawing a category is disabled. 2019-08-28 20:24:14 +02:00
Bartosz Taudul
5b0ccef373 Change some icons. 2019-08-28 20:17:38 +02:00
Bartosz Taudul
fd5014be6f GetThreadString() is no longer used. 2019-08-28 20:08:16 +02:00
Bartosz Taudul
28a20e631e Preserve frame graph position and scale. 2019-08-28 19:52:36 +02:00
Bartosz Taudul
17d4a82ca5 Preserve timeline vertical scroll position. 2019-08-28 19:49:27 +02:00
Bartosz Taudul
abde0c252d Update NEWS. 2019-08-28 19:46:08 +02:00
Bartosz Taudul
f37797db44 Save/load view state. 2019-08-28 19:45:22 +02:00
Bartosz Taudul
dc5444ff0f Notify UserData that view state should be preserved.
This is only active when a trace is loaded from a file (and state should
be persistent for future sessions using this trace), or when state is
saved to a file (so that future sessions will use current state).

No state is preserved by default, i.e. when the trace was not saved to a
file.
2019-08-28 19:37:01 +02:00
Bartosz Taudul
949c9cb121 Move some view data to a separate structure. 2019-08-28 19:35:54 +02:00
Bartosz Taudul
38bfae13dd Add helper function for opening files. 2019-08-28 19:28:31 +02:00
Bartosz Taudul
2a0d6ce4ad Add notification area indicator for hidden timeline items. 2019-08-28 18:36:05 +02:00
Bartosz Taudul
ed83762a1a Keep things simple. 2019-08-28 01:29:58 +02:00
Bartosz Taudul
d95e24f66b Update NEWS. 2019-08-27 23:20:56 +02:00
Bartosz Taudul
ef287c8aab Display external thread names of profiled program on CPU data timeline. 2019-08-27 23:17:53 +02:00
Bartosz Taudul
8eb7220dd7 Use the new thread name getter. 2019-08-27 23:08:14 +02:00
Bartosz Taudul
3c092b4bec Add thread name getter combining local and external thread names. 2019-08-27 23:00:13 +02:00
Bartosz Taudul
f8e3d1ad0a Try to fix current program's thread names.
External thread names can be cut-off to include only the first 15-or-so
characters. If a local thread name is known and its beginning matches
the external name, use the local name instead.
2019-08-27 22:41:03 +02:00
Bartosz Taudul
8bb13ca09e Use captured program name in CPU data.
This fixes android application names, which are cut to show only last
15-or-so letters.
2019-08-27 22:35:53 +02:00
Bartosz Taudul
f76f38777e Signed minus unsigned is unsigned... 2019-08-26 19:09:12 +02:00
Bartosz Taudul
eb78ecd0fd Display frame number in playback window. 2019-08-26 19:01:59 +02:00
Bartosz Taudul
3e4d3efbdb Extract frame number getter. 2019-08-26 19:01:51 +02:00
Bartosz Taudul
00b26c1acf Fix TRACY_NO_SYSTEM_TRACING. 2019-08-26 18:02:10 +02:00
Bartosz Taudul
fbeee3cf61 Fix (?) invalid function pointer signature. 2019-08-26 17:59:58 +02:00
Bartosz Taudul
78127dc357 System threads only allow limited information queries. 2019-08-25 00:33:22 +02:00
Bartosz Taudul
e5a11ad593 Allow sorting CPU data table by different columns. 2019-08-25 00:17:06 +02:00
Bartosz Taudul
5e2614bcfa Update NEWS. 2019-08-24 23:50:18 +02:00
Bartosz Taudul
4376757912 Display thread ids in options menu. 2019-08-24 23:43:36 +02:00
Bartosz Taudul
2b9ec14c92 Display threads ids as base-10 numbers. 2019-08-24 23:41:33 +02:00
Bartosz Taudul
deb59b4c38 Somehow fix event ordering. 2019-08-24 01:43:55 +02:00
Bartosz Taudul
1e74a89924 Check if there's data to read from kernel.
Reading from kernel pipe, while being a blocking operation, spin locks the
thread.
2019-08-24 01:06:21 +02:00
Bartosz Taudul
8f6e94d75c Sleep if sys trace pipe buffer underruns. 2019-08-24 00:42:00 +02:00
Bartosz Taudul
2d50d07438 Allow completely disabling system tracing. 2019-08-21 01:16:25 +02:00
Bartosz Taudul
5c8937eba2 Update manual. 2019-08-20 23:59:47 +02:00
Bartosz Taudul
0cbb853945 Add missing SetThreadName() calls. 2019-08-20 16:23:00 +02:00
Bartosz Taudul
332262dd84 Shorter thread names. 2019-08-20 16:22:54 +02:00
Bartosz Taudul
247acd03ee Kernel tracing on android. 2019-08-20 15:49:40 +02:00
Bartosz Taudul
e427d67347 Don't bail out if unimportant variables are not available. 2019-08-20 12:19:05 +02:00
Bartosz Taudul
bfda30be0b Use su on android to set tracing variables. 2019-08-20 12:18:46 +02:00
Bartosz Taudul
1712431dfd Compress external threads. Saves 4 bytes per ctx switch.
Dropped support for loading context switch data in previous versions of
traces.
2019-08-19 23:09:58 +02:00
Bartosz Taudul
21e7a4bb16 Extract thread compression into a separate class. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
94382f54ca Move FileVersion() to TracyFileHeader.hpp. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
9d87a8394d Add missing getline() implementation for android API < 18. 2019-08-19 15:26:09 +02:00
Bartosz Taudul
fd245bb5df Fix includes for gettid() on android. 2019-08-19 15:09:47 +02:00
Bartosz Taudul
9be6f4a414 Fix typo. 2019-08-19 13:03:37 +02:00
Bartosz Taudul
d209bb4d01 Add missing function pointer checks. 2019-08-19 12:47:27 +02:00
Bartosz Taudul
e60b2884f4 Mark local threads with different color. 2019-08-18 14:57:44 +02:00
Bartosz Taudul
19857473e3 Also collect information on local threads. 2019-08-18 14:56:17 +02:00
Bartosz Taudul
9a3974b8f1 Display process times in graphical form. 2019-08-18 14:51:25 +02:00
Bartosz Taudul
2eed28b19f Highlight current process. 2019-08-18 14:46:59 +02:00
Bartosz Taudul
ae9cae781a Display CPU migrations percentage. 2019-08-18 14:44:00 +02:00
Bartosz Taudul
691fe06bfe Compare pids to determine if thread is local untracked. 2019-08-18 14:40:04 +02:00
Bartosz Taudul
95f4162870 Display number of tracked processes. 2019-08-18 14:30:52 +02:00
Bartosz Taudul
7a036b56b1 Add icon to CPU data button. 2019-08-18 14:30:01 +02:00
Bartosz Taudul
c5060da185 Display unknown pid as unknown. 2019-08-18 14:28:56 +02:00
Bartosz Taudul
faac08865a Display basic information about CPU usage. 2019-08-18 12:28:38 +02:00
Bartosz Taudul
3b8518f7b6 Save/load CPU thread data. 2019-08-18 01:53:38 +02:00
Bartosz Taudul
62dbe522c5 Add accessors. 2019-08-18 01:51:02 +02:00
Bartosz Taudul
103645c2fa Calculate cpu thread data statistics. 2019-08-18 01:50:49 +02:00
Bartosz Taudul
1498417a8d Save/load tid to pid mapping. 2019-08-17 22:36:21 +02:00
Bartosz Taudul
20e8a5ecc8 Create tid to pid mapping. 2019-08-17 22:32:41 +02:00
Bartosz Taudul
fa573ef4cf Display PID. 2019-08-17 22:21:02 +02:00
Bartosz Taudul
678e942e9f Transfer PID of profiled program. 2019-08-17 22:19:04 +02:00
Bartosz Taudul
1024992493 React to enter key in "go to frame" dialog. 2019-08-17 22:01:06 +02:00
Bartosz Taudul
258cf38d64 Fix flicker. 2019-08-17 21:59:08 +02:00
Bartosz Taudul
77c636c3fd Retrieve module name for threads with no names on windows. 2019-08-17 21:24:40 +02:00
Bartosz Taudul
0ea8789f39 Display CPU core in waking up thread popup. 2019-08-17 21:24:40 +02:00
Bartosz Taudul
f7589bde02 Trace thread wakeups on linux. 2019-08-17 17:18:11 +02:00
Bartosz Taudul
580944af65 Update manual. 2019-08-17 17:11:12 +02:00
Bartosz Taudul
414f903cc5 Collect thread wakeup data. 2019-08-17 17:05:29 +02:00
Bartosz Taudul
f957f64ce1 No magic numbers. 2019-08-17 16:26:59 +02:00
Bartosz Taudul
26be78530f Use signed number to calculate frame offset. 2019-08-17 15:22:54 +02:00
Bartosz Taudul
e9080bdbcd Hardcode windows PID 4 as "System". 2019-08-17 03:44:47 +02:00
Bartosz Taudul
40eb8a5a03 Proper check for invalid handle. 2019-08-17 03:44:11 +02:00
Bartosz Taudul
65e62dea06 Display thread ids next to thread names in CPU data. 2019-08-17 03:06:54 +02:00
Bartosz Taudul
6c1dd8eaec Cast thread handle to DWORD. 2019-08-16 21:21:37 +02:00
Bartosz Taudul
6c53cac15e Fix uninitialized variable. 2019-08-16 21:20:04 +02:00
Bartosz Taudul
d7104c752a Cygwin compat layer. 2019-08-16 21:16:04 +02:00
Bartosz Taudul
819ef2a82b External process/thread name retrieval on linux. 2019-08-16 21:00:42 +02:00
Bartosz Taudul
26e93b35c6 Update manual. 2019-08-16 20:31:16 +02:00
Bartosz Taudul
f63b6d0985 Update NEWS. 2019-08-16 19:53:08 +02:00
Bartosz Taudul
e975c4d7bf Also retrieve external thread names. 2019-08-16 19:49:16 +02:00
Bartosz Taudul
134a8c5d2a Fix positioning. 2019-08-16 19:32:25 +02:00
Bartosz Taudul
edd5338faa Display untracked threads. 2019-08-16 19:30:46 +02:00
Bartosz Taudul
ccaf92afc4 Save/load external process names. 2019-08-16 19:24:38 +02:00
Bartosz Taudul
fe7f56b022 Implement retrieval of external process names. 2019-08-16 19:22:23 +02:00
Bartosz Taudul
56e6795c76 Add per-cpu context switch tooltips. 2019-08-16 18:39:03 +02:00
Bartosz Taudul
7e81f3250e Add CPU tooltip. 2019-08-16 18:39:03 +02:00
Bartosz Taudul
8e71e2dba5 Draw per-CPU global context switch data. 2019-08-16 18:22:57 +02:00
Bartosz Taudul
c212661714 Allow determining whether thread is local to profiled program. 2019-08-16 17:59:25 +02:00
Bartosz Taudul
cef7e4b8d0 Save/load per-cpu context switches. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
8bc4258e29 Display count of per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
a92034d59d CPU data accessor. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
69527d2f71 Collect per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
9e0fe226df Add small font. 2019-08-16 16:02:57 +02:00
Bartosz Taudul
83fddd9aa6 Fix unicode builds. 2019-08-16 13:09:27 +02:00
Bartosz Taudul
9d5240c597 Mutable char array is required here due to shit API design. 2019-08-16 13:03:20 +02:00
Bartosz Taudul
42c71d7e46 Fix loading old traces. 2019-08-16 00:24:29 +02:00
Bartosz Taudul
95879d2bd9 Use proper UI element to indicate selectable items. 2019-08-16 00:12:03 +02:00
Bartosz Taudul
889eddd646 Pack ContextSwitchData. Saves 3 bytes per context switch region. 2019-08-15 23:53:47 +02:00
Bartosz Taudul
e90ddf7ee5 Don't search whole data set twice. 2019-08-15 23:03:37 +02:00
Bartosz Taudul
c22c259a13 Pack time and thread in MemEvent.
This saves 4 bytes per logged memory allocation. Memory savings for
selected traces:

android     2945 MB -> 2766 MB
chicken     2261 MB -> 2245 MB
q3bsp-mt    6085 MB -> 6043 MB
mem         6788 MB -> 6468 MB
2019-08-15 23:02:43 +02:00
Bartosz Taudul
9618ee3581 Fix skipping locks. 2019-08-15 22:24:27 +02:00
Bartosz Taudul
e43a57f6b3 Remove irrelevant comments. 2019-08-15 21:51:47 +02:00
Bartosz Taudul
a635e54a79 Pack MessageData. 2019-08-15 21:42:24 +02:00
Bartosz Taudul
04c8830f86 Cosmetics. 2019-08-15 21:38:00 +02:00
Bartosz Taudul
45401fc54c Use proper variable name. 2019-08-15 21:34:19 +02:00
Bartosz Taudul
8b73dece98 Preserve magic time values when loading old traces. 2019-08-15 21:30:37 +02:00
Bartosz Taudul
41beff29a9 Remove redundant GetTimeBegin().
Traces now start at zero time.
2019-08-15 21:04:20 +02:00
Bartosz Taudul
c9d7b96c81 Prevent int16_t -> int64_t promotion on negative numbers. 2019-08-15 20:58:16 +02:00
Bartosz Taudul
3db3952135 Hackfix for broken lock terminate times. 2019-08-15 20:45:00 +02:00
Bartosz Taudul
5e20b3f28a Pack time and source location in LockEvent. 2019-08-15 20:39:16 +02:00
Bartosz Taudul
2e31c26ae5 Update manual. 2019-08-15 20:21:09 +02:00
Bartosz Taudul
723e6ac192 Update NEWS. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
bf3ad57456 Pack start time and srcloc together in ZoneEvent.
This reduces ZoneEvent struct size by 2 bytes. Memory savings on various
captures:

10.62 GB -> 10.29 GB
 2342 MB ->  2276 MB
 1706 MB ->  1635 MB
 6277 MB ->  6085 MB
2019-08-15 20:17:36 +02:00
Bartosz Taudul
3e06daef31 Update manual. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
3148c5d736 Update NEWS. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
f5775a2d6e Display list of CPUs on which zone was running. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
042e6c9e11 Set initial time of old traces to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
350e526ec0 Fix crash when zone exists before thread context switches appear. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
b322d20c19 Store received timestamps offset to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
c021f4cf2c Update NEWS. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
659907c972 Store srcloc identifiers using 16 bit.
This reduces various structure sizes by 2 bytes. Memory usage reduction
on various traces:

big               11 GB -> 10.62 GB
chicken         2436 MB ->  2342 MB
drl-light-big   1761 MB ->  1706 MB
q3bsp-mt        6469 MB ->  6277 MB
2019-08-15 20:15:48 +02:00
Bartosz Taudul
416113fdcb Drop support for ETC1 frame images. 2019-08-15 16:29:50 +02:00
Bartosz Taudul
32c7d13159 Count size of some more structures. 2019-08-15 14:15:40 +02:00
Bartosz Taudul
8a205ef224 Update NEWS. 2019-08-15 02:29:02 +02:00
Bartosz Taudul
14a373a3b8 Add number of CPU cores to host info. 2019-08-15 02:28:35 +02:00
Bartosz Taudul
aa00b1c4c4 Add Win10 wait reasons. 2019-08-15 01:48:50 +02:00
Bartosz Taudul
69077e4e6f Finish sending context switches during disconnect. 2019-08-14 23:06:13 +02:00
Bartosz Taudul
6dc79cf14e Cosmetics. 2019-08-14 23:05:58 +02:00
Bartosz Taudul
c0b524d8de Add a separate method for clearing serial queue. 2019-08-14 22:39:12 +02:00
Bartosz Taudul
bccb845908 Update manual. 2019-08-14 22:19:11 +02:00
Bartosz Taudul
690a6d12d7 Properly handle incomplete context switch data. 2019-08-14 22:10:54 +02:00
Bartosz Taudul
7549c50bab Fix time range reset condition. 2019-08-14 21:53:09 +02:00
Bartosz Taudul
29819321b9 Update manual. 2019-08-14 21:45:34 +02:00
Bartosz Taudul
77a8e47c7d Update NEWS. 2019-08-14 21:33:43 +02:00
Bartosz Taudul
26f417a841 Add option to display running time in find zone menu. 2019-08-14 21:33:43 +02:00
Bartosz Taudul
9ec0724ffb Support dynamic recalculation of min, max and total time. 2019-08-14 21:33:42 +02:00
Bartosz Taudul
ee77ff020a Optimize calculation of zone running time. 2019-08-14 20:47:21 +02:00
Bartosz Taudul
a194c93740 Allow checking if context switch data is available. 2019-08-14 20:26:55 +02:00
Bartosz Taudul
9a364fe5fe Cache context switch data queries. 2019-08-14 20:16:11 +02:00
Bartosz Taudul
cf4e04440e Update manual. 2019-08-14 18:42:04 +02:00
Bartosz Taudul
a5ef38812e Display list of regions where thread was waiting. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
d520f1cc48 Display zone running time in zone tooltip. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
1ae540c7a1 Display zone running time in zone info window. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
858c94e12e Add interface for calculation zone running time. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
0b12db5ee6 Display number of thread running state regions. 2019-08-14 17:36:19 +02:00
Bartosz Taudul
fadac0b433 Display thread running time. 2019-08-14 17:12:48 +02:00
Bartosz Taudul
3e01ca3269 Calculate how long thread was in running time. 2019-08-14 17:12:48 +02:00
Bartosz Taudul
72918cda19 Include recorded context switches in thread lifetime. 2019-08-14 17:03:33 +02:00
Bartosz Taudul
ca9078845c Update NEWS. 2019-08-14 16:53:19 +02:00
Bartosz Taudul
71b54dd48a Always collect thread names.
This fixes an issue when a thread was destroyed before its name could be
retrieved.
2019-08-14 16:52:04 +02:00
Bartosz Taudul
5e199d1ab3 Support ftrace on ARM. 2019-08-14 16:28:54 +02:00
Bartosz Taudul
5fbb811f5d Degrade ARM timer to monotonic raw clock.
The monotonic raw clock has the same accuracy as reading cntvct registers, but
using clock_gettime() has a measurable impact on queueing time (135 us vs
83 us).

This change is needed to enable ftrace time readings on ARM linux, which
doesn't provide any way to get raw cntvct readings, like x86-tsc on x86.
2019-08-14 16:19:02 +02:00
Bartosz Taudul
42865d7c7b Don't set x86-tsc clock on non-x86 platforms. 2019-08-14 15:14:36 +02:00
Bartosz Taudul
54a9132bb5 Skip context switch events in on demand mode, if no connection. 2019-08-14 15:09:33 +02:00
Bartosz Taudul
602c38c6c0 Allow checking timer implementation. 2019-08-14 14:35:44 +02:00
Bartosz Taudul
e39b1abce5 Handle linux wait states. 2019-08-14 14:02:31 +02:00
Bartosz Taudul
3988b56c92 Capture context switches on linux. 2019-08-14 13:56:15 +02:00
Bartosz Taudul
0bb0c10e3c Revert "Save one byte on ContextSwitchData."
Counting bits is hard, let's go shopping.
2019-08-14 13:55:05 +02:00
Bartosz Taudul
3996516fce One more SetThreadName() to change. 2019-08-14 02:27:01 +02:00
Bartosz Taudul
92b6da7cc2 SetThreadName() only works on the current thread.
This breaking change is required, because kernel trace facilities use
kernel thread ids, which are inaccessible from the pthread_t level.
2019-08-14 02:22:45 +02:00
Bartosz Taudul
339b7fd2a6 Use kernel thread ids on linux. 2019-08-14 01:57:36 +02:00
Bartosz Taudul
8925d026a9 Cosmetics. 2019-08-14 01:57:36 +02:00
Bartosz Taudul
e5c40b74ee Update manual. 2019-08-13 21:18:52 +02:00
Bartosz Taudul
2be38d912e Update NEWS. 2019-08-13 17:19:43 +02:00
Bartosz Taudul
73cbf2eead Use windows thread ids on cygwin. 2019-08-13 16:22:58 +02:00
Bartosz Taudul
71a5cffc13 Add context switch tooltips. 2019-08-13 16:20:43 +02:00
Bartosz Taudul
f285e0f5cc Save one byte on ContextSwitchData. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
d77c87ae1c Allow disabling context switch drawing. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
874a2596f7 Improve context switches drawing. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
b313e46139 Keep event trace properties to terminate trace on exit. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
7f856a1b16 Very bad context switch visualization. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
9417ad994d Save/load context switch data. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
1c937ad9bb Implement skipping frame image data. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
8c494eabbf Display number of context switch regions. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
0b03fed61c Add context switch accessor. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
419f74280d Store context switches. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
90d26cb1b6 Collect and send context switch events. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
fe0f1aea07 Add system tracing skeleton. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
8aa0be39d5 Drop support for CPU id queries. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
0b944c88bb Add a note about condition variables. 2019-08-12 17:01:01 +02:00
Bartosz Taudul
d6f32a0839 Serialize lock processing.
This makes is much easier to process on the server and opens new
optimization possibilities. It also fixes theoretical problems, which
may be caused by invalid ordering of events with the same timestamp.
2019-08-12 13:51:01 +02:00
Bartosz Taudul
0431c03556 Add serial queue interface. 2019-08-12 13:27:15 +02:00
Bartosz Taudul
760357d6ea Explain why there are two methods for filling serial queue. 2019-08-12 13:19:10 +02:00
Bartosz Taudul
7fbf2fa2ec Update NEWS. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
6398ecb344 Drop support for pre-0.4 traces. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
154c902e03 Handle legacy file versions. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
54076e717c Update font awesome to 5.10.1. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
a9b41eb657 Rework processing bad files. 2019-08-12 12:04:27 +02:00
Bartosz Taudul
9b6328f962 Release 0.5.0. 2019-08-10 22:14:14 +02:00
Bartosz Taudul
530f293c49 Better way to handle auto scrolling. 2019-08-10 22:06:51 +02:00
Bartosz Taudul
6fca188603 Update tech docs. 2019-08-08 19:25:35 +02:00
Bartosz Taudul
4d2c7899ab Allow skipping invariant TSC check. 2019-08-08 19:21:39 +02:00
Bartosz Taudul
3a221dafde Display error messages on console, if available. 2019-08-08 19:18:05 +02:00
Bartosz Taudul
aada588129 Proper buffer reset. 2019-08-04 17:48:19 +02:00
Bartosz Taudul
8ae90a6cbd Merge branch 'connection-popup' 2019-08-04 16:20:02 +02:00
Bartosz Taudul
177b79a528 Update manual. 2019-08-04 16:19:51 +02:00
Bartosz Taudul
d2490bb62b Update NEWS. 2019-08-04 15:58:05 +02:00
Bartosz Taudul
853e9c17e3 Display client address. 2019-08-04 15:56:52 +02:00
Bartosz Taudul
07da2e506a Fix deadlock problems. 2019-08-04 15:55:42 +02:00
Rokas K. (rku)
37b03559f0 Merged in rokups/tracy/mingw-fixes (pull request #39)
Fix multiple build errors when compiling with MinGW.
2019-08-04 13:36:42 +00:00
Rokas Kupstys
b391e4c21a Fix multiple build errors when compiling with MinGW. 2019-08-04 15:49:46 +03:00
Rokas Kupstys
b24ac75111 Move connection window into a popup when connected. 2019-08-04 13:58:43 +03:00
Bartosz Taudul
eed7039853 Another GPU time adjust fix. 2019-08-04 01:42:44 +02:00
Bartosz Taudul
e87b8d455e Use Theil estimator randomized approximation. 2019-08-04 01:40:11 +02:00
Bartosz Taudul
8953a2652e Update manual. 2019-08-04 00:48:29 +02:00
Bartosz Taudul
dee70ea8d9 Update NEWS. 2019-08-04 00:40:40 +02:00
Bartosz Taudul
6898fd9e42 GPU time adjust fixes. 2019-08-04 00:38:08 +02:00
Bartosz Taudul
9b7384b407 Fix multiple GPU drift entry fields. 2019-08-04 00:33:31 +02:00
Bartosz Taudul
323c37bd33 Fix GPU zone search. 2019-08-04 00:30:09 +02:00
Bartosz Taudul
a642abfde0 Implement automatic GPU clock drift calculation. 2019-08-04 00:23:23 +02:00
Bartosz Taudul
401b879ece Update NEWS. 2019-08-03 15:20:54 +02:00
Bartosz Taudul
da88e32887 Display FPS counts next to frame times. 2019-08-03 15:20:31 +02:00
Bartosz Taudul
51bdbdb71f Update manual. 2019-08-03 15:09:19 +02:00
Bartosz Taudul
6f9e3aaa50 Update NEWS. 2019-08-03 14:56:50 +02:00
Bartosz Taudul
6c958f6177 Increase height of frame graph. 2019-08-03 14:55:08 +02:00
Bartosz Taudul
58003e7a6b Draw target frame time lines. 2019-08-03 14:55:08 +02:00
Bartosz Taudul
a76622d17a Cache last searched ThreadData. 2019-08-03 14:35:01 +02:00
Bartosz Taudul
8a0701025d Update imgui to 1.72b. 2019-08-02 20:56:32 +02:00
Bartosz Taudul
b3a1f932c3 Update tech docs. 2019-08-02 20:46:20 +02:00
Bartosz Taudul
fb745de5ed Update NEWS. 2019-08-02 20:30:15 +02:00
Bartosz Taudul
12969ee497 Track thread context.
This change exploits the fact that events are processed in batches
originating from a single thread. A single message changing thread
context is enough to handle multiple messages, as opposed to inclusion
of thread identifier in each message.
2019-08-02 20:18:08 +02:00
Bartosz Taudul
9b6c405485 Bin number shouldn't be floating point. 2019-08-02 19:43:08 +02:00
Bartosz Taudul
138743f880 Update manual. 2019-08-01 23:24:51 +02:00
Bartosz Taudul
ba162940a3 Update NEWS. 2019-08-01 23:14:56 +02:00
Bartosz Taudul
a4e7a341c0 Proper handling of disconnect request. 2019-08-01 23:14:09 +02:00
Bartosz Taudul
344d36086f Simplify loop. 2019-07-31 18:53:51 +02:00
Bartosz Taudul
f41834370c Also display number of visible messages. 2019-07-31 02:16:14 +02:00
Bartosz Taudul
ccd88a9e27 Add text coloring to memory window. 2019-07-31 02:06:01 +02:00
Bartosz Taudul
68df815ef6 Display total message count. 2019-07-31 00:34:24 +02:00
Bartosz Taudul
526f3a55bc Update imgui to 1.72. 2019-07-30 22:53:52 +02:00
Bartosz Taudul
ca3571fd2b Still more. 2019-07-30 01:30:31 +02:00
Bartosz Taudul
47423e6263 And more. 2019-07-30 01:29:13 +02:00
Bartosz Taudul
d3783ae359 Remove magic template syntax. 2019-07-30 01:28:21 +02:00
Bartosz Taudul
9c28b82954 RPMallocInit and RPMallocThreadInit are identical. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
28220a5fbf Update manual. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
e289f2b8c0 Update tech docs. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
a6a3f45810 Fill in thread id during dequeue, not during enqueue. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
142ef53b42 Dequeue items from a single thread. 2019-07-29 23:44:08 +02:00
Bartosz Taudul
c7f769c52b Allow dequeuing from a single producer, retrieving thread id. 2019-07-29 23:29:30 +02:00
Bartosz Taudul
6cad76ae67 Store thread id in queue producer. 2019-07-29 23:13:06 +02:00
Bartosz Taudul
7ae9a28e32 Drop BlockingConcurrentQueue. 2019-07-29 22:58:13 +02:00
Bartosz Taudul
480a427e07 No need to hash thread ids anymore. 2019-07-29 22:36:04 +02:00
Bartosz Taudul
c60af95053 Remove unused const. 2019-07-29 22:33:32 +02:00
Bartosz Taudul
2d42abf552 Remove CannoAlloc functions. 2019-07-29 22:31:32 +02:00
Bartosz Taudul
b142860c8d More implicit producer removal. 2019-07-29 22:29:39 +02:00
Bartosz Taudul
db6eceb1a6 Producers must be explicit. 2019-07-29 22:25:28 +02:00
Bartosz Taudul
89928fde7b Queue must be always able to alloc. 2019-07-29 22:13:16 +02:00
Bartosz Taudul
a03734afa6 Remove more debug code. 2019-07-29 22:01:06 +02:00
Bartosz Taudul
e9a0145cd5 Remove MCDBGQ_NOLOCKFREE_IMPLICITPRODBLOCKINDEX. 2019-07-29 21:56:53 +02:00
Bartosz Taudul
b496f1ff90 Remove MOODYCAMEL_QUEUE_INTERNAL_DEBUG. 2019-07-29 21:52:49 +02:00
Bartosz Taudul
beaadc3a56 Remove always disabled MCDBGQ_TRACKMEM code. 2019-07-29 21:51:29 +02:00
Bartosz Taudul
82a4a6d9cc Add tracy_ prefix to concurrentqueue.h file name. 2019-07-29 21:47:50 +02:00
Bartosz Taudul
5dff7b5d1e AVX2 version of plot min max calculation.
Slightly faster (~5%) than the autovectorized serial code.
2019-07-29 20:59:22 +02:00
Bartosz Taudul
7a878cf4c7 Pause playback when playback window is closed. 2019-07-29 01:51:45 +02:00
Bartosz Taudul
461f49feb8 Fix drawing zones at extreme zoom out levels.
This is needed due to int64_t -> uint64_t zone end cast hack. There are
no side effects: zero time represents start of the timer, which would be
unix epoch or system bootup time.
2019-07-28 01:58:59 +02:00
Bartosz Taudul
3e74d041c9 Link with Thread Building Blocks, if available. 2019-07-28 01:53:39 +02:00
Bartosz Taudul
2e8d20b6e8 Keep zone info windows headers at top. 2019-07-27 13:28:18 +02:00
Bartosz Taudul
1afcd24dc6 Use big font in zone info windows. 2019-07-27 13:25:31 +02:00
Bartosz Taudul
245c6f9f01 Use big font in lock info window. 2019-07-27 13:18:59 +02:00
Bartosz Taudul
93195b6647 Move trace version display to trace statistics section. 2019-07-27 13:14:44 +02:00
Bartosz Taudul
2654a3010c Keep trace info header at top of the window. 2019-07-27 13:13:50 +02:00
Bartosz Taudul
e1af87744b Use less space for call stack tree headers. 2019-07-27 13:10:53 +02:00
Bartosz Taudul
fb11d67d8e Keep memory window header at top. 2019-07-27 13:06:23 +02:00
Bartosz Taudul
5c3095707a Filter out invalid Windows filename characters.
Do so even on unix, to allow easy transfer of user config between
different machines.
2019-07-27 01:21:11 +02:00
Bartosz Taudul
705a2fa3f4 Update manual. 2019-07-27 01:09:39 +02:00
Bartosz Taudul
9962873522 Update NEWS. 2019-07-26 23:44:20 +02:00
Bartosz Taudul
a7ef99d0b0 Keep find zone, compare headers at top. 2019-07-26 23:43:41 +02:00
Bartosz Taudul
f2cdb64aae Display trace descriptions in compare menu. 2019-07-26 23:33:49 +02:00
Bartosz Taudul
be3b458f28 Load second trace user data in compare menu. 2019-07-26 23:25:03 +02:00
Bartosz Taudul
e5d5af84fd Allow setting custom description of the trace. 2019-07-26 23:21:28 +02:00
Bartosz Taudul
c7e32a16ec Assert on invalid file names. 2019-07-26 23:15:12 +02:00
Bartosz Taudul
27965e8690 Add user data storage handler. 2019-07-26 23:15:12 +02:00
Bartosz Taudul
34cc7183d0 Trace-specific save path retrieval. 2019-07-26 23:15:12 +02:00
Bartosz Taudul
3ec1771f5a Move config directory retrieval to a separate function. 2019-07-26 22:42:50 +02:00
Bartosz Taudul
c1b70c6519 Display histogram time range on histogram. 2019-07-26 22:25:21 +02:00
Bartosz Taudul
276d764141 Fix cygwin. 2019-07-26 00:02:57 +02:00
Bartosz Taudul
36de7b2cc7 Fix incomplete headers. 2019-07-25 23:41:42 +02:00
Bartosz Taudul
e659220602 Use generic std::call_once() on other platforms. 2019-07-25 23:30:47 +02:00
Bartosz Taudul
5f96c55a3e Add background tasks notification tooltip. 2019-07-25 21:21:20 +02:00
Bartosz Taudul
9f29ddd562 Messages window should be scrollable due to thread list. 2019-07-25 20:50:30 +02:00
Bartosz Taudul
d3e8fe0133 Add messages filter clear button. 2019-07-25 20:49:44 +02:00
Bartosz Taudul
31aeadeba0 Update NEWS. 2019-07-25 20:45:16 +02:00
Bartosz Taudul
0f574b5547 Verify source file modification time against capture time. 2019-07-25 20:44:10 +02:00
Bartosz Taudul
16a40f2e1f Revert "Explicitly link with required libraries."
This reverts commit abaa0e8f6e.
2019-07-25 20:41:58 +02:00
Bartosz Taudul
c4b472b6e0 Ditto for call stack window. 2019-07-25 20:34:35 +02:00
Bartosz Taudul
269c3d4530 Keep statistics window headers always on top of the window. 2019-07-25 19:57:29 +02:00
Bartosz Taudul
2291b91ee0 Remove unnecessary separators. 2019-07-25 19:50:22 +02:00
Bartosz Taudul
d31d1f5946 Detect and report clang-cl. 2019-07-25 19:03:58 +02:00
Bartosz Taudul
55c9f41060 Vulkan queue doesn't need to be stored. 2019-07-25 18:54:32 +02:00
Bartosz Taudul
5a81fd5e6b Try updating to newest vcpkg on CI. 2019-07-25 18:46:57 +02:00
Bartosz Taudul
30f76d34a3 Fix printf warnings. 2019-07-25 18:41:52 +02:00
Bartosz Taudul
37c76edcd8 Explicitly require long long abs(). 2019-07-25 18:36:27 +02:00
Bartosz Taudul
abaa0e8f6e Explicitly link with required libraries.
This fixed clang-cl build.
2019-07-25 18:30:34 +02:00
Bartosz Taudul
1b79c35aac Don't use char8_t. 2019-07-25 12:58:16 +02:00
Bartosz Taudul
bd66c9dc5a Update NEWS. 2019-07-24 23:55:26 +02:00
Bartosz Taudul
7074b8ed8f Display notification popup during trace cleanup. 2019-07-24 23:54:47 +02:00
Bartosz Taudul
076bf1b475 Go back to build directory. 2019-07-24 23:32:41 +02:00
Bartosz Taudul
b19d633d68 Update tech docs. 2019-07-24 23:15:45 +02:00
Bartosz Taudul
a7e0b1614a Update manual. 2019-07-24 23:14:53 +02:00
Bartosz Taudul
52452038ad Update CI config. 2019-07-24 22:33:15 +02:00
Bartosz Taudul
57615775ea Migrate to VS2019, vcpkg. 2019-07-24 22:24:17 +02:00
Bartosz Taudul
e5a3d7aa25 Workaround scroll-to-message regression. 2019-07-24 21:40:39 +02:00
Bartosz Taudul
9ad9045078 Disable messages following when focusing on a message. 2019-07-24 02:21:51 +02:00
Bartosz Taudul
d908e148ec Update DXT1 AVX timings. 2019-07-22 20:01:16 +02:00
Bartosz Taudul
092e830264 Use shifts instead of const vector and. 2019-07-22 19:56:47 +02:00
Bartosz Taudul
cdbaec38eb Update tech docs. 2019-07-20 16:46:54 +02:00
Bartosz Taudul
db80673b9e Update DXT1 AVX timings. 2019-07-20 14:54:52 +02:00
Bartosz Taudul
178dc9eba7 Combine block data directly in AVX registers. 2019-07-20 14:52:34 +02:00
Bartosz Taudul
396c28011e Update DXT1 compression timings. 2019-07-19 22:16:33 +02:00
Bartosz Taudul
a6300ef7d1 Ditto on ARM. 2019-07-19 22:13:56 +02:00
Bartosz Taudul
dc49f2f76a Move DXT1 index conversion to server. 2019-07-19 21:46:58 +02:00
Bartosz Taudul
11ba77ced5 Use pthread_once() to initialize rpmalloc on linux. 2019-07-19 20:15:56 +02:00
Bartosz Taudul
4c28593031 Fix races in rpmalloc initialization.
Ensure rpmalloc_thread_initialize() int worker threads is called only after
rpmalloc_initialize() was called on the main profiler thread.
2019-07-19 19:25:27 +02:00
Bartosz Taudul
cef8124247 Replace or with addition to enable usra instruction. 2019-07-19 01:40:27 +02:00
Bartosz Taudul
fd4689a6e2 Don't perform unnecessary ands. 2019-07-19 01:19:52 +02:00
Bartosz Taudul
06296283b7 Fix texture completeness. 2019-07-19 00:53:34 +02:00
Bartosz Taudul
5da2076214 Add optional 2x zoom to frame images playback. 2019-07-19 00:51:52 +02:00
Bartosz Taudul
1c0c5f5282 Disable bilinear filtering for frame images. 2019-07-19 00:51:42 +02:00
Bartosz Taudul
dc992266fd Simplify OpenGL query checks. 2019-07-16 19:42:06 +02:00
Bartosz Taudul
9ec6c1e12d Basic technical documentation. 2019-07-15 21:00:12 +02:00
Bartosz Taudul
b99315ffbe Add some notes on how to get the most accurate results. 2019-07-13 20:49:56 +02:00
Bartosz Taudul
74a40c230f MinGW is also supported. 2019-07-13 20:49:50 +02:00
Bartosz Taudul
0ce93f714b Cosmetics. 2019-07-13 20:49:36 +02:00
Bartosz Taudul
ff9637e884 Update DXT1 timings table.
Clang is able to get much better times on ARM (around 430 us for both
ARM32 and ARM64 NEON). The reference implementation is 1.13 ms on clang.
2019-07-13 20:24:58 +02:00
Bartosz Taudul
f65373ece7 Replace two packs with one shuffle. 2019-07-13 20:01:12 +02:00
Bartosz Taudul
fc83f97ad3 Same for AVX/SSE. 2019-07-13 19:34:08 +02:00
Bartosz Taudul
62a167541c No need to mask out indices. 2019-07-13 19:07:25 +02:00
Bartosz Taudul
5633dc5a87 Add ARM64 NEON timings for DXT1 compression. 2019-07-13 15:32:07 +02:00
Alex
0c5ea710b0 Merged in z33ky/tracy/const-frame-image (pull request #37)
Constify frame-image pointer in API.
2019-07-13 13:09:21 +00:00
Bartosz Taudul
7bb9549e84 ARM64 specific NEON implementation of DXT1 compression. 2019-07-13 14:31:33 +02:00
Alexander 'z33ky' Hirsch
c6e8dc8d63 Constify frame-image pointer in API. 2019-07-13 12:33:55 +02:00
Bartosz Taudul
4c93952ffb Update manual. 2019-07-13 02:03:26 +02:00
Bartosz Taudul
eceff55f5a Add message filtering. 2019-07-13 01:48:43 +02:00
Bartosz Taudul
4944efa51f Update NEWS. 2019-07-13 01:27:26 +02:00
Bartosz Taudul
387674a40a Auto-scroll message list to bottom. 2019-07-13 01:25:37 +02:00
Bartosz Taudul
bcecd6e3a6 Always keep message list options at top. 2019-07-13 00:40:02 +02:00
Bartosz Taudul
c48ab4cb23 Use big font in trace information window. 2019-07-12 19:19:36 +02:00
Bartosz Taudul
7fb9bde9e9 Pass big font to TracyView. 2019-07-12 19:16:56 +02:00
Bartosz Taudul
fc28f827bc Rearrange trace information window. 2019-07-12 19:12:04 +02:00
Bartosz Taudul
5a4c7518ed Update manual. 2019-07-12 19:03:05 +02:00
Bartosz Taudul
290c895f83 Update NEWS. 2019-07-12 18:47:20 +02:00
Bartosz Taudul
2e774f4626 Save/load application info. 2019-07-12 18:45:35 +02:00
Bartosz Taudul
8c9d46ef29 Display application info in info window. 2019-07-12 18:39:07 +02:00
Bartosz Taudul
d64ab7db5a Store app info messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
60d2384a6a Allow sending application information messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
cd018e88a4 Update manual. 2019-07-11 20:32:39 +02:00
Bartosz Taudul
689f4999e3 Reorder threads by drag and drop. 2019-07-11 20:29:20 +02:00
Bartosz Taudul
29d8911c6b Fix Vector::erase(). 2019-07-11 20:29:20 +02:00
Bartosz Taudul
a1ce5fc1f6 Add include for built-in __get_cpuid() on gcc/clang. 2019-07-10 02:09:19 +02:00
Bartosz Taudul
90369335cf Update NEWS. 2019-07-10 02:05:27 +02:00
Bartosz Taudul
c164a70b9d Check for rdstcp/invariant tsc support. 2019-07-10 02:04:14 +02:00
Bartosz Taudul
c0670848d2 Reuse variable. 2019-07-08 02:08:06 +02:00
Bartosz Taudul
05dd9a5e59 Update DXT1 timings. 2019-07-08 00:16:06 +02:00
Bartosz Taudul
17dbbe67de Remove dependency on range subtraction. 2019-07-08 00:14:36 +02:00
Bartosz Taudul
a33205e3bd Update DXT1 timings. 2019-07-08 00:01:57 +02:00
Bartosz Taudul
af1bd3e1fa Faster horizontal add. 2019-07-07 23:57:23 +02:00
Bartosz Taudul
bde9045af5 Update DXT1 timings.
SSE takes a hit due to unfavourable codegen.
2019-07-06 00:51:19 +02:00
Bartosz Taudul
b32e8fa24e Ditto for NEON. 2019-07-06 00:18:53 +02:00
Bartosz Taudul
d236d4b70f Ditto for AVX2. 2019-07-06 00:05:32 +02:00
Bartosz Taudul
f62b21c21d Masking alpha out is not needed.
We assume that alpha value is constant for the whole image. The range
calculation is max - min, so alpha zeroes out. The color normalization
to range is color - min, so alpha also zeroes out here.
2019-07-05 23:58:19 +02:00
Bartosz Taudul
e9676ea1d5 Update DXT1 timings. 2019-07-05 18:38:52 +02:00
Bartosz Taudul
03189a30b8 Two ands less in NEON DXT1 compression. 2019-07-05 18:37:25 +02:00
Bartosz Taudul
275d992cb1 Two ands less in AVX2 DXT1 compression. 2019-07-05 18:22:42 +02:00
Bartosz Taudul
c89358d6b9 Two ands less in SSE DXT1 compression. 2019-07-05 18:17:50 +02:00
Bartosz Taudul
5bfc62f1bf iOS device name decoding. 2019-06-19 09:59:46 +02:00
Bartosz Taudul
59b4f84ce5 Display unknown implementer, part as hex values. 2019-07-03 21:18:17 +02:00
Bartosz Taudul
c6f6c368b2 Decode ARM CPU names. 2019-07-03 21:01:34 +02:00
Bartosz Taudul
e26ab8e9f6 Make forwarding functions more compact. 2019-07-03 18:05:38 +02:00
Bartosz Taudul
94b470ba66 Chomp newline from end of thread string. 2019-07-03 16:12:37 +02:00
Bartosz Taudul
d664b93ae0 Describe why there's no CPU usage graph in android traces. 2019-07-03 00:08:30 +02:00
Bartosz Taudul
f80a0a87bd Remove const qualifier from TracyCZoneCtx.
Some containers don't support storing const types. This struct, as
visible to user, is immutable, so treat it as if const was declared
here.
2019-07-01 19:16:15 +02:00
Bartosz Taudul
080ec6e836 Expand manual wrt manual zone scope management. 2019-07-01 18:29:24 +02:00
Bartosz Taudul
bdfb568742 Fix div tables for max range on all channels. 2019-07-01 12:31:06 +02:00
Bartosz Taudul
684a119a2c Fix order of checks for including intrinsics. 2019-07-01 11:45:16 +02:00
Bartosz Taudul
6b06b64caf Smaller histogram controls. 2019-06-30 18:11:19 +02:00
Bartosz Taudul
3c45476012 Update timings again. 2019-06-30 12:16:22 +02:00
Bartosz Taudul
983c48994b Write block data directly to memory. 2019-06-30 11:44:32 +02:00
Bartosz Taudul
9b8c18f99e Improve readability. 2019-06-30 11:44:00 +02:00
Bartosz Taudul
43042a2aa8 Update DXT1 timings table. 2019-06-30 03:39:37 +02:00
Bartosz Taudul
52b6bdb55a Force inline ProcessRGB functions. 2019-06-30 03:33:14 +02:00
Bartosz Taudul
ddd89dcce5 Add DXT1 AVX2 timings. 2019-06-30 03:23:20 +02:00
Bartosz Taudul
8c06f7288c AVX2 DXT1 compression. 2019-06-30 03:20:58 +02:00
Bartosz Taudul
a1e3d9765f Update DXT1 SSE timings. 2019-06-29 12:23:29 +02:00
Bartosz Taudul
2e893bba91 Use division tables. 2019-06-29 12:16:49 +02:00
Bartosz Taudul
b73f428739 Add DXT1 div table generator. 2019-06-29 11:49:52 +02:00
Bartosz Taudul
370fead4b2 Update DXT1 timings table. 2019-06-29 02:10:35 +02:00
Bartosz Taudul
ab9f036f5e Integrate CheckSolid into ProcessRGB. 2019-06-29 02:04:08 +02:00
Bartosz Taudul
50ac219e97 Update NEON timings in DXT1 table. 2019-06-28 22:40:04 +02:00
Bartosz Taudul
faf6bb97a4 DXT1 NEON color index packing. 2019-06-28 22:36:44 +02:00
Bartosz Taudul
4ee45259f2 Update SSE timings in DXT1 table. 2019-06-28 22:00:59 +02:00
Bartosz Taudul
2df1eaaa7e Pack color indices using SSE. 2019-06-28 21:58:10 +02:00
Bartosz Taudul
d593e5dfa9 DXT1 SIMD color index table generator. 2019-06-28 21:57:38 +02:00
Bartosz Taudul
3208d6c803 Add ARM NEON DXT1 compression timings to manual. 2019-06-28 14:26:00 +02:00
Bartosz Taudul
fcb5b4b888 NEON DXT1 compression. 2019-06-28 14:24:16 +02:00
Bartosz Taudul
e8d4ba492b Unify shifts. 2019-06-28 13:05:32 +02:00
Bartosz Taudul
be4900c822 NEON CheckSolid. 2019-06-28 01:47:04 +02:00
Bartosz Taudul
33486fa3cf Update ARM timings. 2019-06-27 22:47:26 +02:00
Bartosz Taudul
3c066f1527 Simplify code. 2019-06-27 22:40:03 +02:00
Bartosz Taudul
77c6acbc48 Update manual. 2019-06-27 22:30:05 +02:00
Bartosz Taudul
72a0d4c2ab Rest of SSE DXTC compression. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
137b28e110 SSE CheckSolid. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
3d590b6b8c Initialize rpmalloc in compression thread. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
10bcc8c770 Switch to DXT1 textures in profiler utility. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
1939c31165 Experimental DXT1 compressor. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
79eb1b9029 Swap queue and dequeue only if queue has contents. 2019-06-27 13:37:09 +02:00
Bartosz Taudul
aa4ce30dff Update manual. 2019-06-27 13:32:57 +02:00
Bartosz Taudul
7dc7ece2bd Add staging area for frame images.
Compressing frame images on a separate thread may cause frame image
arrival before frames are sent. Fix this issue by creating a staging
area in which frame images will wait for frames to arrive.

This probably breaks playback functionality, as non-existent frames may
be queried, but this problem seems to be very hard to find, so let's
ignore it for now.
2019-06-27 13:24:35 +02:00
Bartosz Taudul
bb35f9a897 Compress frame images in a separate thread. 2019-06-27 13:24:35 +02:00
Bartosz Taudul
7ebd2162c6 Add ETC1 compression thread. 2019-06-26 22:57:24 +02:00
Bartosz Taudul
f565e11976 Store frame images in queue. 2019-06-26 22:52:24 +02:00
Bartosz Taudul
fc106079c5 Remove CPU migration highlight for zones. 2019-06-26 21:35:09 +02:00
Bartosz Taudul
3bf23e15bb Update manual. 2019-06-26 21:07:12 +02:00
Bartosz Taudul
e3aa0a5c88 Update NEWS. 2019-06-26 21:03:34 +02:00
Bartosz Taudul
bc3c375b02 Display crash icon in notification area. 2019-06-26 21:02:04 +02:00
Bartosz Taudul
b8794f64be Extract crash tooltip to a separate function. 2019-06-26 21:01:54 +02:00
Bartosz Taudul
281dcf7c1f Cast to proper types. 2019-06-26 19:33:37 +02:00
Bartosz Taudul
8ce41b3543 Proper init order of thread local thread handle. 2019-06-26 19:32:52 +02:00
Bartosz Taudul
64980a1e6f Use async resolv service. 2019-06-26 18:49:21 +02:00
Bartosz Taudul
5e97e83401 Address can't change. 2019-06-26 18:46:51 +02:00
Bartosz Taudul
913c1e57a6 Add threaded resolv service. 2019-06-26 18:46:51 +02:00
Bartosz Taudul
a8cb257474 Revert "Resolve client host name using DNS."
This reverts commit 48df667a37.
2019-06-26 17:58:23 +02:00
Bartosz Taudul
0aa0b4ac8a Try lower query counts in out-of-memory situations. 2019-06-26 16:43:56 +02:00
Bartosz Taudul
659631fc06 Make vulkan query count variable. 2019-06-26 16:42:51 +02:00
Bartosz Taudul
bc7f2c49c8 GetThreadHandle() might be used by application's code. 2019-06-25 15:44:49 +02:00
Bartosz Taudul
0b656c3469 Update manual. 2019-06-24 21:15:33 +02:00
Bartosz Taudul
9ca254307a Add callstack versions of C API macros. 2019-06-24 21:10:41 +02:00
Bartosz Taudul
c749a2e3fe Add C API for plots and messages. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
48e08acb62 Add C API for frame markup. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
ee99ce833c Implement memory allocation tracking for C API. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
281477f7f9 Tokens must be retrieved for each enqueue. 2019-06-24 20:12:14 +02:00
Bartosz Taudul
06a41708a7 Move TLS accesses close together. 2019-06-24 19:38:44 +02:00
Bartosz Taudul
c4f0965851 Don't use cached thread id to retrieve main thread id. 2019-06-24 19:38:07 +02:00
Bartosz Taudul
a56c47a6a0 Store thread handle in a thread local variable.
This saves us a non-inlineable function call. Thread local block is
accessed anyway, since we need to get the token, so we already have the
pointer and don't need to get it a second time (which is done inside
Windows' GetCurrentThreadId()). We also don't need to store the thread
id in ScopedZone anymore, as it was a micro-optimization to save us the
second GetThreadHandle() call.

This change has a measurable effect of reducing enqueue time from ~10 to
~8 ns.

A further optimization would be to completely skip thread handle
retrieval during zone capture and do it instead on retrieval of data
from the queue. Since each thread has its own producer ("token"), the
thread handle should be accessible during the dequeue operation. This is
a much more invasive change, that would require a) modification of the
queue, b) additional processing of dequeued data to inject the thread
handle.
2019-06-24 19:19:47 +02:00
Bartosz Taudul
46b75c5a19 Only enable tracy-internal GetThreadHandle if tracy is enabled. 2019-06-24 19:18:52 +02:00
Bartosz Taudul
79bfac9ca9 Use proper popcnt for gcc/clang (including cygwin). 2019-06-24 18:56:04 +02:00
Bartosz Taudul
9375afdbed All variables must be defined before goto. 2019-06-23 00:36:25 +02:00
Bartosz Taudul
6bdfedead2 Update nfd to ceb75f7abf3. 2019-06-23 00:35:19 +02:00
Bartosz Taudul
815ad7df28 Update manual. 2019-06-23 00:21:56 +02:00
Bartosz Taudul
a8dcd5d153 Ctrl-click on frame in frame overview to show playback window. 2019-06-23 00:11:46 +02:00
Bartosz Taudul
f125254d14 Cosmetics. 2019-06-23 00:00:16 +02:00
Bartosz Taudul
2f707bd152 Improve frame label drawing logic. 2019-06-22 23:49:30 +02:00
Bartosz Taudul
7217a99dc2 Always show at least one pixel of a frame in frame overview. 2019-06-22 22:48:32 +02:00
Bartosz Taudul
c48cd10f35 Don't divide by zero in zero-length zones. 2019-06-22 22:42:57 +02:00
Bartosz Taudul
1d4117f515 Fix typo. 2019-06-22 14:55:01 +02:00
Bartosz Taudul
ad26eaa9f1 Don't put "select/unselect all" buttons in a separate line. 2019-06-22 14:43:58 +02:00
Bartosz Taudul
0944eab707 Add background tasks icon. 2019-06-22 14:37:17 +02:00
Bartosz Taudul
4d4190c825 Update NEWS. 2019-06-22 14:25:35 +02:00
Bartosz Taudul
e33690c5f3 Allow switching whitespace visibility in source code view. 2019-06-22 14:24:39 +02:00
Bartosz Taudul
53fe688bff Update ImGuiColorTextEdit to 0a88824f7de8d. 2019-06-22 14:19:10 +02:00
Bartosz Taudul
18cef20db9 Silence signed/unsigned comparison warnings. 2019-06-22 14:15:25 +02:00
Bartosz Taudul
8f7be5a0fa Allow only 2^32-1 frame images. 2019-06-22 14:11:45 +02:00
Bartosz Taudul
fadf8e3e0a Can't read negative number of bytes.
This completely ignores error handling, which probably should be added.
The code behavior doesn't change, as the existing comparisons and
asserts already promoted the signed value to unsigned.
2019-06-22 14:08:48 +02:00
Bartosz Taudul
1c41229766 Use proper type for buffer size comparison. 2019-06-22 14:07:53 +02:00
Bartosz Taudul
70a7033a64 Use proper type for iteration. 2019-06-22 14:07:26 +02:00
Bartosz Taudul
1ea647a1dd Use proper type for srcloc highlight decay value. 2019-06-22 14:06:25 +02:00
Bartosz Taudul
aaefd6e1d6 Simplify code. 2019-06-22 14:06:10 +02:00
Bartosz Taudul
6a82f666a7 Cosmetics. 2019-06-22 14:05:18 +02:00
Bartosz Taudul
54ae4c84ba Silence warning about unused variable. 2019-06-22 14:04:48 +02:00
Bartosz Taudul
de953bfaa8 Use proper data type for callstack storage in GPU zones. 2019-06-22 14:04:27 +02:00
Bartosz Taudul
323f0e1ae3 Don't create variable for exception in catch block. 2019-06-22 13:41:24 +02:00
Bartosz Taudul
eb4c7ca9ea Ignore useless warnings. 2019-06-22 13:40:00 +02:00
Bartosz Taudul
a3ce08a9f9 Display zone time as percentage of average zone time. 2019-06-22 13:22:13 +02:00
Bartosz Taudul
5fde56d96a Remove hidden zone time without profiling tooltip. 2019-06-22 13:10:46 +02:00
Bartosz Taudul
850815534e Insert frame mark at beginning of on-demand connection. 2019-06-21 19:39:41 +02:00
Bartosz Taudul
fd9fc880a6 Send current time in on-demand welcome message. 2019-06-21 19:39:41 +02:00
Bartosz Taudul
48df667a37 Resolve client host name using DNS.
This operation is blocking and should be made asynchronous.
2019-06-21 19:27:41 +02:00
Bartosz Taudul
659ef87974 Animate highlighted messages on the timeline. 2019-06-21 14:25:51 +02:00
Bartosz Taudul
bb44e80e5a Use smaller UI elements in selected places. 2019-06-21 14:15:46 +02:00
Bartosz Taudul
8259816de3 Improve playback interruptions on user input. 2019-06-21 13:08:41 +02:00
Bartosz Taudul
a916c28269 Build test application on appveyor. 2019-06-19 22:17:11 +02:00
Bartosz Taudul
ae4f9663aa Selecting frames stops playback. 2019-06-19 20:05:23 +02:00
Bartosz Taudul
51135c1d20 Pulse hover-info line on histograms. 2019-06-19 20:01:41 +02:00
Bartosz Taudul
d44c4b00fb Implement outliers cutoff in compare menu. 2019-06-18 22:27:25 +02:00
Bartosz Taudul
d66be0e033 Update manual. 2019-06-18 21:02:49 +02:00
Bartosz Taudul
3fcd73680c Simulate client activity time advancement. 2019-06-18 20:56:42 +02:00
Bartosz Taudul
800d95c089 Display discovered clients activity times. 2019-06-18 20:51:12 +02:00
Bartosz Taudul
5309e6d94a Broadcast client activity time. 2019-06-18 20:46:12 +02:00
Bartosz Taudul
1a32edebf2 Extract text printing functions. 2019-06-18 20:43:28 +02:00
Bartosz Taudul
aa5259b20a Use the same port (8086) for both TCP and UDP traffic. 2019-06-18 20:28:03 +02:00
Bartosz Taudul
0e5a7263d9 Define broadcast message, add versioning. 2019-06-18 20:26:40 +02:00
Bartosz Taudul
0b394c3f53 Don't need to keep last broadcast time in Profiler class. 2019-06-18 20:15:09 +02:00
Bartosz Taudul
99e638b3fc Normalize values during compare by default. 2019-06-18 19:41:20 +02:00
Bartosz Taudul
5e6bc30bab Support GL_EXT_disjoint_timer_query with EXT postfix. 2019-06-18 16:34:27 +02:00
Bartosz Taudul
2d3e7ee796 More aggressive broadcast repeat timeout. 2019-06-18 00:54:58 +02:00
Bartosz Taudul
53863fe0e7 Set sane initial window sizes. 2019-06-17 23:49:10 +02:00
Bartosz Taudul
ae70f694dd Update manual. 2019-06-17 20:25:25 +02:00
Bartosz Taudul
b8b1fae900 Don't confuse user by suggesting the list is complete. 2019-06-17 20:24:47 +02:00
Bartosz Taudul
dd4c61e964 Update NEWS. 2019-06-17 20:04:14 +02:00
Bartosz Taudul
11dc8e67e5 Change broadcast rate from 5s to 3s. 2019-06-17 19:57:17 +02:00
Bartosz Taudul
6bf8081f5b Remove debug leftovers. 2019-06-17 19:52:44 +02:00
Bartosz Taudul
12e44fc605 Missing include. 2019-06-17 19:51:58 +02:00
Bartosz Taudul
5a359aa376 Allow connecting to broadcasting clients. 2019-06-17 19:50:34 +02:00
Bartosz Taudul
67daff1452 Display list of broadcasting clients. 2019-06-17 19:45:47 +02:00
Bartosz Taudul
36989da2c6 Also store client address. 2019-06-17 19:45:36 +02:00
Bartosz Taudul
265913d969 Process client broadcasts. 2019-06-17 19:34:48 +02:00
Bartosz Taudul
e0bbb41976 Add UDP listen socket and IP address wrapper. 2019-06-17 19:23:43 +02:00
Bartosz Taudul
de058d2a0d Don't hardcode broadcast port. 2019-06-17 18:37:34 +02:00
Bartosz Taudul
1b3b3a94a2 Broadcast protocol version and process name. 2019-06-17 18:34:35 +02:00
Bartosz Taudul
0b9ef7e514 Disable broadcast if TRACY_NO_BROADCAST is defined. 2019-06-17 18:18:58 +02:00
Bartosz Taudul
e609c0fdce UDP broadcast loop. 2019-06-17 02:25:09 +02:00
Bartosz Taudul
40e517594b Add UDP broadcast socket. 2019-06-17 02:24:55 +02:00
Bartosz Taudul
5db6cc4eee Update manual. 2019-06-17 01:24:48 +02:00
Bartosz Taudul
60f0b81faf More compact welcome dialog. 2019-06-17 01:21:55 +02:00
Bartosz Taudul
38ebc2e989 Add icon to "go to frame" button. 2019-06-17 01:13:32 +02:00
Bartosz Taudul
eed849c589 Add reset button to min bin value fields. 2019-06-17 01:12:24 +02:00
Bartosz Taudul
add5c0fb87 Perform proper division. 2019-06-17 01:09:25 +02:00
Bartosz Taudul
b2bbd95430 Changing log time requires bin cache reset. 2019-06-17 01:05:46 +02:00
Bartosz Taudul
e30cf7eafa Update NEWS. 2019-06-17 01:02:52 +02:00
Bartosz Taudul
f27cead040 Add hovered frame markers on histogram. 2019-06-17 01:01:56 +02:00
Bartosz Taudul
099933e66d Add outlier removal to frame time histogram. 2019-06-17 00:44:34 +02:00
Bartosz Taudul
b1f49d4c69 Update manual. 2019-06-16 17:22:29 +02:00
Bartosz Taudul
507e4db14b Update NEWS. 2019-06-16 17:15:47 +02:00
Bartosz Taudul
efe65e2e64 Display currently hovered zone on histogram. 2019-06-16 17:14:47 +02:00
Bartosz Taudul
6a4f7ce1ca Track currently hovered zone. 2019-06-16 17:05:56 +02:00
Bartosz Taudul
6e8b5381a5 Ctrl-click on a zone to go straight to zone statistics. 2019-06-16 17:00:25 +02:00
Bartosz Taudul
d361261993 Open playback from frame using ctrl+left click. 2019-06-16 16:49:21 +02:00
Bartosz Taudul
d683699ba9 Don't recalculate histogram bins every frame.
This remedies slowdown that was only visible when a histogram of a large
number of zones (~100 million) was displayed. The slowdown was caused by
std::accumulate() calls over whole set of zones.
2019-06-16 16:41:52 +02:00
Bartosz Taudul
14398dd4e8 Move bin setup closer to bin usage. 2019-06-16 16:29:18 +02:00
Bartosz Taudul
761405e2a7 Clip histogram highlight to graph area. 2019-06-16 16:23:24 +02:00
Bartosz Taudul
26178dfb00 Update manual. 2019-06-16 02:23:11 +02:00
Bartosz Taudul
df163f627b Update NEWS. 2019-06-16 01:59:30 +02:00
Bartosz Taudul
89f798158f Implement outlier cutoff on histogram. 2019-06-16 01:58:44 +02:00
Bartosz Taudul
8009c6412e Add "minimum values in bin" parameter to histogram. 2019-06-16 01:58:44 +02:00
Bartosz Taudul
4186a71ee7 Cache sorted begin and end iterators. 2019-06-16 01:28:36 +02:00
Bartosz Taudul
26f223e4cd Don't shrink histogram bin buffers. 2019-06-16 00:25:22 +02:00
Bartosz Taudul
f7a590de98 Improve ETC1 timing table. 2019-06-15 21:08:35 +02:00
Bartosz Taudul
103be314e7 Update NEON ETC1 compression timings. 2019-06-15 15:38:05 +02:00
Bartosz Taudul
014c3ed63b Use non-reference, optimized NEON ETC1 compression. 2019-06-15 15:35:57 +02:00
Bartosz Taudul
31a4a45b14 Ignore memory free faults if running on apple.
There's a case in MoltenVK initialization where overloading operator new
and operator delete works for std::string destruction, but not
construction.
2019-06-13 14:15:17 +02:00
Bartosz Taudul
ab4e99229d Indicate whether client is running on apple shitware. 2019-06-13 14:05:15 +02:00
Bartosz Taudul
e05669a80f Add ETC1 compression timings for ARM. 2019-06-13 02:12:03 +02:00
Bartosz Taudul
e5d5abf59a Add NEON path for ETC1 compression. 2019-06-13 02:04:19 +02:00
Bartosz Taudul
5c1bae812a Add frame images to test application. 2019-06-13 01:53:47 +02:00
Bartosz Taudul
2eb67b4684 Add test image. 2019-06-13 01:48:13 +02:00
Bartosz Taudul
516bdcec9b Rewrite playback logic. 2019-06-13 00:12:06 +02:00
Bartosz Taudul
c43f8562ec Rename "sync view" to "sync timeline". 2019-06-12 23:46:14 +02:00
Bartosz Taudul
756379b9d8 Update manual. 2019-06-12 23:45:27 +02:00
Bartosz Taudul
bdfd2c07be Right-click on a frame to set frame in playback. 2019-06-12 23:14:19 +02:00
Bartosz Taudul
796ca57067 Update imgui to 1.71. 2019-06-12 22:53:23 +02:00
Bartosz Taudul
d3e0163dd4 Add byteswap for apple. 2019-06-12 16:54:44 +02:00
Bartosz Taudul
8827f568e4 Update manual. 2019-06-12 15:35:00 +02:00
Bartosz Taudul
afa967afb0 Flip frame image if need be. 2019-06-12 15:30:08 +02:00
Bartosz Taudul
37d1457b44 Frame image may need flipping. 2019-06-12 15:28:32 +02:00
Bartosz Taudul
29fd4b1fe9 Don't animate frame changes during playback. 2019-06-12 13:25:45 +02:00
Bartosz Taudul
61bad76e5a Update manual. 2019-06-12 01:48:11 +02:00
Bartosz Taudul
a936f22a91 Add frame images playback window. 2019-06-12 01:48:11 +02:00
Bartosz Taudul
eb6ac5e6e1 Store frame reference in frame images. 2019-06-12 00:55:02 +02:00
Bartosz Taudul
38b76ea32d Add frame images vector accessor. 2019-06-12 00:14:44 +02:00
Bartosz Taudul
04dd33f5c4 Fix mismatched linkage. 2019-06-11 23:51:12 +02:00
Rokas K. (rku)
c4e05b6264 Merged in rokups/tracy/dllimport-cleanup (pull request #36)
Clean up imported functions in multi-dll projects.

Approved-by: Till Rathmann <till.rathmann@gmx.de>
2019-06-11 15:04:34 +00:00
Bartosz Taudul
de544ef959 Update manual. 2019-06-11 02:25:03 +02:00
Bartosz Taudul
84a52c5d62 Add join discord button. 2019-06-11 02:12:34 +02:00
Bartosz Taudul
57b8b425ba Discard send buffer data after disconnect. 2019-06-10 02:11:29 +02:00
Bartosz Taudul
2cf50427be Add FastVector to natvis. 2019-06-10 01:50:26 +02:00
Bartosz Taudul
2bc7a9bd30 Close listen socket in destructor. 2019-06-09 18:14:26 +02:00
Bartosz Taudul
5f8eadfb16 Release zone id stack. 2019-06-09 17:56:41 +02:00
Bartosz Taudul
a3173965d6 Same for Vis() reference. 2019-06-09 17:51:37 +02:00
Bartosz Taudul
2aa6f70765 Drawing locks may invalidate Vis() iterator. 2019-06-09 17:46:59 +02:00
Bartosz Taudul
80dff1ede1 Add connection id for on-demand mode.
Long-lived zones could send their end events without begin events in a
following scenario:

1. On-demand connection is made.
2. Zone begin is emitted, m_active is set to true.
3. Connection is terminated.
4. A new connection is made.
5. Zone end is emitted, because m_active is true.

To this point it was assumed that all zone end events will happen before
a new connection is made, but it's not necessarily true.
2019-06-09 17:15:47 +02:00
Bartosz Taudul
0db9c73d76 Immediately react to connection termination. 2019-06-09 16:51:39 +02:00
Bartosz Taudul
cc5bad294a More strict memory ordering for on-demand connection status. 2019-06-09 16:48:00 +02:00
Bartosz Taudul
e2d42fae2f We're done here, don't try to send termination request. 2019-06-09 16:25:52 +02:00
Bartosz Taudul
496f866add Don't send data when connection is terminated.
There are only two cases for which HandleServerQuery() returns false.
Either data can't be read from the socket (which is checked by HasData()
call before calling HandleServerQuery()), or if the server sent
termination query. In both these cases there's no need to send data
anymore.
2019-06-09 16:19:40 +02:00
Bartosz Taudul
23e7850162 Make DequeueStatus enum class. 2019-06-09 16:14:30 +02:00
Bartosz Taudul
34d89d39a1 Prevent double freeing of socket. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
b1f8d9fba1 Send server termination query on server disconnect. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
2c780f1af4 Allow sending immediate termination query from server. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
139299389b Add comments to client connection handling. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
d6d7b82529 Ignore invalid frame images in on-demand mode. 2019-06-09 15:37:49 +02:00
Bartosz Taudul
4c2ff80ac8 Restore frame counting for on-demand mode. 2019-06-09 15:23:01 +02:00
Bartosz Taudul
50cda7720f Handle frame image instrumentation failures. 2019-06-09 13:44:53 +02:00
Bartosz Taudul
5d5b12dce4 Add a note about expected lifetime of image data. 2019-06-09 13:20:46 +02:00
Bartosz Taudul
99c8144154 Show performance difference of async capture. 2019-06-09 13:17:08 +02:00
Bartosz Taudul
22d7b2c78d Polishing words. 2019-06-09 12:50:14 +02:00
Bartosz Taudul
bef1988800 Compress frame images using LZ4. 2019-06-08 12:17:18 +02:00
Bartosz Taudul
c3c116317d Fences must be deleted. 2019-06-08 12:08:20 +02:00
Bartosz Taudul
00a468162d Fix signed/unsigned comparison. 2019-06-08 00:57:25 +02:00
Bartosz Taudul
5470dae120 Add AVX2 ETC1 timings to the manual. 2019-06-08 00:54:46 +02:00
Bartosz Taudul
9ef128995a Add AVX2 version of etcpak. 2019-06-08 00:50:39 +02:00
Bartosz Taudul
7e9539ef2d AVX implies SSE 4.1. 2019-06-08 00:39:19 +02:00
Bartosz Taudul
76379a761a Update manual. 2019-06-08 00:06:37 +02:00
Bartosz Taudul
1954caa806 Fix listings. 2019-06-08 00:04:10 +02:00
Bartosz Taudul
2c01f244a8 Update NEWS. 2019-06-07 20:14:21 +02:00
Bartosz Taudul
fc5a8f7e3a Assign frame image to the correct frame (including offset). 2019-06-07 20:13:08 +02:00
Bartosz Taudul
784c4da53a Include frame offset in frame image message. 2019-06-07 20:09:29 +02:00
Rokas Kupstys
9bd1037347 Clean up imported functions in multi-dll projects. 2019-06-07 19:50:08 +03:00
Bartosz Taudul
8c912890f0 Proper case in includes. 2019-06-07 01:35:35 +02:00
Bartosz Taudul
d271634a95 Keep one ETC1 compression buffer. 2019-06-07 01:29:24 +02:00
Bartosz Taudul
34a6fe7055 _bswap may be already defined. 2019-06-07 01:07:51 +02:00
Bartosz Taudul
ff5170d0e9 Silence warnings. 2019-06-07 01:03:28 +02:00
Bartosz Taudul
42a30bffe1 Frame images are now ETC1 compressed. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
a654b642ef Compress frame images to ETC1 before sending. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
aff3246f82 Add ETC1 compressor. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
646e7327b8 Show loading progress of frame images. 2019-06-06 23:40:37 +02:00
Bartosz Taudul
f8a4909c96 Display number of frame images in a trace. 2019-06-06 23:21:36 +02:00
Bartosz Taudul
9cd95db4e3 Delay creation of frame image texture. 2019-06-06 23:14:49 +02:00
Bartosz Taudul
129155946b Actually set current texture pointer. 2019-06-06 23:10:01 +02:00
Bartosz Taudul
6b2741ccdb Save/load frame images. 2019-06-06 23:08:19 +02:00
Bartosz Taudul
6ae4afa0f4 Display frame images also on frame time graph. 2019-06-06 22:43:39 +02:00
Bartosz Taudul
08fb2b7337 Tooltip cosmetics. 2019-06-06 22:32:20 +02:00
Bartosz Taudul
c46576a68c Flip UV. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
cd2f572a2f Use proper index. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
beea31edd0 Show frame images in frame tooltips. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
82d4fe7236 Add texture wrapper. 2019-06-06 22:14:51 +02:00
Bartosz Taudul
af56f41e32 Add frame image accessor. 2019-06-06 22:14:51 +02:00
Bartosz Taudul
34b84bb284 Add frame image index to frame data. 2019-06-06 21:44:48 +02:00
Bartosz Taudul
e5bb6011c5 Frame image transfer prototype. 2019-06-06 21:39:54 +02:00
Bartosz Taudul
a37348c5c7 Increase contrast. 2019-06-06 01:45:41 +02:00
Bartosz Taudul
2b917c6adf Draw index area labels with contrast. 2019-06-06 01:40:23 +02:00
Bartosz Taudul
45039fc417 Don't format colored text where not necessary. 2019-06-03 01:36:03 +02:00
Bartosz Taudul
4f5286a860 Add unformatted colored text extension function. 2019-06-03 01:35:53 +02:00
Bartosz Taudul
ff6768986e Move imgui extension function to an appropriate place. 2019-06-03 01:35:32 +02:00
Bartosz Taudul
c433e76c7a Use TextUnformatted in TextCentered. 2019-06-03 01:28:45 +02:00
Bartosz Taudul
42fefde161 Protect against plot range equal zero. 2019-06-03 01:19:01 +02:00
Bartosz Taudul
76ed1e666f Remove unused variable. 2019-06-02 20:01:19 +02:00
Bartosz Taudul
d5b3ec9503 Locale keeps being changed by system libraries... 2019-06-02 19:59:31 +02:00
Bartosz Taudul
83b838f783 Use C locale for decimal point character. 2019-06-02 19:39:07 +02:00
Bartosz Taudul
557d4d7de4 Add logo to manual. 2019-06-02 18:12:15 +02:00
Bartosz Taudul
7c7e32d49e Set window icon. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
9e3cf3d387 Add 64x64 embedded png icon. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
c0326b9ba0 Add stb_image. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
8ae33fbb1e Add icon to win32 profiler executable. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
55c1e263b7 Add icon files. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
5181f1be5b Update manual. 2019-06-02 15:57:59 +02:00
Bartosz Taudul
07cb6b0e93 Update NEWS. 2019-06-02 15:40:56 +02:00
Bartosz Taudul
79215ea73e Implement linked selection in compare menu. 2019-06-02 15:40:19 +02:00
Bartosz Taudul
794d155fde Update manual. 2019-06-02 15:22:29 +02:00
Bartosz Taudul
ceaf1226b2 Update NEWS. 2019-06-02 15:06:57 +02:00
Bartosz Taudul
c05766e637 Display notification about worker background tasks. 2019-06-02 15:00:50 +02:00
Bartosz Taudul
5681096486 Track status of worker background tasks. 2019-06-02 15:00:38 +02:00
Bartosz Taudul
96b1df67b9 Get proper yMin, yMax values. 2019-06-02 13:58:30 +02:00
Bartosz Taudul
9bbaab8897 Draw on a correct window. 2019-06-02 13:40:35 +02:00
Bartosz Taudul
3a561b4eed Save thread state should be atomic. 2019-06-02 13:15:55 +02:00
Bartosz Taudul
0059cb3ab0 Switch default namespace display to "short". 2019-06-02 12:57:42 +02:00
Bartosz Taudul
7aca6b72d1 Don't block worker when in save file dialog. 2019-05-28 19:57:18 +02:00
Bartosz Taudul
c93170cd42 Move saving trace dump to a separate thread. 2019-05-28 19:56:18 +02:00
Bartosz Taudul
845f3a2ddf Use std::shared_mutex for locking worker access. 2019-05-28 19:21:53 +02:00
Bartosz Taudul
145ca30df9 There's no __popcnt64 in 32 bit winapi. 2019-05-28 18:18:54 +02:00
Bartosz Taudul
b3812146cb Fix atomics initialization. 2019-05-27 14:09:55 +02:00
Bartosz Taudul
18a9741b5d Use proper check. 2019-05-22 14:19:25 +02:00
Bartosz Taudul
340837e202 Callstack decode for android api <= 21.
libbacktrace/elf.cpp:3249:3: error: use of undeclared identifier 'dl_iterate_phdr'
2019-05-22 14:14:30 +02:00
Bartosz Taudul
84efe070fe Make callstack logic more obvious. 2019-05-22 14:05:44 +02:00
Bartosz Taudul
61d775ecc8 Calculate end point before loop. 2019-05-19 16:26:59 +02:00
Bartosz Taudul
8f85a0da2c Don't check twice for the same thing. 2019-05-19 16:23:19 +02:00
Bartosz Taudul
007e434a05 Force inline FillPages(). 2019-05-19 13:46:53 +02:00
Bartosz Taudul
9122d3516c Force inline GetPage(). 2019-05-19 13:45:02 +02:00
Bartosz Taudul
30c398cd96 Don't allocate memory for empty pages in memory map. 2019-05-19 13:15:54 +02:00
Bartosz Taudul
952e466287 Rearrange code. 2019-05-19 12:47:45 +02:00
Bartosz Taudul
860a4625e1 Update manual. 2019-05-12 16:28:59 +02:00
Bartosz Taudul
7cdd8954dc Update NEWS. 2019-05-12 16:27:26 +02:00
Bartosz Taudul
b95d834891 Split contended and uncontended locks in lock list. 2019-05-12 16:26:19 +02:00
Bartosz Taudul
0da1e8551f Track lock contention status. 2019-05-12 16:17:17 +02:00
Bartosz Taudul
a714cd4369 Typo. 2019-05-12 15:59:53 +02:00
Bartosz Taudul
63066cf6a5 Fix logic error. 2019-05-12 15:48:25 +02:00
Bartosz Taudul
e612cef6c2 Optimize drawing frames. 2019-05-11 13:47:06 +02:00
Bartosz Taudul
fcb052cd13 Update manual. 2019-05-10 21:11:58 +02:00
Bartosz Taudul
fd815655ac Update NEWS. 2019-05-10 21:11:54 +02:00
Bartosz Taudul
7cc5149355 Improve timeline message tooltips. 2019-05-10 20:36:06 +02:00
Bartosz Taudul
74575250a5 Save message color data in trace dumps. 2019-05-10 20:32:47 +02:00
Bartosz Taudul
8cbd83c752 Use message color on message lists. 2019-05-10 20:26:27 +02:00
Bartosz Taudul
4850e19ebd Store color in message data. 2019-05-10 20:26:27 +02:00
Bartosz Taudul
797ebd3caf Cosmetics. 2019-05-10 20:20:08 +02:00
Bartosz Taudul
efc54babe3 Transfer of colored messages. 2019-05-10 20:17:44 +02:00
Bartosz Taudul
6a09229ae7 Remove error bars and collection cost from GPU zone display.
There's no way to know how much this takes on a GPU.
2019-05-10 02:31:23 +02:00
Bartosz Taudul
721a818dcc Visual transition of error bars and collection cost markers. 2019-05-10 02:27:42 +02:00
Bartosz Taudul
9051b6e206 Update imgui color text edit to a179931d. 2019-05-09 18:56:13 +02:00
Bartosz Taudul
273a874de9 Update imgui to 1.70. 2019-05-09 18:53:31 +02:00
Bartosz Taudul
ee1a653667 Update manual. 2019-05-09 13:43:28 +02:00
Bartosz Taudul
0c6dde2eef Update NEWS. 2019-05-09 13:38:37 +02:00
Bartosz Taudul
54c8a882c9 Allow restricting call stack frame tree to active allocations. 2019-05-09 13:37:28 +02:00
Bartosz Taudul
98eaacec90 Update LZ4 to 1.9.1. 2019-05-01 16:53:48 +02:00
Bartosz Taudul
9ec8704dad Don't include LZ4 headers in tracy headers.
The LZ4 implementation is wrapped in tracy namespace, but it also adds
some defines, which may conflict with other LZ4 implementations.
2019-05-01 12:57:42 +02:00
Bartosz Taudul
303bbdd512 Update manual. 2019-04-26 23:29:05 +02:00
Bartosz Taudul
209a34e6ab Update NEWS. 2019-04-26 23:20:51 +02:00
Bartosz Taudul
a18a6869bc Allow limiting frame stats to visible frames. 2019-04-26 23:19:31 +02:00
Bartosz Taudul
fdd96fe251 Allow changing frame set from trace info window. 2019-04-26 22:49:36 +02:00
Bartosz Taudul
0e32de293f Update manual. 2019-04-23 17:20:05 +02:00
Bartosz Taudul
26aa3a23fb Display number of visible data points on a plot. 2019-04-23 17:17:25 +02:00
Bartosz Taudul
2c9d9d0d27 /proc/stat might be inaccessible. 2019-04-04 15:25:26 +02:00
Bartosz Taudul
cffc6e21d3 Use open to open webpage on osx. 2019-04-04 13:58:13 +02:00
Bartosz Taudul
a7886cf82c Replace linear search with hash lookup. 2019-04-03 16:24:16 +02:00
Bartosz Taudul
82dad3fb97 Use proper type. 2019-04-01 20:43:42 +02:00
Bartosz Taudul
d0d5184c04 Set options for proper socket. 2019-04-01 20:14:00 +02:00
Bartosz Taudul
687915299a Don't try to set socket options on invalid socket. 2019-04-01 20:08:27 +02:00
Bartosz Taudul
8d0d6b576a Update manual. 2019-04-01 20:06:43 +02:00
Bartosz Taudul
78e8d4aefe Display query backlog. 2019-04-01 19:55:54 +02:00
Bartosz Taudul
20e6813461 Store send queue size in mbps block. 2019-04-01 19:55:37 +02:00
Bartosz Taudul
d8d30bd875 Update NEWS. 2019-04-01 19:49:57 +02:00
Bartosz Taudul
9010b2c142 Put queries into queue if send buffer is full. 2019-04-01 19:47:29 +02:00
Bartosz Taudul
deeea0ee70 Track space left in send buffer. 2019-04-01 19:37:39 +02:00
Bartosz Taudul
57dff0abc9 Add server query queue. 2019-04-01 19:26:50 +02:00
Bartosz Taudul
c07c6d11b7 Define server query packet. 2019-04-01 19:21:53 +02:00
Bartosz Taudul
57cd6d3ed5 Allow retrieval of socket send buffer size. 2019-04-01 18:50:37 +02:00
Bartosz Taudul
45750a05a1 Only smooth zoom now. 2019-04-01 18:39:09 +02:00
Bartosz Taudul
cd774b9e96 Store two entries in zone self time cache.
This accounts for situation when zone information window is open and a
tooltip for another zone is displayed.
2019-03-30 00:54:22 +01:00
Bartosz Taudul
48a07bf4f8 Cache zone self times. 2019-03-30 00:52:25 +01:00
Bartosz Taudul
b0ab3c6139 High compression mode is now a bit better. 2019-03-27 02:26:39 +01:00
Bartosz Taudul
fef417f286 Store total number of CPU and GPU zones in trace. 2019-03-27 01:46:54 +01:00
Bartosz Taudul
2e6ac050f4 Use custom vector swap. 2019-03-26 23:02:39 +01:00
Bartosz Taudul
6c5efbfdce Implement custom vector swap. 2019-03-26 23:02:32 +01:00
Bartosz Taudul
a632d9e2a3 Add zone vector cache.
Zone children will be now collected in staging vectors. When the zone is
ended (and no children can be added anymore to it), a size-fitted vector
is allocated using slab allocation. The over-allocated vector is then
put into cache for use in future zones.

This is only active for vectors <= 8192 elements, or 64 KB (chosen
arbitrarily), to reduce time spent on copying memory.

Overall, this change should have the following effects:
- System memory allocation pressure reduction, due to re-usage of
  vectors, which eliminates the need for constant growth.
- Reduction of memory usage, because children vectors are now fitted to
  required size.
- Slight increase of zone processing time, due to memory copying?
2019-03-26 22:06:00 +01:00
Bartosz Taudul
11f4dcbf1e Consistent variable naming. 2019-03-26 21:41:44 +01:00
Bartosz Taudul
52f76a45ed Display separators for bin counts in compare menu. 2019-03-26 20:27:28 +01:00
Bartosz Taudul
99fca9e069 Fix loading old traces when skipping locks. 2019-03-26 20:25:29 +01:00
Bartosz Taudul
fe675b91be Ditto for frame counts. 2019-03-26 20:19:56 +01:00
Bartosz Taudul
021368fb59 Display bin counts with separators. 2019-03-26 20:18:20 +01:00
Bartosz Taudul
df3e8597c4 Focusing timeline on crash from trace info window. 2019-03-24 23:55:38 +01:00
Bartosz Taudul
7792963e31 Interaction with crash label in options menu. 2019-03-24 23:52:36 +01:00
Bartosz Taudul
2f397c892b Middle click on crash label to center view on it. 2019-03-24 23:50:33 +01:00
Bartosz Taudul
f52c8e9855 Update manual. 2019-03-24 23:47:02 +01:00
Bartosz Taudul
bd838ac84a Update NEWS. 2019-03-24 13:55:12 +01:00
Bartosz Taudul
1c495f077b Allow changing display order of threads. 2019-03-24 13:54:36 +01:00
Bartosz Taudul
f7eca24e18 Use ordered thread vector in message list. 2019-03-24 13:41:14 +01:00
Bartosz Taudul
a633c50991 Use ordered threads vector in options. 2019-03-24 13:41:02 +01:00
Bartosz Taudul
e957590350 Mirror thread data in a reorderable vector. 2019-03-24 13:37:43 +01:00
Bartosz Taudul
6ad820a76a Display tooltip for plot point over limits. 2019-03-23 02:24:45 +01:00
Bartosz Taudul
532bf19efa Don't draw many illegible plot points. 2019-03-22 20:11:24 +01:00
Bartosz Taudul
e6baee2bf9 Reduce number of max plot probes per column. 2019-03-22 20:11:10 +01:00
Bartosz Taudul
ff6034dfbf Change label to us. 2019-03-22 18:54:47 +01:00
Bartosz Taudul
e879016ffa Add Lua callstack capture time measurement. 2019-03-22 14:47:08 +01:00
Bartosz Taudul
302ad87686 Fix typo. 2019-03-21 22:06:37 +01:00
Bartosz Taudul
94ed1c637c Try to check if cntcvt reads are monotonic.
https://lore.kernel.org/patchwork/patch/904607/
2019-03-21 21:59:51 +01:00
Bartosz Taudul
7f57b3dba9 Fallback to reading CLOCK_MONOTONIC_RAW, if available. 2019-03-21 21:49:23 +01:00
Bartosz Taudul
3ccb831efb Fix calculation of frame histogram data. 2019-03-21 21:30:08 +01:00
Bartosz Taudul
e79fa04a8b Don't fail when timer accuracy is low. 2019-03-21 21:24:07 +01:00
Bartosz Taudul
fa556d2d65 Use common access-and-insert pattern for VisData. 2019-03-19 22:12:24 +01:00
Bartosz Taudul
fddba168c6 Track next time to search for. 2019-03-18 19:39:37 +01:00
Bartosz Taudul
f530dfb0e9 Apply the same optimization for GPU zones. 2019-03-18 18:48:27 +01:00
Bartosz Taudul
94a1957338 Optimize zone skipping. 2019-03-18 18:42:58 +01:00
Bartosz Taudul
02db5f52d1 Pass nspx to zone drawing functions. 2019-03-18 18:40:03 +01:00
Bartosz Taudul
2931c83442 Lookup further at the beginning of the collapsed zones area. 2019-03-18 18:32:45 +01:00
Bartosz Taudul
e19f2f26e1 Optimize drawing collapsed CPU zones. 2019-03-18 18:24:27 +01:00
Bartosz Taudul
b5fce70f25 Fix rapid advancing to next frames. 2019-03-17 20:51:54 +01:00
Bartosz Taudul
5fb478a7df Update NEWS. 2019-03-17 17:22:17 +01:00
Bartosz Taudul
e034eabeb8 Animate plot ranges. 2019-03-17 17:21:30 +01:00
Bartosz Taudul
b6ccb9d686 Allocation times may be displayed relative to zone start. 2019-03-17 16:53:09 +01:00
Bartosz Taudul
d2cca5dc3f Allow custom time offset in memory allocation list. 2019-03-17 16:47:44 +01:00
Bartosz Taudul
f0aadfe066 Don't push the same zone on zone info stack multiple times. 2019-03-17 16:43:20 +01:00
Bartosz Taudul
06421cf5ca Always auto-resize memory allocation info window. 2019-03-17 16:39:27 +01:00
Bartosz Taudul
2f22776249 Update manual. 2019-03-17 16:35:50 +01:00
Bartosz Taudul
fdb06fdd1f Update NEWS. 2019-03-17 16:33:44 +01:00
Bartosz Taudul
4914ef6b14 Display zone messages in zone info window. 2019-03-17 16:33:18 +01:00
Bartosz Taudul
016f7ac4b6 Allow retrieval of zone's thread data. 2019-03-17 16:17:47 +01:00
Bartosz Taudul
b4bfdb7872 Dim information about no memory events. 2019-03-17 02:56:26 +01:00
Bartosz Taudul
17718b4d25 Fix asserts. 2019-03-16 20:36:06 +01:00
Bartosz Taudul
28dfa21fda Move conditional out of loop. 2019-03-16 14:46:21 +01:00
Bartosz Taudul
7e6a8135df Remove double indirection in GetNextLockEvent(). 2019-03-16 14:18:43 +01:00
Bartosz Taudul
6db1a9ccd4 Use lock thread ranges in lock tooltips. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
833151b868 Don't search for lock events outside of thread range. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
200621f952 Use lock ranges for early exclusion test. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
67f14be6aa Update lock ranges when loading trace. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
8ced8a457c Update thread time range on lock event insert. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
dc981550a1 Load lock event time to a variable. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
4d66317bc3 Add per-thread time ranges to lock maps. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
71e20e7e7f Store lock map as flat_hash_map with pointer values. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
5fbc14c487 Fix skipping plots in version >= 0.4.5. 2019-03-15 15:27:37 +01:00
Bartosz Taudul
b43d962194 Set labels for input text fields. 2019-03-15 02:35:27 +01:00
Bartosz Taudul
6a36bb2fc2 Add hints to input text fields. 2019-03-15 01:31:06 +01:00
Bartosz Taudul
476287b5f2 Update imguicolortextedit. 2019-03-15 01:06:27 +01:00
Bartosz Taudul
a10ec49a60 Don't use obsolete function. 2019-03-15 01:00:43 +01:00
Bartosz Taudul
98b4b69386 Update imgui to 1.69. 2019-03-15 00:55:53 +01:00
Bartosz Taudul
e6ca8fc75f Update NEWS. 2019-03-14 01:35:40 +01:00
Bartosz Taudul
5177629130 Add standard deviation explanation tooltips. 2019-03-14 01:34:50 +01:00
Bartosz Taudul
18e7b9df11 Add standard deviations to compare menu. 2019-03-14 01:32:50 +01:00
Bartosz Taudul
a0299cc63a Optimize calculation of standard deviation. 2019-03-14 01:23:37 +01:00
Bartosz Taudul
f57cac9042 Initialize SourceLocationZones in-place. 2019-03-14 01:15:19 +01:00
Bartosz Taudul
d3fdd6b1d1 Display standard deviation. 2019-03-14 01:14:06 +01:00
Bartosz Taudul
d64f07f853 Don't search for thread for empty timelines. 2019-03-14 01:10:57 +01:00
Bartosz Taudul
b7fe29f750 Offload timeline statistics update to a background thread. 2019-03-13 01:46:05 +01:00
Bartosz Taudul
737738ac73 Wait for source location zones in update tool.
Not really an issue, as it is build without full fledged statistics.
2019-03-13 01:28:42 +01:00
Bartosz Taudul
3b051b1119 Add callstack depth vs time plot. 2019-03-12 20:23:36 +01:00
Bartosz Taudul
01c7712c92 More extensive call stack capture times table. 2019-03-10 23:37:41 +01:00
Bartosz Taudul
935f69469b Call stack is limited to 62 frames on windows. 2019-03-10 23:37:27 +01:00
Bartosz Taudul
9563c8316d Optimize lock drawing.
Don't iterate over locks that are present in only one thread, if only
contended lock events are to be displayed. Such locks cannot be
contended.
2019-03-09 14:20:34 +01:00
Bartosz Taudul
e8d9de2f77 Briefly describe contributor's work. 2019-03-09 12:36:54 +01:00
Bartosz Taudul
cbfd524b6c Set sane messages window column widths. 2019-03-09 00:37:58 +01:00
Bartosz Taudul
815d7fdcb4 Set sane callstack window column widths. 2019-03-09 00:34:04 +01:00
Bartosz Taudul
5445ffb149 Set sane statistics window column widths. 2019-03-09 00:30:53 +01:00
Bartosz Taudul
8fc0727d54 Update NEWS. 2019-03-09 00:16:14 +01:00
Bartosz Taudul
0748655797 Allow opening source file view from statistics menu. 2019-03-09 00:15:23 +01:00
Bartosz Taudul
761a08b055 Dim location in statistics menu. 2019-03-09 00:08:57 +01:00
Bartosz Taudul
9fd8a20d7c Use small checkbox in appropriate places. 2019-03-08 18:39:41 +01:00
Bartosz Taudul
e004dc85a9 Display waiting dots in "waiting for connection" window. 2019-03-07 17:00:40 +01:00
Bartosz Taudul
f69f9d4660 Disable window transparency. 2019-03-07 01:18:24 +01:00
Bartosz Taudul
535d7b2da1 Add waiting dots to statistics menu. 2019-03-07 00:59:43 +01:00
Bartosz Taudul
aa054f1f46 Add waiting dots to compare traces menu. 2019-03-07 00:59:02 +01:00
Bartosz Taudul
6e4bc7d9c5 Add waiting dots to memory data in zone info window. 2019-03-07 00:57:32 +01:00
Bartosz Taudul
d547700e50 Update time in a common location. 2019-03-07 00:57:25 +01:00
Bartosz Taudul
d0d7131e35 Properly restore threadMap.
This fixes thread ids returned by CompressThread in loaded traces. The
bug was manifesting by not displaying memory events in zone info window.
2019-03-07 00:49:25 +01:00
Bartosz Taudul
07bcca9dc0 Don't pre-fill threadExpand, if not needed. 2019-03-07 00:49:06 +01:00
Bartosz Taudul
f2f19241e6 Display waiting dots in find zone menu during precompute. 2019-03-06 18:25:39 +01:00
Bartosz Taudul
d5914d2e7b Extract drawing waiting dots. 2019-03-06 18:16:21 +01:00
Bartosz Taudul
a4740c1b1c Add animation to loading progress window. 2019-03-06 02:49:21 +01:00
Bartosz Taudul
cee625b375 Animate frame selection expansion. 2019-03-06 01:45:39 +01:00
Bartosz Taudul
4b1c0ff0c5 Fix frame selection when zoom anim is active. 2019-03-06 01:45:26 +01:00
Bartosz Taudul
a26d0bf2b4 Update NEWS. 2019-03-06 01:17:40 +01:00
Bartosz Taudul
00de21f7e7 Smooth zooming on mouse scroll. 2019-03-06 01:15:38 +01:00
Bartosz Taudul
dc2743bcc0 Manual refinements. 2019-03-05 23:39:36 +01:00
Bartosz Taudul
17fb589415 Try dladdr() resolution if libbacktrace fails. 2019-03-05 20:43:47 +01:00
Bartosz Taudul
49f1277e55 Cast void* to char*. 2019-03-05 20:20:55 +01:00
Bartosz Taudul
3a7ed53c5c Update manual. 2019-03-05 20:09:33 +01:00
Bartosz Taudul
b45224d8de Update NEWS. 2019-03-05 19:50:08 +01:00
Bartosz Taudul
171434075d Group call stack improvements together. 2019-03-05 19:49:43 +01:00
Bartosz Taudul
364c20a771 Correct parameter number. 2019-03-05 19:48:02 +01:00
Bartosz Taudul
e798f0f28d Don't send lua callstacks if platform doesn't support callstacks. 2019-03-05 19:47:31 +01:00
Bartosz Taudul
bc87762012 Merge native callstack with the allocated one. 2019-03-05 19:44:08 +01:00
Bartosz Taudul
ebf09bebae Only one callstack may be in-flight at any time.
Save for the allocated callstack, but this will be solved in another
way. There's no need to keep pending callstacks in a map.
2019-03-05 19:44:08 +01:00
Bartosz Taudul
afe2fad1a7 Send native callstack before allocated one. 2019-03-05 19:18:43 +01:00
Bartosz Taudul
4509412efb Fast callstack retrieval for linux. 2019-03-05 18:56:39 +01:00
Bartosz Taudul
1bbf296351 Use fast callstack frame decoding to cut callstack. 2019-03-05 02:42:51 +01:00
Bartosz Taudul
cb62b63fe2 Fast callstack frame decoder.
Returns only function name, doesn't retrieve inlined functions, doesn't
perform demangling.
2019-03-05 02:42:51 +01:00
Bartosz Taudul
b11f932078 Cut lua callstack at lua_pcall. 2019-03-05 02:42:51 +01:00
Bartosz Taudul
ec73178733 Move callstack cutting to a separate function. 2019-03-05 02:42:51 +01:00
Bartosz Taudul
b37a9055d5 Force inline SendLuaCallstack(). 2019-03-05 02:42:51 +01:00
Bartosz Taudul
3e81e44d75 Separate processing for allocated callstacks.
Only native callstack is used for the moment.
2019-03-05 02:42:50 +01:00
Bartosz Taudul
d229c1bc1b Send native callstack along with allocated callstack. 2019-03-05 02:42:50 +01:00
Bartosz Taudul
e13286936c Mark templated functions inline. 2019-03-03 22:09:20 +01:00
Bartosz Taudul
f6913eecf0 Don't display custom stack frames as pointers. 2019-03-03 18:20:55 +01:00
Bartosz Taudul
dc74297439 Add missing const qualifiers. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
bef31ba073 Separate message for zone begin with alloc src loc and callstack. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
66b8a13e77 Store callstack alloc payloads. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
9fc022346b Replace frame pointers with callstack frame ids. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
664847211c Pack/unpack frame pointer to callstack frame id. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
1feedb17ac Add callstack frame identifier and the required plumbing. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
cf8d17c2ec Move VarArray hash calculation to a separate function. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
e3c31e4a4e Send callstack alloc payload. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
b8501956f9 Lua callstack sending. 2019-03-03 18:04:02 +01:00
Bartosz Taudul
2c16bdd538 Don't go out of tracy namespace. 2019-03-03 01:59:24 +01:00
Bartosz Taudul
c0d0d0d42b Display keys as keys in manual. 2019-03-01 02:08:57 +01:00
Bartosz Taudul
fc63b6b07d Display trace version in trace info window. 2019-03-01 01:47:36 +01:00
Bartosz Taudul
6065d25335 Extend list of tracy callstack frames. 2019-03-01 01:41:10 +01:00
Bartosz Taudul
e284248995 Fix display of last message. 2019-03-01 01:30:56 +01:00
Bartosz Taudul
8fd09fe8f0 Get proper width. 2019-03-01 01:20:20 +01:00
Bartosz Taudul
42c94f7142 Handle discontinuous frame end mismatch failure. 2019-02-28 19:32:42 +01:00
Bartosz Taudul
d80dc82b96 Don't display invalid thread in failure dialog. 2019-02-28 19:31:45 +01:00
Bartosz Taudul
06f83bbdef Update manual. 2019-02-28 19:25:30 +01:00
Bartosz Taudul
d863245b49 Serialize discontinuous frame messages. 2019-02-28 19:21:23 +01:00
Bartosz Taudul
f15bfb88a3 Update manual. 2019-02-27 21:34:00 +01:00
Bartosz Taudul
b4daad684c Display frame numbers in zone trace. 2019-02-27 21:12:56 +01:00
Bartosz Taudul
8d784f7fae Display inline frames in all call stacks. 2019-02-27 20:55:58 +01:00
Bartosz Taudul
38c4909d96 Update NEWS. 2019-02-27 20:38:13 +01:00
Bartosz Taudul
422ed1f452 Special mode for callstack grouping in find zone menu. 2019-02-27 20:37:38 +01:00
Bartosz Taudul
851ae9077b Make small callstack button tooltip optional. 2019-02-27 19:59:49 +01:00
Bartosz Taudul
e20e7caab0 Increase size of frame left/right buttons. 2019-02-27 19:58:44 +01:00
Bartosz Taudul
d08ea6e2bd Update NEWS. 2019-02-25 15:12:17 +01:00
Bartosz Taudul
b89db6e926 Don't send CPU usage data when there's no readings. 2019-02-25 15:11:35 +01:00
Bartosz Taudul
963d2b3ca8 CPU usage getter for apple. 2019-02-25 15:04:06 +01:00
Bartosz Taudul
85f29a0f22 Collect system time before server connection is made. 2019-02-24 19:12:17 +01:00
Bartosz Taudul
3c77ad7179 Update NEWS. 2019-02-24 19:04:49 +01:00
Bartosz Taudul
bafc8a1330 Implement getting CPU usage in linux. 2019-02-24 19:02:49 +01:00
Bartosz Taudul
0c75a5178c Update NEWS. 2019-02-24 18:42:34 +01:00
Bartosz Taudul
06dddd15b4 Update manual. 2019-02-24 18:41:47 +01:00
Bartosz Taudul
e9aeb0c522 Darken timeline outside of capture area. 2019-02-24 18:35:03 +01:00
Bartosz Taudul
fb96d60256 Adjust last time in mem alloc/free and sys time messages. 2019-02-24 18:26:23 +01:00
Bartosz Taudul
4610f79b94 Update manual. 2019-02-24 17:49:20 +01:00
Bartosz Taudul
9f621bf67f Improve lock label tooltips. 2019-02-24 17:44:44 +01:00
Bartosz Taudul
c78aedae62 Zoom-to-range for lock labels. 2019-02-24 17:30:58 +01:00
Bartosz Taudul
021d369b80 Fix calculation of thread lock extent. 2019-02-24 17:30:18 +01:00
Bartosz Taudul
271d7ccaa3 Bring plot tooltip up-to-par. 2019-02-24 17:01:46 +01:00
Bartosz Taudul
bf2ecbae36 Middle-click on thread label to zoom to thread extent. 2019-02-24 17:01:46 +01:00
Bartosz Taudul
62162d4cdb Display count of messages and locks in thread tooltip. 2019-02-24 17:01:46 +01:00
Bartosz Taudul
6dceec4ea8 Improve thread tooltip.
- In addition to the first event time, also display last one and the
  time period.
- Include messages and locks in thread events discovery.
2019-02-24 17:01:46 +01:00
Bartosz Taudul
a0b5ac33cc Always show label of a crashed thread. 2019-02-23 01:34:45 +01:00
Bartosz Taudul
ba769ab23a Update manual. 2019-02-23 01:32:06 +01:00
Bartosz Taudul
ccabba58d9 Update NEWS. 2019-02-23 01:28:57 +01:00
Bartosz Taudul
af3eb93e4a Hide tracks that don't have anything to display. 2019-02-23 01:28:12 +01:00
Bartosz Taudul
7ab326c4fe Don't clip area above track. 2019-02-23 00:59:09 +01:00
Bartosz Taudul
5f3f6d18bb Cosmetics. 2019-02-23 00:37:48 +01:00
Bartosz Taudul
53992b9b6b Don't hide hex thread id in tooltip. 2019-02-23 00:34:01 +01:00
Bartosz Taudul
29fcddca0b Display frame count in frame type selection dropdown. 2019-02-22 21:07:33 +01:00
Bartosz Taudul
ae53c8e6f0 Don't display threads with no messages. 2019-02-22 21:07:33 +01:00
Bartosz Taudul
48b5b25a6a Display count of messages in threads. 2019-02-22 21:07:33 +01:00
Bartosz Taudul
2ee86ef126 Display bigger program name in welcome dialog. 2019-02-22 02:44:41 +01:00
Bartosz Taudul
9bd13b02e9 Small string changes in the welcome dialog. 2019-02-22 02:41:13 +01:00
Bartosz Taudul
81c2515199 Bump protocol version. 2019-02-21 23:25:43 +01:00
Bartosz Taudul
542d081027 Update manual. 2019-02-21 23:24:18 +01:00
Bartosz Taudul
06fcfdd493 Update NEWS. 2019-02-21 23:11:49 +01:00
Bartosz Taudul
0b9fa8f3c8 Track CPU usage also on cygwin. 2019-02-21 23:11:09 +01:00
Bartosz Taudul
d0c1b9bf67 Proper formatting of plot values. 2019-02-21 23:07:32 +01:00
Bartosz Taudul
e190faa7e1 Save/load CPU usage plot. 2019-02-21 22:56:59 +01:00
Bartosz Taudul
e9baa80bf3 Process CPU usage reports. 2019-02-21 22:56:59 +01:00
Bartosz Taudul
9f4f5bcb63 CPU usage retrieval. 2019-02-21 22:45:53 +01:00
Bartosz Taudul
c8fb3f24cc Update NEWS. 2019-02-21 21:19:53 +01:00
Bartosz Taudul
f1dd4ef3d9 Animate thread position and height. 2019-02-21 21:18:41 +01:00
Bartosz Taudul
e945902f40 Merge visibility and show full options into one struct. 2019-02-21 20:24:08 +01:00
Bartosz Taudul
bc713463d8 Improve zooming animation. 2019-02-21 20:00:29 +01:00
Bartosz Taudul
b33c61cead Update imgui to 1.68. 2019-02-21 18:40:27 +01:00
Bartosz Taudul
938d8ce69e Properly initialize demangled pointer. 2019-02-21 15:04:17 +01:00
Bartosz Taudul
44009b6fda Use mach_absolute_time() to get time on iOS. 2019-02-21 14:45:13 +01:00
Bartosz Taudul
e839a3153f Just use getprogname(). 2019-02-21 11:40:56 +01:00
Bartosz Taudul
c4d46f1c24 No libproc.h on iOS. 2019-02-21 11:33:45 +01:00
Till Rathmann
9d7c4a2861 Merged in tillrathmann/tracy (pull request #33)
Fixed DLL support
2019-02-20 17:24:12 +00:00
Till Rathmann
29140afe0c Fixed compiler warnings. 2019-02-20 17:50:49 +01:00
Till Rathmann
77abc3bffd Fixed DLL support. 2019-02-20 16:15:13 +01:00
Bartosz Taudul
5ddc605dd6 Update manual. 2019-02-20 16:04:30 +01:00
Bartosz Taudul
22329ae5d9 Collect call stacks on apple. 2019-02-20 16:01:41 +01:00
Bartosz Taudul
34d24b16bb Retrieve memory size on apple. 2019-02-20 13:52:55 +01:00
Bartosz Taudul
9c966b6224 Process name retrieval on apple. 2019-02-20 13:13:29 +01:00
Bartosz Taudul
8f75839d66 Fix apple target detection. 2019-02-20 12:43:48 +01:00
Bartosz Taudul
0fd3025a7e Fix typo. 2019-02-20 02:51:03 +01:00
Bartosz Taudul
5afadcb11d Fix if condition. 2019-02-19 21:51:41 +01:00
Bartosz Taudul
a75d602f6e Update manual. 2019-02-19 21:37:40 +01:00
Bartosz Taudul
34695fca60 Update README. 2019-02-19 20:46:17 +01:00
Bartosz Taudul
4048ebf7c5 Update NEWS. 2019-02-19 20:44:55 +01:00
Bartosz Taudul
ef5e30056e Implement delayed initialization of the profiler.
Enabled on osx, ios.
2019-02-19 20:43:30 +01:00
Bartosz Taudul
d560f7a203 Cosmetics. 2019-02-19 19:36:30 +01:00
Bartosz Taudul
3f914834b7 Hide rest of statics. 2019-02-19 19:33:37 +01:00
Bartosz Taudul
9fabafbeca Fix DLL code. 2019-02-19 18:46:59 +01:00
Bartosz Taudul
2421e05c27 Prevent direct access to s_profiler. 2019-02-19 18:38:08 +01:00
Bartosz Taudul
d865d1cc87 Disallow direct access to s_token. 2019-02-19 18:27:00 +01:00
Bartosz Taudul
44753dd4ac thread_local implies static. 2019-02-19 16:52:05 +01:00
Bartosz Taudul
3a562ae6c9 Fix display of unresolved call stack frames. 2019-02-19 16:37:34 +01:00
Bartosz Taudul
9040953e13 Proper constructors are needed. 2019-02-18 14:57:46 +01:00
Bartosz Taudul
c9c4d2845a Provide empty {Gpu,Vk}CtxScope classes if tracy is disabled.
This may be needed if some wrapping is done, abstracting the OpenGL and
Vulkan tracing.
2019-02-18 14:45:09 +01:00
Bartosz Taudul
081b1069f6 Properly count number of locks in options menu. 2019-02-17 17:19:17 +01:00
Bartosz Taudul
9dd1d6a744 Don't display locks with no lock events. 2019-02-17 17:13:20 +01:00
Bartosz Taudul
062058315e Update manual. 2019-02-17 17:10:18 +01:00
Bartosz Taudul
a66f1e2135 Update NEWS. 2019-02-17 17:07:35 +01:00
Bartosz Taudul
a2819baa35 Split locks as single and multithreaded in options menu. 2019-02-17 17:06:39 +01:00
Bartosz Taudul
5cc738593f Fix drawing lock highlight. 2019-02-17 16:57:52 +01:00
Bartosz Taudul
92a7e02e73 Highlight locks hovered in the options menu. 2019-02-17 16:53:33 +01:00
Bartosz Taudul
bec27f7d60 Handle highlighting lock in fast-exit code path. 2019-02-17 16:49:18 +01:00
Bartosz Taudul
1e32821097 Move drawing lock header to a separate function. 2019-02-17 16:49:03 +01:00
Bartosz Taudul
d2a39e29e1 Update manual. 2019-02-17 16:35:10 +01:00
Bartosz Taudul
f4fc604845 Update NEWS. 2019-02-17 16:23:07 +01:00
Bartosz Taudul
ea4f4ebb3a Highlight selected/hovered lock. 2019-02-17 16:20:56 +01:00
Bartosz Taudul
d4fb6fde2b Fix printf type. 2019-02-17 00:29:01 +01:00
Bartosz Taudul
4422fce55c Don't decompress GpuZone threads while saving trace.
Saving the threads compressed in GPU zones and memory event has the
following result on trace dump sizes:

043/aa.tracy (0.4.3) {10055 KB} -> 044/aa.tracy (0.4.4) {9975 KB}  99.20% size change
043/android.tracy (0.4.3) {542739 KB} -> 044/android.tracy (0.4.4) {519248 KB}  95.67% size change
043/asset-new.tracy (0.4.3) {78403 KB} -> 044/asset-new.tracy (0.4.4) {75899 KB}  96.81% size change
043/asset-new-id.tracy (0.4.3) {84341 KB} -> 044/asset-new-id.tracy (0.4.4) {81771 KB}  96.95% size change
043/asset-old.tracy (0.4.3) {80688 KB} -> 044/asset-old.tracy (0.4.4) {78410 KB}  97.18% size change
043/big.tracy (0.4.3) {939577 KB} -> 044/big.tracy (0.4.4) {938427 KB}  99.88% size change
043/callstack.tracy (0.4.3) {14557 KB} -> 044/callstack.tracy (0.4.4) {14465 KB}  99.37% size change
043/callstack-linux.tracy (0.4.3) {6949 KB} -> 044/callstack-linux.tracy (0.4.4) {6942 KB}  99.90% size change
043/crash.tracy (0.4.3) {131 KB} -> 044/crash.tracy (0.4.4) {127 KB}  97.10% size change
043/crash2.tracy (0.4.3) {1422 KB} -> 044/crash2.tracy (0.4.4) {1412 KB}  99.29% size change
043/darkrl.tracy (0.4.3) {15767 KB} -> 044/darkrl.tracy (0.4.4) {15663 KB}  99.34% size change
043/darkrl2.tracy (0.4.3) {7947 KB} -> 044/darkrl2.tracy (0.4.4) {7886 KB}  99.23% size change
043/darkrl-old.tracy (0.4.3) {67448 KB} -> 044/darkrl-old.tracy (0.4.4) {67004 KB}  99.34% size change
043/deadlock.tracy (0.4.3) {5984 KB} -> 044/deadlock.tracy (0.4.4) {5986 KB}  100.03% size change
043/gn-opengl.tracy (0.4.3) {29005 KB} -> 044/gn-opengl.tracy (0.4.4) {28885 KB}  99.59% size change
043/gn-vulkan.tracy (0.4.3) {29352 KB} -> 044/gn-vulkan.tracy (0.4.4) {29257 KB}  99.68% size change
043/long.tracy (0.4.3) {1182800 KB} -> 044/long.tracy (0.4.4) {1176584 KB}  99.47% size change
043/mem.tracy (0.4.3) {1369067 KB} -> 044/mem.tracy (0.4.4) {1262406 KB}  92.21% size change
043/multi.tracy (0.4.3) {8004 KB} -> 044/multi.tracy (0.4.4) {7944 KB}  99.24% size change
043/new.tracy (0.4.3) {1108 KB} -> 044/new.tracy (0.4.4) {1099 KB}  99.18% size change
043/q3bsp-mt.tracy (0.4.3) {949855 KB} -> 044/q3bsp-mt.tracy (0.4.4) {937574 KB}  98.71% size change
043/q3bsp-st.tracy (0.4.3) {240347 KB} -> 044/q3bsp-st.tracy (0.4.4) {230092 KB}  95.73% size change
043/selfprofile.tracy (0.4.3) {197708 KB} -> 044/selfprofile.tracy (0.4.4) {197659 KB}  99.98% size change
043/tbrowser.tracy (0.4.3) {9503 KB} -> 044/tbrowser.tracy (0.4.4) {9503 KB}  100.00% size change
043/test.tracy (0.4.3) {40700 KB} -> 044/test.tracy (0.4.4) {40699 KB}  100.00% size change
043/virtualfile_hc.tracy (0.4.3) {72424 KB} -> 044/virtualfile_hc.tracy (0.4.4) {72304 KB}  99.83% size change
043/zfile_hc.tracy (0.4.3) {39419 KB} -> 044/zfile_hc.tracy (0.4.4) {39328 KB}  99.77% size change
2019-02-17 00:29:01 +01:00
Bartosz Taudul
760e9105d0 Don't decompress memory thread data while saving trace. 2019-02-17 00:27:41 +01:00
Bartosz Taudul
9ee494c0f4 Store thread compression layout in trace dump. 2019-02-16 22:48:29 +01:00
Bartosz Taudul
d030674b83 Simplify loading memory events. 2019-02-16 22:32:14 +01:00
Bartosz Taudul
569a9fb9be Change order of file version checks during loading memory events. 2019-02-16 22:26:50 +01:00
Bartosz Taudul
88b7961421 Allocate memory for all zones at the current level at once. 2019-02-16 20:53:07 +01:00
Bartosz Taudul
470600fbc2 Don't thrash memory bandwith during file load. 2019-02-16 20:42:50 +01:00
Bartosz Taudul
c127f51767 Load time offsets to scratch buffers. 2019-02-15 02:46:25 +01:00
Bartosz Taudul
8fd685c877 Properly track memory usage in slab allocator. 2019-02-15 02:28:31 +01:00
Bartosz Taudul
23d12d2633 Allocate new block, if we're at the end of current one. 2019-02-15 02:04:37 +01:00
Bartosz Taudul
7b023e533d Use big allocation mode for Vector's reserve_exact. 2019-02-15 01:59:33 +01:00
Bartosz Taudul
930190f2cb Support big allocations in slab allocator. 2019-02-15 01:59:33 +01:00
Bartosz Taudul
1cefd4d8ac Don't use reserve_exact for temporary things. 2019-02-15 01:43:30 +01:00
Bartosz Taudul
127be8e995 GpuEvent doesn't need init. 2019-02-15 01:31:58 +01:00
Bartosz Taudul
e8d15e8295 Mirror zone child grouping for GPU zones. 2019-02-14 01:38:34 +01:00
Bartosz Taudul
e24ac42755 Add self time to GPU zone info window. 2019-02-14 01:31:06 +01:00
Bartosz Taudul
0fad23dbae Add GPU zone self time in tooltip. 2019-02-14 01:28:27 +01:00
Bartosz Taudul
f06609eb61 GPU child zones time getter. 2019-02-14 01:28:12 +01:00
Bartosz Taudul
92c1420c30 Improve handling of post-load background jobs.
Background tasks (source location zones sorting, reconstruction of
memory plot) are now started only after trace loading is finished.
Multithreaded sorting was previously impacting trace load times.

Only one thread is used to perform both jobs, one after another.
2019-02-14 01:17:37 +01:00
Bartosz Taudul
080873003b Simplify support for 0.2 traces. 2019-02-14 01:13:11 +01:00
Bartosz Taudul
bd1c1d044b Force inline read/write time offset functions. 2019-02-14 00:17:50 +01:00
Bartosz Taudul
631f81e9dc Use Vector to store children data instead of std::vector. 2019-02-13 02:32:25 +01:00
Bartosz Taudul
40d0c72982 Use memcpy and memset instead of per-element copy and zero. 2019-02-13 02:23:56 +01:00
Bartosz Taudul
d854998856 Support non-trivially-copyable items in Vector. 2019-02-13 02:20:31 +01:00
Bartosz Taudul
08642d034b Preserve string length in string map. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
17e1894034 Add specialized string key for hash map. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
ec37f59c14 Replace manual comparison with memcmp. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
e4e20b47ca Handle dropped connection in capture utility. 2019-02-12 11:13:53 +01:00
Bartosz Taudul
d32c070a9e Two more places where connection can silently drop. 2019-02-12 11:07:12 +01:00
Bartosz Taudul
7f11260bf0 Handle dropped connection during handshake. 2019-02-12 01:41:09 +01:00
Bartosz Taudul
8717fe5730 Window position may be negative. 2019-02-12 01:26:14 +01:00
Bartosz Taudul
e254f049a5 Update manual. 2019-02-10 17:33:39 +01:00
Bartosz Taudul
c22d7f9b62 Update NEWS. 2019-02-10 17:25:19 +01:00
Bartosz Taudul
147b31f014 Implement grouping children zones. 2019-02-10 17:21:01 +01:00
Bartosz Taudul
76186f3221 Allow zone name retrieval from source location. 2019-02-10 16:45:19 +01:00
Bartosz Taudul
48c721c4b9 Fix natvis display of exact reserved vector's capacity. 2019-02-10 16:36:09 +01:00
Bartosz Taudul
740486a0ce Add children locations grouping button. 2019-02-10 16:14:13 +01:00
Bartosz Taudul
b7bd3696b7 Do not draw time subdividers on a nanosecond scale. 2019-02-10 16:04:04 +01:00
Bartosz Taudul
c7e64bb8a8 Replace select() with poll(). 2019-02-10 15:45:23 +01:00
Bartosz Taudul
d18c3432a4 Fix call stack window. 2019-02-10 13:38:14 +01:00
Bartosz Taudul
2d50664180 Use multiply instead of divide. 2019-02-10 13:01:16 +01:00
Bartosz Taudul
f1940aab2e Use help marker helper function. 2019-02-10 03:02:54 +01:00
Bartosz Taudul
96e38501b6 Use unformatted text drawing where possible. 2019-02-10 02:50:34 +01:00
Bartosz Taudul
ecdb672130 Add simple checks against invalid window position. 2019-02-10 02:11:59 +01:00
Bartosz Taudul
3a8abdf9c1 Integer time specialization is not needed anymore. 2019-02-10 01:14:34 +01:00
Bartosz Taudul
2ad0258925 Don't print trailing zeros in fractions (e.g. 2.5 instead of 2.50). 2019-02-10 01:12:22 +01:00
Bartosz Taudul
af16872693 Don't display fractional part if it's 0. 2019-02-10 01:03:35 +01:00
Bartosz Taudul
e4f4fee6d4 Optimize printing days. 2019-02-10 01:02:57 +01:00
Bartosz Taudul
ee66b1354d IntTable10 is not needed. 2019-02-10 00:51:13 +01:00
Bartosz Taudul
d940e315bd Optimize TimeToString(). 2019-02-08 22:11:06 +01:00
Bartosz Taudul
3c4394489c Workaround GCC bug #67274.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67274
2019-02-08 11:54:29 +01:00
Bartosz Taudul
a47202b9ac Let's try building on ubuntu 1804. 2019-02-08 02:38:26 +01:00
Bartosz Taudul
0a03f25c9f Add Dedmen Miller to AUTHORS. 2019-02-08 02:37:30 +01:00
Bartosz Taudul
053932249c Style fixes. 2019-02-08 02:29:24 +01:00
Dedmen Miller (Dedmenmiller)
8fb6c0dfcb Merged in dedmenmiller/tracy/findZoneSorting (pull request #31)
Add sorting for findZone zonelist
2019-02-08 00:53:48 +00:00
Dedmen Miller
ab0dc0da11 Use memcpy 2019-02-07 16:10:28 +01:00
Dedmen Miller (Dedmenmiller)
bfdba8c2a5 Merged in dedmenmiller/tracy/cleanerTimeToString (pull request #32)
Cleaner TimeToString
2019-02-07 14:13:52 +00:00
Dedmen Miller
59ae188a7f Cleanup 2019-02-07 14:51:34 +01:00
Dedmen Miller
7361d696c5 Return proper buf 2019-02-07 14:38:42 +01:00
Dedmen Miller
bfa5386bbe Cleanup 2019-02-07 14:36:33 +01:00
Dedmen Miller
e4ef491fdf Cleaner TimeToString 2019-02-07 13:14:52 +01:00
Dedmen Miller
92c872dfc0 Added sorting for findZone zonelist 2019-02-07 12:25:05 +01:00
Bartosz Taudul
0e6350d95e Grouping by function names is a more sane default. 2019-02-06 23:09:38 +01:00
Bartosz Taudul
90c1428aac Update manual. 2019-02-06 23:05:58 +01:00
Bartosz Taudul
f18aa9c33f Update NEWS. 2019-02-06 22:39:15 +01:00
Bartosz Taudul
b945f83169 Don't separate inclusive/exclusive counts.
There is no way for one frame to have both. Coloring is preserved and is
now determined by presence of children.
2019-02-06 22:36:21 +01:00
Bartosz Taudul
1953a1a1d5 Notify user about pitfalls of function name grouping. 2019-02-06 22:02:59 +01:00
Bartosz Taudul
70ea9e7712 Implement grouping call stack tree by function names. 2019-02-06 21:56:49 +01:00
Bartosz Taudul
044b7e1522 Add function name grouping controls. 2019-02-06 21:45:26 +01:00
Bartosz Taudul
7aa24864bf Make it easier to add new matches against tracy own stack frames. 2019-02-06 21:07:41 +01:00
Bartosz Taudul
104415ced8 Display base frame, not inline frame, if inlines are not shown. 2019-02-06 14:17:18 +01:00
Bartosz Taudul
bb4d390bc7 Update manual. 2019-02-06 14:03:54 +01:00
Bartosz Taudul
bb8002ec08 Update NEWS. 2019-02-06 13:54:23 +01:00
Bartosz Taudul
c2e9c00a38 Add top-down call stack memory tree. 2019-02-06 13:53:14 +01:00
Bartosz Taudul
c689a494da Move call stack paths calculation to a separate function. 2019-02-06 13:46:50 +01:00
Bartosz Taudul
dbf8115771 Same for linux. 2019-02-04 02:33:03 +01:00
Bartosz Taudul
4dc05195ca Skip internal call stack capture inline frames for MSVC. 2019-02-04 02:27:13 +01:00
Bartosz Taudul
9dd869a5eb Fix call stacks on cygwin. 2019-02-02 13:58:17 +01:00
Bartosz Taudul
e801943b90 Array index is changing here. 2019-01-31 18:37:59 +01:00
Bartosz Taudul
52a7f3a39a Update manual. 2019-01-30 01:56:31 +01:00
Bartosz Taudul
b2c57151a6 Update NEWS. 2019-01-30 01:54:52 +01:00
Bartosz Taudul
b0d319890b Allow sorting find zone groups by mean time per call. 2019-01-30 01:54:18 +01:00
Bartosz Taudul
c5fd347401 Initialize variable. 2019-01-29 23:18:36 +01:00
Bartosz Taudul
89ddfd0006 Remove dead code. 2019-01-29 23:18:36 +01:00
Bartosz Taudul
653caf159f Assign return value only once. 2019-01-29 22:21:01 +01:00
Bartosz Taudul
852fe03cbc More references. 2019-01-29 22:10:14 +01:00
Bartosz Taudul
5e3390894d Use preincrementation for iterators. 2019-01-29 22:01:47 +01:00
Bartosz Taudul
d6c616848c Use reference instead of repeated deep dereferences. 2019-01-29 21:59:52 +01:00
Bartosz Taudul
1585be7ff3 Rearrange Socket to reduce struct size. 2019-01-29 21:56:10 +01:00
Bartosz Taudul
b7fd0bdc9c Use proper type. 2019-01-29 21:53:56 +01:00
Bartosz Taudul
1b3f10148d Fix logic snafu. 2019-01-29 21:46:14 +01:00
Bartosz Taudul
a708bebbfd Use language neutral header for callstack capability detection.
This fixes call stack collection in C API when TRACY_CALLSTACK is
defined.
2019-01-27 13:41:32 +01:00
Bartosz Taudul
16b398ffeb Update manual. 2019-01-27 00:22:25 +01:00
Bartosz Taudul
01bddf95a6 Trace inline function calls on MSVC call stacks. 2019-01-26 23:50:58 +01:00
Bartosz Taudul
d86e36cc62 Fix progress of loading CPU zones. 2019-01-26 22:18:07 +01:00
Bartosz Taudul
39680ad315 Boost lock loading time. 2019-01-24 22:44:09 +01:00
Bartosz Taudul
606a4502e0 Fix MSVC build.
"SIGINT is not supported for any Win32 application."
2019-01-24 20:04:08 +01:00
Bartosz Taudul
901d690d55 Update manual. 2019-01-24 19:27:14 +01:00
Bartosz Taudul
f83df89a58 Update NEWS. 2019-01-24 19:14:10 +01:00
Bartosz Taudul
34c9cf512e Disconnect on ^C in capture utility. 2019-01-24 19:13:09 +01:00
Bartosz Taudul
66a5e06803 Allow disconnecting from a client. 2019-01-24 19:00:34 +01:00
Bartosz Taudul
922993cbbb Add placeholder worker disconnect command. 2019-01-24 18:51:55 +01:00
Dedmen Miller (Dedmenmiller)
f474669ab5 Merged in dedmenmiller/tracy-1/dedmenmiller/fixed-offset-in-histogram-with-nonlog-ti-1548339368610 (pull request #30)
Fixed offset in histogram with non-log time
2019-01-24 15:03:22 +00:00
Dedmen Miller (Dedmenmiller)
e83e63caa4 Fix other lines 2019-01-24 15:02:36 +00:00
Dedmen Miller (Dedmenmiller)
72966a24a3 Fixed offset in histogram with non-log time 2019-01-24 14:16:23 +00:00
Bartosz Taudul
c67d91c6ac Display numerical thread id in thread tooltip. 2019-01-23 18:15:19 +01:00
Bartosz Taudul
71f1a0b31e Display self time percentage in find zone menu. 2019-01-23 18:11:47 +01:00
Bartosz Taudul
56b530e99c Fix tooltip active area. 2019-01-23 18:04:31 +01:00
Bartosz Taudul
5019b2e507 Update manual. 2019-01-23 14:31:02 +01:00
Bartosz Taudul
0dc323a74e Update NEWS. 2019-01-23 14:26:31 +01:00
Bartosz Taudul
7f015b1b24 Implement self time in find zone menu. 2019-01-23 14:25:45 +01:00
Bartosz Taudul
92766430d9 Add "self time" checkbox to find zone menu. 2019-01-23 14:25:28 +01:00
Bartosz Taudul
42af2d14cc Calculate self min and max times of source location zones. 2019-01-23 14:24:22 +01:00
Bartosz Taudul
118fab1561 Fast version of zone child time getter.
This one can only be used when all child zones are properly ended.
2019-01-23 13:59:14 +01:00
Bartosz Taudul
3d2cc2d54d Display zone self time. 2019-01-23 13:44:11 +01:00
Bartosz Taudul
06292f1a3f Add zone child time getter. 2019-01-23 13:39:44 +01:00
Bartosz Taudul
ef17699887 Fix order of inline and base subframes. 2019-01-21 17:12:01 +01:00
Bartosz Taudul
c4f755e77b Update manual. 2019-01-20 19:44:07 +01:00
Bartosz Taudul
07ff01f4dd Update NEWS. 2019-01-20 19:35:25 +01:00
Bartosz Taudul
49b0a3500d Enable tracing incline functions in callstacks. 2019-01-20 19:33:37 +01:00
Bartosz Taudul
ddad475c19 Make it possible to store multiple frames at single frame address. 2019-01-20 19:11:48 +01:00
Bartosz Taudul
b9dc9f043c Make nohash operator() const. 2019-01-20 18:41:26 +01:00
Bartosz Taudul
bf7cc0a0d5 Add missing header for PRIxMAX. 2019-01-20 17:17:09 +01:00
Bartosz Taudul
481024e4ce Add libbacktrace to list of libraries in manual. 2019-01-20 17:08:56 +01:00
Bartosz Taudul
ed69b04cee Update NEWS. 2019-01-20 16:57:31 +01:00
Bartosz Taudul
9e7714c45a Decode callstack frames using libbacktrace. 2019-01-20 16:55:59 +01:00
Bartosz Taudul
79a5a860a5 Compile with libbacktrace on linux. 2019-01-20 16:55:33 +01:00
Bartosz Taudul
420a50feea Function inlining test. 2019-01-20 16:55:09 +01:00
Bartosz Taudul
0ce3dfaba7 Add modified libbacktrace.
https://github.com/ianlancetaylor/libbacktrace
5a99ff7fed66b8ea8f09c9805c138524a7035ece
2019-01-20 16:53:45 +01:00
Bartosz Taudul
d4e9baa0d9 Display time savings also as time percentage. 2019-01-20 03:16:32 +01:00
Rokas K. (rku)
31bbdfe2f2 Merged in rokups/tracy/mingw-support (pull request #26)
MingW support
2019-01-20 00:44:44 +00:00
Bartosz Taudul
f6edbccfc8 Fix triangle rendering. 2019-01-19 14:22:45 +01:00
Bartosz Taudul
ac791fd19f Update imgui to 1.67. Also update imguicolortextedit. 2019-01-19 14:05:54 +01:00
Rokas Kupstys
36c76456f7 Fix mistakes from MingW support commit. 2019-01-19 15:03:43 +02:00
Rokas Kupstys
8157e3a0b3 Fix builds with MingW. 2019-01-19 13:53:10 +02:00
Bartosz Taudul
32f0a27d3b Update manual. 2019-01-16 02:10:21 +01:00
Bartosz Taudul
92f3a4bba0 Add ZoneText and ZoneName to the C API. 2019-01-16 02:10:21 +01:00
Bartosz Taudul
49e270d8a6 Detect zone end without begin failure. 2019-01-16 00:45:48 +01:00
Bartosz Taudul
7b6b6862ed Add histogram, compare screenshots to README. 2019-01-15 19:55:41 +01:00
Bartosz Taudul
6a84520126 C is also supported. 2019-01-15 19:50:41 +01:00
Bartosz Taudul
c10e6051f2 Update manual. 2019-01-15 19:50:24 +01:00
Bartosz Taudul
b72d30af80 Allow disabling zone verification. 2019-01-15 18:59:05 +01:00
Bartosz Taudul
708fdfea49 Track memory alloc+free matching failures. 2019-01-15 18:56:26 +01:00
Bartosz Taudul
ecf9a299de Check for proper number of failure reasons. 2019-01-15 18:56:17 +01:00
Bartosz Taudul
76ab70a948 Simplify failure detection code. 2019-01-15 18:55:47 +01:00
Bartosz Taudul
3e3ee0ec2f There may be no source location associated with failure. 2019-01-15 18:54:41 +01:00
Bartosz Taudul
9944a73444 Store failure reason strings in Worker. 2019-01-15 18:42:15 +01:00
Bartosz Taudul
3cd97138fc Capture utility also displays failure messages. 2019-01-14 23:52:38 +01:00
Bartosz Taudul
5a3856dff0 Update NEWS. 2019-01-14 23:45:23 +01:00
Bartosz Taudul
57decf5875 Display failure information. 2019-01-14 23:42:58 +01:00
Bartosz Taudul
ac6e7439e2 TODO: track memory allocation tracking failures. 2019-01-14 23:26:32 +01:00
Bartosz Taudul
c3246ca3b5 Gracefully store failure states. 2019-01-14 23:22:31 +01:00
Bartosz Taudul
4dc339c933 Close connection when zone validation fails. 2019-01-14 23:12:11 +01:00
Bartosz Taudul
c3b67e4482 Perform zone stack validation. 2019-01-14 23:08:34 +01:00
Bartosz Taudul
dcc6bee607 Process zone validation messages. 2019-01-14 22:56:10 +01:00
Bartosz Taudul
8e52ab318b Send zone validation messages.
This is only performed for C API, as C++ scoped zones are always
properly ordered, due to RAII. With manual submission of zone begin and
end events there's no such guarantee.
2019-01-14 22:36:54 +01:00
Bartosz Taudul
970108fbbf Track zone id for verification purposes. 2019-01-14 22:36:54 +01:00
Bartosz Taudul
a2fd09d938 Add zone validation queue item. 2019-01-14 22:36:54 +01:00
Bartosz Taudul
1a8518dcc2 Allow filtering zones in on-demand mode. 2019-01-14 22:36:54 +01:00
Bartosz Taudul
1f0d1fdfdc C API prototype. 2019-01-14 21:07:29 +01:00
Bartosz Taudul
73cbd7dc3a Add deadlock test. 2019-01-14 18:48:16 +01:00
Bartosz Taudul
a5736a9c1b Change crash visuals in options menu. 2019-01-14 18:48:16 +01:00
Bartosz Taudul
12bd93ca5b Update manual. 2019-01-14 13:16:00 +01:00
Bartosz Taudul
a95e8a5424 Hide internals behind TracyVkCtx typedef. 2019-01-14 12:40:54 +01:00
Bartosz Taudul
070888f80d Make it possible to have multiple vulkan contexts.
API change!
2019-01-10 17:11:17 +01:00
Bartosz Taudul
ae288c6a6a Move vulkan macros at the end of TracyVulkan.hpp. 2019-01-10 16:41:04 +01:00
Bartosz Taudul
da8b01357d Proper skipping of locks in 0.4.1+ (fixes compare menu). 2019-01-08 17:19:04 +01:00
Bartosz Taudul
cb50cf9de6 Last time is stored in worker. 2019-01-08 15:44:29 +01:00
Bartosz Taudul
9c6d037859 Another unneeded capture. 2019-01-06 21:15:49 +01:00
Bartosz Taudul
096022a718 Proper string printing. 2019-01-06 21:15:26 +01:00
Bartosz Taudul
d1beb12dc3 Remove unused variable. 2019-01-06 21:14:02 +01:00
Bartosz Taudul
13a0ddfe03 No need to perform capture here. 2019-01-06 21:11:36 +01:00
Bartosz Taudul
fbe8eb3585 Fix initialization of atomics. 2019-01-06 21:09:56 +01:00
Bartosz Taudul
6a1c552c61 Reduce zone loading time. 2019-01-06 20:49:37 +01:00
Bartosz Taudul
d6953d5e73 Update NEWS. 2019-01-06 19:20:39 +01:00
Bartosz Taudul
dabdf1360f Display trace loading time. 2019-01-06 19:20:17 +01:00
Bartosz Taudul
77c9a8c407 Add support for notification text in View. 2019-01-06 19:14:24 +01:00
Bartosz Taudul
980c54e349 Track trace loading time. 2019-01-06 19:09:50 +01:00
Bartosz Taudul
5ac26ce084 Init common Worker variables in header. 2019-01-06 19:04:50 +01:00
Bartosz Taudul
a313ed4720 Track separate time offset for GPU times.
This is second version of 0.4.2 dump file format. Previous 0.4.2 format
cannot be read anymore.

041/aa.tracy (0.4.1) {18987 KB} -> 042/aa.tracy (0.4.2) {10051 KB}  52.94% size change
041/android.tracy (0.4.1) {696753 KB} -> 042/android.tracy (0.4.2) {542738 KB}  77.90% size change
041/asset-new.tracy (0.4.1) {97163 KB} -> 042/asset-new.tracy (0.4.2) {78402 KB}  80.69% size change
041/asset-new-id.tracy (0.4.1) {105683 KB} -> 042/asset-new-id.tracy (0.4.2) {84341 KB}  79.81% size change
041/asset-old.tracy (0.4.1) {100205 KB} -> 042/asset-old.tracy (0.4.2) {80688 KB}  80.52% size change
041/big.tracy (0.4.1) {2246014 KB} -> 042/big.tracy (0.4.2) {939578 KB}  41.83% size change
041/crash.tracy (0.4.1) {143 KB} -> 042/crash.tracy (0.4.2) {131 KB}  91.37% size change
041/crash2.tracy (0.4.1) {3411 KB} -> 042/crash2.tracy (0.4.2) {1420 KB}  41.63% size change
041/darkrl.tracy (0.4.1) {31818 KB} -> 042/darkrl.tracy (0.4.2) {15762 KB}  49.54% size change
041/darkrl2.tracy (0.4.1) {18778 KB} -> 042/darkrl2.tracy (0.4.2) {7945 KB}  42.31% size change
041/darkrl-old.tracy (0.4.1) {151346 KB} -> 042/darkrl-old.tracy (0.4.2) {67449 KB}  44.57% size change
041/deadlock.tracy (0.4.1) {53 KB} -> 042/deadlock.tracy (0.4.2) {52 KB}  98.55% size change
041/gn-opengl.tracy (0.4.1) {45860 KB} -> 042/gn-opengl.tracy (0.4.2) {29005 KB}  63.25% size change
041/gn-vulkan.tracy (0.4.1) {45618 KB} -> 042/gn-vulkan.tracy (0.4.2) {29352 KB}  64.34% size change
041/long.tracy (0.4.1) {1583550 KB} -> 042/long.tracy (0.4.2) {1182800 KB}  74.69% size change
041/mem.tracy (0.4.1) {1243058 KB} -> 042/mem.tracy (0.4.2) {1369067 KB}  110.14% size change
041/multi.tracy (0.4.1) {14519 KB} -> 042/multi.tracy (0.4.2) {8000 KB}  55.10% size change
041/new.tracy (0.4.1) {1439 KB} -> 042/new.tracy (0.4.2) {1105 KB}  76.75% size change
041/q3bsp-mt.tracy (0.4.1) {1414323 KB} -> 042/q3bsp-mt.tracy (0.4.2) {949855 KB}  67.16% size change
041/q3bsp-st.tracy (0.4.1) {301334 KB} -> 042/q3bsp-st.tracy (0.4.2) {240347 KB}  79.76% size change
041/selfprofile.tracy (0.4.1) {399648 KB} -> 042/selfprofile.tracy (0.4.2) {197704 KB}  49.47% size change
041/tbrowser.tracy (0.4.1) {13052 KB} -> 042/tbrowser.tracy (0.4.2) {9503 KB}  72.81% size change
041/test.tracy (0.4.1) {60309 KB} -> 042/test.tracy (0.4.2) {40700 KB}  67.49% size change
041/virtualfile_hc.tracy (0.4.1) {108967 KB} -> 042/virtualfile_hc.tracy (0.4.2) {72424 KB}  66.46% size change
041/zfile_hc.tracy (0.4.1) {58814 KB} -> 042/zfile_hc.tracy (0.4.2) {39418 KB}  67.02% size change
2019-01-03 21:52:43 +01:00
Bartosz Taudul
d49b005900 Display dump file size change in the update utility. 2018-12-30 23:47:43 +01:00
Bartosz Taudul
f8ef5b726a Store time deltas, instead of absolute time in trace dumps.
This change greatly reduces the size of saved dumps, but increase the
cost of processing during loading. One notable outlier in the dataset
below is mem.tracy, which increased in size, even if changes in the
memory dump saving scheme decrease size of the other traces.

041/aa.tracy (0.4.1) {18987 KB} -> 042/aa.tracy (0.4.2) {10140 KB}  53.40% size change
041/android.tracy (0.4.1) {696753 KB} -> 042/android.tracy (0.4.2) {542738 KB}  77.90% size change
041/asset-new.tracy (0.4.1) {97163 KB} -> 042/asset-new.tracy (0.4.2) {78402 KB}  80.69% size change
041/asset-new-id.tracy (0.4.1) {105683 KB} -> 042/asset-new-id.tracy (0.4.2) {84341 KB}  79.81% size change
041/asset-old.tracy (0.4.1) {100205 KB} -> 042/asset-old.tracy (0.4.2) {80688 KB}  80.52% size change
041/big.tracy (0.4.1) {2246014 KB} -> 042/big.tracy (0.4.2) {943083 KB}  41.99% size change
041/crash.tracy (0.4.1) {143 KB} -> 042/crash.tracy (0.4.2) {131 KB}  91.39% size change
041/crash2.tracy (0.4.1) {3411 KB} -> 042/crash2.tracy (0.4.2) {1425 KB}  41.80% size change
041/darkrl.tracy (0.4.1) {31818 KB} -> 042/darkrl.tracy (0.4.2) {15897 KB}  49.96% size change
041/darkrl2.tracy (0.4.1) {18778 KB} -> 042/darkrl2.tracy (0.4.2) {8002 KB}  42.62% size change
041/darkrl-old.tracy (0.4.1) {151346 KB} -> 042/darkrl-old.tracy (0.4.2) {67945 KB}  44.89% size change
041/deadlock.tracy (0.4.1) {53 KB} -> 042/deadlock.tracy (0.4.2) {52 KB}  98.55% size change
041/gn-opengl.tracy (0.4.1) {45860 KB} -> 042/gn-opengl.tracy (0.4.2) {30983 KB}  67.56% size change
041/gn-vulkan.tracy (0.4.1) {45618 KB} -> 042/gn-vulkan.tracy (0.4.2) {31349 KB}  68.72% size change
041/long.tracy (0.4.1) {1583550 KB} -> 042/long.tracy (0.4.2) {1225316 KB}  77.38% size change
041/mem.tracy (0.4.1) {1243058 KB} -> 042/mem.tracy (0.4.2) {1369291 KB}  110.15% size change
041/multi.tracy (0.4.1) {14519 KB} -> 042/multi.tracy (0.4.2) {8110 KB}  55.86% size change
041/new.tracy (0.4.1) {1439 KB} -> 042/new.tracy (0.4.2) {1108 KB}  77.01% size change
041/q3bsp-mt.tracy (0.4.1) {1414323 KB} -> 042/q3bsp-mt.tracy (0.4.2) {949855 KB}  67.16% size change
041/q3bsp-st.tracy (0.4.1) {301334 KB} -> 042/q3bsp-st.tracy (0.4.2) {240347 KB}  79.76% size change
041/selfprofile.tracy (0.4.1) {399648 KB} -> 042/selfprofile.tracy (0.4.2) {197713 KB}  49.47% size change
041/tbrowser.tracy (0.4.1) {13052 KB} -> 042/tbrowser.tracy (0.4.2) {9503 KB}  72.81% size change
041/test.tracy (0.4.1) {60309 KB} -> 042/test.tracy (0.4.2) {40700 KB}  67.49% size change
041/virtualfile_hc.tracy (0.4.1) {108967 KB} -> 042/virtualfile_hc.tracy (0.4.2) {72839 KB}  66.85% size change
041/zfile_hc.tracy (0.4.1) {58814 KB} -> 042/zfile_hc.tracy (0.4.2) {39608 KB}  67.35% size change
2018-12-30 23:42:17 +01:00
Bartosz Taudul
59ed5775d9 Release v0.4.1. 2018-12-30 17:57:55 +01:00
Bartosz Taudul
6c9337563d Update year in copyright notice. 2018-12-30 17:51:17 +01:00
Bartosz Taudul
370eda557c Manual improvements. 2018-12-30 17:50:52 +01:00
Bartosz Taudul
5cbe2c6ae5 Reorder tracy_lz4.cpp vs TracyProfiler.cpp in TracyClient.cpp.
This fixes deprecation warning in tracy_lz4.hpp, which was previously
present due to TracyProfiler.cpp including tracy_lz4.hpp before an
appropriate deprecation restraining macro was defined in tracy_lz4.cpp.
Note that this issue was only present if TracyClient.cpp was used to
include the profiler in a project. Including the profiler as a
collection of separate source files worked correctly, as the deprecated
function is only used by tracy_lz4.cpp.
2018-12-29 01:00:14 +01:00
Bartosz Taudul
b1ba2f9bf7 Fix extern "C" initialization. 2018-12-29 01:00:14 +01:00
Bartosz Taudul
1733961885 Proper printf type for DWORDLONG on cygwin. 2018-12-29 01:00:14 +01:00
Bartosz Taudul
ee718f18d9 Cygwin headers provide their own FORCEINLINE macro. 2018-12-29 01:00:14 +01:00
Bartosz Taudul
0a6c6606bf Don't use MSVC pragmas on gcc/clang (cygwin). 2018-12-29 01:00:14 +01:00
Miguel Fernandez
5e4b5850af Merged in Muitxer/tracy (pull request #28)
Moved NoMinMax before windows.h
2018-12-24 19:06:51 +00:00
Miguel Fernandez
baa870fa8c Moved NoMinMax before windows.h 2018-12-24 18:50:52 +00:00
Miguel Fernandez
1a50a15212 Merged in Muitxer/tracy (pull request #27)
Avoid conflicts with min/max macros
2018-12-24 18:50:19 +00:00
Miguel Fernandez
7c164375a4 Moved NoMinMax inside _MSC_VER 2018-12-24 18:49:53 +00:00
Miguel Fernandez
51bdb004f9 Avoid conflicts with min/max macros 2018-12-24 15:26:50 +00:00
Bartosz Taudul
d80bd2693c Update manual. 2018-12-22 17:46:30 +01:00
Bartosz Taudul
0ac83a27cc Update NEWS. 2018-12-22 17:41:29 +01:00
Bartosz Taudul
ea396354d0 ^F opens find zone menu and focuses on the input box. 2018-12-22 17:39:22 +01:00
Bartosz Taudul
2d143ce516 Add support for handling keyboard shortcuts. 2018-12-22 17:36:20 +01:00
Bartosz Taudul
4bb4a568ca Move initialization of View values to header. 2018-12-22 17:22:26 +01:00
Bartosz Taudul
e1bd5c092b Pressing enter key when entering client address automatically connects. 2018-12-22 17:14:22 +01:00
Bartosz Taudul
cd8d86edf3 Allow hiding "[unknown frames]" entries. 2018-12-21 21:10:29 +01:00
Bartosz Taudul
e9ce8fdfda Flush queues when opening listen socket fails. 2018-12-21 18:14:30 +01:00
Bartosz Taudul
a4be9b51b0 Use common queue clearing function. 2018-12-21 18:12:26 +01:00
Bartosz Taudul
331693d7f1 Use proper pattern for acquiring serial lock.
This fixes a potential hang during crash handling. Also, lock duration
is reduced.
2018-12-21 18:11:09 +01:00
Bartosz Taudul
4893bca12b Update manual. 2018-12-20 17:13:18 +01:00
Bartosz Taudul
2f45fd8b36 Update NEWS. 2018-12-20 17:08:24 +01:00
Bartosz Taudul
6fefffe8a5 Implement automated connection to a given IP address. 2018-12-20 17:07:15 +01:00
Bartosz Taudul
8c5670489c Freeing nullptr is valid. 2018-12-20 17:03:09 +01:00
Bartosz Taudul
407fb61a30 Display maximum number of waiting threads for a lock. 2018-12-19 18:34:53 +01:00
Bartosz Taudul
0f2b61cf24 Display wait and hold times of locks. 2018-12-19 18:28:48 +01:00
Bartosz Taudul
1d70e5e5c3 Document the maximum number of threads supported by locks. 2018-12-18 18:24:27 +01:00
Bartosz Taudul
621f7c891e Document right click on lock label in options menu. 2018-12-18 18:16:24 +01:00
Bartosz Taudul
30c2d0df85 Add lock information window section. 2018-12-18 18:14:26 +01:00
Bartosz Taudul
4e64ba7775 Document lock events interaction. 2018-12-18 17:56:02 +01:00
Bartosz Taudul
83958db840 Add information about collapsing labels. 2018-12-18 17:55:54 +01:00
Bartosz Taudul
70ec4b71e4 Move zone interaction out of view navigation section. 2018-12-18 17:45:18 +01:00
Bartosz Taudul
ac9dbfbc79 Add a note about source button highlight. 2018-12-18 17:28:00 +01:00
Bartosz Taudul
57b4f874cc Menu bar buttons are now toggles. 2018-12-18 17:24:19 +01:00
Bartosz Taudul
bd71190a4c Add separate section describing collapsed items. 2018-12-18 17:21:29 +01:00
Bartosz Taudul
0898873a02 Update NEWS. 2018-12-18 16:58:29 +01:00
Bartosz Taudul
df1a125fc0 Mirror find zone menu changes in compare menu. 2018-12-18 16:56:19 +01:00
Bartosz Taudul
a220f38fbd Add support for matching source locations ignoring case. 2018-12-18 16:52:29 +01:00
Bartosz Taudul
acddcbd9bf Add case-ignoring string matcher. 2018-12-18 16:52:05 +01:00
Bartosz Taudul
24235406a0 Enter key in find zone menu acts the same as pressing "find". 2018-12-18 16:40:23 +01:00
Bartosz Taudul
7fc03736f2 Add "ignore case" toggle to find zone menu. 2018-12-18 16:38:55 +01:00
Bartosz Taudul
a740074da6 Color tweaks. 2018-12-18 16:30:13 +01:00
Bartosz Taudul
b60d5b892a Unify coloring of highlighted buttons. 2018-12-18 16:30:13 +01:00
Bartosz Taudul
c2485fbcb0 Add visual notification of an active toggle. 2018-12-18 16:30:13 +01:00
Bartosz Taudul
9e18db01c9 Menu bar buttons are now toggles. 2018-12-18 16:30:13 +01:00
Rokas K. (rku)
85fbfeccf0 Merged in rokups/tracy/fix-macos-android-builds (pull request #25)
Fix MacOS/android builds
2018-12-18 15:20:44 +00:00
Rokas Kupstys
a931b9eaf1 HOST_NAME_MAX and LOGIN_NAME_MAX availability is not consistent across linux/android/macos platforms. However all of them do have versions of these macros with _POSIX_ prefix.
In addition to that hostname and user variables may be uninitialized in some configurations, however they are always used. Initializing these arrays fixes conditional depending on uninitialized memory warning uncovered by valgrind.
2018-12-18 17:19:03 +02:00
Bartosz Taudul
083320820f OSX doesn't define HOST_NAME_MAX and LOGIN_NAME_MAX.
Fix based on patch from Jack Skalski.
2018-12-17 15:11:59 +01:00
Bartosz Taudul
a7e615d42e Cosmetics. 2018-12-17 15:09:10 +01:00
Bartosz Taudul
79eb6a5836 Right click on lock in options to open info window. 2018-12-16 21:14:15 +01:00
Bartosz Taudul
aac8d85e6d Update NEWS. 2018-12-16 21:09:37 +01:00
Bartosz Taudul
9a7689c65d Display lock announce, terminate and life time. 2018-12-16 21:09:37 +01:00
Bartosz Taudul
7376ec65b0 Store lock announce and terminate time in trace dump. 2018-12-16 21:09:37 +01:00
Bartosz Taudul
9360df89b1 Store announce and terminate time of locks. 2018-12-16 21:07:26 +01:00
Bartosz Taudul
f42d52923a No-op processing of lock terminate events. 2018-12-16 20:46:33 +01:00
Bartosz Taudul
0b816ce0b7 Add lock termination event. 2018-12-16 20:46:33 +01:00
Bartosz Taudul
61ac0b8afc Send lock creation time. 2018-12-16 20:33:18 +01:00
Bartosz Taudul
91171a6674 Draw zig-zag pattern over collapsed locks. 2018-12-16 20:20:27 +01:00
Bartosz Taudul
abad5574f3 Middle click on lock event to zoom to it. 2018-12-16 20:04:45 +01:00
Bartosz Taudul
8f6f54e412 Clicking on a lock event also opens lock info window. 2018-12-16 20:01:40 +01:00
Bartosz Taudul
444d5e20f0 Add basic lock info window. 2018-12-16 19:58:11 +01:00
Bartosz Taudul
ff8c9ab6dc Properly terminate source file data. 2018-12-16 19:48:34 +01:00
Bartosz Taudul
80bd4275eb Document TRACY_CALLSTACK macro. 2018-12-13 15:42:53 +01:00
Bartosz Taudul
a54fab05e8 Update NEWS. 2018-12-13 14:44:20 +01:00
Bartosz Taudul
537cee911c Allow forcing call stack capture. 2018-12-13 14:43:37 +01:00
Bartosz Taudul
3c1231f5eb Update manual. 2018-11-25 19:36:17 +01:00
Bartosz Taudul
a9a2ca66ca Update NEWS. 2018-11-25 19:31:26 +01:00
Bartosz Taudul
1235a5aa0a Allow discarding active trace. 2018-11-25 19:31:26 +01:00
Bartosz Taudul
fec0017bb6 Add third state (stopped) to the pause/resume button. 2018-11-25 19:15:16 +01:00
Bartosz Taudul
f19b559f6e InitOnceExecuteOnce requires targeting Windows Vista.
Cygwin fix.
2018-11-25 19:03:17 +01:00
Bartosz Taudul
d8a9d6d3bf Update imguicolortextedit.
339d5ef00edcfb849c1281bcf176113199828522
2018-10-30 22:52:57 +01:00
Bartosz Taudul
6f9977638d Update AUTHORS list. 2018-10-29 17:53:28 +01:00
Sherief Farouk
853eec451f Merged in sherief/tracy-rpmalloc-bugfix (pull request #24)
Fix for using Tracy with multithreaded NT loader in Windows 10 RS5 (Issue #26).
2018-10-29 10:08:20 +00:00
Sherief Farouk
591f04ad0f Renamed preprocessor #define for consistency. 2018-10-28 22:41:08 -07:00
Sherief Farouk
5110d55f17 Fix for using Tracy with multithreaded NT loader in Windows 10 RS5 (Issue #26) [Take 2]. 2018-10-28 18:55:55 -07:00
Sherief Farouk
27447902ef Fix for using Tracy with multithreaded NT loader in Windows 10 RS5 (Issue #26). 2018-10-27 18:13:59 -07:00
Bartosz Taudul
d8bcb32951 Add freetype to libraries list. 2018-10-27 20:08:50 +02:00
Bartosz Taudul
dc4c8ef343 Statically link with freetype. 2018-10-27 20:06:54 +02:00
Bartosz Taudul
6f9b4aeac9 Document the connection history drop-down. 2018-10-23 20:12:36 +02:00
Bartosz Taudul
78beb7bd81 Allow removal of addresses from connection history. 2018-10-23 19:59:11 +02:00
Bartosz Taudul
a234510ef3 Update NEWS. 2018-10-23 19:50:09 +02:00
Bartosz Taudul
2af941eadc Allow selection from the 5 most commonly used addresses. 2018-10-23 19:50:09 +02:00
Bartosz Taudul
1eb46042be Track connection history. 2018-10-23 19:50:09 +02:00
Bartosz Taudul
63f0dd72a5 Allow cancelling pending connection. 2018-10-23 19:49:57 +02:00
Bartosz Taudul
c3ba314700 Update profiler screenshot. 2018-10-21 18:10:35 +02:00
Bartosz Taudul
b20b169a88 Document "go to frame" functionality. 2018-10-21 17:51:02 +02:00
Bartosz Taudul
54baec9e7e Fix drawing last collapsed non-contiguous frame. 2018-10-21 17:46:24 +02:00
Bartosz Taudul
56190c9614 Update NEWS. 2018-10-21 17:46:21 +02:00
Bartosz Taudul
556b3e8efe Add "go to frame" functionality. 2018-10-21 17:36:27 +02:00
Bartosz Taudul
793e955480 Fix crash when loading a trace with unresolved strings.
Unresolved strings ("???") are not saved, but the internal string
pointers are saved. Resolving such string pointers caused a crash.
2018-10-21 16:38:20 +02:00
Bartosz Taudul
9342ba0e71 Don't track last frames in offline mode. 2018-10-21 16:03:21 +02:00
Bartosz Taudul
2165881efc Document timeline view frame set changing. 2018-10-21 15:53:21 +02:00
Bartosz Taudul
759a4ac908 Update NEWS. 2018-10-21 15:46:53 +02:00
Bartosz Taudul
5280d6586b Switching active frame set by clicking on a frame. 2018-10-21 15:46:02 +02:00
Bartosz Taudul
ec6e2dc0e1 Also update date in NEWS. 2018-10-09 19:40:18 +02:00
Bartosz Taudul
fae5e15048 Add link to new features in v0.4. 2018-10-09 19:36:46 +02:00
Bartosz Taudul
5f92b08b0d Bump version to 0.4.0. 2018-10-09 19:29:06 +02:00
Bartosz Taudul
ec50d16076 Add new feature videos to tutorial button. 2018-10-09 19:28:24 +02:00
Bartosz Taudul
9a7bee7280 Update manual. 2018-10-09 16:00:55 +02:00
Bartosz Taudul
df57ba3b5b Update NEWS. 2018-10-09 15:55:16 +02:00
Bartosz Taudul
05c9325018 Highlight zones selected in the find zone menu. 2018-10-09 15:54:28 +02:00
Bartosz Taudul
4ca4c85976 Fix an edge case in zone drawing.
If the last zone on a track was not ended, and in the view's past
(beyond the left edge of the view) it was still included in calculation
of track height.
2018-10-06 12:58:38 +02:00
Bartosz Taudul
299d1d7a73 Update docs. 2018-10-05 23:47:45 +02:00
Bartosz Taudul
75ab9147d0 Reduce amount of information in "menu" bar.
Zone count, queue delay and timer resolution were moved to the trace
information window.

Time span and View span are now displayed as icons.
2018-10-05 23:02:23 +02:00
Bartosz Taudul
d6c26293e3 Update docs. 2018-10-05 22:40:08 +02:00
Bartosz Taudul
9e94dcd320 Fix zoom-to-allocation not working on selected allocations. 2018-10-05 21:13:31 +02:00
Bartosz Taudul
1a8b184d10 Mute inactive frame sets. 2018-10-05 21:10:37 +02:00
Bartosz Taudul
286a6cfe0a Move check out of loop. 2018-10-05 21:03:04 +02:00
Bartosz Taudul
0d8b79f6c9 Don't miss frame separators. 2018-10-05 20:59:00 +02:00
Bartosz Taudul
81cf024498 Highlight message marker even if it's collapsed. 2018-10-05 20:40:10 +02:00
Bartosz Taudul
28c176d3aa Fix loss of window size and position, if it was maximized. 2018-10-05 20:23:54 +02:00
Bartosz Taudul
3b241baa52 Add list of used libraries. 2018-10-03 15:51:00 +02:00
Bartosz Taudul
06a1c9c752 Update manual. 2018-09-28 12:20:14 +02:00
Bartosz Taudul
c33f60662f Update NEWS. 2018-09-28 11:46:22 +02:00
Bartosz Taudul
b7d2a690d9 Zoom to allocation range when middle clicking on address. 2018-09-28 11:43:45 +02:00
Bartosz Taudul
4960e691d4 Added ability to zoom to allocation range in allocation window. 2018-09-28 11:40:22 +02:00
Bartosz Taudul
560163a031 Add allocations list window entry to the user manual. 2018-09-27 23:35:38 +02:00
Bartosz Taudul
55d66bd31e Update NEWS. 2018-09-27 23:20:21 +02:00
Bartosz Taudul
428b7da1cc The underlying vector might be reallocated. 2018-09-27 23:19:20 +02:00
Bartosz Taudul
6cfd53b274 Add allocations list window. 2018-09-27 23:19:20 +02:00
Bartosz Taudul
01e0bbb5f9 Build list of allocations in a given call stack tree entry. 2018-09-27 23:19:20 +02:00
Bartosz Taudul
9301986bae Collect callstacks for each entry in call stack tree. 2018-09-27 22:56:44 +02:00
Bartosz Taudul
44fae53583 Display lock source location in tooltip. 2018-09-18 16:29:02 +02:00
Bartosz Taudul
cf459c6498 Update NEWS. 2018-09-18 16:25:09 +02:00
Bartosz Taudul
b6e9905155 Display time span during capture in capture utility. 2018-09-18 16:24:32 +02:00
Bartosz Taudul
920f9ac7aa More compact network statistics display. 2018-09-18 16:20:41 +02:00
Bartosz Taudul
d3370856ee Update NEWS. 2018-09-18 16:15:50 +02:00
Bartosz Taudul
06eec51ed9 Display locks source locations in options locks list. 2018-09-18 16:14:32 +02:00
Bartosz Taudul
42360f50b0 Update lz4 to 1.8.3. 2018-09-13 01:52:16 +02:00
Bartosz Taudul
5ddfda9170 Initialize capacity in vector, as it's checked by asserts. 2018-09-10 11:28:12 +02:00
Bartosz Taudul
ecb8bfc53a Update manual. 2018-09-09 19:57:46 +02:00
Bartosz Taudul
59a31a8222 Update NEWS. 2018-09-09 19:46:56 +02:00
Bartosz Taudul
6be66d7a3c Fix on-demand mode. 2018-09-09 19:44:41 +02:00
Bartosz Taudul
9211ce42da Non-on-demand client is only able to handle one connection. 2018-09-09 19:42:06 +02:00
Bartosz Taudul
984a711666 Send protocol version to verify handshake. 2018-09-09 19:28:53 +02:00
Bartosz Taudul
db1d7d2c92 Free socket after disconnection. 2018-09-09 18:31:06 +02:00
Bartosz Taudul
270072b09e Require shibboleth match at start of connection. 2018-09-09 18:26:53 +02:00
Bartosz Taudul
f55f2858c4 Raw socket read function. 2018-09-09 18:24:58 +02:00
Bartosz Taudul
806c8de463 Only one outgoing server connection is supported. 2018-09-09 17:47:20 +02:00
Bartosz Taudul
2ae2399f31 Add a section about variable shadowing. 2018-09-08 21:10:41 +02:00
Bartosz Taudul
cbfdbcbcd2 Add more information about zone filtering. 2018-09-08 20:59:54 +02:00
Bartosz Taudul
9eb04ea817 Update to ImGui 1.65. 2018-09-08 20:31:38 +02:00
Bartosz Taudul
9d4bdf0c28 Update NEWS. 2018-09-08 20:25:44 +02:00
Bartosz Taudul
4471329661 Display time savings in the compare traces menu. 2018-09-08 20:23:49 +02:00
Bartosz Taudul
b4c25e56c1 Add a section on timer accuracy to the manual. 2018-09-08 20:19:15 +02:00
Bartosz Taudul
f540038d0d Remove spacing from source code. 2018-09-08 20:16:19 +02:00
Bartosz Taudul
19f3c5f5ff Ignore frames with 0 time. 2018-09-08 19:04:38 +02:00
Bartosz Taudul
2c43e1337f Fast log10 is no longer needed. 2018-09-08 19:01:51 +02:00
Bartosz Taudul
560fc34ae6 No pthread_setcancelstate on Android. 2018-09-07 17:21:44 +02:00
Bartosz Taudul
d7d4c26990 Add example of overloading operator new and delete. 2018-09-07 01:51:38 +02:00
Bartosz Taudul
2f9d0aa9eb Use improved algorithm in compare trace histogram. 2018-09-03 21:26:50 +02:00
Bartosz Taudul
384a42cc47 Display average and median times in compare traces. 2018-09-03 20:45:51 +02:00
Bartosz Taudul
29d649216e In compare traces put both total times in the same line. 2018-09-03 20:39:34 +02:00
Bartosz Taudul
9fb26b3622 If there's no group selected, dim group selection legend. 2018-09-03 20:36:40 +02:00
Bartosz Taudul
fc40c7bbf6 Calculate compare traces average, median. 2018-09-03 20:34:07 +02:00
Bartosz Taudul
b485aad2a3 Cosmetics. 2018-09-03 20:34:00 +02:00
Bartosz Taudul
e8b4f71f4a Properly initialize sortedNum in find zone. 2018-09-03 20:21:28 +02:00
Bartosz Taudul
a02121d78a Allow disabling average, median markers on frame set histogram. 2018-09-02 13:37:36 +02:00
Bartosz Taudul
0b0fa919d3 Find zone groups are now by default sorted by count. 2018-09-02 13:34:00 +02:00
Bartosz Taudul
d1f9fff7e3 Add information about median, average time checkboxes. 2018-09-02 13:33:31 +02:00
Bartosz Taudul
fb013c0df5 Properly reset state when switching matched source locations. 2018-09-02 13:25:17 +02:00
Bartosz Taudul
ea779905a3 Add crash handling section reference. 2018-09-02 13:13:53 +02:00
Bartosz Taudul
1581e2f20d "Index area" is a name. 2018-09-02 13:12:41 +02:00
Bartosz Taudul
f42219e7f6 There's source file view ability in find zone window. 2018-09-02 13:09:32 +02:00
Bartosz Taudul
08729c2b42 Allow disabling average and median time markers. 2018-09-02 13:06:09 +02:00
Bartosz Taudul
1bff8a7997 Draw group average and median times on histogram. 2018-09-02 13:00:21 +02:00
Bartosz Taudul
c3c48117d4 Display group average and median times. 2018-09-02 13:00:21 +02:00
Bartosz Taudul
2d3ce1bf25 Calculate group average and median times. 2018-09-02 13:00:21 +02:00
Bartosz Taudul
5733b420a1 Use the same algorithm for selection group binning. 2018-09-02 03:46:16 +02:00
Bartosz Taudul
c1630936d4 Use the improved method in find zone histogram. 2018-09-02 02:58:15 +02:00
Bartosz Taudul
854210a7e3 Fix find zone histogram selection start/end. 2018-09-02 02:09:29 +02:00
Bartosz Taudul
8152e213f8 Collapse separate find zone histogram paths into one. 2018-09-02 01:31:09 +02:00
Bartosz Taudul
bb9da36f7e Display compare traces histogram in manual. 2018-09-02 00:57:44 +02:00
Bartosz Taudul
6869f2e9d9 Update user manual. 2018-09-02 00:47:17 +02:00
Bartosz Taudul
fb45d15f5c Update NEWS. 2018-09-02 00:38:47 +02:00
Bartosz Taudul
f43b875b83 Display average and median zone time in find zone histogram. 2018-09-02 00:28:57 +02:00
Bartosz Taudul
f66ed00d71 Calculate sorted zone times for find zone histogram. 2018-09-02 00:19:15 +02:00
Bartosz Taudul
e81218ddaf Radically improve frame set histogram performance.
This change exploits the fact that frame set data is sorted, and the
histogram bins can be calculated as distances in the frame-time vectors.
2018-09-01 14:50:38 +02:00
Bartosz Taudul
1bef4b45b7 Display continuous/discontinuous info about frame sets. 2018-09-01 14:04:23 +02:00
Bartosz Taudul
213b33a4fa No need to check for zero value in a sorted set. 2018-09-01 13:55:25 +02:00
Bartosz Taudul
0c086e3a30 In-place merge new frames instead of re-sorting the whole set. 2018-09-01 13:34:02 +02:00
Bartosz Taudul
9f4d6692dc Proper way to get full frame count. 2018-09-01 12:38:12 +02:00
Bartosz Taudul
faea036c16 Ignore last, probably unfinished frame. 2018-09-01 02:07:52 +02:00
Bartosz Taudul
c8a0bfd9be Merge average and median lines, if they overlap. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
8211eb1371 Display FPS ranges. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
0825c40938 Display average and median frame times. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
8df82278a5 Display frame set time as a percentage of profile time. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
98b5363ebc Add frame set histogram. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
9b8a0a8364 Display total frame set time. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
27a2d8595d Time is int64_t. 2018-09-01 01:48:00 +02:00
Bartosz Taudul
cb47ac6165 Actually mark the data as used. 2018-09-01 01:01:41 +02:00
Bartosz Taudul
907da3265d Fix string handling. 2018-08-31 20:08:04 +02:00
Bartosz Taudul
0f72461c3e Fix for GLFW 3.1. 2018-08-31 20:06:38 +02:00
Bartosz Taudul
009c877982 Update NEWS. 2018-08-31 19:44:17 +02:00
Bartosz Taudul
00fb98ed64 Only show window when it's ready. 2018-08-31 19:40:08 +02:00
Bartosz Taudul
a9dd70251b Save window position, size, maximized state. 2018-08-31 19:38:19 +02:00
Bartosz Taudul
bc886e4287 Save path is not persistent. 2018-08-31 19:38:05 +02:00
Bartosz Taudul
9da3364c77 Display non-rounded FPS in a tooltip. 2018-08-31 18:58:39 +02:00
Bartosz Taudul
230ee71368 Do not recalculate frame stats, if frame data didn't change. 2018-08-31 18:51:00 +02:00
Bartosz Taudul
4ee8e7c372 Also display frames per second for average and median frame times. 2018-08-31 18:43:25 +02:00
Bartosz Taudul
fca71e6e0d Update to imgui 1.64. 2018-08-31 18:37:29 +02:00
Bartosz Taudul
0d6d296e94 Display average and median frame times. 2018-08-31 15:32:30 +02:00
Bartosz Taudul
d977fa004d Enable keyboard navigation. 2018-08-30 02:08:08 +02:00
Bartosz Taudul
f1e4d949a0 Update bindings. 2018-08-30 02:01:12 +02:00
Bartosz Taudul
d287250e25 Also update imgui_freetype. 2018-08-30 01:01:23 +02:00
Bartosz Taudul
1875abed91 Fix obsolete enum. 2018-08-30 00:57:01 +02:00
Bartosz Taudul
38a2951e97 Update to ImGui 1.63. 2018-08-30 00:55:19 +02:00
Bartosz Taudul
0f5ee69668 Add missing include. 2018-08-29 23:25:42 +02:00
Bartosz Taudul
411f956db2 Save imgui.ini in a common location. 2018-08-29 23:22:54 +02:00
Bartosz Taudul
204cc019ea Add file storage helpers. 2018-08-29 23:22:44 +02:00
Bartosz Taudul
81655816f0 Display captured program name and capture time. 2018-08-29 01:02:29 +02:00
Bartosz Taudul
8f1acf2571 Store explicit program name and capture time. 2018-08-29 01:02:29 +02:00
Bartosz Taudul
128577d7bf Remove parenthesis from TracyVkDestroy macro. 2018-08-28 16:45:05 +02:00
Bartosz Taudul
dd0556fa0f Update NEWS. 2018-08-28 01:49:59 +02:00
Bartosz Taudul
bc6a553a3a Fetch thread names in memory events. 2018-08-28 01:48:19 +02:00
Bartosz Taudul
0b568d55ba Add thread that only allocates memory. 2018-08-28 01:48:03 +02:00
Bartosz Taudul
00da3ba6eb SEGV_{BND,PKU}ERR might not be defined. 2018-08-27 14:45:07 +02:00
Bartosz Taudul
8ab3409266 Crash handling works on android. 2018-08-27 14:08:54 +02:00
Bartosz Taudul
2ebe9b72d1 There's no getlogin_t() on android. 2018-08-27 13:59:19 +02:00
Bartosz Taudul
c0d140b405 Add a note about external trace compression. 2018-08-26 20:47:08 +02:00
Bartosz Taudul
989c28d1fe Describe high compression mode. 2018-08-26 17:06:14 +02:00
Bartosz Taudul
a5b99b54c8 Allow specifying FileWrite compression level.
Note that extreme compression level is not exposed in the update
utility.

% time update.exe long.tracy out.tracy
long.tracy (0.3.201) -> out.tracy (0.3.204)
update.exe long.tracy   0,00s user 0,00s system 0% cpu 13,464 total
% time update.exe --hc long.tracy outhc.tracy
long.tracy (0.3.201) -> outhc.tracy (0.3.204)
update.exe --hc long.trac  0,00s user 0,00s system 0% cpu 3:46,23 total
% ls -l long.tracy out*
-rw-r--r-- 1 wolf Brak 1621546031 07-30 22:51 long.tracy
-rw-r--r-- 1 wolf Brak 1621579467 08-26 16:44 out.tracy
-rw-r--r-- 1 wolf Brak 1397610127 08-26 16:48 outhc.tracy
2018-08-26 16:49:27 +02:00
Bartosz Taudul
39fd3b3a6f Add optional high compression mode to update utility. 2018-08-26 16:28:46 +02:00
Bartosz Taudul
b3b12f76f3 Add LZ4HC support to FileWrite. 2018-08-26 16:25:43 +02:00
Bartosz Taudul
0f0528ca3d Add LZ4HC. 2018-08-26 16:23:34 +02:00
Bartosz Taudul
9c4909d22f Describe source file window. 2018-08-25 17:31:52 +02:00
Bartosz Taudul
31003690ed Describe call stack window. 2018-08-25 17:16:05 +02:00
Bartosz Taudul
4910a43a24 Rearrange zones, locks, plots. 2018-08-25 17:02:49 +02:00
Bartosz Taudul
2d7b18aa37 Describe zone info window. 2018-08-25 16:55:49 +02:00
Bartosz Taudul
40fc1edd2a Describe trace information window. 2018-08-25 16:17:06 +02:00
Bartosz Taudul
b851072919 Describe memory window. 2018-08-25 16:10:24 +02:00
Bartosz Taudul
d835d4da2a Remove display of found allocations count. 2018-08-25 15:07:23 +02:00
Bartosz Taudul
256d905ed5 Hide memory address search in "allocations" section. 2018-08-25 15:05:22 +02:00
Bartosz Taudul
0beee3f803 Describe: options, messages, find zone, compare traces, statistics. 2018-08-24 20:07:21 +02:00
Bartosz Taudul
2148f7c352 Forgot about the options button. 2018-08-24 17:24:31 +02:00
Bartosz Taudul
5b0fdadf78 Use full mouse button descriptions. Slight reword. 2018-08-24 17:22:12 +02:00
Bartosz Taudul
aa2a4da311 Describe navigating the view. 2018-08-23 21:01:18 +02:00
Bartosz Taudul
77b48ccbd6 Add zones, locks, plots display description. 2018-08-23 20:45:33 +02:00
Bartosz Taudul
c20d86eab1 Adjust frame selection box. 2018-08-23 18:16:58 +02:00
Bartosz Taudul
f8406111a3 Add frame sets description. 2018-08-23 18:14:01 +02:00
Bartosz Taudul
2a6f366414 Add time scale description. 2018-08-23 17:20:47 +02:00
Bartosz Taudul
ae9b385260 Frame time graph description. 2018-08-23 16:35:34 +02:00
Bartosz Taudul
dc9928c3c7 Add mouse button icons. 2018-08-23 15:48:01 +02:00
Bartosz Taudul
4e6d3ee412 Put icons in welcome dialog buttons. 2018-08-23 14:57:22 +02:00
Bartosz Taudul
8be8846892 Control menu description. 2018-08-23 14:56:42 +02:00
Bartosz Taudul
2a9e6e06af Add main profiler window to the manual. 2018-08-22 19:13:08 +02:00
Bartosz Taudul
5000e37155 Use fontawesome5 package. 2018-08-22 18:58:15 +02:00
Bartosz Taudul
c178cd3d16 Add sketches of welcome dialog and connection window. 2018-08-22 18:30:17 +02:00
Bartosz Taudul
a1a9f6d610 Fix printf types. 2018-08-22 16:31:09 +02:00
Bartosz Taudul
6e3909825f Explicitly cast size_t to uint32_t. 2018-08-22 16:30:37 +02:00
Bartosz Taudul
8b3895473d Gag inconsequential MSVC warnings in TracySocket.
Fix your API!
2018-08-22 16:29:15 +02:00
Bartosz Taudul
d3b4a9fb69 Be more elaborate about server integration. 2018-08-21 19:56:13 +02:00
Bartosz Taudul
3ad3e7c5aa Document crash handling. 2018-08-21 19:56:03 +02:00
Bartosz Taudul
e2dc1f391f Add client-server illustration. 2018-08-21 19:20:06 +02:00
Bartosz Taudul
befce97384 Update NEWS. 2018-08-21 17:57:24 +02:00
Bartosz Taudul
6ad184447a Call stack window may now display frame addresses. 2018-08-21 17:55:59 +02:00
Bartosz Taudul
7df12652b1 General improvements to the user manual. 2018-08-21 17:39:41 +02:00
Bartosz Taudul
8a78fcd2f9 Cut off Linux stack trace at sigreturn. 2018-08-21 01:53:00 +02:00
Bartosz Taudul
22346feea3 Fun fact: two threads can crash at the same time. 2018-08-21 01:45:33 +02:00
Bartosz Taudul
47943d6a86 Use proper type. 2018-08-21 01:24:00 +02:00
Bartosz Taudul
facb05f8cb Don't mark FastVector element as used until it's ready.
This should prevent a race condition that would result in invalid last
element of the queue, in case a freezed thread already got the queue
item, but didn't wrote to it (or didn't wrote fully).
2018-08-20 22:35:50 +02:00
Bartosz Taudul
8c0ff67796 Cut windows crash call stack at the exception dispatcher. 2018-08-20 22:21:35 +02:00
Bartosz Taudul
d1adf9e8d6 Allow skipping functions on top of call stack.
Note that this is on-client performance intensive and shouldn't be used,
except in special situations, like processing crashes.
2018-08-20 22:20:44 +02:00
Bartosz Taudul
b371003336 In case of manual shutdown, don't wait for lock.
All threads are freezed at this point, nothing will release it.
2018-08-20 21:49:23 +02:00
Bartosz Taudul
401ebd6f3d Use spin-lock in DequeueSerial.
A thread freezed during crash processing may hold the lock and never
release it. The old behavior would cause deadlock in such situation. The
new one can be modified to work. Also, we don't want to use timed mutex.
2018-08-20 21:40:13 +02:00
Bartosz Taudul
afee61a2cf Use standard naming for try_lock() in NonRecursiveBenaphore. 2018-08-20 21:37:55 +02:00
Bartosz Taudul
def6c674b2 Add crash notification to thread tooltip. 2018-08-20 14:37:14 +02:00
Bartosz Taudul
6d45434cb5 Implement crash handler on Linux. 2018-08-20 14:30:56 +02:00
Bartosz Taudul
53aee0e03d Fix warning. 2018-08-20 12:53:14 +02:00
Bartosz Taudul
ebcdebaa69 Display crash marker on timeline. 2018-08-20 03:00:45 +02:00
Bartosz Taudul
5fa4cf6e5d Display crash information on visible threads lists. 2018-08-20 02:41:11 +02:00
Bartosz Taudul
b1227cf9fd Display crashed thread in red color. 2018-08-20 02:36:58 +02:00
Bartosz Taudul
99b7a39c52 Save/load crash information. 2018-08-20 02:27:24 +02:00
Bartosz Taudul
619fba41ab Display crash information in info window. 2018-08-20 02:23:55 +02:00
Bartosz Taudul
2a696418cd Cosmetics. 2018-08-20 02:23:55 +02:00
Bartosz Taudul
3b526b074e Send crash report. 2018-08-20 02:23:55 +02:00
Bartosz Taudul
49e36c013f Only handle selected subset of exceptions. 2018-08-20 02:06:59 +02:00
Bartosz Taudul
b56a33add1 Update NEWS. 2018-08-20 01:09:11 +02:00
Bartosz Taudul
0258f4a7b4 Handle crashes on windows.
When a crash happens, put all threads (bar the profiler and crash
handling ones) into the freezer, send crash notification message,
request profiler shutdown and when it does, terminate process.

The list of ignored exceptions is sorta-kinda random at the moment and
may need further expansion.
2018-08-20 01:07:33 +02:00
Bartosz Taudul
366ea35593 Allow crash event reporting.
When crash happens there's no longer anything to profile -- don't wait
for unfinished zones to finish before sending client terminate
confirmation.
2018-08-20 01:03:16 +02:00
Bartosz Taudul
ca939ccd19 Allow external profiler shutdown requests. 2018-08-20 01:02:27 +02:00
Bartosz Taudul
9650162cda Update NEWS. 2018-08-19 22:24:28 +02:00
Bartosz Taudul
aefa2a9573 Display dialog when CPU doesn't support AVX/AVX2. 2018-08-19 22:20:54 +02:00
Bartosz Taudul
7fc1729f3b Reduce required instruction set to SSE2 in winmain.cpp. 2018-08-19 22:20:54 +02:00
Bartosz Taudul
ddf889e8bc Move WinMain entry point to a separate source file. 2018-08-19 22:20:54 +02:00
Bartosz Taudul
d63b5431bf Discover linux kernel version. 2018-08-19 19:00:01 +02:00
Bartosz Taudul
f55b99ba7e Fix signed/unsigned. 2018-08-19 18:53:32 +02:00
Bartosz Taudul
e9170c862e System RAM discovery on Linux. 2018-08-19 18:52:04 +02:00
Bartosz Taudul
790a3ae26f Perform windows version discovery. 2018-08-19 18:43:26 +02:00
Bartosz Taudul
66c839b557 Update NEWS. 2018-08-19 18:29:39 +02:00
Bartosz Taudul
e0a4b9c56a Save/load host info. 2018-08-19 18:28:48 +02:00
Bartosz Taudul
71bfd15d9e Display host info. 2018-08-19 18:24:43 +02:00
Bartosz Taudul
203d9b4b85 Store host info. 2018-08-19 18:21:56 +02:00
Bartosz Taudul
bd76f4cd10 Send host info in welcome message. 2018-08-19 18:19:12 +02:00
Bartosz Taudul
9c0e6620b3 Host info discovery. 2018-08-19 18:15:46 +02:00
Bartosz Taudul
716166bc3a Expose InitWinSock(). 2018-08-19 18:15:46 +02:00
Bartosz Taudul
6224daf9c9 Greatly simplify call stack tree calculation.
Instead of caching paths, compute accumulated cost of each path and only
then create the tree, going through each path just once.
2018-08-19 16:34:26 +02:00
Bartosz Taudul
e1821e439a Add icons to error pop-ups. 2018-08-19 02:59:12 +02:00
Bartosz Taudul
2313e6b845 Add chapter on capturing the data to the manual. 2018-08-18 22:29:26 +02:00
Bartosz Taudul
cb51fdec20 Fix multiple file name retrievals in a row. 2018-08-18 20:38:14 +02:00
Bartosz Taudul
42636cfe89 Allow viewing source files from within find zone menu. 2018-08-18 20:35:25 +02:00
Bartosz Taudul
0dbd58c038 Don't change mouse cursor over ImGuiColorTextEdit window. 2018-08-18 20:13:02 +02:00
Bartosz Taudul
2445cc72bc Add icons to memory menu. 2018-08-18 20:10:14 +02:00
Bartosz Taudul
235da1eded Use focused text for source locations count. 2018-08-18 20:01:57 +02:00
Bartosz Taudul
4060a59b4f Call stack tree nodes that have no siblings are expanded by default. 2018-08-18 20:00:24 +02:00
Bartosz Taudul
145949968e Update NEWS. 2018-08-18 19:58:09 +02:00
Bartosz Taudul
a4df805746 Allow filtering messages by thread. 2018-08-18 19:57:36 +02:00
Bartosz Taudul
59293b1850 Enable support for restrict time in call stack tree. 2018-08-18 19:44:29 +02:00
Bartosz Taudul
1410ba6f01 Increase readability. 2018-08-18 19:34:17 +02:00
Bartosz Taudul
79c437ba7f Let's not search in a map. 2018-08-18 19:29:04 +02:00
Bartosz Taudul
bd96c2ce51 Cache call stack tree paths. 2018-08-18 19:13:46 +02:00
Bartosz Taudul
7f0fb851b4 Force inline GetFrameTreeItem(). 2018-08-18 18:46:16 +02:00
Bartosz Taudul
7ef6944246 Remove compare menu visual aids if extended font is not available. 2018-08-18 16:32:26 +02:00
Bartosz Taudul
410616f7f8 Allow viewing source from zone trace. 2018-08-18 14:26:10 +02:00
Bartosz Taudul
fb876344e3 No need for indentVal outside of scope. 2018-08-18 14:14:33 +02:00
Bartosz Taudul
a548bcb470 Update NEWS. 2018-08-18 14:04:11 +02:00
Bartosz Taudul
b9e83871a8 Add visual aids to compare menu. 2018-08-18 14:02:20 +02:00
Bartosz Taudul
2852784f55 Separate global and level indices for call stack tree. 2018-08-18 02:23:55 +02:00
Bartosz Taudul
9cd6d12095 Update NEWS. 2018-08-18 02:13:53 +02:00
Bartosz Taudul
0757930521 Only display "go to parent" if there is a parent. 2018-08-18 02:12:34 +02:00
Bartosz Taudul
91c397f25c Fix nuget. 2018-08-18 01:46:39 +02:00
Bartosz Taudul
69dd0b72c1 Just accept const char ptr in ImGuiColorTextEdit. 2018-08-18 01:16:15 +02:00
Bartosz Taudul
2c7d457755 Highlight source buttons, if source file is displayed. 2018-08-18 00:28:36 +02:00
Bartosz Taudul
e8da52324d Add memory icon to memory usage plot name. 2018-08-18 00:24:23 +02:00
Bartosz Taudul
07952f0a1f Add icons to options menu. 2018-08-18 00:21:01 +02:00
Bartosz Taudul
8db30a9016 Add icon to statistics menu. 2018-08-18 00:09:23 +02:00
Bartosz Taudul
816c91922e Add icons to memory menu. 2018-08-17 23:58:52 +02:00
Bartosz Taudul
441a5e257c Add wifi icon to "waiting for connection" window. 2018-08-17 23:56:06 +02:00
Bartosz Taudul
b613a60c88 Add icons to compare menu. 2018-08-17 23:54:40 +02:00
Bartosz Taudul
4c228fe862 Add icons to find zone menu. 2018-08-17 23:52:03 +02:00
Bartosz Taudul
350fb6a5b0 Add icons to zone info window buttons. 2018-08-17 23:47:01 +02:00
Bartosz Taudul
ddbc7d5ac2 Add hourglass icon to loading pop-up. 2018-08-17 23:29:32 +02:00
Bartosz Taudul
cdee1d4ce4 Remove obsolete frame rounding setting. 2018-08-17 23:27:14 +02:00
Bartosz Taudul
0aebf614db Add icons to pause/resume button. 2018-08-17 23:24:25 +02:00
Bartosz Taudul
940dda8fc1 Use helper header for icons. 2018-08-17 23:22:13 +02:00
Bartosz Taudul
bb6b646d6e Add icons to user manual, homepage and tutorial buttons. 2018-08-17 23:08:07 +02:00
Bartosz Taudul
e1e0e6e140 Centered text helper. 2018-08-17 23:07:58 +02:00
Bartosz Taudul
4c393a2b8d Allow opening source files from withing call stack tree. 2018-08-17 22:51:26 +02:00
Bartosz Taudul
4e23ce9a24 Shared index for all call stack tree nodes. 2018-08-17 22:31:55 +02:00
Bartosz Taudul
17273c5987 Update NEWS. 2018-08-17 22:28:18 +02:00
Bartosz Taudul
07d2aaa1ad Play a little animation when source file cannot be opened. 2018-08-17 22:23:16 +02:00
Bartosz Taudul
841f18885e Add simple animation controller. 2018-08-17 22:23:04 +02:00
Bartosz Taudul
12f2080387 Right click on call stack file name to view source. 2018-08-17 22:06:59 +02:00
Bartosz Taudul
5752156695 Use "call stack" instead of "callstack". 2018-08-17 22:00:35 +02:00
Bartosz Taudul
12ae3524a1 Update NEWS. 2018-08-17 21:40:59 +02:00
Bartosz Taudul
914a1713e3 Use freetype to render fonts. 2018-08-17 21:40:15 +02:00
Bartosz Taudul
e6ab7692c8 Use Cousine-Regular as monospaced font. 2018-08-17 20:57:26 +02:00
Bartosz Taudul
fe37c4ab80 Scale scrollbar with DPI. 2018-08-17 19:06:30 +02:00
Bartosz Taudul
8eee82b0ce Scale cosmetic UI elements with DPI. 2018-08-17 19:06:14 +02:00
Bartosz Taudul
0de3e088d9 Use icons in main profiler window buttons. 2018-08-17 19:03:35 +02:00
Bartosz Taudul
7cfcdee053 Slightly reduce icons size. 2018-08-17 18:45:04 +02:00
Bartosz Taudul
92284b65e7 Use save file icon. 2018-08-17 18:36:06 +02:00
Bartosz Taudul
9b1af05472 Use power off icon. 2018-08-17 18:33:56 +02:00
Bartosz Taudul
97eb88a839 Use icons in initial connection window. 2018-08-17 18:28:21 +02:00
Bartosz Taudul
4de04a2df2 Document that extended font now includes icons. 2018-08-17 18:08:47 +02:00
Bartosz Taudul
5125c2487b Use exclamation icon instead of ascii representation. 2018-08-17 17:57:13 +02:00
Bartosz Taudul
3d0b9da592 Merge font awesome into the default font. 2018-08-17 17:56:55 +02:00
Bartosz Taudul
337111f948 Add Font Awesome Solid 5.2.0. 2018-08-17 17:56:33 +02:00
Bartosz Taudul
7f6454d550 Setup window title setter callback. 2018-08-17 17:24:50 +02:00
Bartosz Taudul
2b3490e6f7 Handle window title setter callback in View. 2018-08-17 17:24:18 +02:00
Bartosz Taudul
b24d9fa044 Declare OpenWebpage() static. 2018-08-17 17:19:06 +02:00
Bartosz Taudul
b7ac41ab1b Make the warning signs stand out more. 2018-08-17 17:08:16 +02:00
Bartosz Taudul
d31ba3284e Update NEWS. 2018-08-17 17:05:32 +02:00
Bartosz Taudul
a6584ad3d3 Document TRACY_ROOT_WINDOW macro. 2018-08-17 17:04:15 +02:00
Bartosz Taudul
b76707ffa1 Render main profiler view in whole window. 2018-08-17 17:00:56 +02:00
Bartosz Taudul
df7db3bd2b Notify profiler about root window size. 2018-08-17 16:54:56 +02:00
Bartosz Taudul
9416f5bb49 Add close button to loaded traces (not the window close one). 2018-08-17 16:34:58 +02:00
Bartosz Taudul
f0b6c5b447 Update NEWS. 2018-08-17 15:39:09 +02:00
Bartosz Taudul
facae0b9e1 Draw text editor with potential source code. 2018-08-17 15:33:12 +02:00
Bartosz Taudul
77a875ff6b Always scroll source to selected line. 2018-08-17 15:28:56 +02:00
Bartosz Taudul
d45efbe640 Don't reload source file, if it's already there. 2018-08-17 15:24:52 +02:00
Bartosz Taudul
5dc3d73ad6 Set cursor on proper line. 2018-08-17 15:21:37 +02:00
Bartosz Taudul
c503e3760c Load fixed width font (the default ImGui one for now). 2018-08-17 15:18:37 +02:00
Bartosz Taudul
5cd61c4b07 Text editor needs fixed-width font. 2018-08-17 15:18:09 +02:00
Bartosz Taudul
f3cc5cfd07 Use proper font in ImGuiColorTextEdit. 2018-08-17 15:10:58 +02:00
Bartosz Taudul
5bd35eb34e Open file preview in text editor. 2018-08-17 14:54:28 +02:00
Bartosz Taudul
9dbc56beb6 Initialize text editor. 2018-08-17 14:44:41 +02:00
Bartosz Taudul
a90ed5b4b8 Wrap ImGuiColorTextEdit in tracy namespace. 2018-08-17 14:38:57 +02:00
Bartosz Taudul
1529fa5b42 Add ImGuiColorTextEdit.
https://github.com/BalazsJako/ImGuiColorTextEdit.git
1fbba2fe8da83139a39789ea4ef8ca3077143b79
2018-08-17 14:31:33 +02:00
Bartosz Taudul
ff3774d8f2 List contributors in manual. 2018-08-17 14:31:00 +02:00
Bartosz Taudul
bb3b5f6904 Add better version of tl;dr license. 2018-08-17 14:23:49 +02:00
Bartosz Taudul
68c9d09685 Include license in the manual. 2018-08-17 14:17:47 +02:00
Bartosz Taudul
6bf7b85260 Add file existence check. 2018-08-17 13:35:33 +02:00
Bartosz Taudul
27b2851291 Declare Vector moves as noexcept. 2018-08-17 13:10:27 +02:00
Bartosz Taudul
e8faafefb2 Add URL to license. 2018-08-15 13:19:49 +02:00
Bartosz Taudul
a7561f1956 Add project name to license. 2018-08-15 12:57:22 +02:00
Bartosz Taudul
ebb6e2c453 Update NEWS. 2018-08-14 18:49:13 +02:00
Bartosz Taudul
8cbb518f28 Display average allocation sizes. 2018-08-14 18:48:29 +02:00
Bartosz Taudul
df14cf5330 Implement callstack tree of memory allocations. 2018-08-14 18:37:06 +02:00
Bartosz Taudul
c2c0f887aa Display srcloc, callstack counts. 2018-08-14 16:41:27 +02:00
Bartosz Taudul
b75b198f7e Apple sucks. 2018-08-14 14:22:15 +02:00
Bartosz Taudul
2ba5aeec5a Add performance impact section to the manual. 2018-08-14 14:06:53 +02:00
Bartosz Taudul
6f2a598b6a Not only games use frames. 2018-08-14 13:24:37 +02:00
Bartosz Taudul
823aed185a Move zone filtering to zone markup section. 2018-08-14 13:04:00 +02:00
Arvid Gerstmann
8f961b7c8a Merged in Leandros99/tracy/explicit-init (pull request #22)
Explicit init
2018-08-13 16:47:46 +00:00
Arvid Gerstmann
dab0b34303 Reduce the amount of macros, add docs 2018-08-13 18:32:27 +02:00
Arvid Gerstmann
076e83635b Add possibility to explicitly avoid logging 2018-08-13 14:47:52 +02:00
Bartosz Taudul
c4ea13dab5 Static initialization order test. 2018-08-13 12:13:28 +02:00
Bartosz Taudul
a15a287a6b Don't over-allocate vectors, when exact needed size is known.
This reduces memory usage when loading saved traces. Memory usage
reduction observed on a selected number of traces:

5625.76 MB -> 5330.29 MB
3292.94 MB -> 2978.66 MB
632.77 MB  -> 479.58 MB
681.32 MB  -> 506.27 MB
11.9 GB    -> 11.22 GB
854.21 MB  -> 806.17 MB
10.57 GB   -> 7175.31 MB
67.38 MB   -> 66.63 MB
2026.12 MB -> 1744.2 MB
86.55 MB   -> 85.57 MB
343.64 MB  -> 244.81 MB
201.93 MB  -> 162.25 MB
2018-08-09 19:41:15 +02:00
Bartosz Taudul
a14a6fa8fb Don't shadow variables. 2018-08-09 19:41:15 +02:00
Bartosz Taudul
dbf4de0694 Allow exact size allocations in Vector. 2018-08-09 19:41:15 +02:00
Bartosz Taudul
68dd90cb94 Also display exact memory allocation size. 2018-08-09 02:27:55 +02:00
Bartosz Taudul
92c1685528 Fix shortening of negative memory values. 2018-08-09 02:25:47 +02:00
Bartosz Taudul
4a12e14a1b Rewording. 2018-08-08 22:24:50 +02:00
Bartosz Taudul
6d9215ac65 Add a markup quick start guide. 2018-08-08 22:24:11 +02:00
Bartosz Taudul
0e0d5a8d5d Add a notice about client memory usage. 2018-08-08 22:13:09 +02:00
Bartosz Taudul
3e9d7fa3bd Grammar fix (?). 2018-08-08 22:11:54 +02:00
Bartosz Taudul
f9740328d2 Put timing information together. 2018-08-08 22:01:57 +02:00
Bartosz Taudul
e6fa665921 Move lua performance notice. 2018-08-08 21:51:59 +02:00
Bartosz Taudul
887836eca6 Describe defines that change server operation. 2018-08-08 21:45:37 +02:00
Bartosz Taudul
49e9f10438 Split first steps section of the manual. 2018-08-08 21:45:13 +02:00
Bartosz Taudul
96ecf47ecf Add profiler FPS and memory usage to info window. 2018-08-08 20:53:01 +02:00
Bartosz Taudul
596727e135 Update NEWS. 2018-08-08 20:39:16 +02:00
Bartosz Taudul
4a9cbafc7e Proper formatting of memory sizes. 2018-08-08 20:38:58 +02:00
Bartosz Taudul
7d465aab1d Add memory size formatting. 2018-08-08 20:38:58 +02:00
Bartosz Taudul
acf86c91fa Update NEWS. 2018-08-08 19:25:44 +02:00
Bartosz Taudul
29c6498890 Add minimal trace info window. 2018-08-08 19:25:13 +02:00
Bartosz Taudul
a51da71fa4 Add lock, plot counts to worker. 2018-08-08 19:21:53 +02:00
Bartosz Taudul
237bb06dd6 Move frame set selection button to the right. 2018-08-08 18:40:31 +02:00
Bartosz Taudul
3e622cda6b Decapitalize "zone" in "Find Zone". 2018-08-08 18:40:20 +02:00
Bartosz Taudul
9fc8cb9d8b Slight reword. 2018-08-07 22:29:35 +02:00
Bartosz Taudul
a54191cb3c Rename "standalone" to "profiler". 2018-08-07 22:26:37 +02:00
Bartosz Taudul
8d053b5179 Update NEWS. 2018-08-05 17:00:21 +02:00
Bartosz Taudul
2a08687afe Left click on message marker displays it on the msg list. 2018-08-05 16:57:21 +02:00
Bartosz Taudul
1d6f388a81 Middle-click on message marker to center on it. 2018-08-05 16:47:49 +02:00
Bartosz Taudul
1d0203ac17 Abstracted away one-frame-decay values. 2018-08-05 16:45:34 +02:00
Bartosz Taudul
44e027ad11 Highlight message markers on timeline. 2018-08-05 16:37:51 +02:00
Bartosz Taudul
eb7064f13d Display frame set tooltip. 2018-08-05 13:33:18 +02:00
Bartosz Taudul
44fecc4390 Improve rendering of small discontinuous frames. 2018-08-05 13:29:44 +02:00
Bartosz Taudul
d36b0aff45 Fix progress of loading GPU zones. 2018-08-05 13:07:58 +02:00
Bartosz Taudul
3d591520ec Document discontinuous frames. 2018-08-05 02:48:46 +02:00
Bartosz Taudul
0d1ef80a17 Update NEWS. 2018-08-05 02:31:21 +02:00
Bartosz Taudul
d590fa7ce2 Display that frames are discontinuous in options. 2018-08-05 02:30:41 +02:00
Bartosz Taudul
cb9f243987 Fix navigation in discontinuous frames. 2018-08-05 02:27:59 +02:00
Bartosz Taudul
947f829797 Fix drawing discontinuous frames. 2018-08-05 02:23:26 +02:00
Bartosz Taudul
9d051cf5ee Add support for discontinuous frames. 2018-08-05 02:15:54 +02:00
Bartosz Taudul
cbb45160af Disable zoom anim on user interaction. 2018-08-05 01:23:00 +02:00
Bartosz Taudul
1b44b31eff Prevent range-zoom when range has zero length. 2018-08-05 01:20:26 +02:00
Bartosz Taudul
b92087bd95 Fix capture utility. 2018-08-04 23:53:21 +02:00
Bartosz Taudul
2acea5da3c Also draw zig-zag on too-small zones. 2018-08-04 23:32:53 +02:00
Bartosz Taudul
3869c1dbca Count frames from 1, not 0. 2018-08-04 23:21:58 +02:00
Bartosz Taudul
6b8a3b25ba Fix drawing of last frame. 2018-08-04 23:19:35 +02:00
Bartosz Taudul
3d75eb50bf Update NEWS. 2018-08-04 23:13:37 +02:00
Bartosz Taudul
9cd6932b13 Draw zig-zag in place of invisible (too small) frames. 2018-08-04 23:11:47 +02:00
Bartosz Taudul
bf475a4cc2 Describe how text strings in macros are handled. 2018-08-04 22:49:16 +02:00
Bartosz Taudul
11b3d23b37 Document secondary frame sets. 2018-08-04 22:04:03 +02:00
Bartosz Taudul
976c921b85 Update NEWS. 2018-08-04 21:55:08 +02:00
Bartosz Taudul
f385e5520b Draw frame separators only if the frame set is selected. 2018-08-04 21:51:46 +02:00
Bartosz Taudul
1282aa9739 Darken frame set counter, if it is disabled. 2018-08-04 21:48:40 +02:00
Bartosz Taudul
37f42a52fb Proper frame names on frames graph. 2018-08-04 21:46:26 +02:00
Bartosz Taudul
acabdf3c2a Implement switching between frame sets. 2018-08-04 21:43:29 +02:00
Bartosz Taudul
88d9307d7a Allow disabling frame sets. 2018-08-04 21:26:01 +02:00
Bartosz Taudul
1ea1cd57b3 Use proper frame names. 2018-08-04 21:19:24 +02:00
Bartosz Taudul
aad3e941e5 Draw multiple frame sets. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
83eac36949 Add FrameData vector accessor. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
9b4348b497 Handle frame name queries. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
4424a7d7e8 Last time should never be zero. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
5e9b2e36be Make getting start of time less cryptic. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
23dfc2e3fc Multiple frame sets support. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
0b4c2724ce Add strings to map directly in StringDiscovery. 2018-08-04 17:10:45 +02:00
Bartosz Taudul
2f01014a95 Document how to generate debugging symbols. 2018-08-04 16:52:24 +02:00
Bartosz Taudul
f8f10f4776 Add user manual to NEWS. 2018-08-04 16:35:34 +02:00
Bartosz Taudul
ada9f78678 Use StringDiscovery for plots. 2018-08-04 16:33:03 +02:00
Bartosz Taudul
d2c866377e Extract unique string discovery from worker.
This class is responsible for handling data sets that should be grouped
together, but which may come with different name pointers.

It is a generalization of the plot merging functionality.
2018-08-04 16:25:11 +02:00
Bartosz Taudul
e174e2c12a Remove obsolete comment.
Nothing happens with the source data, as the strings are uniquely stored
in the StoreString() function.
2018-08-04 15:46:10 +02:00
Bartosz Taudul
6ef2d2d9a3 Track progress of loading plots. 2018-08-04 15:17:37 +02:00
Bartosz Taudul
85fb87a2f0 Update NEWS. 2018-08-04 15:11:08 +02:00
Bartosz Taudul
8de214b4d5 Display number of events for items in options menu. 2018-08-04 15:09:52 +02:00
Bartosz Taudul
adde6cf4fd Allow sending named frames. 2018-08-04 15:04:18 +02:00
Bartosz Taudul
922882d3b0 Add name field to frame mark message. 2018-08-04 15:03:47 +02:00
Bartosz Taudul
ea588072f2 Stop connection window from jumping around. 2018-08-04 14:50:31 +02:00
Bartosz Taudul
a4e877a89f Add user manual button. 2018-08-04 01:15:56 +02:00
Bartosz Taudul
3d7040d30f Slight reword to avoid overflows. 2018-08-03 01:44:44 +02:00
Bartosz Taudul
0b20d37672 Platform support information. 2018-08-03 01:16:09 +02:00
Bartosz Taudul
373cc5226d Remove usage instructions from README, redirect to user manual. 2018-08-02 22:57:37 +02:00
Bartosz Taudul
2d395d3e72 Now we're getting somewhere with the manual. 2018-08-02 22:49:04 +02:00
Till Rathmann
09e63dafd6 Merged in tillrathmann/tracy_for_multi_dll_projects (pull request #21)
Fixed compiler warning about unused variable in release builds
2018-08-02 10:32:47 +00:00
Till Rathmann
c71d99c134 Minor change: adapted the spaces to tabs at the just inserted line as in tracy_rpmalloc.cpp tabs are used as indentation. 2018-08-02 11:53:04 +02:00
Till Rathmann
4968717313 Fixed compiler warning about unused variable in release builds. 2018-08-02 11:45:15 +02:00
Bartosz Taudul
a211e57e3c Initial draft of the user manual. 2018-08-01 23:45:40 +02:00
Bartosz Taudul
1b42946e25 Update AUTHORS, NEWS. 2018-08-01 19:26:10 +02:00
Till Rathmann
2fe073e195 Merged in tillrathmann/tracy_for_multi_dll_projects (pull request #20)
Master
2018-08-01 17:24:16 +00:00
Till Rathmann
3b302315f9 Fixed __ANDROID_API__ < 21 build and FD_SET usage. 2018-08-01 19:18:40 +02:00
Till Rathmann
df09fe48cf Workaround in nfd_win.cpp for MSVC problem in combaseapi.h. 2018-08-01 14:44:39 +02:00
Till Rathmann
659c8e25eb Revert "Changed from AVX2 to AVX."
This reverts commit 6599ebcf31.
2018-08-01 14:36:31 +02:00
Till Rathmann
37d5736bf5 Fixed compiler warnings. 2018-08-01 14:07:30 +02:00
Till Rathmann
d1dd1d664f Removed MSVC /permissive- compiler flag for nfd_win.cpp because it causes problems in combaseapi.h with Windows SDK 8.1. 2018-08-01 14:06:00 +02:00
Till Rathmann
6599ebcf31 Changed from AVX2 to AVX. 2018-08-01 14:03:21 +02:00
Till Rathmann
af0ad42081 Merged wolfpld/tracy into master 2018-08-01 13:42:58 +02:00
Till Rathmann
468b8d4526 Merged in tillrathmann/tracy_for_multi_dll_projects (pull request #19)
Support for multi-DLL projects.
2018-08-01 08:46:02 +00:00
Till Rathmann
2dcfe5fce0 Made s_threadNameDataInstance and s_profilerInstance static. 2018-07-31 13:03:09 +02:00
Till Rathmann
dd042619e9 Support for multi-DLL projects. 2018-07-31 12:06:04 +02:00
Bartosz Taudul
12c4128460 Update NEWS. 2018-07-29 22:13:01 +02:00
Bartosz Taudul
78e14b4bee Skip rendering if viewer window is minimized. 2018-07-29 22:11:47 +02:00
Bartosz Taudul
43255b01fa Reduce viewer frame rate when it doesn't have focus. 2018-07-29 22:11:24 +02:00
Bartosz Taudul
672925ac04 Add separators to frame numbers. 2018-07-29 21:56:30 +02:00
Bartosz Taudul
0d76ccfb71 Add days to time-to-string converter. 2018-07-29 21:14:56 +02:00
Bartosz Taudul
d04126eabe Make time-to-string more readable. 2018-07-29 21:09:11 +02:00
Bartosz Taudul
173c0c4b26 Update NEWS. 2018-07-29 20:55:07 +02:00
Bartosz Taudul
2eddbeb164 Use ctrl key to zoom-out using selection range. 2018-07-29 20:53:29 +02:00
Bartosz Taudul
1cf168c95e Fix impossible zoom animation. 2018-07-29 20:53:29 +02:00
Bartosz Taudul
310203101f Stop zoom-range-selection when zooming to range.
This also disables zoom range selection when middle click is used to
zoom view to a selected zone.
2018-07-29 20:53:29 +02:00
Bartosz Taudul
b4f4fcfde9 Add zoom-to-middle-mouse-button-selection-range. 2018-07-29 20:15:49 +02:00
Bartosz Taudul
18896044c4 Display explicit names of loaded things. 2018-07-29 16:56:46 +02:00
Bartosz Taudul
821be252d5 Display trace update summary. 2018-07-29 15:37:45 +02:00
Bartosz Taudul
9f13475b52 Track trace version in worker. 2018-07-29 15:33:48 +02:00
Bartosz Taudul
13509c14f1 Save size of 'active' and 'frees' memory data structures. 2018-07-29 15:29:56 +02:00
Bartosz Taudul
00d07e39f7 Save threadExpand size to allow vector preallocation. 2018-07-29 15:19:44 +02:00
Bartosz Taudul
bff6eb4c34 Save source location zones counts.
This allows preallocation of zones-in-source-location vectors.
2018-07-29 14:58:01 +02:00
Bartosz Taudul
5b9ad4bcbf Expose tracy version in UI. 2018-07-29 14:24:24 +02:00
Bartosz Taudul
12b90d1630 Move tracy version to a separate header. 2018-07-29 14:20:44 +02:00
Bartosz Taudul
ccc5c37af5 Always count source location zones. 2018-07-29 14:16:13 +02:00
Bartosz Taudul
4456c8a454 Reserve space for string data. 2018-07-29 14:13:29 +02:00
Bartosz Taudul
766bf45a2b Fix initialization of atomics. 2018-07-28 20:13:06 +02:00
Bartosz Taudul
d2b2e1deb0 Update NEWS. 2018-07-28 19:55:17 +02:00
Bartosz Taudul
8ddf32bc6b Highlight zones with the same srcloc on hover. 2018-07-28 19:53:11 +02:00
Bartosz Taudul
c124e49443 Update NEWS. 2018-07-28 19:30:08 +02:00
Bartosz Taudul
648070e6a1 Include each loaded zone in sub progress. 2018-07-28 19:22:28 +02:00
Bartosz Taudul
4741dab833 Track sub progress. 2018-07-28 19:05:01 +02:00
Bartosz Taudul
a14238c199 Add sub progress display. 2018-07-28 18:56:52 +02:00
Bartosz Taudul
3a401106b0 Display total progress also as text. 2018-07-28 18:50:22 +02:00
Bartosz Taudul
a7e48bd2a9 Loading popup is not resizable. 2018-07-28 18:48:45 +02:00
Bartosz Taudul
6a3a9c0bc5 Load second trace on a separate thread. 2018-07-28 18:47:33 +02:00
Bartosz Taudul
a46425f4e9 Adjust load stages. 2018-07-28 18:26:00 +02:00
Bartosz Taudul
71db7c431f Load main trace on a thread. 2018-07-28 18:17:56 +02:00
Bartosz Taudul
cd6e3ab2c9 Trace loading progress popup. 2018-07-28 18:07:55 +02:00
Bartosz Taudul
0bf0ceed3d Track trace loading progress. 2018-07-28 17:59:17 +02:00
Bartosz Taudul
1b785befa2 Update packages list before install. 2018-07-28 01:41:47 +02:00
Bartosz Taudul
68e40ad250 Unix build can use extended font. 2018-07-28 01:40:48 +02:00
Bartosz Taudul
63b383f7be Add missing TRACY_FILESELECTOR define to unix build. 2018-07-28 01:38:08 +02:00
Bartosz Taudul
a54ff1f56d Use 'μ' instead of 'u' to indicate micro. 2018-07-28 01:06:36 +02:00
Bartosz Taudul
b7ec7f6819 Embed Arimo font. 2018-07-28 01:03:26 +02:00
Bartosz Taudul
31c2ddb8ac Rename client's SourceLocation to SourceLocationData. 2018-07-28 00:34:04 +02:00
Bartosz Taudul
149812c071 Always keep main profiler window on bottom. 2018-07-26 23:38:45 +02:00
Bartosz Taudul
91ab641cc6 Update NEWS. 2018-07-26 20:18:17 +02:00
Bartosz Taudul
dbdc530f1c Named GPU zones. 2018-07-26 20:15:39 +02:00
Bartosz Taudul
3737e122cf Of course, this can't work without stupid fuckery. 2018-07-26 19:59:55 +02:00
Bartosz Taudul
b3f4495825 Provide named versions of ZoneScoped* macro. 2018-07-26 19:59:55 +02:00
Bartosz Taudul
e0799c6556 Provide dummy defines for ZoneScoped*S macros. 2018-07-26 19:56:48 +02:00
Bartosz Taudul
c4bd4e6c70 Fix SourceLocation qualifiers for ZoneNamedCS. 2018-07-26 19:56:41 +02:00
Bartosz Taudul
1111980f1f Make source location names unique. 2018-07-26 19:22:19 +02:00
Arvid Gerstmann
dfe3285252 Merged in Leandros99/tracy/pr-1 (pull request #11)
Implement pthread_getname_np alternative if it's not available
2018-07-24 11:57:21 +00:00
Arvid Gerstmann
69dac3f611 Fix accessing the thread id on Android 2018-07-24 13:43:25 +02:00
Bartosz Taudul
d84d0b7754 Don't try to read empty timelines. 2018-07-22 21:15:28 +02:00
Bartosz Taudul
25116a8059 Don't try to compress invalid thread. 2018-07-22 21:13:42 +02:00
Bartosz Taudul
010cf66e43 Call Vector destructors. 2018-07-22 21:01:45 +02:00
Bartosz Taudul
29159069ab Properly initialize child index. 2018-07-22 20:14:55 +02:00
Bartosz Taudul
7d7877517e Also remove child vectors from GPU events. 2018-07-22 19:47:01 +02:00
Bartosz Taudul
3a934b2ba3 Store children vectors in a separate data collection.
This reduces per-zone memory cost by 9 bytes if there are no children
and increases it by 4 bytes, if there are children. This is universally
a better solution, as the following data shows:

+++ /home/wolf/desktop/tracy-old/android.tracy +++
Vectors: 2794480
Size 0: 2373070 (84.92%)
Size 1: 70237 (2.51%)
Size 2+: 351173 (12.57%)
+++ /home/wolf/desktop/tracy-old/asset-new.tracy +++
Vectors: 1799227
Size 0: 1482691 (82.41%)
Size 1: 93272 (5.18%)
Size 2+: 223264 (12.41%)
+++ /home/wolf/desktop/tracy-old/asset-new-id.tracy +++
Vectors: 1977996
Size 0: 1640817 (82.95%)
Size 1: 97198 (4.91%)
Size 2+: 239981 (12.13%)
+++ /home/wolf/desktop/tracy-old/asset-old.tracy +++
Vectors: 1782395
Size 0: 1471437 (82.55%)
Size 1: 88813 (4.98%)
Size 2+: 222145 (12.46%)
+++ /home/wolf/desktop/tracy-old/big.tracy +++
Vectors: 180794047
Size 0: 172696094 (95.52%)
Size 1: 2799772 (1.55%)
Size 2+: 5298181 (2.93%)
+++ /home/wolf/desktop/tracy-old/darkrl.tracy +++
Vectors: 12014129
Size 0: 11611324 (96.65%)
Size 1: 134980 (1.12%)
Size 2+: 267825 (2.23%)
+++ /home/wolf/desktop/tracy-old/mem.tracy +++
Vectors: 383097
Size 0: 321932 (84.03%)
Size 1: 854 (0.22%)
Size 2+: 60311 (15.74%)
+++ /home/wolf/desktop/tracy-old/new.tracy +++
Vectors: 77536
Size 0: 63035 (81.30%)
Size 1: 8886 (11.46%)
Size 2+: 5615 (7.24%)
+++ /home/wolf/desktop/tracy-old/selfprofile.tracy +++
Vectors: 22940871
Size 0: 22704868 (98.97%)
Size 1: 73000 (0.32%)
Size 2+: 163003 (0.71%)
+++ /home/wolf/desktop/tracy-old/tbrowser.tracy +++
Vectors: 962682
Size 0: 695380 (72.23%)
Size 1: 43007 (4.47%)
Size 2+: 224295 (23.30%)
+++ /home/wolf/desktop/tracy-old/virtualfile_hc.tracy +++
Vectors: 529170
Size 0: 449386 (84.92%)
Size 1: 15694 (2.97%)
Size 2+: 64090 (12.11%)
+++ /home/wolf/desktop/tracy-old/zfile_hc.tracy +++
Vectors: 264849
Size 0: 220589 (83.29%)
Size 1: 9386 (3.54%)
Size 2+: 34874 (13.17%)
2018-07-22 16:05:50 +02:00
Bartosz Taudul
eb1475ebd4 Add single-value Vector constructor. 2018-07-22 16:01:58 +02:00
Bartosz Taudul
be40ee9dbc Fix crash when there's no callstack. 2018-07-22 00:13:23 +02:00
Bartosz Taudul
59c188a18d Invalid callstack is 0, not UINT64_MAX. 2018-07-22 00:13:11 +02:00
Bartosz Taudul
16833fb237 Mention TRACY_NO_EXIT environment variable in README. 2018-07-22 00:09:14 +02:00
Bartosz Taudul
f767a0c3fd Add Rokas Kupstys to AUTHORS. 2018-07-22 00:08:02 +02:00
Bartosz Taudul
ea9f599c88 Update NEWS. 2018-07-22 00:07:16 +02:00
Bartosz Taudul
fbfc0e151d Replace combo lists with radio buttons. 2018-07-22 00:04:41 +02:00
Bartosz Taudul
d1cef20c0b Allow sorting groups by time. 2018-07-21 23:58:50 +02:00
Bartosz Taudul
59e0f3d490 Use precalculated zone group time. 2018-07-21 23:54:35 +02:00
Bartosz Taudul
cacbac8915 Calculate and display group times. 2018-07-21 23:53:11 +02:00
Bartosz Taudul
d03356c1f5 Rename "threads" to "groups" in find zone data structs. 2018-07-21 23:41:50 +02:00
Bartosz Taudul
36c207fb51 Fix some unused variables. 2018-07-21 21:35:35 +02:00
Bartosz Taudul
c4d44ab36e Also need gtk. 2018-07-21 21:10:49 +02:00
Bartosz Taudul
9058b6ef78 Install GLFW. 2018-07-21 21:01:28 +02:00
Bartosz Taudul
3c1c7cb624 Setup linux CI build. 2018-07-21 20:47:05 +02:00
Bartosz Taudul
9291a88020 Zones can be now also grouped by call stack. 2018-07-21 20:26:13 +02:00
Bartosz Taudul
3c6baf53da Memory alloc range hover also works on allocation lists. 2018-07-19 15:55:15 +02:00
Bartosz Taudul
389e0facd3 Draw memory allocation range on mouse hover on mem event. 2018-07-19 15:43:45 +02:00
Bartosz Taudul
047d950936 Add AppVeyor build status badge. 2018-07-19 13:05:44 +02:00
Rokas Kupstys
2ffa1689b0 Merged in rokups/tracy/ci (pull request #18)
Appveyor CI script.
2018-07-19 10:07:58 +00:00
Rokas Kupstys
3a80c207e4 Appveyor CI script. 2018-07-19 12:13:45 +03:00
Rokas Kupstys
812c4d7085 Merged in rokups/tracy/fix-winver-targetting-2 (pull request #17)
Fix targetting lower windows versions when using W10 SDK that is older than redstone2.
2018-07-19 08:56:30 +00:00
Rokas Kupstys
01df5aa840 Fix targetting lower windows versions when using W10 SDK that is older than redstone2. 2018-07-19 11:08:41 +03:00
Bartosz Taudul
04c2a6c8ac Update NEWS. 2018-07-18 00:29:36 +02:00
Bartosz Taudul
108ba20af8 Fix closing memory allocation info window. 2018-07-18 00:25:02 +02:00
Bartosz Taudul
e8726c72b1 Display memory allocation range on memory plot. 2018-07-18 00:21:16 +02:00
Bartosz Taudul
9ab09d9867 Only show "same zone" if zones are valid. 2018-07-17 23:32:29 +02:00
Bartosz Taudul
84d0f1a3ea Indicate inspected memory address on alloc list. 2018-07-17 23:17:46 +02:00
Bartosz Taudul
cf3bf4378b No need to return MemEvent ptr from DrawAddress(). 2018-07-17 23:13:56 +02:00
Bartosz Taudul
18a460e782 Clicking on mem address in alloc list displays info window. 2018-07-17 23:08:10 +02:00
Bartosz Taudul
0889334462 Add memory allocation info window. 2018-07-17 23:03:03 +02:00
Bartosz Taudul
6485a090ed Separate small callstack button setup. 2018-07-17 22:53:38 +02:00
Bartosz Taudul
e7b71f29a5 Define WIN32_LEAN_AND_MEAN in TracyClient.cpp. 2018-07-17 21:26:31 +02:00
Rokas Kupstys
76ff094a05 Merged in rokups/tracy/fix-winver-targetting (pull request #16)
Fix build when targeting earlier windows versions by defining _WIN32_WINNT.
2018-07-17 17:27:42 +00:00
Rokas Kupstys
d290e04d45 Fix build when targeting earlier windows versions by defining _WIN32_WINNT. 2018-07-17 20:15:27 +03:00
Bartosz Taudul
fc310ce15a Fix check. 2018-07-17 18:29:07 +02:00
Rokas Kupstys
4eaf8b64d6 Merged in rokups/tracy/fix-msvc-cpp14-build (pull request #14)
Fix msvc builds when required c++ standard version is set to lower than c++17.
2018-07-17 16:26:35 +00:00
Rokas Kupstys
8a8faa3d6c Added __has_include(<execution>) back. 2018-07-17 19:25:26 +03:00
Rokas Kupstys
5c75fe292f Fix msvc builds when required c++ standard version is set to lower than c++17.
Also use latest available c++ standard which allows using older VS versions that only support c++14.
2018-07-17 18:29:48 +03:00
Rokas Kupstys
ab8d2c553a Merged in rokups/tracy/fix-unix-build (pull request #15)
Fix build errors with some compilers due to using std::max(float, double).
2018-07-17 14:21:20 +00:00
Rokas Kupstys
c2f52d9ef7 Merged in rokups/tracy/fix-imgui-build (pull request #13)
Define ImVec2 math operators only when IMGUI_DEFINE_MATH_OPERATORS is undefined.
2018-07-17 13:41:15 +00:00
Rokas Kupstys
064385fc62 Define ImVec2 math operators only when IMGUI_DEFINE_MATH_OPERATORS is undefined. 2018-07-17 16:37:45 +03:00
Rokas Kupstys
abfa90012f Fix build errors with some compilers due to using std::max(float, double). 2018-07-17 16:36:41 +03:00
Bartosz Taudul
3799e0da43 Infer socket readiness from select() return value. 2018-07-16 01:50:21 +02:00
Bartosz Taudul
807d2a02bc Display collapsed zones counts with separators. 2018-07-16 01:24:43 +02:00
Bartosz Taudul
acf3bc7d43 Show only contended locks by default. 2018-07-15 20:29:35 +02:00
Bartosz Taudul
24f7be3f51 Add homepage and tutorial video buttons. 2018-07-15 20:10:41 +02:00
Bartosz Taudul
2e39d18e94 Web page opening functionality. 2018-07-15 20:10:34 +02:00
Bartosz Taudul
efdb3791e9 Take recv buffer into account in HasData(). 2018-07-15 19:52:22 +02:00
Bartosz Taudul
ea4470b26e Buffer data from recv() calls.
This reduces cost of socket reads measured in a test run from 47ms to
8.7ms.
2018-07-15 19:34:47 +02:00
Bartosz Taudul
c6ea032de3 GPU source location may not yet be available. 2018-07-15 19:00:40 +02:00
Bartosz Taudul
df75b25a3f Add Arvid Gerstmann to AUTHORS. 2018-07-15 16:02:54 +02:00
Bartosz Taudul
cda9cbaf19 Update NEWS. 2018-07-15 16:02:25 +02:00
Bartosz Taudul
21da3bca63 Don't create lz4buf on stack. 2018-07-14 16:02:33 +02:00
Arvid Gerstmann
f04e67779c Fix some minor code style issues 2018-07-14 13:46:25 +02:00
Arvid Gerstmann
6fb73a3d97 Implement getname alternative if it's not available 2018-07-14 13:26:55 +02:00
Arvid Gerstmann
d461cbb6d2 Merged in Leandros99/tracy/original-master (pull request #10)
Support for callstacks on Linux without glibc
2018-07-14 11:24:03 +00:00
Arvid Gerstmann
b8db9df949 Detect glibc explicitly 2018-07-14 13:23:00 +02:00
Arvid Gerstmann
ad48c32e1e Support for callstacks on Linux without glibc 2018-07-14 11:08:17 +02:00
Bartosz Taudul
561d2dc360 Use the fastest mutex available.
The selection is based on the following test results:

MSVC:
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.641 ns/iter
     2 thread contention: 141.559 ns/iter
     3 thread contention: 242.733 ns/iter
     4 thread contention: 409.807 ns/iter
     5 thread contention: 561.544 ns/iter
     6 thread contention: 785.845 ns/iter
=> std::mutex
     No contention: 19.190 ns/iter
     2 thread contention: 39.305 ns/iter
     3 thread contention: 58.999 ns/iter
     4 thread contention: 59.532 ns/iter
     5 thread contention: 103.539 ns/iter
     6 thread contention: 110.314 ns/iter
=> std::shared_timed_mutex
     No contention: 45.487 ns/iter
     2 thread contention: 96.351 ns/iter
     3 thread contention: 142.871 ns/iter
     4 thread contention: 184.999 ns/iter
     5 thread contention: 336.608 ns/iter
     6 thread contention: 542.551 ns/iter
=> std::shared_mutex
     No contention: 10.861 ns/iter
     2 thread contention: 17.495 ns/iter
     3 thread contention: 31.126 ns/iter
     4 thread contention: 40.468 ns/iter
     5 thread contention: 15.677 ns/iter
     6 thread contention: 64.505 ns/iter

Cygwin (clang):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.536 ns/iter
     2 thread contention: 121.082 ns/iter
     3 thread contention: 396.430 ns/iter
     4 thread contention: 672.555 ns/iter
     5 thread contention: 1327.761 ns/iter
     6 thread contention: 14151.955 ns/iter
=> std::mutex
     No contention: 62.583 ns/iter
     2 thread contention: 3990.464 ns/iter
     3 thread contention: 7161.189 ns/iter
     4 thread contention: 9870.820 ns/iter
     5 thread contention: 12355.178 ns/iter
     6 thread contention: 14694.903 ns/iter
=> std::shared_timed_mutex
     No contention: 91.687 ns/iter
     2 thread contention: 1115.037 ns/iter
     3 thread contention: 4183.792 ns/iter
     4 thread contention: 15283.491 ns/iter
     5 thread contention: 27812.477 ns/iter
     6 thread contention: 35028.140 ns/iter
=> std::shared_mutex
     No contention: 91.764 ns/iter
     2 thread contention: 1051.826 ns/iter
     3 thread contention: 5574.720 ns/iter
     4 thread contention: 15721.416 ns/iter
     5 thread contention: 27721.487 ns/iter
     6 thread contention: 35420.404 ns/iter

Linux (x64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 13.487 ns/iter
     2 thread contention: 210.317 ns/iter
     3 thread contention: 430.855 ns/iter
     4 thread contention: 510.533 ns/iter
     5 thread contention: 1003.609 ns/iter
     6 thread contention: 1787.683 ns/iter
=> std::mutex
     No contention: 12.403 ns/iter
     2 thread contention: 157.122 ns/iter
     3 thread contention: 186.791 ns/iter
     4 thread contention: 265.073 ns/iter
     5 thread contention: 283.778 ns/iter
     6 thread contention: 270.687 ns/iter
=> std::shared_timed_mutex
     No contention: 21.509 ns/iter
     2 thread contention: 150.179 ns/iter
     3 thread contention: 256.574 ns/iter
     4 thread contention: 415.351 ns/iter
     5 thread contention: 611.532 ns/iter
     6 thread contention: 944.695 ns/iter
=> std::shared_mutex
     No contention: 20.805 ns/iter
     2 thread contention: 157.034 ns/iter
     3 thread contention: 244.025 ns/iter
     4 thread contention: 406.269 ns/iter
     5 thread contention: 387.985 ns/iter
     6 thread contention: 468.550 ns/iter

Linux (arm64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 20.891 ns/iter
     2 thread contention: 211.037 ns/iter
     3 thread contention: 409.962 ns/iter
     4 thread contention: 657.441 ns/iter
     5 thread contention: 828.405 ns/iter
     6 thread contention: 1131.827 ns/iter
=> std::mutex
     No contention: 50.884 ns/iter
     2 thread contention: 103.620 ns/iter
     3 thread contention: 332.429 ns/iter
     4 thread contention: 620.802 ns/iter
     5 thread contention: 783.943 ns/iter
     6 thread contention: 834.002 ns/iter
=> std::shared_timed_mutex
     No contention: 64.948 ns/iter
     2 thread contention: 173.191 ns/iter
     3 thread contention: 490.352 ns/iter
     4 thread contention: 660.668 ns/iter
     5 thread contention: 1014.546 ns/iter
     6 thread contention: 1451.553 ns/iter
=> std::shared_mutex
     No contention: 64.521 ns/iter
     2 thread contention: 195.222 ns/iter
     3 thread contention: 490.819 ns/iter
     4 thread contention: 654.786 ns/iter
     5 thread contention: 955.759 ns/iter
     6 thread contention: 1282.544 ns/iter
2018-07-14 00:39:01 +02:00
Bartosz Taudul
a26ab263dd Select/unselect all plot visibility. 2018-07-14 00:10:38 +02:00
Bartosz Taudul
f4f7e58e88 Add select/unselect all threads visibility option. 2018-07-14 00:08:37 +02:00
Arvid Gerstmann
9ac47eef0a Merged in Leandros99/tracy/dev (pull request #9)
Couple of minor compatibility fixes
2018-07-13 22:05:13 +00:00
Bartosz Taudul
e285c837a4 Support TRACY_NO_EXIT env variable in addition to define. 2018-07-13 23:55:40 +02:00
Arvid Gerstmann
34533ad4f1 Dynamically import GetDpiForSystem, to support older Windows versions 2018-07-13 23:41:38 +02:00
Arvid Gerstmann
0b1c2ebc8f Define M_PI_2 if not already done 2018-07-13 23:41:12 +02:00
Arvid Gerstmann
ebd1d00178 Correctly forward declare Win32 functions
_WINDOWS_ is the macro defined by the windows.h header guard,
checking it whether the symbols have already been included before
forward declaring our own.
2018-07-13 23:39:58 +02:00
Arvid Gerstmann
32fc011f80 Silence unused parameter warning 2018-07-13 23:39:25 +02:00
Bartosz Taudul
c3ba0ef4eb Fix lua zone state init. 2018-07-13 20:21:50 +02:00
Bartosz Taudul
26f2cb336e Return value from non-void function. 2018-07-13 20:12:39 +02:00
Bartosz Taudul
a3c898f8b8 Rename FrameMark() to SendFrameMark().
This avoids conflict with FrameMark define.
2018-07-13 20:09:19 +02:00
Arvid Gerstmann
0e59c6e98d Merged in Leandros99/tracy/concurrentqueue-namespace (pull request #8)
Wrap concurrentqueue in tracy namespace
2018-07-13 18:06:42 +00:00
Arvid Gerstmann
6b87aecdce Wrap concurrentqueue in tracy namespace 2018-07-13 20:01:27 +02:00
Arvid Gerstmann
575b04076f Merged in Leandros99/tracy/fix-unix-semaphore (pull request #7)
Do not include the semaphore headers inside namespace tracy
2018-07-13 17:54:35 +00:00
Arvid Gerstmann
6fee78dfee Do not include the semaphore headers inside namespace tracy 2018-07-13 19:51:24 +02:00
Arvid Gerstmann
3e012cb762 Compile standalone on macOS 2018-07-13 19:22:09 +02:00
Bartosz Taudul
96042891f7 Reintroduce explicit template type for std::lock_guard.
Requested in issue #4 for support of older MSVC versions.
2018-07-13 12:30:29 +02:00
Bartosz Taudul
90a874f311 Require MSVC 15.7 for <execution> support. 2018-07-13 12:26:02 +02:00
Bartosz Taudul
e9a971bacf Mention on-demand mode in FAQ. 2018-07-12 13:32:49 +02:00
Bartosz Taudul
b85a8a62a2 Document TRACY_ON_DEMAND macro. 2018-07-12 13:27:02 +02:00
Bartosz Taudul
2806bfaac7 Update NEWS. 2018-07-12 13:24:51 +02:00
Bartosz Taudul
b11695111d Implement on-demand Lua zone capture. 2018-07-12 12:53:35 +02:00
Bartosz Taudul
fbc5556ddd Send memory events in on-demand mode. 2018-07-12 01:36:01 +02:00
Bartosz Taudul
c8b5b9447d Ignore dangling memory frees in on-demand mode. 2018-07-12 01:35:32 +02:00
Bartosz Taudul
e5064dec1e Store on-demand connection state. 2018-07-12 01:21:04 +02:00
Bartosz Taudul
dacbfbd031 Support on-demand OpenGL tracing. 2018-07-11 17:10:53 +02:00
Bartosz Taudul
d0868b5004 Early exit in GpuCtx::Collect(). 2018-07-11 17:10:34 +02:00
Bartosz Taudul
0cbeea97a2 Support on-demand Vulkan tracing. 2018-07-11 17:03:00 +02:00
Bartosz Taudul
1f7246f6b9 Defer OpenGL/Vulkan context announce. 2018-07-11 15:00:30 +02:00
Bartosz Taudul
26d5c4b302 Fix copy pasta. 2018-07-11 14:43:38 +02:00
Bartosz Taudul
96f39281a1 Implement on-demand locks. 2018-07-11 14:17:20 +02:00
Bartosz Taudul
d87508901f Send deferred data. 2018-07-11 12:28:40 +02:00
Bartosz Taudul
ad0a75da7d Defer lock announcements. 2018-07-11 12:24:58 +02:00
Bartosz Taudul
475d151b2d Implement deferring items. 2018-07-11 12:21:39 +02:00
Bartosz Taudul
a99d74966c Active status of scoped zone can't change. 2018-07-11 12:16:55 +02:00
Bartosz Taudul
52207f20b7 Add deferred events queue. 2018-07-11 12:14:28 +02:00
Bartosz Taudul
c2659473fd Free memory associated with cleared queue items. 2018-07-11 01:34:48 +02:00
Bartosz Taudul
ef73979fb9 MemRead() uses const pointer. 2018-07-11 01:33:27 +02:00
Bartosz Taudul
b1a71174db Messages are also safe. 2018-07-10 23:09:59 +02:00
Bartosz Taudul
e80c677fa0 Plots can be safely sent in on-demand mode. 2018-07-10 23:06:27 +02:00
Bartosz Taudul
d1ddaa8d59 Store frame offset in trace dumps. 2018-07-10 22:56:41 +02:00
Bartosz Taudul
fe449f366f Use frame offset for frame count and missed frames display. 2018-07-10 22:51:24 +02:00
Bartosz Taudul
a78981e040 Store on-demand frame offset. 2018-07-10 22:42:00 +02:00
Bartosz Taudul
6a9caabc63 Send on-demand initial payload message. 2018-07-10 22:37:39 +02:00
Bartosz Taudul
32ca54a523 Pack WelcomeMessage. 2018-07-10 22:29:31 +02:00
Bartosz Taudul
43d5ab4382 Count frames in on-demand mode. 2018-07-10 22:27:19 +02:00
Bartosz Taudul
03794a2957 Send frame marks in on-demand mode. 2018-07-10 22:27:19 +02:00
Bartosz Taudul
f8b2ffdc7e Clear queues before new on-demand connection is made. 2018-07-10 22:27:19 +02:00
Bartosz Taudul
a767c5ea08 Trace zones in on-demand mode. 2018-07-10 22:27:19 +02:00
Bartosz Taudul
c973735b49 Track connection status. 2018-07-10 22:27:19 +02:00
Bartosz Taudul
010b19946f Send on-demand status in welcome message. 2018-07-10 21:44:23 +02:00
Bartosz Taudul
c056f3be41 Send keep alive messages to determine if client disconnected. 2018-07-10 21:39:17 +02:00
Bartosz Taudul
a0188122a0 Add keep alive message. 2018-07-10 21:23:19 +02:00
Bartosz Taudul
e5b133073c Disable all tracing if TRACY_ON_DEMAND is defined. 2018-07-10 20:49:51 +02:00
Bartosz Taudul
a5381337f6 Don't use obsolete function. 2018-07-10 20:49:29 +02:00
Bartosz Taudul
045f792e84 Keep one profiler window size for all captures. 2018-07-10 20:47:09 +02:00
Bartosz Taudul
27a1eb94fe Update NEWS. 2018-07-08 16:56:09 +02:00
Bartosz Taudul
d68297ba45 Add trace update utility.
This tool will load a trace saved in previous version of tracy and save
an up-to-date version of the file.
2018-07-08 16:53:31 +02:00
Bartosz Taudul
e87abfa7bc X11 colors conversion program. 2018-07-04 18:26:57 +02:00
Bartosz Taudul
ca0053d4d4 Add memory decay color table creation program. 2018-07-04 18:24:56 +02:00
Bartosz Taudul
83310cd0e9 Release v0.3.3. 2018-07-03 21:53:54 +02:00
Tobias Widlund
29a2cc2f6c Merged in therocode/tracy (pull request #6)
Add size_t casts in asserts to get rid of sign-compare warnings on GCC
2018-07-01 19:10:18 +00:00
Tobias Widlund
626a995c63 Add size_t casts in asserts to get rid of sign-compare warnings on GCC 2018-07-01 20:02:53 +02:00
Tobias Widlund
51a68f3709 Merged in therocode/tracy (pull request #5)
Fix warning re shadowing, implicit conversion and added include <cstdio>
2018-06-30 14:26:44 +00:00
Tobias Widlund
f09623b6c9 Revert inappropriate fix 2018-06-30 16:23:16 +02:00
Tobias Widlund
273355b665 Change system include from using "" to <> 2018-06-30 16:00:51 +02:00
Tobias Widlund
b6cce4ddb6 Improve fixes for warnings as per request 2018-06-30 15:36:06 +02:00
Tobias Widlund
1c467a5847 Fix warning re shadowing, implicit conversion and added include <cstdio> 2018-06-30 11:47:27 +02:00
Bartosz Taudul
c7952e4d4f Move "without profiling" to tooltip in zone info window. 2018-06-29 19:02:44 +02:00
Bartosz Taudul
4ae317109d Improve compare menu histogram tooltip. 2018-06-29 18:57:49 +02:00
Bartosz Taudul
b190a15ef6 Display numerical thread id in memory plot tooltip. 2018-06-29 18:54:19 +02:00
Bartosz Taudul
9329c761f6 Improve plot tooltips. 2018-06-29 18:52:28 +02:00
Bartosz Taudul
bcd2fc027d Improve lock tooltips. 2018-06-29 18:49:47 +02:00
Bartosz Taudul
a918d9a401 Improve compressed zones tooltips. 2018-06-29 18:47:33 +02:00
Bartosz Taudul
c8361205da Improve timeline tooltips. 2018-06-29 18:46:05 +02:00
Bartosz Taudul
8228f4131b Improve frame header tooltip. 2018-06-29 18:44:07 +02:00
Bartosz Taudul
c92d8cf7a3 Improve frame list tooltips. 2018-06-29 18:43:23 +02:00
Bartosz Taudul
201a40fb04 Improve readability of callstack tooltips. 2018-06-29 18:41:06 +02:00
Bartosz Taudul
400ee1c752 Improve readability of zone tooltips. 2018-06-29 18:39:20 +02:00
Bartosz Taudul
fbe0ad437e Remove "without profiling" entry from zone tooltip. 2018-06-29 18:36:58 +02:00
Bartosz Taudul
d01c14c2f3 Improve readability of compare menu. 2018-06-29 18:35:40 +02:00
Bartosz Taudul
91dd8f5d52 Improve readability of find zone menu. 2018-06-29 18:33:01 +02:00
Bartosz Taudul
8f4b09edc3 Improve zone info windows readability. 2018-06-29 18:27:34 +02:00
Bartosz Taudul
41c06ea067 Update NEWS. 2018-06-29 16:28:00 +02:00
Bartosz Taudul
d3648cc8dd Document free-form zone naming. 2018-06-29 16:27:19 +02:00
Bartosz Taudul
275a79e1c9 Display custom zone name in find zone results list. 2018-06-29 16:20:24 +02:00
Bartosz Taudul
ab18869ce6 Display custom zone name in zone tooltip. 2018-06-29 16:15:59 +02:00
Bartosz Taudul
09c38f17e6 Display custom zone name in zone info window. 2018-06-29 16:14:31 +02:00
Bartosz Taudul
cb100e261c Return custom zone names. 2018-06-29 16:12:40 +02:00
Bartosz Taudul
053284b1c7 Process custom free-form zone names. 2018-06-29 16:12:17 +02:00
Bartosz Taudul
b29d60056a Custom per-zone name transfer. 2018-06-29 16:01:31 +02:00
Bartosz Taudul
ac89c6de33 Fix one more copy paste error. 2018-06-29 15:31:47 +02:00
Bartosz Taudul
865e8d8506 Extract zone name getting functionality. 2018-06-29 15:14:20 +02:00
Bartosz Taudul
102234321d Update NEWS. 2018-06-29 00:45:19 +02:00
Bartosz Taudul
2866cafda5 Adapt histogram height to font size. 2018-06-29 00:44:05 +02:00
Bartosz Taudul
f82e8aa98f Adapt plot height to font size. 2018-06-29 00:38:56 +02:00
Bartosz Taudul
cc196ff0a8 Adapt frames view to font size. 2018-06-29 00:35:44 +02:00
Bartosz Taudul
ba68df329f Viewer is now system DPI aware on windows. 2018-06-29 00:31:05 +02:00
Bartosz Taudul
4f6a3057c4 Update NEWS. 2018-06-28 01:17:28 +02:00
Bartosz Taudul
41e8648701 Adjust GPU zones to set time drift. 2018-06-28 01:12:25 +02:00
Bartosz Taudul
bab798c13c GPU context time drift storage. 2018-06-28 01:08:08 +02:00
Bartosz Taudul
0b9559c05b Retrieval of GPU context from GPU zone. 2018-06-28 01:07:21 +02:00
Bartosz Taudul
4a467b6d03 Remove GPU resync leftovers. 2018-06-28 00:48:23 +02:00
Bartosz Taudul
4c16aa9b96 Store build objects in separate directories. 2018-06-27 20:31:17 +02:00
Bartosz Taudul
242fc9bfb4 Mention required libraries. 2018-06-26 18:04:10 +02:00
Bartosz Taudul
84c34ad826 Handle unicode builds. 2018-06-25 10:55:07 +02:00
Bartosz Taudul
9308d7964c Callstack capture timing test. 2018-06-24 17:55:05 +02:00
Bartosz Taudul
c3238a496d No need to check for frame function name match.
Firstly, the match is not necessarily possible (e.g. on Linux the source
location function names and callstack frame names are two completely
different things).

Secondly, the first current zone callstack frame (which is matched to
some callstack frame of previous zone) is the frame in which a zone was
captured, and it will already be present in the zone trace. The
callstack frame omission should be therefore unconditional.
2018-06-24 17:38:32 +02:00
Bartosz Taudul
d7a85983a5 Make callstack hash less shitty. 2018-06-24 17:30:54 +02:00
Bartosz Taudul
ab2945b988 Slab allocator is not thread safe. 2018-06-24 17:10:46 +02:00
Bartosz Taudul
3fba965c3c Update NEWS. 2018-06-24 17:04:32 +02:00
Bartosz Taudul
77e139e900 Insert true call stack frames into zone trace. 2018-06-24 16:57:57 +02:00
Bartosz Taudul
b0aa13f4af Callstack getters are const. 2018-06-24 16:15:49 +02:00
Bartosz Taudul
fa62603c77 Move zone trace loop handler to a separate function. 2018-06-24 15:54:36 +02:00
Bartosz Taudul
858628918b Force inline AddCallstackPayload. 2018-06-24 15:28:09 +02:00
Bartosz Taudul
d78126e60f Improve callstack payload hashing speed. 2018-06-24 15:25:53 +02:00
Bartosz Taudul
64a38c591b Don't perform multiple NeedDataSize checks. 2018-06-23 02:19:23 +02:00
Bartosz Taudul
4d197ec7a2 Unsafe version of AppendData. 2018-06-23 02:16:58 +02:00
Bartosz Taudul
a2c6848433 Send callstack payload without iteration, if possible. 2018-06-23 02:13:52 +02:00
Bartosz Taudul
a7ace6ef9e Directly use RtlWalkFrameChain.
RtlCaptureStackBackTrace is just a wrapper for RtlWalkFrameChain.
2018-06-23 02:07:47 +02:00
Bartosz Taudul
19e83b434e Increase max length of symbol on windows. 2018-06-23 00:27:14 +02:00
Bartosz Taudul
f0ce7de193 Move callstack collection in mem events out of critical section. 2018-06-22 23:00:03 +02:00
Bartosz Taudul
4d60d3a20e Document callstack capture. 2018-06-22 20:53:27 +02:00
Bartosz Taudul
17194cb591 Allow copying callstack frames name/file to clipboard. 2018-06-22 20:44:57 +02:00
Bartosz Taudul
b8f7a4daac Mention purple line indicating middle of timeline. 2018-06-22 20:34:08 +02:00
Bartosz Taudul
9c2aab733d Allow centering timeline on memory alloc/free time. 2018-06-22 20:32:38 +02:00
Bartosz Taudul
5f5fe7c6aa Add tip about centering timeline on message. 2018-06-22 20:23:56 +02:00
Bartosz Taudul
39eccd5b08 Extract "center view at time" function. 2018-06-22 20:21:02 +02:00
Bartosz Taudul
a347ddd753 OpenGL needs query id translation. 2018-06-22 16:46:47 +02:00
Bartosz Taudul
11cf650be6 Fix GPU queries ordering.
With multithreaded Vulkan rendering it is possible that GPU time queries
will be sent in a different order than the originating CPU queries were
made. This commit changes the in-order queue to a map of queries,
waiting to be resolved.
2018-06-22 16:37:54 +02:00
Bartosz Taudul
af0c64c888 Remove GPU resync support.
The whole concept is not really reliable. And it forces CPU to GPU sync,
which is bad.
2018-06-22 16:34:51 +02:00
Bartosz Taudul
62267399bc Send query ids of GPU times. 2018-06-22 16:19:53 +02:00
Bartosz Taudul
69c461cda3 Results MUST be available here. 2018-06-22 16:09:35 +02:00
Bartosz Taudul
51c5f47ae2 Transfer query ids of GPU events. 2018-06-22 15:57:54 +02:00
Bartosz Taudul
cd5ca3e754 Don't use hash table to store 256 pointers. 2018-06-22 15:14:44 +02:00
Bartosz Taudul
55ddb64352 GPU context counter is now 8 bit. 2018-06-22 15:10:23 +02:00
Bartosz Taudul
d13fc2413f Highlight callstack button in zone info windows. 2018-06-22 02:24:36 +02:00
Bartosz Taudul
3a885bb8fd Support callstack collection for OpenGL GPU zones. 2018-06-22 02:13:35 +02:00
Bartosz Taudul
225ed4e037 Update NEWS. 2018-06-22 01:58:50 +02:00
Bartosz Taudul
e5f673eaa0 Allow viewing callstack from gpu zone info window. 2018-06-22 01:58:25 +02:00
Bartosz Taudul
35dc2f796e Process GpuZoneBeginCallstack queue event. 2018-06-22 01:56:32 +02:00
Bartosz Taudul
b213e5f415 Vulkan zone callstack collection. 2018-06-22 01:47:08 +02:00
Bartosz Taudul
a1424c4112 Vulkan tracing is not thread safe. 2018-06-22 01:41:28 +02:00
Bartosz Taudul
7e4f00fac0 Update NEWS. 2018-06-22 01:31:06 +02:00
Bartosz Taudul
4992ae6b39 Take callstack field in ZoneEvent into account in save/load. 2018-06-22 01:30:08 +02:00
Bartosz Taudul
e40c5068c9 Allow viewing callstack from zone info window. 2018-06-22 01:21:51 +02:00
Bartosz Taudul
5e01a8ead9 Process callstack queue event. 2018-06-22 01:15:49 +02:00
Bartosz Taudul
205a4e4ca2 Add callstack index to ZoneEvent. 2018-06-22 01:11:03 +02:00
Bartosz Taudul
978e168cbd Handle ZoneBeginCallstack queue event.
This is identical to ZoneBegin handling, but requires some additional
bookkeeping to account for the incoming callstack information.
2018-06-22 01:07:25 +02:00
Bartosz Taudul
b6088b908f Callstack capture for ZoneBegin. 2018-06-22 00:56:30 +02:00
Bartosz Taudul
c0b086240c Update NEWS. 2018-06-22 00:33:09 +02:00
Bartosz Taudul
bd041b6267 More accurate ARM timing information. 2018-06-22 00:29:01 +02:00
Bartosz Taudul
8de92a8c9e String pooling is meh. 2018-06-22 00:25:30 +02:00
Bartosz Taudul
7086320d64 Thread naming support has been greatly improved. 2018-06-22 00:25:13 +02:00
Bartosz Taudul
94c9c89ad0 Enable thread name collection on old windows SDKs. 2018-06-22 00:23:50 +02:00
Bartosz Taudul
ed40a3d989 Discourage embedding server into client application. 2018-06-22 00:16:53 +02:00
Bartosz Taudul
63611403ff Add memory profiling documentation. 2018-06-22 00:15:50 +02:00
Bartosz Taudul
3404d191f0 Fix non-unique child ids in memory window. 2018-06-22 00:10:00 +02:00
Bartosz Taudul
d716195afa Move server setup to the top of README. 2018-06-22 00:09:37 +02:00
Bartosz Taudul
bf7402e8b0 Android callstack collection using _Unwind_Backtrace(). 2018-06-21 17:07:21 +02:00
Bartosz Taudul
0c13fb818b Initialize rpmalloc in Mem{Alloc,Free}Callstack().
rpmalloc may still be uninitialized here (i.e. if memory allocation/free
is performed before any other tracy operation that would initialize
thread_local data). Since memory allocations are using serialized queue
(which is not held in thread_local section) and obtaining callstack
involves memory allocation, we need to initialize rpmalloc manually.

This won't be a problem when support for zone callbacks becomes online,
because zones are stored in per-thread queues, which initialize
thread_local data before rpmalloc is needed in the Callstack() call.
2018-06-21 17:02:40 +02:00
Bartosz Taudul
3f7ab10323 Don't show line number if it's 0. 2018-06-21 13:26:04 +02:00
Bartosz Taudul
937141b7e3 Include symbol address in location field on linux. 2018-06-21 13:14:13 +02:00
Bartosz Taudul
b3ca36f3f4 Include symbol offset in symbol name on linux. 2018-06-21 13:10:48 +02:00
Bartosz Taudul
973eab2b4a Fix typo. 2018-06-20 23:42:00 +02:00
Bartosz Taudul
909166daf7 Hide SendCallstackMemory(). 2018-06-20 23:30:19 +02:00
Bartosz Taudul
8c46ad81d5 Extract common code. 2018-06-20 23:29:44 +02:00
Bartosz Taudul
2a618c90d5 Properly save compressed thread in GPU events. 2018-06-20 23:12:49 +02:00
Bartosz Taudul
1856d057c1 Update NEWS. 2018-06-20 23:02:52 +02:00
Bartosz Taudul
32278364cd Demangle symbol names. 2018-06-20 23:01:00 +02:00
Bartosz Taudul
c8f51d7f11 More involved callstack frame description on linux. 2018-06-20 22:54:42 +02:00
Bartosz Taudul
36d81412a0 Fix copy pasta. 2018-06-20 22:27:46 +02:00
Bartosz Taudul
601c80466c Fix use-after-free. 2018-06-20 22:18:12 +02:00
Bartosz Taudul
5541cd6c97 Linux callstack retrieval. 2018-06-20 21:54:11 +02:00
Bartosz Taudul
dc20742b5b Callstack support needs the -rdynamic flag. 2018-06-20 21:02:14 +02:00
Bartosz Taudul
b4b08a0b29 Windows header poisoning should be avoided only in headers.
This fixes cygwin.
2018-06-20 21:01:25 +02:00
Bartosz Taudul
45cec65eef Don't assign const char ptr to char ptr. 2018-06-20 20:35:57 +02:00
Bartosz Taudul
6c9add0f30 Track memory allocations in test application. 2018-06-20 19:48:14 +02:00
Bartosz Taudul
09304390dd Overload operator new and delete in test. 2018-06-20 19:45:20 +02:00
Bartosz Taudul
cef972fe25 Remove parenthesis from callstack location. 2018-06-20 17:07:48 +02:00
Bartosz Taudul
e495747b88 Fix off-by-one. 2018-06-20 17:02:05 +02:00
Bartosz Taudul
7912807133 Wait for transfer of pending callback frames. 2018-06-20 14:57:48 +02:00
Bartosz Taudul
60395c85e0 Wait for pending callstacks. 2018-06-20 14:54:08 +02:00
Bartosz Taudul
e95ca3930d Make all allocation list alloc/free buttons clickable. 2018-06-20 14:50:07 +02:00
Bartosz Taudul
a9fa8f966b Fix "zone free" indentation in allocation list. 2018-06-20 14:44:24 +02:00
Bartosz Taudul
bc565e65d1 Better callstack info window layout. 2018-06-20 14:41:00 +02:00
Bartosz Taudul
0d509ea3a6 Add missing EndColumns() call. 2018-06-20 14:37:55 +02:00
Bartosz Taudul
be0a70a5c1 Highlight actively inspected callstack. 2018-06-20 13:49:23 +02:00
Bartosz Taudul
15ff98b64a Push detailed callstack to a separate window.
Only show function names (no source files or line numbers) in callstack
tooltip.
2018-06-20 13:23:08 +02:00
Bartosz Taudul
9a5329b97d Save and load callstack frames. 2018-06-20 01:59:25 +02:00
Bartosz Taudul
e56ee377f4 Fix off-by-one. 2018-06-20 01:54:27 +02:00
Bartosz Taudul
88b1955a5a Filename in callstack frame is not a persistent pointer. 2018-06-20 01:26:05 +02:00
Bartosz Taudul
56479b86fa Display frame details in callstack tooltip. 2018-06-20 01:19:10 +02:00
Bartosz Taudul
4000f27e15 Stack frame accessor. 2018-06-20 01:18:59 +02:00
Bartosz Taudul
0c0afa5ac7 Process callstack frames. 2018-06-20 01:07:09 +02:00
Bartosz Taudul
5177a7b960 Callstack frame transfer. 2018-06-20 01:06:31 +02:00
Bartosz Taudul
359feae7ef Symbol retrieval may fail. 2018-06-20 01:05:44 +02:00
Bartosz Taudul
203744cdd9 Callstack frame queries. 2018-06-20 00:25:26 +02:00
Bartosz Taudul
4ba95145da Display raw callstack payload. 2018-06-19 22:19:33 +02:00
Bartosz Taudul
4eea85fdad Callstack payload accessor. 2018-06-19 22:19:20 +02:00
Bartosz Taudul
06f34052a5 Have to track callstacks of both alloc and free. 2018-06-19 22:08:47 +02:00
Bartosz Taudul
0de279005b Load saved callstack payload. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
14b71e988b Properly skip memory event data. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
4033d74479 Callstack payload index 0 is invalid. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
b6e71dd909 Load memory event callstack index. 2018-06-19 21:51:06 +02:00
Bartosz Taudul
7c1333ce2f Save callstack payload. 2018-06-19 21:39:52 +02:00
Bartosz Taudul
2940230fcf Save callstack index in memory events. 2018-06-19 21:39:42 +02:00
Bartosz Taudul
e03493f082 Store callstack index as uint32_t. 2018-06-19 21:39:22 +02:00
Bartosz Taudul
77db91253b Assign callstack idx to memory event. 2018-06-19 21:34:36 +02:00
Bartosz Taudul
c28465aa7c Store unique callstack payloads. 2018-06-19 21:16:02 +02:00
Bartosz Taudul
87467a472c Add variable sized const array. 2018-06-19 21:16:02 +02:00
Bartosz Taudul
46cc92bd01 Link test executable with dbghelp under cygwin. 2018-06-19 19:51:29 +02:00
Bartosz Taudul
4be2543b2f Cygwin support for callstack tracing. 2018-06-19 19:49:21 +02:00
Bartosz Taudul
288744273b Fallback to callback-less version of macros if no callback support. 2018-06-19 19:38:56 +02:00
Bartosz Taudul
9b1fb01e16 Disable Callstack() call if there's no callstack support. 2018-06-19 19:38:30 +02:00
Bartosz Taudul
62ef4f225e Missing defines for disabled tracy. 2018-06-19 19:36:28 +02:00
Bartosz Taudul
cbc9ede3ca No-op callstack payload handling. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
6a63d09a49 Don't check for each type, if range check is possible. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
0a8cd73db7 Issue predictive callback payload transfer. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
51043ebc47 Callstack payload transfer. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
55e6a4a484 No return status is needed here. 2018-06-19 19:00:57 +02:00
Bartosz Taudul
e51eef3dcd Process memory events with callstack. 2018-06-19 18:52:45 +02:00
Bartosz Taudul
59dc55002b Callstack ptr in server data structures.
Will be probably reduced to 32-bit index later on.
2018-06-19 18:52:10 +02:00
Bartosz Taudul
d0d3545988 Optional sending of callstack ptr in memory events. 2018-06-19 18:51:21 +02:00
Bartosz Taudul
8943e4681e Memory event callstack transfer. 2018-06-19 18:50:29 +02:00
Bartosz Taudul
d2a98c3090 Configurable callstack depth. 2018-06-19 18:49:13 +02:00
Bartosz Taudul
5368f386ce Make sure uintptr_t is really size of pointer. 2018-06-19 17:51:55 +02:00
Bartosz Taudul
ca499eefaf Return typeless pointer. 2018-06-19 17:27:03 +02:00
Bartosz Taudul
827900969f Make Callstack() static inline. 2018-06-19 17:23:50 +02:00
Bartosz Taudul
ca2cac9b99 Use proper type for pointer size. 2018-06-19 14:34:37 +02:00
Bartosz Taudul
4a01eb7fc4 Windows callstack inspection plumbing. 2018-06-19 01:17:19 +02:00
Bartosz Taudul
7a23f677dd Vulkan and OpenGL must share idx pool. 2018-06-18 01:10:43 +02:00
Bartosz Taudul
021dd853b9 Differentiate Vulkan/OpenGL in options menu. 2018-06-18 01:08:56 +02:00
Bartosz Taudul
53e3eee9ee Delay query until results are available.
This will query for the new values with a bit of lag, but it ignores
issues with broken drivers...
2018-06-17 20:56:46 +02:00
Bartosz Taudul
4767dbad5b Workaround Nvidia bugs.
This solution is still bad, as it really does wait for the query
results, which stalls the CPU.
2018-06-17 20:48:02 +02:00
Bartosz Taudul
1b64e84945 Update NEWS. 2018-06-17 19:37:02 +02:00
Bartosz Taudul
3205bfdae8 Add Vulkan documentation. 2018-06-17 19:33:45 +02:00
Bartosz Taudul
6e1ab9ae7a Display per-GPU-event threads. 2018-06-17 19:09:56 +02:00
Bartosz Taudul
bb0631585c Store thread id of GPU events. 2018-06-17 19:07:07 +02:00
Bartosz Taudul
cfd7ac3957 Map compressed thread id 0 to real thread id 0. 2018-06-17 19:03:06 +02:00
Bartosz Taudul
684ba455a2 Send GPU zone thread handle. 2018-06-17 18:55:38 +02:00
Bartosz Taudul
6102f17e29 Better way to write zero value. 2018-06-17 18:55:38 +02:00
Bartosz Taudul
d5a4c693d8 Take GPU timestamp period into account. 2018-06-17 18:49:56 +02:00
Bartosz Taudul
f33584516b Fix yet another regression. 2018-06-17 18:37:38 +02:00
Bartosz Taudul
cc973a5091 Differentiate Vulkan and OpenGL contexts. 2018-06-17 18:33:05 +02:00
Bartosz Taudul
cb77e8dc1c There's no notion of main thread in vulkan. 2018-06-17 18:29:12 +02:00
Bartosz Taudul
dcd6cac078 Save GPU timestamp period.
Bump file version to 0.3.2.
2018-06-17 18:27:42 +02:00
Bartosz Taudul
8495e5b094 Send timestamp period in GPU context announcement. 2018-06-17 18:21:15 +02:00
Bartosz Taudul
9c11e0fc5b Vulkan tracing. 2018-06-17 18:14:37 +02:00
Bartosz Taudul
2be1d1d2b2 Use proper type. 2018-06-07 13:30:46 +02:00
Bartosz Taudul
6956aed769 Fix selecting last bin with log time in find zone. 2018-06-06 23:36:21 +02:00
Bartosz Taudul
b4ce0c281b Total time is also already known in compare view. 2018-06-06 23:17:13 +02:00
Bartosz Taudul
d49be792ba Cache bin containers in compare view. 2018-06-06 23:09:46 +02:00
Bartosz Taudul
da5d35c364 Cache bin containers in find zone. 2018-06-06 23:06:00 +02:00
Bartosz Taudul
2950f3c70c Total time is already known. 2018-06-06 23:00:18 +02:00
Bartosz Taudul
8a4d88f2b3 tmin and tmax don't change. 2018-06-06 23:00:03 +02:00
Bartosz Taudul
be8d3f47cd Use fast log10. 2018-06-06 01:59:31 +02:00
Bartosz Taudul
8696c81e7d Implement fast frexpf. 2018-06-06 01:59:31 +02:00
Bartosz Taudul
26cc9d8547 Enable fast floating point model. 2018-06-06 01:53:25 +02:00
Bartosz Taudul
60b24249d3 Use explicit value for 1/log2(10). 2018-06-06 01:52:46 +02:00
Bartosz Taudul
39c1b20184 Don't care about previous values. 2018-06-06 01:23:49 +02:00
Bartosz Taudul
1c47e22eca Add log10f approximation.
Based on https://community.arm.com/tools/f/discussions/4292/cmsis-dsp-new-functionality-proposal/22621#22621
2018-06-06 01:23:29 +02:00
Bartosz Taudul
763db5e5cc Update NEWS. 2018-06-06 00:48:54 +02:00
Bartosz Taudul
859bf01992 Support displaying self times in statistics view. 2018-06-06 00:47:16 +02:00
Bartosz Taudul
e5d35d443d Missing initializer. 2018-06-06 00:47:11 +02:00
Bartosz Taudul
b7930f67da Calculate total self time of zones. 2018-06-06 00:39:22 +02:00
Bartosz Taudul
814cd1553d Update NEWS. 2018-06-02 22:28:27 +02:00
Bartosz Taudul
785a30a68b Implement going to next/previous frame. 2018-06-02 22:27:35 +02:00
Bartosz Taudul
1cddf8436c ZoomToRange() already enables pause. 2018-06-02 22:09:07 +02:00
Bartosz Taudul
a3834a75f7 Update NEWS. 2018-05-27 20:23:48 +02:00
Bartosz Taudul
5a7304171d Fix allocation times displayed in plot tooltip. 2018-05-27 20:22:58 +02:00
Bartosz Taudul
8ed59c261b Open memory address info after clicking on mem plot item. 2018-05-27 20:17:20 +02:00
Bartosz Taudul
9898066a7a Display additional memory event info in mem plot tooltips. 2018-05-27 20:11:33 +02:00
Bartosz Taudul
3ea5fd93ed Simple and not so simple draw plot point functions. 2018-05-27 19:51:45 +02:00
Bartosz Taudul
3236164116 v0.3 is no more. Enter v0.3.1. 2018-05-25 21:24:18 +02:00
Bartosz Taudul
5e03612d3f Update NEWS. 2018-05-25 21:18:51 +02:00
Bartosz Taudul
0a79243332 Display thread from which message originated on msg list. 2018-05-25 21:14:15 +02:00
Bartosz Taudul
53aea660c8 Store thread id in MessageData. 2018-05-25 21:10:38 +02:00
Bartosz Taudul
bb0246730f Don't save MessageData padding.
This requires file version bump to 0.3.1.
2018-05-25 21:10:38 +02:00
Bartosz Taudul
8118e41559 Use columns to display message list. 2018-05-25 21:10:38 +02:00
Bartosz Taudul
f7e2683cf1 Update imgui to 1.61. 2018-05-22 20:49:05 +02:00
Bartosz Taudul
312c20b0bc Fallback to pdqsort if parallel STL is not available. 2018-05-12 22:41:18 +02:00
Bartosz Taudul
3432c594a9 ImplicitProducer is private. 2018-05-08 16:27:52 +02:00
Bartosz Taudul
e2534e2bf6 Forward declare explicit and implicit producers. 2018-05-08 12:33:19 +02:00
Bartosz Taudul
e74108f175 Bump lz4 to 1.8.2. 2018-05-08 01:52:40 +02:00
Bartosz Taudul
920bfc8c82 Parallelize (big) sorts in worker. 2018-05-08 01:40:22 +02:00
Bartosz Taudul
dbc963d55c Drop template argument from std::lock_guard. 2018-05-08 01:25:16 +02:00
Bartosz Taudul
249cd4783c Use C++17 in server. 2018-05-08 01:23:24 +02:00
Bartosz Taudul
e565f5cb74 Enable modern diagnostics format. 2018-05-08 01:17:53 +02:00
Bartosz Taudul
3768ed5dd7 Don't reconstruct mem plot if there's no mem event data. 2018-05-04 16:08:16 +02:00
Bartosz Taudul
e7ffe288e6 One less FileWrite::Write() call. 2018-05-04 15:11:19 +02:00
Bartosz Taudul
e058bb34c1 CompressThread body must be available. 2018-05-03 18:43:51 +02:00
Bartosz Taudul
a46d27f312 Parallelize file reading.
Use spin-locks for synchronization.

IsEOF() is now buggy, but the bug chance is fairly low (1/65536) - it
can happen when the last compressed block has exactly max decompressed
block size. Don't care about it much, as it's only used to open old
archives.
2018-05-03 17:56:43 +02:00
Bartosz Taudul
3d13ea09e8 Move block decompression to a separate function. 2018-05-03 17:29:58 +02:00
Bartosz Taudul
7d32ef8c8b Restrict mem events list size. 2018-05-02 19:40:35 +02:00
Bartosz Taudul
867fc6a0cf Update NEWS. 2018-05-02 19:28:04 +02:00
Bartosz Taudul
f2cb04ea8d Allow going back to the previous zone info. 2018-05-02 19:25:52 +02:00
Bartosz Taudul
1cc798cea3 Construct zone info stack. 2018-05-02 19:23:46 +02:00
Bartosz Taudul
e28022f735 Don't display alloc, free threads on two lines. 2018-05-02 19:07:34 +02:00
Bartosz Taudul
f2f712b8db Optional display of each mem event in zone info window. 2018-05-02 19:03:34 +02:00
Bartosz Taudul
dac6a65156 Infer total mem usage change from alloc and free changes. 2018-05-02 18:41:11 +02:00
Bartosz Taudul
14ca2198dd Force inline simple Vector ops. 2018-05-02 18:27:37 +02:00
Bartosz Taudul
fc057401a4 Update NEWS. 2018-05-02 18:20:28 +02:00
Bartosz Taudul
bbf1e9f111 Only include memory events from zone thread. 2018-05-02 18:13:13 +02:00
Bartosz Taudul
4584ef9e88 Use memory events to calculate zone memory changes. 2018-05-02 18:06:27 +02:00
Bartosz Taudul
b18841aa75 Store ordered list of memory frees. 2018-05-02 17:59:50 +02:00
Bartosz Taudul
ce1f56ea0f Display zone memory statistics.
Note that this information is incorrect, as it accounts for memory
events in all threads.
2018-05-02 17:46:09 +02:00
Bartosz Taudul
fd59ac0125 Only calculate zone child data if child list is displayed. 2018-05-02 17:23:32 +02:00
Bartosz Taudul
754e79b443 Setup memory plot pointer on dump load. 2018-05-02 17:18:52 +02:00
Bartosz Taudul
1512f3584c Show appropriate message when there's no memory data collected. 2018-05-01 17:28:02 +02:00
Bartosz Taudul
e5934b409a Don't use Vector for memory pages storage.
Vector has POT data size and we know exactly how much is needed.
2018-05-01 17:26:34 +02:00
Bartosz Taudul
7266a979c3 Omit stack. 2018-05-01 02:13:49 +02:00
Bartosz Taudul
5deeb8426f Specialized Read function writing directly to registers. 2018-05-01 02:13:49 +02:00
Bartosz Taudul
8beb1c1a39 Add thread compression cache.
Observation: calls to CompressThread() are likely to be repeated with
the same value. Exploit that by storing last query and its result.
2018-05-01 01:29:25 +02:00
Bartosz Taudul
ec58aa4ce1 Don't increase vector size in each iteration. 2018-04-30 13:57:12 +02:00
Bartosz Taudul
e41ce5523c Allow explicit setting of vector size. 2018-04-30 13:56:58 +02:00
Bartosz Taudul
553e3ca38b Optimize mem plot reconstruction loop. 2018-04-30 13:45:36 +02:00
Bartosz Taudul
76f0c8fafe Sort source location zones on a separate thread. 2018-04-30 03:54:09 +02:00
Bartosz Taudul
63e4f6fa04 Directly store values. 2018-04-30 03:30:19 +02:00
Bartosz Taudul
8d854b1c8f Force inline flat_hash_map find. 2018-04-30 03:09:50 +02:00
Bartosz Taudul
a2d3ad35f0 Force inline common slab allocation paths. 2018-04-30 02:47:16 +02:00
Bartosz Taudul
b598300186 Split FileRead::Skip into small and big part. 2018-04-30 02:31:03 +02:00
Bartosz Taudul
b1a440647d Remove one level of indirection in FileWrite. 2018-04-30 02:29:05 +02:00
Bartosz Taudul
fd46651c32 Remove one level of indirection in FileRead. 2018-04-30 02:26:15 +02:00
Bartosz Taudul
c3efe228ce Update NEWS. 2018-04-30 01:19:37 +02:00
Bartosz Taudul
4c521ce92a Loaded traces may be unloaded. 2018-04-30 01:16:08 +02:00
Bartosz Taudul
e5cb241c19 Optimize creation of vector of frees. 2018-04-29 13:40:47 +02:00
Bartosz Taudul
3eb73b8d43 Move memory plot reconstruction to a background thread. 2018-04-29 13:40:04 +02:00
Bartosz Taudul
a8ce01eeb1 Push next no space check variant. 2018-04-29 13:39:06 +02:00
Bartosz Taudul
bc84ebc338 Read/write LockEvent data in one go. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
c5133e0b4e Walk lockmap timeline pointer. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
9769cc4d7d Read/write most of MemEvent in one go. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
d5f0f0939d No need to track min memory usage.
At least if client instrumentation was not broken and the data makes
sense.
2018-04-29 02:57:20 +02:00
Bartosz Taudul
7fdc6f5453 Zero as initial max value is fine too. 2018-04-29 02:56:23 +02:00
Bartosz Taudul
723f98d24b Overflow checks are not needed. 2018-04-29 02:47:25 +02:00
Bartosz Taudul
b06f445de9 Don't use stack to write two values... 2018-04-29 02:32:20 +02:00
Bartosz Taudul
333d3a92c8 Perform memory usage calculation on doubles. 2018-04-29 02:29:06 +02:00
Bartosz Taudul
aceaed25b9 Walk plot data pointer. 2018-04-29 02:11:47 +02:00
Bartosz Taudul
868fbace5a Don't compress thread twice, if it's the same. 2018-04-29 02:04:51 +02:00
Bartosz Taudul
fdaebc2bd8 No need to perform space check here. 2018-04-29 01:38:54 +02:00
Bartosz Taudul
dc1396012e Add assert checking that there's space. 2018-04-29 01:38:35 +02:00
Bartosz Taudul
d64f0390da Don't use std::sort. 2018-04-29 01:23:30 +02:00
Bartosz Taudul
4ed3fe8e7b Update NEWS. 2018-04-28 16:46:53 +02:00
Bartosz Taudul
925b6c2617 Display y-range of plots. 2018-04-28 16:44:36 +02:00
Bartosz Taudul
6d4b7c55a3 Update NEWS. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
7df7bf1745 Begin memory plot with no memory usage. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
a0b8ed2e50 Restore memory plot when loading data dump. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
afa432a087 Non-user plots must have predefined names. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
d8bfe7de2e Create memory plot based on memory alloc/free events. 2018-04-28 15:49:12 +02:00
Bartosz Taudul
cd34ed6968 Two plot types: user and memory.
Only user plots are saved in a dump file.
2018-04-28 15:48:05 +02:00
Bartosz Taudul
5b6d9769af Properly separate HW timer from MSVC rdtscp optimization. 2018-04-27 19:40:47 +02:00
Bartosz Taudul
488d05bc21 Update NEWS. 2018-04-27 19:27:45 +02:00
Bartosz Taudul
eeeff40a70 Prevent TIME-WAIT connections from blocking listen address.
Of course Windows has to be retarded in its own special way and implement
SO_REUSEADDR with a completely different meaning.

http://www.andy-pearce.com/blog/posts/2013/Feb/so_reuseaddr-on-windows/
2018-04-27 19:18:09 +02:00
Bartosz Taudul
237aee30a8 Test if HW timer can be used on arm. 2018-04-27 16:58:45 +02:00
Bartosz Taudul
6a2311a7b7 Arm64 also defines __ARM_ARCH. 2018-04-26 17:39:04 +02:00
Bartosz Taudul
a3f5003f88 Read time from timer register on armv6, armv7.
Same improvement as on aarch64.
2018-04-26 17:18:10 +02:00
Bartosz Taudul
69a50b04c1 Really don't care about cpu id. 2018-04-26 16:12:52 +02:00
Bartosz Taudul
1899066e36 Read time from timer register on arm64.
On ODROID C2 this change improves timer resolution from 250 ns to 41 ns.
2018-04-26 16:03:31 +02:00
Bartosz Taudul
3a20104882 No need for separate tracy_rdtscp() function. 2018-04-26 15:30:53 +02:00
Bartosz Taudul
8cc9464082 Use GetTime() in CalibrateTimer(). 2018-04-26 15:29:09 +02:00
Bartosz Taudul
48665cc09b s/TRACY_RDTSCP_SUPPORTED/TRACY_HW_TIMER/ 2018-04-26 15:25:54 +02:00
Bartosz Taudul
2a427ba87a Fix typo. 2018-04-22 02:24:34 +02:00
Bartosz Taudul
ecabf24c4e Optional normalization of compared data. 2018-04-22 02:19:22 +02:00
Bartosz Taudul
d06890b55d Add missing tree pop. 2018-04-22 01:28:55 +02:00
Bartosz Taudul
1fb47899b2 Fix skipping lock data with new dump version. 2018-04-22 01:26:51 +02:00
Bartosz Taudul
470bfb5c02 Don't load unneeded data. 2018-04-22 01:00:17 +02:00
Bartosz Taudul
0337569f95 Update NEWS. 2018-04-22 00:58:17 +02:00
Bartosz Taudul
41738469f1 Add trace compare window. 2018-04-22 00:52:33 +02:00
Bartosz Taudul
436cd2b6cf Drop '###Profiler' from capture name. 2018-04-21 23:29:28 +02:00
Bartosz Taudul
28380f2d25 Move bad version dialogs to a separate file. 2018-04-21 23:19:48 +02:00
Bartosz Taudul
ea2be1bce9 Fix custom ImVec2 operators. 2018-04-21 23:19:13 +02:00
Bartosz Taudul
1d044b494b Don't enforce main window buttons width. 2018-04-21 22:42:32 +02:00
Bartosz Taudul
880eb7cbdd Don't display zone names in find zone menu zones list. 2018-04-21 22:34:19 +02:00
Bartosz Taudul
2ef9fe0743 Enable log time in find zone menu by default. 2018-04-21 22:21:15 +02:00
Bartosz Taudul
d1e185e176 Cleanup message data. 2018-04-21 20:36:33 +02:00
Bartosz Taudul
4cd9cf5dd9 Cleanup zone data. 2018-04-21 20:34:29 +02:00
Bartosz Taudul
0de5bcacaf Free plot data. 2018-04-21 20:12:16 +02:00
Bartosz Taudul
dda25cf66a Cosmetics. 2018-04-21 20:11:59 +02:00
Bartosz Taudul
ac73b00540 Prevent nasal demons from appearing. 2018-04-21 19:37:55 +02:00
Bartosz Taudul
adf8a126c6 More space for text on main window buttons. 2018-04-21 19:30:49 +02:00
Bartosz Taudul
a4e1bb05f3 Use proper format strings. 2018-04-21 19:26:55 +02:00
Bartosz Taudul
d201be25ed Fix force_inline on gcc/clang. 2018-04-21 19:22:27 +02:00
Bartosz Taudul
ade97b7ab6 Add hours to time-to-string conversion. 2018-04-21 17:01:10 +02:00
Bartosz Taudul
7b07b67d89 Update NEWS. 2018-04-21 16:58:23 +02:00
Bartosz Taudul
ad91b9b002 Expand maximum view span from 1 minute to 1 hour. 2018-04-21 16:53:17 +02:00
Bartosz Taudul
cb298893e7 Fix skipping lock data. 2018-04-21 16:02:36 +02:00
Bartosz Taudul
121cced681 Don't save unneeded lock data.
Store only the minimal lock information required and calculate lock
counts, wait lists, etc. at load time.
2018-04-21 15:42:08 +02:00
Bartosz Taudul
539c034ec3 Update NEWS. 2018-04-21 15:21:50 +02:00
Bartosz Taudul
3b6b67b7ee Display a dialog when user tries to open invalid file. 2018-04-21 15:00:54 +02:00
Bartosz Taudul
764792d8db Try to not crash when opening invalid files.
Tracy will now perform a number of checks when trying to read a dump
file:
1. The file must have at least 4 bytes of data.
2. There should be a 4 byte header to indicate the file was saved by
   tracy. This is a breaking change in file format.
3. Old header-less files are still supported, but there's a new check
   for data validity. The first 4 bytes of file (as an uint32) must be
   less or equal to max LZ4 data packet size. This requires the first
   two bytes to be 00 00 or 00 01, which should catch most invalid
   files.
2018-04-21 14:53:40 +02:00
Bartosz Taudul
a63f214964 Use static assert where static assert is due. 2018-04-21 14:47:15 +02:00
Bartosz Taudul
fefcbc6c35 Handle opening unsupported files. 2018-04-21 14:31:33 +02:00
Bartosz Taudul
36efe96e9d Throw exception when trying to open unsupported dump version. 2018-04-21 14:18:42 +02:00
Bartosz Taudul
3793a37b2b Use small buttons in statistics window. 2018-04-21 14:17:42 +02:00
Bartosz Taudul
645f312166 Enable imgui frame rounding. 2018-04-21 14:16:49 +02:00
Bartosz Taudul
d9fd1ce74a Add dump file header. 2018-04-21 13:45:48 +02:00
Bartosz Taudul
6bb3b846f4 Update NEWS. 2018-04-21 12:49:35 +02:00
Bartosz Taudul
6c2d7628ee Don't draw off-screen gpu zones. 2018-04-20 23:28:19 +02:00
Bartosz Taudul
a2779eccaf Don't draw off-screen zones. 2018-04-20 23:19:04 +02:00
Bartosz Taudul
9fc14d2faf Don't draw off-screen plots. 2018-04-20 23:00:26 +02:00
Bartosz Taudul
cd7a1cffe8 Don't draw off-screen locks. 2018-04-20 22:53:31 +02:00
Bartosz Taudul
723fad84a7 Don't draw off-screen zone timeline labels. 2018-04-20 22:45:29 +02:00
Bartosz Taudul
84fd351fba Allow partial load of data from dump. 2018-04-20 16:03:09 +02:00
Bartosz Taudul
cc65e52663 Allow skipping data when reading file. 2018-04-20 14:27:20 +02:00
Bartosz Taudul
4eb205ad18 Optimize FastVector for fast push_next() operation. 2018-04-14 17:12:41 +02:00
Bartosz Taudul
6120b3e922 Change -1 comparisons to "0" comparisons. 2018-04-14 16:50:04 +02:00
Bartosz Taudul
15219b1481 Support 4-byte size_t. 2018-04-14 16:08:39 +02:00
Bartosz Taudul
14c77aba2f Cosmetics. 2018-04-14 15:47:09 +02:00
Bartosz Taudul
459890ef0e Don't hold lock on serial queue during dequeue. 2018-04-14 15:46:11 +02:00
Bartosz Taudul
e1dc62cabe Add fast vector swap. 2018-04-14 15:46:01 +02:00
Bartosz Taudul
d0d5528e99 Disable histogram highlight using right mouse button. 2018-04-14 15:21:22 +02:00
Bartosz Taudul
07201a19ad Update imgui to 1.60. 2018-04-14 15:12:16 +02:00
Bartosz Taudul
3df7c70f99 Optimize mem alloc processing. 2018-04-10 16:06:01 +02:00
Bartosz Taudul
be50fb26b5 Remove useless assert. 2018-04-10 14:37:17 +02:00
Bartosz Taudul
fd41b4927a Allow selecting/unselecting all locks for display. 2018-04-09 16:15:40 +02:00
Bartosz Taudul
0e6ce076f9 Update NEWS. 2018-04-09 14:29:22 +02:00
Bartosz Taudul
4e1dbb3973 Fix lock announce processing. 2018-04-09 14:28:40 +02:00
Bartosz Taudul
f5073ffd8d Update NEWS. 2018-04-05 19:31:46 +02:00
Bartosz Taudul
d4bfbc2797 Allow displaying global statistics of a zone. 2018-04-05 19:31:04 +02:00
Bartosz Taudul
093787b3e8 Move find zone setup to a dedicated function. 2018-04-05 19:30:32 +02:00
Bartosz Taudul
d1a0ae2564 Update NEWS. 2018-04-05 19:20:28 +02:00
Bartosz Taudul
a319ce13e9 Merge branch 'memory' 2018-04-05 18:57:55 +02:00
Bartosz Taudul
ac3b10e50f Release v0.2. 2018-04-05 18:57:32 +02:00
Bartosz Taudul
0f95d7fd21 Use lookup table to get memory decay color. 2018-04-05 12:14:26 +02:00
Bartosz Taudul
4c76a5d66b Add missing no-op macros for use if tracy is disabled. 2018-04-05 12:14:26 +02:00
Bartosz Taudul
c9d1f59c92 No need to pack WelcomeMessage struct. 2018-04-04 19:43:21 +02:00
Bartosz Taudul
d1429d086d No need to pack WelcomeMessage struct. 2018-04-04 18:53:41 +02:00
Bartosz Taudul
bb299a5074 Desaturate older allocations on memory map. 2018-04-03 20:38:50 +02:00
Bartosz Taudul
189a4a2e32 Page chunk mask is not needed anymore. 2018-04-03 19:41:11 +02:00
Bartosz Taudul
1182a3fcb8 Stop processing allocations if already at time end. 2018-04-03 19:40:06 +02:00
Bartosz Taudul
b78dc70b70 No need to split address into page and chunk. 2018-04-03 19:39:19 +02:00
Bartosz Taudul
22bd2923eb Keep mem.low in a register. 2018-04-03 19:35:43 +02:00
Bartosz Taudul
a3dd90529c Rearrange memory reads. 2018-04-03 19:35:28 +02:00
Bartosz Taudul
197e513727 Add a separate time restriction code path. 2018-04-03 19:34:48 +02:00
Bartosz Taudul
f0573d68bd Store memory pages in a contiguous memory area. 2018-04-03 19:17:32 +02:00
Bartosz Taudul
5ce3e44c77 Calculate chunks in one place in code. 2018-04-03 18:27:50 +02:00
Bartosz Taudul
7c4075c9ce Fix MemRead() call. 2018-04-03 17:57:12 +02:00
Bartosz Taudul
3ea5600900 Fix UB, lose type safety. 2018-04-03 17:51:53 +02:00
Bartosz Taudul
3e93c615f7 Fix UB, lose type safety. 2018-04-03 16:45:55 +02:00
Bartosz Taudul
bf99bff87d Store MemEvents directly in the vector. 2018-04-03 14:17:51 +02:00
Bartosz Taudul
bc27c99a1e Move page init to a non-inlined function. 2018-04-03 13:30:56 +02:00
Bartosz Taudul
6d40502068 Execute direct write to memory, if only one byte. 2018-04-03 13:23:53 +02:00
Bartosz Taudul
81c84025a2 Fix calculation of lines. 2018-04-02 20:11:55 +02:00
Bartosz Taudul
1bb1cf9e6c Display memory map information. 2018-04-02 20:00:05 +02:00
Bartosz Taudul
78ebf37039 Use proper values for page map calculation. 2018-04-02 19:57:46 +02:00
Bartosz Taudul
a2a6386491 Allow time restricting memory map. 2018-04-02 18:57:24 +02:00
Bartosz Taudul
1c441824fd Display memory map. 2018-04-02 18:51:32 +02:00
Bartosz Taudul
78cd86dd69 Memory pages bitmap calculation. 2018-04-02 18:51:32 +02:00
Bartosz Taudul
bf249de266 Display memory usage by active allocations. 2018-04-02 16:30:03 +02:00
Bartosz Taudul
670744f852 Move alloc cutoff to middle of timeline. 2018-04-02 16:21:24 +02:00
Bartosz Taudul
7b194d2349 Don't use std::sort. 2018-04-02 16:09:44 +02:00
Bartosz Taudul
e80891e36d Allow restricting displayed allocs by time. 2018-04-02 16:07:33 +02:00
Bartosz Taudul
c1aaec32d6 Sort active allocations by appearance time. 2018-04-02 15:45:11 +02:00
Bartosz Taudul
38edf308fa Display memory span. 2018-04-02 14:58:40 +02:00
Bartosz Taudul
821b08fbe4 Thread compression state is not preserved. 2018-04-02 14:52:36 +02:00
Bartosz Taudul
aa8980aacc Put memory allocations list into a child area. 2018-04-02 14:44:45 +02:00
Bartosz Taudul
8cc446b578 Highlight zones with opened zone info window. 2018-04-02 14:38:08 +02:00
Bartosz Taudul
50eb5c4b84 Highlight same zone alloc+free. 2018-04-02 14:36:07 +02:00
Bartosz Taudul
f7ce3e795f Display zone if which allocation was freed. 2018-04-02 14:29:56 +02:00
Bartosz Taudul
e1682c7675 Draw active allocations list. 2018-04-02 02:39:12 +02:00
Bartosz Taudul
c4a36398f6 Move memory allocations table drawing to a separate function. 2018-04-02 02:39:12 +02:00
Bartosz Taudul
1fa943d109 Save/load memory data. 2018-04-02 02:05:39 +02:00
Bartosz Taudul
68acc30bdd Add support for determining FileRead EOF. 2018-04-02 02:05:39 +02:00
Bartosz Taudul
5824b47a66 Display memory usage. 2018-04-02 00:02:45 +02:00
Bartosz Taudul
52f59c90bf Track memory usage. 2018-04-02 00:00:49 +02:00
Bartosz Taudul
e3509b6eee Display total number of allocations. 2018-04-01 23:57:18 +02:00
Bartosz Taudul
8efc0a0a71 Display proper hex value. 2018-04-01 22:00:57 +02:00
Bartosz Taudul
2b8ce8341e Missing initializer. 2018-04-01 21:54:12 +02:00
Bartosz Taudul
3f7abd478e Display zone in which memory allocation took place. 2018-04-01 21:50:35 +02:00
Bartosz Taudul
912cfdbc5e Search for zone present in given thread at given time. 2018-04-01 21:47:08 +02:00
Bartosz Taudul
20824a200c Implement search for memory address. 2018-04-01 21:24:30 +02:00
Bartosz Taudul
9c403d9cc2 GetTime() calls also must be serialized. 2018-04-01 21:07:33 +02:00
Bartosz Taudul
c686b86464 Add rudimentary memory information window. 2018-04-01 20:34:58 +02:00
Bartosz Taudul
2d00d95743 Missing initializer. 2018-04-01 20:34:58 +02:00
Bartosz Taudul
cd3bba8063 Memory data accessor. 2018-04-01 20:34:58 +02:00
Bartosz Taudul
a574f98f0c Memory events are now serialized. 2018-04-01 20:13:01 +02:00
Bartosz Taudul
794f199bdc Serial queue dequeuing. 2018-04-01 20:04:35 +02:00
Bartosz Taudul
860e0e1809 Store memory operations in the serial queue. 2018-04-01 19:53:24 +02:00
Bartosz Taudul
faeecdd773 Add serial queue to profiler. 2018-04-01 19:53:05 +02:00
Bartosz Taudul
0a3e9f85eb "Fast" vector implementation. 2018-04-01 19:52:29 +02:00
Bartosz Taudul
66ad415ce5 Remove windows.h dependency from tracy_sema.h. 2018-04-01 19:15:46 +02:00
Bartosz Taudul
16a98c8c17 Move benaphore to common directory. 2018-04-01 18:59:55 +02:00
Bartosz Taudul
b12375815c Broken memory events processing. 2018-04-01 02:03:34 +02:00
Bartosz Taudul
991fc6bd95 Memory allocations tracker. 2018-03-31 21:56:05 +02:00
Bartosz Taudul
7a35e8facc Fix typo. 2018-03-31 14:19:45 +02:00
Bartosz Taudul
e44cf98807 Update NEWS. 2018-03-31 14:15:04 +02:00
Bartosz Taudul
a677048d2b Fix try_lock(). 2018-03-31 14:15:04 +02:00
Bartosz Taudul
3b03e849f0 Harden client code against unaligned memory access.
There shouldn't be any changes in generated code on modern
architectures, as the memcpy will be reduced to a store/load operation
identical to the one generated with plain struct member access.

GetTime( cpu ) needs special handling, as the MSVC intrinsic for rdtscp
can't store cpu identifier in a register. Using intermediate variable
would cause store to stack, read from stack, store to the destination
address. Since rdtscp is only available on x86, which handles unaligned
stores without any problems, we can have one place with direct struct
member access.
2018-03-31 14:15:04 +02:00
Bartosz Taudul
685432a85f Add unaligned read/write helpers. 2018-03-31 14:15:04 +02:00
Bartosz Taudul
56bd01dfd1 Don't copy thread name needlessly. 2018-03-31 01:38:57 +02:00
Bartosz Taudul
780e838785 Update NEWS. 2018-03-31 01:24:11 +02:00
Bartosz Taudul
48e82ea135 Workaround pthreads thread name limit. 2018-03-31 01:22:21 +02:00
Bartosz Taudul
347c74cec3 Cosmetics. 2018-03-31 01:06:35 +02:00
Bartosz Taudul
03830fe83a Update NEWS. 2018-03-30 23:45:48 +02:00
Bartosz Taudul
813e265bc3 Initialize rpmalloc in SetThreadName().
There's no guarantee that rpmalloc will be initialized when the thread
calls SetThreadName, due to thread_local storage initialization rules.
2018-03-30 14:39:25 +02:00
Bartosz Taudul
045870ad95 Doh! FileWrite destructor was never called. 2018-03-29 01:11:54 +02:00
Bartosz Taudul
c7a5e25c87 Display parent times. 2018-03-28 19:35:33 +02:00
Bartosz Taudul
c626bbd553 Update NEWS. 2018-03-28 02:00:29 +02:00
Bartosz Taudul
9d798789a9 Fix broken behavior on duplicate names in zone info window. 2018-03-28 01:57:53 +02:00
Bartosz Taudul
d6bf19a762 Standard zone list behavior in zone trace. 2018-03-28 01:53:59 +02:00
Bartosz Taudul
bf52b3bc98 Add zone trace. 2018-03-28 01:47:28 +02:00
Bartosz Taudul
4d0396fa06 No auto-expand of child zones. 2018-03-28 01:47:20 +02:00
Bartosz Taudul
aebbefde2a Rename exclusive time to self time. 2018-03-28 01:47:08 +02:00
Bartosz Taudul
f6d4728494 Move child zones into an expandable tree. 2018-03-28 01:34:12 +02:00
Bartosz Taudul
4d38ebe1b1 Add a link to tracy introduction video. 2018-03-28 01:14:58 +02:00
Bartosz Taudul
871633cbaf Adapt button size to font size. 2018-03-25 00:10:31 +01:00
Bartosz Taudul
83e8bb0b49 Update NEWS. 2018-03-24 22:21:34 +01:00
Bartosz Taudul
e69d71cd4d Open trace passed as standalone server argument. 2018-03-24 22:20:06 +01:00
Bartosz Taudul
9925e9ed81 Link to NEWS from README. 2018-03-24 22:10:54 +01:00
Bartosz Taudul
c123c2c755 Update NEWS. 2018-03-24 22:09:46 +01:00
Bartosz Taudul
3567089278 Provide default size for main profiler window.
This prevents a crash when loading saved trace with plot.
2018-03-24 22:07:41 +01:00
Bartosz Taudul
1219b72577 Check if match table has content. 2018-03-24 17:32:27 +01:00
Bartosz Taudul
d559da932f Highlight source location displayed in find zone window. 2018-03-24 17:29:25 +01:00
Bartosz Taudul
ae274d8e37 Different ways of sorting of statistics data. 2018-03-24 17:28:10 +01:00
Bartosz Taudul
0e1e1b4d06 Update NEWS. 2018-03-24 15:22:08 +01:00
Bartosz Taudul
b65824d116 Show source location details when one is selected. 2018-03-24 15:20:39 +01:00
Bartosz Taudul
3012817da4 Source location statistics. 2018-03-24 15:16:43 +01:00
Bartosz Taudul
27c66c3765 Remove unused variable. 2018-03-24 15:04:44 +01:00
Bartosz Taudul
aa9d9575e0 Allow raw access to source location zones data. 2018-03-24 14:48:52 +01:00
Bartosz Taudul
948c6f405c Update NEWS. 2018-03-24 14:46:44 +01:00
Bartosz Taudul
cb4c1dac24 Don't show pause/resume button if data is static. 2018-03-24 14:45:01 +01:00
Bartosz Taudul
d8ac7dee83 Expose worker data state (static/dynamic). 2018-03-24 14:43:57 +01:00
Bartosz Taudul
225423bd21 Cosmetics. 2018-03-24 14:42:48 +01:00
Bartosz Taudul
3a49e9a4be Statistics window shell. 2018-03-24 14:40:48 +01:00
Bartosz Taudul
a9e1a9bddb Calculate total time spent in source location.
This simple solution doesn't handle recursion at all.
2018-03-24 14:24:30 +01:00
Bartosz Taudul
40a14292b3 Matched source locations and histogram default to open. 2018-03-24 02:45:24 +01:00
Bartosz Taudul
fea0234a60 Change zone end "-1" comparisons to "0" comparisons. 2018-03-24 02:00:20 +01:00
Bartosz Taudul
6a4e58b545 Force inline compress/decompress thread id. 2018-03-24 01:31:58 +01:00
Bartosz Taudul
c0577fd5b2 Unordered map is no longer used. 2018-03-23 21:18:52 +01:00
Bartosz Taudul
f4b88b9c05 Use flat hash map for reverse plot lookup. 2018-03-23 21:18:00 +01:00
Bartosz Taudul
6cb2fec48e Use flat hash map for string map. 2018-03-23 21:12:29 +01:00
Bartosz Taudul
69b49f527d Inline GetZoneEndDirect(). 2018-03-23 02:06:44 +01:00
Bartosz Taudul
910ce8b8ef Display number of matched source locations. 2018-03-20 20:18:23 +01:00
Bartosz Taudul
6e6addfa81 Use pdqsort. 2018-03-20 19:19:07 +01:00
Bartosz Taudul
ae55360a6d Don't sort zones if statistics are disabled. 2018-03-20 19:12:42 +01:00
Bartosz Taudul
4837ce31ff Allow sorting zone groups by count. 2018-03-20 17:19:48 +01:00
Bartosz Taudul
64f3c55ba5 Display zone group time. 2018-03-20 16:56:11 +01:00
Bartosz Taudul
e6d5f3f5fc Store common variables in registers to prevent aliasing. 2018-03-20 16:49:29 +01:00
Bartosz Taudul
d8f7903a97 Use flat hash map for ptr mapping during data load. 2018-03-20 15:44:13 +01:00
Bartosz Taudul
720e5a0468 First check if valid, then search in map. 2018-03-20 15:41:06 +01:00
Bartosz Taudul
fe6c753f12 Store lock thread map in flat hash map. 2018-03-20 15:40:25 +01:00
Bartosz Taudul
765a1ececf Move nohash<> from TracyWorker to flat hash map. 2018-03-20 15:40:11 +01:00
Bartosz Taudul
37808ec4c7 Fix the horribly inefficient Visible() and ShowFull() methods. 2018-03-20 15:33:38 +01:00
Bartosz Taudul
ceeae3c2cf Restore ordering of source location zones after load. 2018-03-20 14:56:42 +01:00
Bartosz Taudul
f8f59bbd36 Update NEWS. 2018-03-20 14:39:10 +01:00
Bartosz Taudul
ad37f0857b Highlight selected zone group on histogram. 2018-03-20 14:37:58 +01:00
Bartosz Taudul
64e05e4726 Put found zones list into a subchild. 2018-03-20 12:56:26 +01:00
Bartosz Taudul
ce3f0bd596 Add separator to zone tooltips. 2018-03-19 16:14:01 +01:00
Bartosz Taudul
d5e0858982 Display thread in GPU zone tooltip. 2018-03-19 16:13:12 +01:00
Bartosz Taudul
4d34ccc30c Unify zone info window thread retrieval. 2018-03-19 16:11:44 +01:00
Bartosz Taudul
0f6ec65b65 GPU zone thread getter. 2018-03-19 16:11:37 +01:00
Bartosz Taudul
5a32cd7984 Show zone thread in zone info popup. 2018-03-19 16:08:50 +01:00
Bartosz Taudul
ad959549e4 Update NEWS. 2018-03-19 16:02:18 +01:00
Bartosz Taudul
0d831e452b Add ability to group zones by user text. 2018-03-19 16:01:36 +01:00
Bartosz Taudul
05eb4b7ebc Don't use memcpy to terminate string. 2018-03-19 15:41:28 +01:00
Bartosz Taudul
2b54bd1b15 Update NEWS. 2018-03-19 02:39:19 +01:00
Bartosz Taudul
1fbe1621e7 Display zone exclusive time as progress bar. 2018-03-19 02:30:40 +01:00
Bartosz Taudul
3b34ebf544 Unify GPU info window child selection with the rest of lists. 2018-03-19 02:25:24 +01:00
Bartosz Taudul
efe3eda845 Display thread in zone info windows. 2018-03-19 02:22:08 +01:00
Bartosz Taudul
2eece7c1f3 Reorder instructions. 2018-03-18 23:46:34 +01:00
Bartosz Taudul
ce2bf7c207 Use Vector instead of std::vector for thread zone list. 2018-03-18 21:15:31 +01:00
Bartosz Taudul
8dabe47602 Stop processing new zones on invalid time span.
When processing will resume in the next frame, the zone will hopefully
have a proper end time.
2018-03-18 21:06:26 +01:00
Bartosz Taudul
8b3e53bfad Don't ignore first thread. 2018-03-18 20:53:31 +01:00
Bartosz Taudul
d0519499f4 Store thread id next to zone ptr in source location zone list. 2018-03-18 20:45:49 +01:00
Bartosz Taudul
777d672e05 Thread id compression/decompression. 2018-03-18 20:45:22 +01:00
Bartosz Taudul
40c6f01a41 Perform search after condition was verified, not before. 2018-03-18 20:25:00 +01:00
Bartosz Taudul
3ac98beb5a Use precalculated min/max time spans. 2018-03-18 20:20:24 +01:00
Bartosz Taudul
0f1f7c6813 Calculate min/max time spans for source locations. 2018-03-18 20:15:45 +01:00
Bartosz Taudul
43c3fe25ba Put source location zone data into a struct. 2018-03-18 20:08:57 +01:00
Bartosz Taudul
f5b0f34827 Using std::vector instead of Vector is no longer possible. 2018-03-18 19:56:53 +01:00
Bartosz Taudul
77fa8f54a6 Restore per-thread zone list functionality. 2018-03-18 16:41:58 +01:00
Bartosz Taudul
d08c10c5b6 Add functionality for getting zone thread. 2018-03-18 16:38:42 +01:00
Bartosz Taudul
616269e849 Display zone counts in matched source locations. 2018-03-18 16:11:08 +01:00
Bartosz Taudul
af3559afed Only display results for a single source location match. 2018-03-18 16:07:07 +01:00
Bartosz Taudul
3207861869 Add changelog. 2018-03-18 13:55:44 +01:00
Bartosz Taudul
d5177e6946 Add a quick FAQ. 2018-03-18 13:30:09 +01:00
Bartosz Taudul
b7dfbed7e5 Document the capture tool. 2018-03-18 13:13:36 +01:00
Bartosz Taudul
8ce6634ebc Document the TRACY_NO_STATISTICS macro. 2018-03-18 13:09:59 +01:00
Bartosz Taudul
d747f2b74f Disable statistics collection in capture tool. 2018-03-18 13:00:11 +01:00
Bartosz Taudul
7a4e7cbf86 Reduce data collection if TRACY_NO_STATISTICS is defined.
Statistical data collection is only useful if it's meant to be used.
Otherwise it only incurs CPU and memory cost.
2018-03-18 12:55:54 +01:00
Bartosz Taudul
4baea4a74f Don't hash source location zones keys. 2018-03-18 03:25:14 +01:00
Bartosz Taudul
67774698af Only use direct zone end time for find zone data.
This prevents temporary timing artifacts from affecting histogram graph.
Previously the graph would flicker, because some shorter than usual
timing data was reported and the graph tried to compensate for a single
frame when such data was present.
2018-03-18 02:53:41 +01:00
Bartosz Taudul
e6b3f373c5 Add direct zone end getter. 2018-03-18 02:53:00 +01:00
Bartosz Taudul
746df21ad9 Live updates of find zone data.
TODO: found zones list. Currently only histogram view is available.
2018-03-18 02:43:17 +01:00
Bartosz Taudul
c807b3f7ef Getter for source location zones. 2018-03-18 02:35:39 +01:00
Bartosz Taudul
9830fa297e Store per-source-location zone lists. 2018-03-18 02:05:33 +01:00
Bartosz Taudul
c5c81a73bc Skip initialization of StringIdx.
That memory will be loaded from file.
2018-03-17 14:43:02 +01:00
Bartosz Taudul
a4d46219df File read buffer doesn't need to be preserved. 2018-03-17 14:22:36 +01:00
Bartosz Taudul
41d8ca0814 Split read/write functions into small and big variants. 2018-03-17 13:57:32 +01:00
Bartosz Taudul
79418d0c57 Move locks, zones, etc in options menu out of view. 2018-03-15 23:33:05 +01:00
Bartosz Taudul
81ff554c7d Don't call ReadTimeline() when there's nothing to read. 2018-03-15 22:54:10 +01:00
Bartosz Taudul
9dfa9c95cb Read and write whole ZoneEvent/GpuEvent data at once. 2018-03-15 21:59:16 +01:00
Bartosz Taudul
e5796af196 More efficient vector filling. 2018-03-15 21:42:00 +01:00
Bartosz Taudul
c510c9705b No need to check for reserved space. 2018-03-15 21:32:06 +01:00
Bartosz Taudul
b7ba64a223 Microoptimize ReadTimeline(). 2018-03-15 21:27:36 +01:00
Bartosz Szreder
124170b804 Fix compile warnings. 2018-03-14 00:30:57 +01:00
Bartosz Taudul
2f2dd2fc21 Display basic capture information. 2018-03-10 02:25:29 +01:00
Bartosz Taudul
3f2ba6797b Link with pthreads. 2018-03-10 01:32:39 +01:00
Bartosz Taudul
c673f70f90 Add command line trace capture utility. 2018-03-10 01:29:27 +01:00
Bartosz Taudul
a14ff62e64 Decrease minimum spacing between tick labels on linear histogram. 2018-03-05 20:33:04 +01:00
Bartosz Taudul
f361d7484d Put selection information next to each other. 2018-03-05 20:30:21 +01:00
Bartosz Taudul
f39d4c415d Count time spent in histogram selection. 2018-03-05 20:23:58 +01:00
Bartosz Taudul
e9e3e46ea2 Display time instead of counts in cumulate time mode. 2018-03-05 20:19:05 +01:00
Bartosz Taudul
f733758652 Time accumulation histogram mode. 2018-03-05 20:15:18 +01:00
Bartosz Taudul
4005f22ecf Clear selection only on right mouse click. Add tooltip. 2018-03-05 20:05:20 +01:00
Bartosz Taudul
3e931432cf Don't calculate logarithms more than once. 2018-03-05 13:20:24 +01:00
Bartosz Taudul
68f652c40f Put total time and max counts on the same line. 2018-03-04 23:25:33 +01:00
Bartosz Taudul
3a8c976285 Clear histogram range selection by right mouse click. 2018-03-04 23:20:35 +01:00
Bartosz Taudul
f510d8d2e7 Update item counts in thread list. 2018-03-04 23:17:36 +01:00
Bartosz Taudul
3dd14c9e01 Filter found zones according to selection. 2018-03-04 23:07:38 +01:00
Bartosz Taudul
f42d8cee38 Selection of time range on histogram. 2018-03-04 22:52:36 +01:00
Bartosz Taudul
dee7fd27be Move mouse highlight data to a separate struct. 2018-03-04 22:21:35 +01:00
Bartosz Taudul
f7829a7eae Store matches in a map. 2018-03-04 22:11:50 +01:00
Bartosz Taudul
754279d6f1 Allow narrowing down search results by source location. 2018-03-04 21:17:38 +01:00
Bartosz Taudul
2c508c1f79 Display list of matched source locations in search window. 2018-03-04 21:10:10 +01:00
Bartosz Taudul
5c1aec723d Fix thread name clashes in ImGui. 2018-03-04 18:52:32 +01:00
Bartosz Taudul
fa46445537 Add label to separate found zones from rest of find dialog. 2018-03-04 18:44:33 +01:00
Bartosz Taudul
a34bb97d78 Unify zone children and find zone list behavior. 2018-03-04 18:42:18 +01:00
Bartosz Taudul
f44e9bbd7b Make zone info child list "selectable". 2018-03-04 18:40:32 +01:00
Bartosz Taudul
a7e7f59f96 Zoom-to-zone on middle click on found item. 2018-03-04 18:35:40 +01:00
Bartosz Taudul
5cb917e868 No nonsense union. 2018-03-04 17:52:51 +01:00
Bartosz Taudul
5afdccfc46 Properly initialize data.
Unused bitbield bits and inactive string index/reference had thrash
values in release builds, which prevented de-duplication of source
location payloads.
2018-03-04 17:47:26 +01:00
Bartosz Taudul
e9395cd988 Reconstruct source location payload map on data load. 2018-03-04 17:22:34 +01:00
Bartosz Taudul
a374114358 Use proper encoding of source location. 2018-03-04 17:17:37 +01:00
Bartosz Taudul
9170cfd943 First entry in sourceLocationExpand is special. 2018-03-04 16:57:57 +01:00
Bartosz Taudul
80da271a2c Don't match source location on a per-zone basis. 2018-03-04 16:53:13 +01:00
Bartosz Taudul
b48602f5d1 Implement search for matching source locations. 2018-03-04 16:52:45 +01:00
Bartosz Taudul
f8c5f28372 Use Vector for source location expand storage. 2018-03-04 16:32:51 +01:00
Bartosz Taudul
f99c6eec78 Simplify code. 2018-03-04 16:23:28 +01:00
Bartosz Taudul
b09bae07c4 Change default find parameters to unlimited. 2018-03-04 16:07:10 +01:00
Bartosz Taudul
dca7338319 Update rpmalloc to 1.3.0. 2018-03-04 15:51:10 +01:00
Bartosz Taudul
0c1721144e Backport concurrent queue's fixes.
420509b6678263f0fa6c0ffba87a15319238a1f2
2018-03-04 15:32:42 +01:00
Bartosz Taudul
7d6f5b875d Bump lz4 to 1.8.1. 2018-03-04 15:23:46 +01:00
Bartosz Taudul
b057c631a6 Ignore tracy_test executable. 2018-03-04 15:12:05 +01:00
Bartosz Taudul
87cfd98b69 No need for fractional time part on graph ticks. 2018-02-28 15:38:32 +01:00
Bartosz Taudul
2891ecc526 Logarithmic scale histogram ticks. 2018-02-28 15:20:52 +01:00
Bartosz Taudul
e64e7ce3f1 Add TracyWorker.hpp to msvc project. 2018-02-23 15:13:30 +01:00
Bartosz Szreder
0fb35b42f8 Merged in bartosz_szreder/tracy (pull request #3)
Split data handling code from the view.
2018-02-23 14:12:01 +00:00
Bartosz Szreder
3b9639a9de Tweak included header files in View and Worker. 2018-02-23 15:08:20 +01:00
Bartosz Taudul
6406df6f45 Display total time. 2018-02-22 12:44:55 +01:00
Bartosz Taudul
ffb28a3d0d More concise time range display. 2018-02-22 12:38:43 +01:00
Bartosz Szreder
bae1c02ad0 Worker thread will take care of itself. 2018-02-21 16:41:37 +01:00
Bartosz Szreder
9e3f18a62a Split data handling code from the view. 2018-02-21 16:41:37 +01:00
Bartosz Taudul
fbaf59c9a6 Ignore zero-time zones in search. 2018-02-21 15:25:28 +01:00
Bartosz Taudul
785ab2927b Calculate proper label offsets. 2018-02-21 15:18:30 +01:00
Bartosz Taudul
d9988c8a06 Histogram time labels prototype. 2018-02-20 16:01:33 +01:00
Bartosz Taudul
118d4b497f Time Stamp Counter to time conversion function. 2018-02-20 12:40:12 +01:00
Bartosz Taudul
6a65ceb71a Display maximum number of counts in bins. 2018-02-16 16:19:31 +01:00
Bartosz Taudul
4611bc355f Optional log time scale in histogram. 2018-02-16 15:34:22 +01:00
Bartosz Taudul
6e8bb9e490 Display bin times. 2018-02-16 14:42:16 +01:00
Bartosz Taudul
f6cc360c69 Basic histogram introspection. 2018-02-16 14:31:57 +01:00
Bartosz Taudul
fbe1af80b5 Cosmetics. 2018-02-16 14:31:53 +01:00
Bartosz Taudul
9678cc8afc Support logarithmic scaling of values on search histogram. 2018-02-16 13:28:40 +01:00
Bartosz Taudul
508b699252 Fix crash. 2018-02-16 13:09:24 +01:00
Bartosz Taudul
5bc145f719 Search results histogram. 2018-02-15 17:25:16 +01:00
Bartosz Taudul
ea4863d4bd Fix help strings. 2018-02-15 16:32:36 +01:00
Bartosz Taudul
e20bb2fe66 Add separators to zone count. 2018-02-15 16:31:47 +01:00
Bartosz Taudul
cc38988045 Cleanup. 2018-02-15 16:24:01 +01:00
Bartosz Taudul
d1d54db7b6 Display number of found zones. 2018-02-15 16:17:16 +01:00
Bartosz Szreder
d5fe006e2d Add missing include charutil::hash() 2018-02-12 19:07:55 +01:00
Kamil Klimek
5c0038f3f3 Merged in kamilklimek/tracy/find-zone (pull request #2)
"Find Zone" feature
2018-01-18 11:48:00 +00:00
Kamil Klimek
66fd052344 Updated AUTHORS 2018-01-18 12:46:31 +01:00
Kamil Klimek
cb08990eff "Find Zone" feature
- Simple text search with some limiting options
 - Grouping by threads
 - Easy access to "Zone Info" from search results
2018-01-18 12:35:30 +01:00
Bartosz Taudul
142f94cc33 Small style adjustments. 2018-01-13 14:08:14 +01:00
Bartosz Taudul
e5317d9e40 Use dark style. 2018-01-13 13:59:16 +01:00
Bartosz Taudul
961a907e09 Remove obsolete window flag. 2018-01-13 13:56:02 +01:00
Bartosz Taudul
9330e950da Bump ImGui to 1.53. 2018-01-13 13:52:52 +01:00
Bartosz Taudul
7300c2e46e Fix TRACY_NO_EXIT behavior.
Terminate event could be the first event that was sent. In such case
server immediately closed the connection, as there was no outstanding
data to receive. Fix by sending all data in the queue before sending
terminate event.
2018-01-11 13:45:13 +01:00
231 changed files with 119513 additions and 27150 deletions

25
.appveyor.yml Normal file
View File

@@ -0,0 +1,25 @@
version: '{build}'
platform:
- x64
image:
- Visual Studio 2019
- Ubuntu1804
install:
- cmd: cd c:\tools\vcpkg
- cmd: git pull
- cmd: bootstrap-vcpkg.bat
- cmd: vcpkg install freetype glfw3 --triplet x64-windows-static
- cmd: vcpkg integrate install
- cmd: cd %APPVEYOR_BUILD_FOLDER%
build_script:
- cmd: msbuild .\update\build\win32\update.vcxproj
- cmd: msbuild .\profiler\build\win32\Tracy.vcxproj
- cmd: msbuild .\capture\build\win32\capture.vcxproj
- sh: sudo apt-get update && sudo apt-get -y install libglfw3-dev libgtk2.0-dev
- sh: make -C update/build/unix debug release
- sh: make -C profiler/build/unix debug release
- sh: make -C capture/build/unix debug release
- sh: make -C test
- sh: make -C test clean
- sh: make -C test TRACYFLAGS=-DTRACY_ON_DEMAND
test: off

11
.gitignore vendored
View File

@@ -9,3 +9,14 @@ Debug
*.o
*.swp
imgui.ini
test/tracy_test
test/tracy_test.exe
*/build/unix/*-*
manual/t*.aux
manual/t*.log
manual/t*.out
manual/t*.pdf
manual/t*.synctex.gz
manual/t*.toc
profiler/build/win32/packages
profiler/build/win32/Tracy.aps

View File

@@ -1 +1,8 @@
Bartosz Taudul <wolf.pld@gmail.com>
Kamil Klimek <kamil.klimek@sharkbits.com> (initial find zone implementation)
Bartosz Szreder <zgredder@gmail.com> (view/worker split)
Arvid Gerstmann <dev@arvid-g.de> (compatibility fixes)
Rokas Kupstys <rokups@zoho.com> (compatibility fixes, initial CI work, MingW support)
Till Rathmann <till.rathmann@gmx.de> (DLL support)
Sherief Farouk <sherief.personal@gmail.com> (compatibility fixes)
Dedmen Miller <dedmen@dedmen.de> (find zone bug fixes, improvements)

View File

@@ -1,7 +1,7 @@
tl;dr: Tracy is licensed under BSD 3-clause license.
Tracy Profiler (https://bitbucket.org/wolfpld/tracy) is licensed under the
3-clause BSD license.
Copyright (c) 2017, Bartosz Taudul <wolf.pld@gmail.com>
Copyright (c) 2017-2019, Bartosz Taudul <wolf.pld@gmail.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without

445
NEWS Normal file
View File

@@ -0,0 +1,445 @@
Note: There is no guarantee that version mismatched client and server will
be able to talk with each other. Network protocol breakages won't be listed
here.
Note: Release numbers are nothing more than numbers. There are some
"missing" versions due to trace file changes during development. This is not
a mistake.
v0.6 (2019-11-17)
-----------------
This is the last release which will be able to load pre-v0.5 traces. Use the
update utility to convert your old traces now!
- Dropped support for pre-v0.4 traces.
- Major memory usage decrease.
- Significant network bandwidth decrease.
- Implemented context switch capture on selected platforms.
- Zone timings in various UI places can now take into account only the
time when the thread was executing.
- Zone information window can now display regions in which thread was
suspended by the operating system.
- CPUs on which the zone was running are enumerated.
- Thread activity regions can be graphed on the timeline.
- API breakage: SetThreadName() now only works on current thread.
- Fixed thread name retrieval after thread is destroyed.
- Added number of CPU cores to host info.
- Limited number of possible source locations to 64K.
- Limited supported capture length to 1.6 days.
- CPU cores are now displayed on the timeline.
- Thread execution workload is displayed, including threads from external
programs.
- Thread migrations across CPU cores can be graphed.
- System-wide workload distribution is now plotted on the timeline.
- Added "CPU data" window showing programs competing for CPU during the
capture.
- Switched to using native thread identifiers (relatively small numbers), as
opposed to pthreads identifiers, which in reality were pointers.
- Improved thread name discovery if context switch capture is enabled.
- Per-trace state is now preserved between profiling sessions:
- Timeline view position.
- Item categories draw/hide settings.
- Timeline zones will be highlighted using a different color, when a
matching time range is selected on histogram.
- Per-frame zone times are now displayed on the frames plot when a zone is
selected in the find zone menu.
- Zone color is now displayed in zone information window.
- Zone colors can now be determined basing on depth and thread or source
location.
- Thread colors are displayed across the profiler application.
- Frame times can be now compared.
- Expose more lock handling functionality.
- Network port can be now specified by the user.
- Proper handling of multithreaded Vulkan code.
- Added extreme compression level in update utility.
- Added time distribution data in the zone information window.
- Trace file name is now displayed in trace information window.
- Annotations can be now added to the timeline.
- Server now performs network data retrieval and decompression on a dedicated
thread.
- Added examples of Tracy integration.
- Allow grouping of zones in the find zone menu by zone parent or with no
grouping.
- Zone list in the statistics window can be now filtered.
- Implemented configuration of plots.
- Messages can now collect call stacks.
v0.5 (2019-08-10)
-----------------
This is the last release which will be able to load pre-v0.4 traces. Use the
update utility to convert your old traces now!
- Major decrease of trace dump file size.
- Major optimizations across the board.
- Vcpkg is now used for library management on Windows.
- Display dump file size change in the update utility.
- Added notification area.
- Display trace loading time.
- Display background processing tasks after trace is loaded.
- Display trace save notification.
- Show crash icon, if there was a crash.
- Added C API.
- Profiling session may now gracefully terminate, due to incorrect
instrumentation. A popup with termination reason will be displayed.
- Call stack improvements.
- Call stack frames now have a proper source file and file line
information on Linux.
- Single call stack frame may now have multiple entries, representing
inlined function calls.
- Call stack grouping in the find zone menu now has a special display
mode.
- Call stack memory allocations tree improvements:
- Add top-down variant to complement the previously available bottom-up
one.
- Add ability to group tree nodes by function name.
- Allow restricting tree to display only active allocations.
- Added support for Lua call stack capture.
- Self time of zones may be now displayed in the find zone menu.
- Added ability to disconnect from a client.
- Find zone groups can now be sorted by mean time per call.
- Zones displayed in the find zone menu can be now grouped by order of
appearance, execution time or name.
- Time is now displayed without trailing fractional zeros (e.g. "2.5 ms"
instead of "2.50 ms").
- Child zones displayed in zone info window can be now grouped by source
location.
- Selected or hovered lock is now highlighted on the timeline.
- Locks are now grouped into single and multithreaded (contended and
uncontended) in the options menu locks list.
- On broken platforms the profiler can now be initialized as needed (and
possible), taking a performance and functionality hit.
- User experience improvements in the graphical profiler.
- Thread position and height is now animated, to eliminate flickering that
was happening when depth of displayed zones was changing.
- Zooming in/out using the mouse wheel is now animated.
- Plot range adjustment is now animated.
- Various other UI improvements.
- System CPU usage is now being monitored.
- Threads that have nothing to display in the current view are now hidden by
default.
- Dimmed-out the timeline outside the profiling area.
- Source file view can now be opened also from statistics menu.
- Display standard deviation in find zone and compare traces menus.
- Display zone messages in zone information window.
- Display order of threads can be changed in the options menu.
- Prevent deadlocks by querying socket send buffer size.
- Frame set statistics can be now limited to frames visible on the screen.
- Messages can be now colored.
- Zone selection in compare traces menu can be now linked to the other
trace.
- Added support for frame image (screen shot) storage.
- Implemented ability to cut off outliers on histograms.
- Zone or frame that is currently hovered by the mouse cursor will be
highlighted on the histogram.
- Server now displays available clients in the local network.
- Source code whitespace visibility can now be enabled or disabled.
- Profiler will now check if proper timer readings can be performed on
x86/x64.
- Application can now log app-specific information, similarly to how the
host info reports system information.
- Message list will automatically scroll down to the most recent message.
- Feature will disable when the list is scrolled by user.
- To re-enable, scroll to the bottom of the list.
- Message list can be now filtered.
- A notification popup will be displayed during trace cleanup.
- Source file view won't be available if a source file is newer than the
capture.
- Added ability to set custom trace descriptions.
- Added frame time target lines.
- FPS counts are now displayed next to frame times.
- GPU drift value can be now automatically measured.
- Connection window is now a popup hidden under a dedicated button.
v0.4.1 (2018-12-30)
-------------------
- Active frame set can be now switched by clicking on a frame set on the
timeline.
- Add ability to go to a specified frame.
- Most commonly used addresses can be now selected from the drop-down menu.
- Fixed corner case problem with profiler initialization on Windows.
- Added third state (stopped) to the pause/resume button. It will be used
after the connection to the client is terminated.
- Active trace can be discarded.
- Call stack capture may be forced through TRACY_CALLSTACK define.
- Lock info window has been added.
- Time of lock creation and termination is now being tracked.
- Menu bar buttons are now toggles that can also close their corresponding
windows.
- Find zone and compare menu improvements.
- Ability to ignore case during search.
- Pressing enter key will now start search, just like pressing the "find"
button.
- Using the ^F keyboard shortcut will open the find zone menu and focus
the input box.
- Added ability to automatically connect to an IP address in the graphical
profiler application (use "-a address" argument to enable).
- Pressing enter key after entering client address in the welcome dialog
will now automatically begin connection process.
v0.4 (2018-10-09)
-----------------
- Renamed "standalone" utility to "profiler".
- Added trace update utility, which will convert files saved in previous
versions of tracy to be up-to-date.
- Optional high compression (--hc) mode is available that will increase
the compression level, at the cost of considerably longer compression
time.
- Fix regression causing varying size of profiler window for different
captures.
- Added support for on-demand tracing.
- If a client application is compiled with the TRACY_ON_DEMAND macro
defined, tracing will not begin until a connection to server is
established.
- Since data is not fully captured in this mode, the resulting trace will
be less precise, until application state is appropriately reset. For
example, locks need to be fully released, zone stacks need to be
flushed. This is an automatic process.
- All tracing macros are able to work in the on-demand mode.
- Improved compatibility with various system setups.
- Aside from using TRACY_NO_EXIT define you can also set the same-named
environmental variable to 1 to get the same effect.
- Added ability to show/hide all threads and plots.
- Performance improvements.
- Improvements to memory data presentation.
- Added memory allocation info window.
- Selecting memory allocation on a plot will draw time range of the
allocation.
- Middle clicking on an memory allocation address (or on a button in
memory allocation info window) will zoom the view to the allocation
range.
- Find zone menu improvements:
- Zones can be now also grouped by call stacks.
- Zone groups can be now also sorted by time spend in each zone.
- Zone groups list now displays group times.
- Average and median zone times are now displayed on the histogram.
- Selected zones will be highlighted on the timeline view.
- Added named versions of tracing macros that allow specifying scoped
variable name.
- The main profiler window is now kept at the bottom of windows stack.
- The "profiler" utility will now use a custom embedded font.
- Microseconds are now displayed using correct symbol ('μ' instead of 'u').
- Unix builds of the "profiler" utility will now ask for a file name when
saving a trace.
- Progress popup is now displayed when a trace file is loading.
- Zones that share source location with a zone that is hovered over are now
highlighted.
- Added ability to zoom-in to a selection range made using middle mouse
button.
- Holding the ctrl key will switch to zoom-out mode.
- The "profiler" utility will use less resources when its window is
out-of-focus or minimized.
- Added support for cross-DLL profiling.
- Items in options menu (locks, threads, etc.) are now described with number
of events.
- Source location of lock declaration is also provided.
- Created an extensive user manual for the profiler.
- Added ability to capture multiple frame sets.
- Viewer will display multiple frame ranges at once.
- Only one frame set can be active at once. The selected one is used for
the frame navigation graph, frame navigation buttons and drawing frame
separators.
- The active frame set will be highlighted, and the rest will be dimmed
out.
- Frames can now also be discontinuous.
- Frames and zones too small to be displayed will be marked with a zig-zag
pattern.
- General improvements to message list and message markers.
- Hovering over message on a list will highlight its marker (previously it
only worked the other way).
- Left clicking on a message marker will focus the message list on the
selected message.
- Middle clicking on a message marker will center it on screen.
- Added trace information window.
- This includes frame time statistics and histogram.
- Displayed memory sizes are now properly formatted.
- Added call stack tree for memory allocations.
- You can display allocations list for each call stack tree entry.
- The source code of the profiled application may now be viewed in the
profiler.
- BIG FAT WARNING: The actual profiled program source code is not known to
the profiler. It only checks if there is a file on your disk that
matches the file name of the captured source location. Even if the file
is displayed, it may be out of date.
- CPU and GPU zones will have "Source" button, if source file can be
opened.
- Source files for call stack traces can be opened by right-clicking on
the file name. Since in this case there is no button that can be hidden,
a small animation will be played to notify user if the source cannot be
opened.
- The main profiler view will now occupy the whole window. Previous behavior
is still available for embedded use cases.
- Many button labels are now accompanied by icons.
- Fonts should now be less blurry.
- "Go to parent" button in zone info window won't be displayed if there is
no parent to go to.
- Improvements to the compare traces menu.
- There are now colored markers to make it easier to distinguish "this" and
"external" traces.
- The amount of saved time is now displayed (a difference between total
run times of both traces).
- Tracy will now collect host information, like CPU name, amount of system
memory, etc.
- Windows builds of the "profiler" utility will perform a check of supported
CPU instruction set and match it against the one required by the binary
(by default AVX2 is used). If the program cannot be executed on the
processor, a message dialog with workaround instructions will be
displayed.
- Tracy can intercept crashes and finish sending data from a dying process.
- Currently this is only implemented on Windows, Linux and Android.
- Call stack window may now display addresses of the frames, instead of
source file locations.
- Memory events will now properly register their thread.
- Profiler settings are now stored in a persistent location.
- On Windows settings are stored in %APPDATA%/tracy.
- On other platforms settings are stored in $XDG_CONFIG_HOME/tracy or
$HOME/.config/tracy, if the variable is not set.
- The main profiler window position, size and maximized state are saved
and restored.
- The size and position of internal windows now doesn't depend on the
runtime directory of the profiler executable.
- Added connection handshake.
- Server won't be able to connect to client if there's a protocol version
mismatch.
- Client not in on-demand mode will refuse connections after the first
connection was made and the initial event buffers were cleared.
- A single server will no longer try to connect to multiple clients.
- The capture utility will now display time span of the ongoing capture.
v0.3.3 (2018-07-03)
-------------------
- Breaking change: the format of trace files has changed.
- Previous tracy version will crash when trying to open new traces.
- Loading of traces saved by previous version is supported.
- Tracy will no longer crash when trying to load traces saved by future
versions. Instead, a dialog advising to update will be displayed.
- Tracy will no longer crash in most cases when trying to open files that
are not traces. Some crashes are still possible, due to support of old,
header-less traces.
- Ability to track every memory allocation in profiled program.
- Allocation event queuing must be done in order, which requires exclusive
access to the serialized queue on the client side. This has no effect on
the rest of events, which are stored in a concurrent queue, as before.
- You can search for a memory address and see where it was allocated, for
how long, etc. This lists all matching allocations since the program was
started.
- All active (non-freed) allocations may be listed. This shows the current
memory state by default, but can go back to any point in time.
- Graphical representation of process memory map may be displayed. New
allocations/frees are displayed in a bright color and fade out with
time. This feature also can look back in time.
- Memory usage plot is automatically generated.
- Basic allocation information is displayed in memory plot tooltips.
- A summary of memory events within a zone (and its children) is now
printed in zone info window.
- Support loading profile dumps with no memory allocation data (generated by
v0.2).
- Added ability to display global statistics of a selected zone from the
zone info window.
- Fixed regression with lock announce processing that appeared during
worker/viewer split.
- Allow selecting/unselecting all locks for display.
- Performance improvements.
- Don't save unneeded lock information in trace file.
- Don't save thrash in message list data.
- Allow expanding view span up to one hour, instead of one minute.
- Added trace comparison window.
- An external trace has to be loaded first.
- Zone query in both traces (current and external).
- Both results are overlaid on the same histogram.
- Graphs can be adjusted as-if there was the same number of zones
collected.
- Read time directly from a hardware register on ARM/ARM64, if possible.
- User-space access to the timer needs to be enabled in the kernel, so
tracy will perform run-time checks and fallback to the old method if the
check fails.
- Prevent connections in a TIME-WAIT state from blocking new listen
connections.
- Display y-range of plots.
- Added ability to unload traces loaded from files. To do so close the main
profiler window. You will return to the connect/open selection dialog.
Live captures cannot be terminated this way.
- Zones previously displayed in zone info window are remembered and you can
go back to them. Closing the zone info window or switching between CPU and
GPU zones will clear the memory.
- Improved message list window.
- Messages are now displayed in columns.
- Originating thread of each message is now included in the list.
- Messages can be filtered by the originating thread.
- You can now navigate to next and previous frame.
- Zone statistics can be now displayed using only self times.
- Support for tracing GPU events using Vulkan.
- Timeline will now display "OpenGL context" or "Vulkan context" instead of
"GPU context".
- Fixed regression causing invalid display of GPU context appearance time.
- Fixed regression causing invalid reporting of an active CPU in zone end
events, if MSVC rdtscp optimization was not enabled.
- Ability to collect true call stacks.
- Supported on Windows, Linux, Android.
- The following events can collect call stacks:
- Memory alloc/free.
- Zone begin.
- GPU zone begin.
- Zone stack trace now also displays frames from a real call trace.
- On Linux call stack frame name resolution requires a call to dladdr,
which in turn requires linking with libdl.
- Allow manual entry of GPU time drift value.
- Unix build system no longer shares object files between different build
units.
- Fixes inability to build debug and release versions of a single utility
without "make clean".
- Fixes incompatibility between "standalone" and "capture" utilities due
to different set of used feature flags.
- On Windows "standalone" utility now adapts to system DPI setting.
- Optional per-call zone naming.
v0.2 (2018-04-05)
-----------------
- Fixed broken TRACY_NO_EXIT behavior.
- Visual refresh (new color scheme).
- Added optional support for live in-depth zone analysis.
- Ability to search for zones matching a query.
- Histogram of zone time spans.
- List occurrences of a zone, grouped by thread, or by user text.
- Zone groups can be selected and highlighted on histogram graph.
- Support for linear and logarithmic display of time and values.
- Histogram bins can show zone counts or total execution time.
- Listed zones can be narrowed down by data range selection on histogram.
- Separation of server data handling code from the visualisation.
- Implementation of a command line capture utility.
- Support libraries have been updated.
- Fixed an issue that prevented de-duplication of source location payloads.
- Fixed an issue that prevented the ability to disable threads in settings
menu, if two threads had the same name.
- Performance optimizations.
- Visual clean up of the settings menu.
- Zone info windows improvements.
- Visual improvements to zone info window child list.
- Zone info windows now show zone thread.
- Display zone stack trace.
- Hide pause/resume button if there's no data connection (i.e. trace was
loaded from file).
- Source location statistics view has been added.
- Fixed crash when a saved trace was opened, but no trace capture session
was performed before.
- Standalone server will now open trace files passed as an argument to the
executable.
- Fix possible crash in SetThreadName, that could happen if TLS init was
delayed until first use of thread local variable.
- Store full thread name if pthreads (with 15 character name limit) are
used.
- Properly handle unaligned memory access (no performance impact).
- Fixed broken lock identifiers in try_lock().
v0.1 (2017-12-18)
-----------------
- Initial release.

127
README.md
View File

@@ -1,126 +1,19 @@
# Tracy Profiler
Tracy is a real time, nanosecond resolution frame profiler that can be used for remote or embedded telemetry of your application. It can profile both CPU (C++, Lua) and GPU (OpenGL). It also can display locks held by threads and their interactions with each other.
[![Build status](https://ci.appveyor.com/api/projects/status/968a88arq06gm3el/branch/master?svg=true)](https://ci.appveyor.com/project/wolfpld/tracy/branch/master)
![](doc/profiler.png)
Tracy requires compiler support for C++11, Thread Local Storage and a way to workaround static initialization order fiasco. There are no other requirements. The following platforms are confirmed to be working (this is not a complete list):
### A real time, nanosecond resolution, remote telemetry frame profiler for games and other applications.
- Windows (x86, x64)
- Linux (x86, x64, ARM, ARM64)
- Android (ARM, x86)
- FreeBSD (x64)
- Cygwin (x64)
- WSL (x64)
- OSX (x64)
Tracy supports profiling CPU (C, C++11, Lua), GPU (OpenGL, Vulkan), memory, locks, context switches, per-frame screenshots and more.
The following compilers are supported:
For usage instructions, consult the user manual [at the following address](https://bitbucket.org/wolfpld/tracy/downloads/tracy.pdf).
- MSVC
- gcc
- clang
[Changelog](NEWS)
### High-level overview
![](doc/design.svg)
Tracy is split into client and server side. The client side collects events using a high-efficiency queue and awaits for an incoming connection. The server part connects to client and receives collected data from the client, which is then reconstructed into a viewable timeline. The transfer is performed using a TCP connection.
### Performance impact
To check how much slowdown is introduced by using tracy, I have profiled [etcpak](https://bitbucket.org/wolfpld/etcpak), which is the fastest ETC texture compression utility there is. I used an 8192×8192 test image as input data and instrumented everything down to the 4×4 pixel block compression function (that's 4 million blocks to compress). It should be noted that tracy needs to calibrate its internal timers at each run. This introduces a delay of 115 ms (on my machine), which is negligible when doing lengthy profiling runs, but it skews the results of etcpak timing. The following times have this delay subtracted, to give focus on zone collection impact, which is the thing that really matters here.
| Scenario | Zones | Clean run | Profiling run | Difference |
|-------------------------------------------------------|---------|-----------|---------------|------------|
| Compression of an image to ETC1 format | 4194568 | 0.94 s | 1.003 s | +0.063 s |
| Compression of an image to ETC2 format, with mip-maps | 5592822 | 1.034 s | 1.119 s | +0.085 s |
In both scenarios the per-zone time cost is at ~15 ns. This is in line with the measured 8 ns single event collection time (each zone has to report start and end event).
## Usage instructions
#### Initial client setup
Copy files from `tracy/client` and `tracy/common` to your project. Add `tracy/TracyClient.cpp` to source files list. That's all. Tracy is now integrated into your application.
In the default configuration tracy is disabled. To enable it, add a `TRACY_ENABLE` define.
If you want to profile a short-lived application, add a `TRACY_NO_EXIT` define. In this configuration tracy will not exit until an incoming connection is made, even if the application has already finished.
#### Marking zones
To begin data collection, tracy requires that you manually instrument your application (automatic tracing of every entered function is not feasible due to the amount of data that would generate). All the user-facing interface is contained in the `tracy/Tracy.hpp` header file.
To slice the program's execution recording into frame-sized chunks, put the `FrameMark` macro after you have completed rendering the frame. Ideally that would be right after the swap buffers command. Note that this step is optional, as some applications (for example: a compression utility) do not have the concept of a frame.
To record a zone's execution time add the `ZoneScoped` macro at the beginning of the scope you want to measure. This will automatically record function name, source file name and location. Optionally you may use the `ZoneScopedC( 0xRRGGBB )` macro to set a custom color for the zone. Note that the color value will be constant in the recording (don't try to parametrize it). You may also set a custom name for the zone, using the `ZoneScopedN( name )` macro, where name is a string literal. Color and name may be combined by using the `ZoneScopedNC( name, color )` macro.
Use the `ZoneText( const char* text, size_t size )` macro to add a custom text string that will be displayed along the zone information (for example, name of the file you are opening). Note that every time `ZoneText` is invoked, a memory allocation is performed to store an internal copy of the data. The provided string is not used by tracy after ZoneText returns.
#### Marking locks
Tracy can collect and display lock interactions in threads.
![](doc/locks.png)
To mark a lock (mutex) for event reporting, use the `TracyLockable( type, varname )` macro. Note that the lock must implement a [Lockable concept](http://en.cppreference.com/w/cpp/concept/Lockable) (i.e. there's no support for timed mutices). For a concrete example, you would replace the line `std::mutex m_lock` with `TracyLockable( std::mutex, m_lock )`. You may use `TracyLockableN( type, varname, description )` to provide a custom lock name.
The standard `std::lock_guard` and `std::unique_lock` wrappers should use the `LockableBase( type )` macro for their template parameter (unless you're using C++17, with improved template argument deduction). For example, `std::lock_guard<LockableBase( std::mutex )> lock( m_lock )`.
To mark the location of lock being held, use the `LockMark( varname )` macro, after you have obtained the lock. Note that the varname must be a lock variable (a reference is also valid). This step is optional.
Similarly, you can use `TracySharedLockable`, `TracySharedLockableN` and `SharedLockableBase` to mark locks implementing the [SharedMutex concept](http://en.cppreference.com/w/cpp/concept/SharedMutex). Note that while there's no support for timed mutices in tracy, both `std::shared_mutex` and `std::shared_timed_mutex` may be used.
#### Plotting data
Tracy is able to capture and draw value changes over time. You may use it to analyse memory usage, draw call count, etc. To report data, use the `TracyPlot( name, value )` macro.
![](doc/plot.png)
#### Message log
Fast navigation in large data set and correlation of zones with what was happening in application may be difficult. To ease these issues tracy provides a message log functionality. You can send messages (for example, your typical debug output) using the `TracyMessage( text, size )` macro (tracy will allocate memory for message storage). Alternatively, use `TracyMessageL( text )` for string literal messages. Messages are displayed on a chronological list and in the zone view.
![](doc/messages.png)
#### Running the server
The easiest way to get going is to build the standalone server, available in the `standalone` directory. You can connect to localhost or remote clients and view the collected data right away.
Alternatively, you may want to embed the server in your application, the same which is running the client part of tracy. Doing so requires that you also include the `server` and `imgui` directories. Include the `tracy/server/TracyView.hpp` header file, create an instance of the `tracy::View` class and call its `Draw()` method every frame. Unfortunately, there's also the hard part - you need to integrate the imgui library into the innards of your program. How to do so is outside the scope of this document.
#### Lua support
To profile Lua code using tracy, include the `tracy/TracyLua.hpp` header file in your Lua wrapper and execute `tracy::LuaRegister( lua_State* )` function to add instrumentation support. In your Lua code, add `tracy.ZoneBegin()` and `tracy.ZoneEnd()` calls to mark execution zones. Double check if you have included all return paths! Use `tracy.ZoneBeginN( name )` to set zone name. Use `tracy.ZoneText( text )` to set zone text. Use `tracy.Message( text )` to send messages.
Even if tracy is disabled, you still have to pay the no-op function call cost. To prevent that you may want to use the `tracy::LuaRemove( char* script )` function, which will replace instrumentation calls with whitespace. This function does nothing if profiler is enabled.
#### GPU profiling
Tracy provides bindings for profiling OpenGL execution time on GPU. To use it, you will need to include the `tracy/TracyOpenGL.hpp` header file and declare each of your rendering contexts using the `TracyGpuContext` macro (typically you will only have one context). Tracy expects no more than one context per thread and no context migration.
To mark a GPU zone use the `TracyGpuZone( name )` macro, where `name` is a string literal name of the zone. Alternatively you may use `TracyGpuZoneC( name, color )` to specify zone color.
You also need to periodically collect the GPU events using the `TracyGpuCollect` macro. A good place to do it is after swap buffers function call.
GPU profiling is not supported on OSX, iOS (because Apple is unable to implement standards properly). Android devices do work, if GPU drivers are not broken. Disjoint events are not currently handled, so some readings may be a bit spotty. NVIDIA drivers are unable to provide consistent timing results when two OpenGL contexts are used simultaneously.
## Good practices
- Remember to set thread names for proper identification of threads. You may use the functions exposed in the `tracy/common/TracySystem.hpp` header to do so. Note that the max thread name length in pthreads is limited to 15 characters. Proper thread naming support is available in MSVC only if you are using Windows SDK 10.0.15063 or newer.
- Enable the MSVC String Pooling option (`/GF`) or the gcc counterpart, `-fmerge-constants`. This will reduce number of queries the server needs to perform to the client. Note that these options are enabled in optimized builds by default.
## Practical considerations
Tracy's time measurement precision is not infinite. It's only as good as the system-provided timers are.
- On the embedded ARM-based systems you can expect to have 1 µs time resolution. Some hardware is able to provide tens to hundreds nanoseconds resolution.
- On x86 the time resolution depends on the hardware implementation of the RDTSCP instruction and typically is a couple of nanoseconds. This may vary from one micro-architecture to another and requires a fairly modern (Sandy Bridge) processor for reliable results.
While the data collection is very lightweight, it is not completely free. Each recorded zone event has a cost, which tracy tries to calculate and display on the timeline view, as a red zone. Note that this is an *approximation* of the real cost, which ignores many important factors. For example, you can't determine the impact of cache effects. The CPU frequency may be reduced in some situations, which will increase the recorded time, but the displayed profiler cost will not compensate for that.
![](doc/cost.png)
Lua instrumentation needs to perform additional work (including memory allocation) to store source location. This approximately doubles the data collection cost.
You may use named colors predefined in `common/TracyColor.hpp` (included by `Tracy.hpp`). Visual reference: [wikipedia](https://en.wikipedia.org/wiki/X11_color_names).
[Introduction to Tracy Profiler v0.2](https://www.youtube.com/watch?v=fB5B46lbapc)
[New features in Tracy Profiler v0.3](https://www.youtube.com/watch?v=3SXpDpDh2Uo)
[New features in Tracy Profiler v0.4](https://www.youtube.com/watch?v=eAkgkaO8B9o)
[New features in Tracy Profiler v0.5](https://www.youtube.com/watch?v=P6E7qLMmzTQ)
[New features in Tracy Profiler v0.6](https://www.youtube.com/watch?v=uJkrFgriuOo)

135
Tracy.hpp
View File

@@ -6,15 +6,25 @@
#ifndef TRACY_ENABLE
#define ZoneNamed(x,y)
#define ZoneNamedN(x,y,z)
#define ZoneNamedC(x,y,z)
#define ZoneNamedNC(x,y,z,w)
#define ZoneScoped
#define ZoneScopedN(x)
#define ZoneScopedC(x)
#define ZoneScopedNC(x,y)
#define ZoneText(x,y)
#define ZoneName(x)
#define ZoneName(x,y)
#define FrameMark
#define FrameMarkNamed(x)
#define FrameMarkStart(x)
#define FrameMarkEnd(x)
#define FrameImage(x,y,z,w,a)
#define TracyLockable( type, varname ) type varname;
#define TracyLockableN( type, varname, desc ) type varname;
@@ -25,9 +35,34 @@
#define LockMark(x) (void)x;
#define TracyPlot(x,y)
#define TracyPlotConfig(x,y)
#define TracyMessage(x,y)
#define TracyMessageL(x)
#define TracyMessageC(x,y,z)
#define TracyMessageLC(x,y)
#define TracyAppInfo(x,y)
#define TracyAlloc(x,y)
#define TracyFree(x)
#define ZoneNamedS(x,y,z)
#define ZoneNamedNS(x,y,z,w)
#define ZoneNamedCS(x,y,z,w)
#define ZoneNamedNCS(x,y,z,w,a)
#define ZoneScopedS(x)
#define ZoneScopedNS(x,y)
#define ZoneScopedCS(x,y)
#define ZoneScopedNCS(x,y,z)
#define TracyAllocS(x,y,z)
#define TracyFreeS(x,y)
#define TracyMessageS(x,y,z)
#define TracyMessageLS(x,y)
#define TracyMessageCS(x,y,z,w)
#define TracyMessageLCS(x,y,z)
#else
@@ -35,27 +70,101 @@
#include "client/TracyProfiler.hpp"
#include "client/TracyScoped.hpp"
#define ZoneScoped static const tracy::SourceLocation __tracy_source_location { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone ___tracy_scoped_zone( &__tracy_source_location );
#define ZoneScopedN( name ) static const tracy::SourceLocation __tracy_source_location { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone ___tracy_scoped_zone( &__tracy_source_location );
#define ZoneScopedC( color ) static const tracy::SourceLocation __tracy_source_location { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone ___tracy_scoped_zone( &__tracy_source_location );
#define ZoneScopedNC( name, color ) static const tracy::SourceLocation __tracy_source_location { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone ___tracy_scoped_zone( &__tracy_source_location );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define ZoneNamed( varname, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define ZoneNamedN( varname, name, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define ZoneNamedC( varname, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define ZoneNamedNC( varname, name, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
#else
# define ZoneNamed( varname, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), active );
# define ZoneNamedN( varname, name, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), active );
# define ZoneNamedC( varname, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), active );
# define ZoneNamedNC( varname, name, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), active );
#endif
#define ZoneScoped ZoneNamed( ___tracy_scoped_zone, true )
#define ZoneScopedN( name ) ZoneNamedN( ___tracy_scoped_zone, name, true )
#define ZoneScopedC( color ) ZoneNamedC( ___tracy_scoped_zone, color, true )
#define ZoneScopedNC( name, color ) ZoneNamedNC( ___tracy_scoped_zone, name, color, true )
#define ZoneText( txt, size ) ___tracy_scoped_zone.Text( txt, size );
#define ZoneName( txt, size ) ___tracy_scoped_zone.Name( txt, size );
#define FrameMark tracy::Profiler::FrameMark();
#define FrameMark tracy::Profiler::SendFrameMark( nullptr );
#define FrameMarkNamed( name ) tracy::Profiler::SendFrameMark( name );
#define FrameMarkStart( name ) tracy::Profiler::SendFrameMark( name, tracy::QueueType::FrameMarkMsgStart );
#define FrameMarkEnd( name ) tracy::Profiler::SendFrameMark( name, tracy::QueueType::FrameMarkMsgEnd );
#define TracyLockable( type, varname ) tracy::Lockable<type> varname { [] () -> const tracy::SourceLocation* { static const tracy::SourceLocation srcloc { nullptr, #type " " #varname, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracyLockableN( type, varname, desc ) tracy::Lockable<type> varname { [] () -> const tracy::SourceLocation* { static const tracy::SourceLocation srcloc { nullptr, desc, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracySharedLockable( type, varname ) tracy::SharedLockable<type> varname { [] () -> const tracy::SourceLocation* { static const tracy::SourceLocation srcloc { nullptr, #type " " #varname, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracySharedLockableN( type, varname, desc ) tracy::SharedLockable<type> varname { [] () -> const tracy::SourceLocation* { static const tracy::SourceLocation srcloc { nullptr, desc, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define FrameImage( image, width, height, offset, flip ) tracy::Profiler::SendFrameImage( image, width, height, offset, flip );
#define TracyLockable( type, varname ) tracy::Lockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, #type " " #varname, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracyLockableN( type, varname, desc ) tracy::Lockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, desc, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracySharedLockable( type, varname ) tracy::SharedLockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, #type " " #varname, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracySharedLockableN( type, varname, desc ) tracy::SharedLockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, desc, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define LockableBase( type ) tracy::Lockable<type>
#define SharedLockableBase( type ) tracy::SharedLockable<type>
#define LockMark( varname ) static const tracy::SourceLocation __tracy_lock_location_##varname { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; varname.Mark( &__tracy_lock_location_##varname );
#define LockMark( varname ) static const tracy::SourceLocationData __tracy_lock_location_##varname { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; varname.Mark( &__tracy_lock_location_##varname );
#define TracyPlot( name, val ) tracy::Profiler::PlotData( name, val );
#define TracyPlotConfig( name, type ) tracy::Profiler::ConfigurePlot( name, type );
#define TracyMessage( txt, size ) tracy::Profiler::Message( txt, size );
#define TracyMessageL( txt ) tracy::Profiler::Message( txt );
#define TracyAppInfo( txt, size ) tracy::Profiler::MessageAppInfo( txt, size );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyMessage( txt, size ) tracy::Profiler::Message( txt, size, TRACY_CALLSTACK );
# define TracyMessageL( txt ) tracy::Profiler::Message( txt, TRACY_CALLSTACK );
# define TracyMessageC( txt, size, color ) tracy::Profiler::MessageColor( txt, size, color, TRACY_CALLSTACK );
# define TracyMessageLC( txt, color ) tracy::Profiler::MessageColor( txt, color, TRACY_CALLSTACK );
# define TracyAlloc( ptr, size ) tracy::Profiler::MemAllocCallstack( ptr, size, TRACY_CALLSTACK );
# define TracyFree( ptr ) tracy::Profiler::MemFreeCallstack( ptr, TRACY_CALLSTACK );
#else
# define TracyMessage( txt, size ) tracy::Profiler::Message( txt, size, 0 );
# define TracyMessageL( txt ) tracy::Profiler::Message( txt, 0 );
# define TracyMessageC( txt, size, color ) tracy::Profiler::MessageColor( txt, size, color, 0 );
# define TracyMessageLC( txt, color ) tracy::Profiler::MessageColor( txt, color, 0 );
# define TracyAlloc( ptr, size ) tracy::Profiler::MemAlloc( ptr, size );
# define TracyFree( ptr ) tracy::Profiler::MemFree( ptr );
#endif
#ifdef TRACY_HAS_CALLSTACK
# define ZoneNamedS( varname, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define ZoneNamedNS( varname, name, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define ZoneNamedCS( varname, color, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define ZoneNamedNCS( varname, name, color, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define ZoneScopedS( depth ) ZoneNamedS( ___tracy_scoped_zone, depth, true )
# define ZoneScopedNS( name, depth ) ZoneNamedNS( ___tracy_scoped_zone, name, depth, true )
# define ZoneScopedCS( color, depth ) ZoneNamedCS( ___tracy_scoped_zone, color, depth, true )
# define ZoneScopedNCS( name, color, depth ) ZoneNamedNCS( ___tracy_scoped_zone, name, color depth, true )
# define TracyAllocS( ptr, size, depth ) tracy::Profiler::MemAllocCallstack( ptr, size, depth );
# define TracyFreeS( ptr, depth ) tracy::Profiler::MemFreeCallstack( ptr, depth );
# define TracyMessageS( txt, size, depth ) tracy::Profiler::Message( txt, size, depth );
# define TracyMessageLS( txt, depth ) tracy::Profiler::Message( txt, depth );
# define TracyMessageCS( txt, size, color, depth ) tracy::Profiler::MessageColor( txt, size, color, depth );
# define TracyMessageLCS( txt, color, depth ) tracy::Profiler::MessageColor( txt, color, depth );
#else
# define ZoneNamedS( varname, depth, active ) ZoneNamed( varname, active )
# define ZoneNamedNS( varname, name, depth, active ) ZoneNamedN( varname, name, active )
# define ZoneNamedCS( varname, color, depth, active ) ZoneNamedC( varname, color, active )
# define ZoneNamedNCS( varname, name, color, depth, active ) ZoneNamedNC( varname, name, color, active )
# define ZoneScopedS( depth ) ZoneScoped
# define ZoneScopedNS( name, depth ) ZoneScopedN( name )
# define ZoneScopedCS( color, depth ) ZoneScopedC( color )
# define ZoneScopedNCS( name, color, depth ) ZoneScopedNC( name, color )
# define TracyAllocS( ptr, size, depth ) TracyAlloc( ptr, size )
# define TracyFreeS( ptr, depth ) TracyFree( ptr )
# define TracyMessageS( txt, size, depth ) TracyMessage( txt, size )
# define TracyMessageLS( txt, depth ) TracyMessageL( txt )
# define TracyMessageCS( txt, size, color, depth ) TracyMessageC( txt, size, color )
# define TracyMessageLCS( txt, color, depth ) TracyMessageLC( txt, color )
#endif
#endif

188
TracyC.h Normal file
View File

@@ -0,0 +1,188 @@
#ifndef __TRACYC_HPP__
#define __TRACYC_HPP__
#include <stddef.h>
#include <stdint.h>
#include "client/TracyCallstack.h"
#ifdef __cplusplus
extern "C" {
#endif
#ifndef TRACY_ENABLE
typedef const void* TracyCZoneCtx;
#define TracyCZone(c,x)
#define TracyCZoneN(c,x,y)
#define TracyCZoneC(c,x,y)
#define TracyCZoneNC(c,x,y,z)
#define TracyCZoneEnd(c)
#define TracyCZoneText(c,x,y)
#define TracyCZoneName(c,x,y)
#define TracyCAlloc(x,y)
#define TracyCFree(x)
#define TracyCFrameMark
#define TracyCFrameMarkNamed(x)
#define TracyCFrameMarkStart(x)
#define TracyCFrameMarkEnd(x)
#define TracyCFrameImage(x,y,z,w,a)
#define TracyCPlot(x,y)
#define TracyCMessage(x,y)
#define TracyCMessageL(x)
#define TracyCMessageC(x,y,z)
#define TracyCMessageLC(x,y)
#define TracyCAppInfo(x,y)
#define TracyCZoneS(x,y,z)
#define TracyCZoneNS(x,y,z,w)
#define TracyCZoneCS(x,y,z,w)
#define TracyCZoneNCS(x,y,z,w,a)
#define TracyCAllocS(x,y,z)
#define TracyCFreeS(x,y)
#define TracyCMessageS(x,y,z)
#define TracyCMessageLS(x,y)
#define TracyCMessageCS(x,y,z,w)
#define TracyCMessageLCS(x,y,z)
#else
#ifndef TracyConcat
# define TracyConcat(x,y) TracyConcatIndirect(x,y)
#endif
#ifndef TracyConcatIndirect
# define TracyConcatIndirect(x,y) x##y
#endif
struct ___tracy_source_location_data
{
const char* name;
const char* function;
const char* file;
uint32_t line;
uint32_t color;
};
struct ___tracy_c_zone_context
{
uint32_t id;
int active;
};
// Some containers don't support storing const types.
// This struct, as visible to user, is immutable, so treat it as if const was declared here.
typedef /*const*/ struct ___tracy_c_zone_context TracyCZoneCtx;
TRACY_API TracyCZoneCtx ___tracy_emit_zone_begin( const struct ___tracy_source_location_data* srcloc, int active );
TRACY_API TracyCZoneCtx ___tracy_emit_zone_begin_callstack( const struct ___tracy_source_location_data* srcloc, int depth, int active );
TRACY_API void ___tracy_emit_zone_end( TracyCZoneCtx ctx );
TRACY_API void ___tracy_emit_zone_text( TracyCZoneCtx ctx, const char* txt, size_t size );
TRACY_API void ___tracy_emit_zone_name( TracyCZoneCtx ctx, const char* txt, size_t size );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyCZone( ctx, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCZoneN( ctx, name, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCZoneC( ctx, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCZoneNC( ctx, name, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
#else
# define TracyCZone( ctx, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
# define TracyCZoneN( ctx, name, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
# define TracyCZoneC( ctx, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
# define TracyCZoneNC( ctx, name, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
#endif
#define TracyCZoneEnd( ctx ) ___tracy_emit_zone_end( ctx );
#define TracyCZoneText( ctx, txt, size ) ___tracy_emit_zone_text( ctx, txt, size );
#define TracyCZoneName( ctx, txt, size ) ___tracy_emit_zone_name( ctx, txt, size );
TRACY_API void ___tracy_emit_memory_alloc( const void* ptr, size_t size );
TRACY_API void ___tracy_emit_memory_alloc_callstack( const void* ptr, size_t size, int depth );
TRACY_API void ___tracy_emit_memory_free( const void* ptr );
TRACY_API void ___tracy_emit_memory_free_callstack( const void* ptr, int depth );
TRACY_API void ___tracy_emit_message( const char* txt, size_t size, int callstack );
TRACY_API void ___tracy_emit_messageL( const char* txt, int callstack );
TRACY_API void ___tracy_emit_messageC( const char* txt, size_t size, uint32_t color, int callstack );
TRACY_API void ___tracy_emit_messageLC( const char* txt, uint32_t color, int callstack );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyCAlloc( ptr, size ) ___tracy_emit_memory_alloc_callstack( ptr, size, TRACY_CALLSTACK )
# define TracyCFree( ptr ) ___tracy_emit_memory_alloc_free_callstack( ptr, TRACY_CALLSTACK )
# define TracyCMessage( txt, size ) ___tracy_emit_message( txt, size, TRACY_CALLSTACK );
# define TracyCMessageL( txt ) ___tracy_emit_messageL( txt, TRACY_CALLSTACK );
# define TracyCMessageC( txt, size, color ) ___tracy_emit_messageC( txt, size, color, TRACY_CALLSTACK );
# define TracyCMessageLC( txt, color ) ___tracy_emit_messageLC( txt, color, TRACY_CALLSTACK );
#else
# define TracyCAlloc( ptr, size ) ___tracy_emit_memory_alloc( ptr, size );
# define TracyCFree( ptr ) ___tracy_emit_memory_free( ptr );
# define TracyCMessage( txt, size ) ___tracy_emit_message( txt, size, 0 );
# define TracyCMessageL( txt ) ___tracy_emit_messageL( txt, 0 );
# define TracyCMessageC( txt, size, color ) ___tracy_emit_messageC( txt, size, color, 0 );
# define TracyCMessageLC( txt, color ) ___tracy_emit_messageLC( txt, color, 0 );
#endif
TRACY_API void ___tracy_emit_frame_mark( const char* name );
TRACY_API void ___tracy_emit_frame_mark_start( const char* name );
TRACY_API void ___tracy_emit_frame_mark_end( const char* name );
TRACY_API void ___tracy_emit_frame_image( const void* image, uint16_t w, uint16_t h, uint8_t offset, int flip );
#define TracyCFrameMark ___tracy_emit_frame_mark( 0 );
#define TracyCFrameMarkNamed( name ) ___tracy_emit_frame_mark( name );
#define TracyCFrameMarkStart( name ) ___tracy_emit_frame_mark_start( name );
#define TracyCFrameMarkEnd( name ) ___tracy_emit_frame_mark_end( name );
#define TracyCFrameImage( image, width, height, offset, flip ) ___tracy_emit_frame_image( image, width, height, offset, flip );
TRACY_API void ___tracy_emit_plot( const char* name, double val );
TRACY_API void ___tracy_emit_message_appinfo( const char* txt, size_t size );
#define TracyCPlot( name, val ) ___tracy_emit_plot( name, val );
#define TracyCAppInfo( txt, color ) ___tracy_emit_message_appinfo( txt, color );
#ifdef TRACY_HAS_CALLSTACK
# define TracyCZoneS( ctx, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCZoneNS( ctx, name, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCZoneCS( ctx, color, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCZoneNCS( ctx, name, color, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCAllocS( ptr, size, depth ) ___tracy_emit_memory_alloc_callstack( ptr, size, depth )
# define TracyCFreeS( ptr, depth ) ___tracy_emit_memory_alloc_free_callstack( ptr, depth )
# define TracyCMessageS( txt, size, depth ) ___tracy_emit_message( txt, size, depth );
# define TracyCMessageLS( txt, depth ) ___tracy_emit_messageL( txt, depth );
# define TracyCMessageCS( txt, size, color, depth ) ___tracy_emit_messageC( txt, size, color, depth );
# define TracyCMessageLCS( txt, color, depth ) ___tracy_emit_messageLC( txt, color, depth );
#else
# define TracyCZoneS( ctx, depth, active ) TracyCZone( ctx, active )
# define TracyCZoneNS( ctx, name, depth, active ) TracyCZoneN( ctx, name, active )
# define TracyCZoneCS( ctx, color, depth, active ) TracyCZoneC( ctx, color, active )
# define TracyCZoneNCS( ctx, name, color, depth, active ) TracyCZoneNC( ctx, name, color, active )
# define TracyCAllocS( ptr, size, depth ) TracyCAlloc( ptr, size )
# define TracyCFreeS( ptr, depth ) TracyCFree( ptr )
# define TracyCMessageS( txt, size, depth ) TracyCMessage( txt, size )
# define TracyCMessageLS( txt, depth ) TracyCMessageL( txt )
# define TracyCMessageCS( txt, size, color, depth ) TracyCMessageC( txt, size, color )
# define TracyCMessageLCS( txt, color, depth ) TracyCMessageLC( txt, color )
#endif
#endif
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -4,7 +4,9 @@
//
// For fast integration, compile and
// link with this source file (and none
// other).
// other) in your executable (or in the
// main DLL / shared object on multi-DLL
// projects).
//
// Define TRACY_ENABLE to enable profiler.
@@ -13,13 +15,29 @@
#ifdef TRACY_ENABLE
#include "client/TracyProfiler.cpp"
#include "common/tracy_lz4.cpp"
#include "client/TracyProfiler.cpp"
#include "client/TracyCallstack.cpp"
#include "client/TracySysTime.cpp"
#include "client/TracySysTrace.cpp"
#include "common/TracySocket.cpp"
#include "client/tracy_rpmalloc.cpp"
#include "client/TracyDxt1.cpp"
#if TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 3
# include "libbacktrace/alloc.cpp"
# include "libbacktrace/dwarf.cpp"
# include "libbacktrace/elf.cpp"
# include "libbacktrace/fileline.cpp"
# include "libbacktrace/mmapio.cpp"
# include "libbacktrace/posix.cpp"
# include "libbacktrace/sort.cpp"
# include "libbacktrace/state.cpp"
#endif
#ifdef _MSC_VER
# pragma comment(lib, "ws2_32.lib")
# pragma comment(lib, "dbghelp.lib")
#endif
#endif

19
TracyClientDLL.cpp Normal file
View File

@@ -0,0 +1,19 @@
//
// Tracy profiler
// ----------------
//
// On multi-DLL projects compile and
// link with this source file (and none
// other) in the executable and in
// DLLs / shared objects that link to
// the main DLL.
//
// Define TRACY_ENABLE to enable profiler.
#ifdef TRACY_ENABLE
# ifndef TRACY_IMPORTS
# define TRACY_IMPORTS 1
# endif
#endif
#include "common/TracySystem.cpp"

View File

@@ -23,10 +23,16 @@ static inline void LuaRegister( lua_State* L )
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneBeginN" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneBeginS" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneBeginNS" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneEnd" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneText" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneName" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "Message" );
lua_setglobal( L, "tracy" );
}
@@ -67,12 +73,30 @@ static inline void LuaRemove( char* script )
memset( script, ' ', end - script );
script = end;
}
else if( strncmp( script + 10, "Name(", 5 ) == 0 )
{
auto end = FindEnd( script + 15 );
memset( script, ' ', end - script );
script = end;
}
else if( strncmp( script + 10, "BeginN(", 7 ) == 0 )
{
auto end = FindEnd( script + 17 );
memset( script, ' ', end - script );
script = end;
}
else if( strncmp( script + 10, "BeginS(", 7 ) == 0 )
{
auto end = FindEnd( script + 17 );
memset( script, ' ', end - script );
script = end;
}
else if( strncmp( script + 10, "BeginNS(", 8 ) == 0 )
{
auto end = FindEnd( script + 18 );
memset( script, ' ', end - script );
script = end;
}
else
{
script += 10;
@@ -100,19 +124,79 @@ static inline void LuaRemove( char* script )
#else
#include <assert.h>
#include "common/TracyColor.hpp"
#include "common/TracyAlign.hpp"
#include "common/TracyForceInline.hpp"
#include "common/TracySystem.hpp"
#include "client/TracyProfiler.hpp"
namespace tracy
{
#ifdef TRACY_ON_DEMAND
TRACY_API LuaZoneState& GetLuaZoneState();
#endif
namespace detail
{
static inline int LuaZoneBegin( lua_State* L )
#ifdef TRACY_HAS_CALLSTACK
static tracy_force_inline void SendLuaCallstack( lua_State* L, uint32_t depth )
{
const uint32_t color = Color::DeepSkyBlue3;
assert( depth <= 64 );
lua_Debug dbg[64];
const char* func[64];
uint32_t fsz[64];
uint32_t ssz[64];
uint32_t spaceNeeded = 4; // cnt
uint32_t cnt;
for( cnt=0; cnt<depth; cnt++ )
{
if( lua_getstack( L, cnt+1, dbg+cnt ) == 0 ) break;
lua_getinfo( L, "Snl", dbg+cnt );
func[cnt] = dbg[cnt].name ? dbg[cnt].name : dbg[cnt].short_src;
fsz[cnt] = uint32_t( strlen( func[cnt] ) );
ssz[cnt] = uint32_t( strlen( dbg[cnt].source ) );
spaceNeeded += fsz[cnt] + ssz[cnt];
}
spaceNeeded += cnt * ( 4 + 4 + 4 ); // source line, function string length, source string length
auto ptr = (char*)tracy_malloc( spaceNeeded + 4 );
auto dst = ptr;
memcpy( dst, &spaceNeeded, 4 ); dst += 4;
memcpy( dst, &cnt, 4 ); dst += 4;
for( uint32_t i=0; i<cnt; i++ )
{
const uint32_t line = dbg[i].currentline;
memcpy( dst, &line, 4 ); dst += 4;
memcpy( dst, fsz+i, 4 ); dst += 4;
memcpy( dst, func[i], fsz[i] ); dst += fsz[i];
memcpy( dst, ssz+i, 4 ); dst += 4;
memcpy( dst, dbg[i].source, ssz[i] ), dst += ssz[i];
}
assert( dst - ptr == spaceNeeded + 4 );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::CallstackAlloc );
MemWrite( &item->callstackAlloc.ptr, (uint64_t)ptr );
MemWrite( &item->callstackAlloc.nativePtr, (uint64_t)Callstack( depth ) );
tail.store( magic + 1, std::memory_order_release );
}
static inline int LuaZoneBeginS( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
@@ -131,29 +215,41 @@ static inline int LuaZoneBegin( lua_State* L )
// 1b null terminator
// ssz source file name
// 1b null terminator
const uint32_t sz = 4 + 4 + 4 + fsz + 1 + ssz + 1;
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memcpy( ptr + 4, &color, 4 );
memset( ptr + 4, 0, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, func, fsz+1 );
memcpy( ptr + 12 + fsz + 1, dbg.source, ssz + 1 );
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneBeginAllocSrcLoc;
item->zoneBegin.time = Profiler::GetTime( item->zoneBegin.cpu );
item->zoneBegin.thread = GetThreadHandle();
item->zoneBegin.srcloc = (uint64_t)ptr;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginAllocSrcLocCallstack );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
#ifdef TRACY_CALLSTACK
const uint32_t depth = TRACY_CALLSTACK;
#else
const auto depth = uint32_t( lua_tointeger( L, 1 ) );
#endif
SendLuaCallstack( L, depth );
return 0;
}
static inline int LuaZoneBeginN( lua_State* L )
static inline int LuaZoneBeginNS( lua_State* L )
{
const uint32_t color = Color::DeepSkyBlue3;
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
@@ -175,75 +271,233 @@ static inline int LuaZoneBeginN( lua_State* L )
// ssz source file name
// 1b null terminator
// nsz zone name
const uint32_t sz = 4 + 4 + 4 + fsz + 1 + ssz + 1 + nsz;
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 + nsz );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memcpy( ptr + 4, &color, 4 );
memset( ptr + 4, 0, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, func, fsz+1 );
memcpy( ptr + 12 + fsz + 1, dbg.source, ssz + 1 );
memcpy( ptr + 12 + fsz + 1 + ssz + 1, name, nsz );
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneBeginAllocSrcLoc;
item->zoneBegin.time = Profiler::GetTime( item->zoneBegin.cpu );
item->zoneBegin.thread = GetThreadHandle();
item->zoneBegin.srcloc = (uint64_t)ptr;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginAllocSrcLocCallstack );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
#ifdef TRACY_CALLSTACK
const uint32_t depth = TRACY_CALLSTACK;
#else
const auto depth = uint32_t( lua_tointeger( L, 2 ) );
#endif
SendLuaCallstack( L, depth );
return 0;
}
#endif
static inline int LuaZoneBegin( lua_State* L )
{
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
return LuaZoneBeginS( L );
#else
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
lua_getinfo( L, "Snl", &dbg );
const uint32_t line = dbg.currentline;
const auto func = dbg.name ? dbg.name : dbg.short_src;
const auto fsz = strlen( func );
const auto ssz = strlen( dbg.source );
// Data layout:
// 4b payload size
// 4b color
// 4b source line
// fsz function name
// 1b null terminator
// ssz source file name
// 1b null terminator
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memset( ptr + 4, 0, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, func, fsz+1 );
memcpy( ptr + 12 + fsz + 1, dbg.source, ssz + 1 );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginAllocSrcLoc );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
return 0;
#endif
}
static inline int LuaZoneBeginN( lua_State* L )
{
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
return LuaZoneBeginNS( L );
#else
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
lua_getinfo( L, "Snl", &dbg );
const uint32_t line = dbg.currentline;
const auto func = dbg.name ? dbg.name : dbg.short_src;
size_t nsz;
const auto name = lua_tolstring( L, 1, &nsz );
const auto fsz = strlen( func );
const auto ssz = strlen( dbg.source );
// Data layout:
// 4b payload size
// 4b color
// 4b source line
// fsz function name
// 1b null terminator
// ssz source file name
// 1b null terminator
// nsz zone name
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 + nsz );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memset( ptr + 4, 0, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, func, fsz+1 );
memcpy( ptr + 12 + fsz + 1, dbg.source, ssz + 1 );
memcpy( ptr + 12 + fsz + 1 + ssz + 1, name, nsz );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginAllocSrcLoc );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
return 0;
#endif
}
static inline int LuaZoneEnd( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
assert( GetLuaZoneState().counter != 0 );
GetLuaZoneState().counter--;
if( !GetLuaZoneState().active ) return 0;
if( !GetProfiler().IsConnected() )
{
GetLuaZoneState().active = false;
return 0;
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneEnd;
item->zoneEnd.time = Profiler::GetTime( item->zoneEnd.cpu );
item->zoneEnd.thread = GetThreadHandle();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneEnd );
MemWrite( &item->zoneEnd.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
return 0;
}
static inline int LuaZoneText( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
if( !GetLuaZoneState().active ) return 0;
if( !GetProfiler().IsConnected() )
{
GetLuaZoneState().active = false;
return 0;
}
#endif
auto txt = lua_tostring( L, 1 );
const auto size = strlen( txt );
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneText;
item->zoneText.thread = GetThreadHandle();
item->zoneText.text = (uint64_t)ptr;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneText );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
return 0;
}
static inline int LuaZoneName( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
if( !GetLuaZoneState().active ) return 0;
if( !GetProfiler().IsConnected() )
{
GetLuaZoneState().active = false;
return 0;
}
#endif
auto txt = lua_tostring( L, 1 );
const auto size = strlen( txt );
Magic magic;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneName );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
return 0;
}
static inline int LuaMessage( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return 0;
#endif
auto txt = lua_tostring( L, 1 );
const auto size = strlen( txt );
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::Message;
item->message.time = Profiler::GetTime();
item->message.thread = GetThreadHandle();
item->message.text = (uint64_t)ptr;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::Message );
MemWrite( &item->message.time, Profiler::GetTime() );
MemWrite( &item->message.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
return 0;
}
@@ -257,10 +511,23 @@ static inline void LuaRegister( lua_State* L )
lua_setfield( L, -2, "ZoneBegin" );
lua_pushcfunction( L, detail::LuaZoneBeginN );
lua_setfield( L, -2, "ZoneBeginN" );
#ifdef TRACY_HAS_CALLSTACK
lua_pushcfunction( L, detail::LuaZoneBeginS );
lua_setfield( L, -2, "ZoneBeginS" );
lua_pushcfunction( L, detail::LuaZoneBeginNS );
lua_setfield( L, -2, "ZoneBeginNS" );
#else
lua_pushcfunction( L, detail::LuaZoneBegin );
lua_setfield( L, -2, "ZoneBeginS" );
lua_pushcfunction( L, detail::LuaZoneBeginN );
lua_setfield( L, -2, "ZoneBeginNS" );
#endif
lua_pushcfunction( L, detail::LuaZoneEnd );
lua_setfield( L, -2, "ZoneEnd" );
lua_pushcfunction( L, detail::LuaZoneText );
lua_setfield( L, -2, "ZoneText" );
lua_pushcfunction( L, detail::LuaZoneName );
lua_setfield( L, -2, "ZoneName" );
lua_pushcfunction( L, detail::LuaMessage );
lua_setfield( L, -2, "Message" );
lua_setglobal( L, "tracy" );

View File

@@ -6,27 +6,76 @@
#if !defined TRACY_ENABLE || defined __APPLE__
#define TracyGpuContext
#define TracyGpuNamedZone(x,y)
#define TracyGpuNamedZoneC(x,y,z)
#define TracyGpuZone(x)
#define TracyGpuZoneC(x,y)
#define TracyGpuCollect
#else
#include <atomic>
#include "Tracy.hpp"
#include "client/TracyProfiler.hpp"
#include "common/TracyAlloc.hpp"
#define TracyGpuContext tracy::s_gpuCtx.ptr = (tracy::GpuCtx*)tracy::tracy_malloc( sizeof( tracy::GpuCtx ) ); new(tracy::s_gpuCtx.ptr) tracy::GpuCtx;
#define TracyGpuZone( name ) static const tracy::SourceLocation __tracy_gpu_source_location { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope ___tracy_gpu_zone( &__tracy_gpu_source_location );
#define TracyGpuZoneC( name, color ) static const tracy::SourceLocation __tracy_gpu_source_location { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope ___tracy_gpu_zone( &__tracy_gpu_source_location );
#define TracyGpuCollect tracy::s_gpuCtx.ptr->Collect();
#define TracyGpuNamedZoneS(x,y,z)
#define TracyGpuNamedZoneCS(x,y,z,w)
#define TracyGpuZoneS(x,y)
#define TracyGpuZoneCS(x,y,z)
namespace tracy
{
struct SourceLocationData;
class GpuCtxScope
{
public:
GpuCtxScope( const SourceLocationData* ) {}
GpuCtxScope( const SourceLocationData*, int depth ) {}
};
}
extern std::atomic<uint16_t> s_gpuCtxCounter;
#else
#include <atomic>
#include <assert.h>
#include <stdlib.h>
#include "Tracy.hpp"
#include "client/TracyProfiler.hpp"
#include "client/TracyCallstack.hpp"
#include "common/TracyAlign.hpp"
#include "common/TracyAlloc.hpp"
#if !defined GL_TIMESTAMP && defined GL_TIMESTAMP_EXT
# define GL_TIMESTAMP GL_TIMESTAMP_EXT
# define GL_QUERY_COUNTER_BITS GL_QUERY_COUNTER_BITS_EXT
# define glGetQueryObjectiv glGetQueryObjectivEXT
# define glGetQueryObjectui64v glGetQueryObjectui64vEXT
# define glQueryCounter glQueryCounterEXT
#endif
#define TracyGpuContext tracy::GetGpuCtx().ptr = (tracy::GpuCtx*)tracy::tracy_malloc( sizeof( tracy::GpuCtx ) ); new(tracy::GetGpuCtx().ptr) tracy::GpuCtx;
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyGpuNamedZone( varname, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK );
# define TracyGpuNamedZoneC( varname, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK );
# define TracyGpuZone( name ) TracyGpuNamedZoneS( ___tracy_gpu_zone, name, TRACY_CALLSTACK )
# define TracyGpuZoneC( name, color ) TracyGpuNamedZoneCS( ___tracy_gpu_zone, name, color, TRACY_CALLSTACK )
#else
# define TracyGpuNamedZone( varname, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__) );
# define TracyGpuNamedZoneC( varname, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__) );
# define TracyGpuZone( name ) TracyGpuNamedZone( ___tracy_gpu_zone, name )
# define TracyGpuZoneC( name, color ) TracyGpuNamedZoneC( ___tracy_gpu_zone, name, color )
#endif
#define TracyGpuCollect tracy::GetGpuCtx().ptr->Collect();
#ifdef TRACY_HAS_CALLSTACK
# define TracyGpuNamedZoneS( varname, name, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), depth );
# define TracyGpuNamedZoneCS( varname, name, color, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), depth );
# define TracyGpuZoneS( name, depth ) TracyGpuNamedZoneS( ___tracy_gpu_zone, name, depth )
# define TracyGpuZoneCS( name, color, depth ) TracyGpuNamedZoneCS( ___tracy_gpu_zone, name, color, depth )
#else
# define TracyGpuNamedZoneS( varname, name, depth ) TracyGpuNamedZone( varname, name )
# define TracyGpuNamedZoneCS( varname, name, color, depth ) TracyGpuNamedZoneC( varname, name, color )
# define TracyGpuZoneS( name, depth ) TracyGpuZone( name )
# define TracyGpuZoneCS( name, color, depth ) TracyGpuZoneC( name, color )
#endif
namespace tracy
{
class GpuCtx
{
@@ -36,10 +85,12 @@ class GpuCtx
public:
GpuCtx()
: m_context( s_gpuCtxCounter.fetch_add( 1, std::memory_order_relaxed ) )
: m_context( GetGpuCtxCounter().fetch_add( 1, std::memory_order_relaxed ) )
, m_head( 0 )
, m_tail( 0 )
{
assert( m_context != 255 );
glGenQueries( QueryCount, m_query );
int64_t tgpu;
@@ -49,16 +100,24 @@ public:
GLint bits;
glGetQueryiv( GL_TIMESTAMP, GL_QUERY_COUNTER_BITS, &bits );
const float period = 1.f;
Magic magic;
auto& token = s_token.ptr;
const auto thread = GetThreadHandle();
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::GpuNewContext;
item->gpuNewContext.cpuTime = tcpu;
item->gpuNewContext.gpuTime = tgpu;
item->gpuNewContext.thread = GetThreadHandle();
item->gpuNewContext.context = m_context;
item->gpuNewContext.accuracyBits = bits;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::GpuNewContext );
MemWrite( &item->gpuNewContext.cpuTime, tcpu );
MemWrite( &item->gpuNewContext.gpuTime, tgpu );
MemWrite( &item->gpuNewContext.thread, thread );
MemWrite( &item->gpuNewContext.period, period );
MemWrite( &item->gpuNewContext.context, m_context );
MemWrite( &item->gpuNewContext.accuracyBits, (uint8_t)bits );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
@@ -66,56 +125,38 @@ public:
{
ZoneScopedC( Color::Red4 );
auto start = m_tail;
auto end = m_head + QueryCount;
auto cnt = ( end - start ) % QueryCount;
while( cnt > 1 )
{
auto mid = start + cnt / 2;
GLint available;
glGetQueryObjectiv( m_query[mid % QueryCount], GL_QUERY_RESULT_AVAILABLE, &available );
if( available )
{
start = mid;
}
else
{
end = mid;
}
cnt = ( end - start ) % QueryCount;
}
if( m_tail == m_head ) return;
start %= QueryCount;
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() )
{
m_head = m_tail = 0;
return;
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
while( m_tail != start )
while( m_tail != m_head )
{
GLint available;
glGetQueryObjectiv( m_query[m_tail], GL_QUERY_RESULT_AVAILABLE, &available );
if( !available ) return;
uint64_t time;
glGetQueryObjectui64v( m_query[m_tail], GL_QUERY_RESULT, &time );
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::GpuTime;
item->gpuTime.gpuTime = (int64_t)time;
item->gpuTime.context = m_context;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::GpuTime );
MemWrite( &item->gpuTime.gpuTime, (int64_t)time );
MemWrite( &item->gpuTime.queryId, (uint16_t)m_tail );
MemWrite( &item->gpuTime.context, m_context );
tail.store( magic + 1, std::memory_order_release );
m_tail = ( m_tail + 1 ) % QueryCount;
}
{
int64_t tgpu;
glGetInteger64v( GL_TIMESTAMP, &tgpu );
int64_t tcpu = Profiler::GetTime();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::GpuResync;
item->gpuResync.cpuTime = tcpu;
item->gpuResync.gpuTime = tgpu;
item->gpuResync.context = m_context;
tail.store( magic + 1, std::memory_order_release );
}
}
private:
@@ -124,54 +165,104 @@ private:
const auto id = m_head;
m_head = ( m_head + 1 ) % QueryCount;
assert( m_head != m_tail );
return id;
}
tracy_force_inline unsigned int TranslateOpenGlQueryId( unsigned int id )
{
return m_query[id];
}
tracy_force_inline uint16_t GetId() const
tracy_force_inline uint8_t GetId() const
{
return m_context;
}
unsigned int m_query[QueryCount];
uint16_t m_context;
uint8_t m_context;
unsigned int m_head;
unsigned int m_tail;
};
extern thread_local GpuCtxWrapper s_gpuCtx;
class GpuCtxScope
{
public:
tracy_force_inline GpuCtxScope( const SourceLocation* srcloc )
tracy_force_inline GpuCtxScope( const SourceLocationData* srcloc )
#ifdef TRACY_ON_DEMAND
: m_active( GetProfiler().IsConnected() )
#endif
{
glQueryCounter( s_gpuCtx.ptr->NextQueryId(), GL_TIMESTAMP );
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = GetGpuCtx().ptr->NextQueryId();
glQueryCounter( GetGpuCtx().ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::GpuZoneBegin;
item->gpuZoneBegin.cpuTime = Profiler::GetTime();
item->gpuZoneBegin.srcloc = (uint64_t)srcloc;
item->gpuZoneBegin.context = s_gpuCtx.ptr->GetId();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneBegin );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
memset( &item->gpuZoneBegin.thread, 0, sizeof( item->gpuZoneBegin.thread ) );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, GetGpuCtx().ptr->GetId() );
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline GpuCtxScope( const SourceLocationData* srcloc, int depth )
#ifdef TRACY_ON_DEMAND
: m_active( GetProfiler().IsConnected() )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = GetGpuCtx().ptr->NextQueryId();
glQueryCounter( GetGpuCtx().ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
Magic magic;
const auto thread = GetThreadHandle();
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginCallstack );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, thread );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, GetGpuCtx().ptr->GetId() );
tail.store( magic + 1, std::memory_order_release );
GetProfiler().SendCallstack( depth );
}
tracy_force_inline ~GpuCtxScope()
{
glQueryCounter( s_gpuCtx.ptr->NextQueryId(), GL_TIMESTAMP );
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = GetGpuCtx().ptr->NextQueryId();
glQueryCounter( GetGpuCtx().ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::GpuZoneEnd;
item->gpuZoneEnd.cpuTime = Profiler::GetTime();
item->gpuZoneEnd.context = s_gpuCtx.ptr->GetId();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneEnd );
MemWrite( &item->gpuZoneEnd.cpuTime, Profiler::GetTime() );
memset( &item->gpuZoneEnd.thread, 0, sizeof( item->gpuZoneEnd.thread ) );
MemWrite( &item->gpuZoneEnd.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneEnd.context, GetGpuCtx().ptr->GetId() );
tail.store( magic + 1, std::memory_order_release );
}
private:
#ifdef TRACY_ON_DEMAND
const bool m_active;
#endif
};
}

320
TracyVulkan.hpp Normal file
View File

@@ -0,0 +1,320 @@
#ifndef __TRACYVULKAN_HPP__
#define __TRACYVULKAN_HPP__
#if !defined TRACY_ENABLE
#define TracyVkContext(x,y,z,w) nullptr
#define TracyVkDestroy(x)
#define TracyVkNamedZone(c,x,y,z)
#define TracyVkNamedZoneC(c,x,y,z,w)
#define TracyVkZone(c,x,y)
#define TracyVkZoneC(c,x,y,z)
#define TracyVkCollect(c,x)
#define TracyVkNamedZoneS(c,x,y,z,w)
#define TracyVkNamedZoneCS(c,x,y,z,w,v)
#define TracyVkZoneS(c,x,y,z)
#define TracyVkZoneCS(c,x,y,z,w)
namespace tracy
{
class VkCtxScope {};
}
using TracyVkCtx = void*;
#else
#include <assert.h>
#include <stdlib.h>
#include <vulkan/vulkan.h>
#include "Tracy.hpp"
#include "client/TracyProfiler.hpp"
#include "client/TracyCallstack.hpp"
namespace tracy
{
class VkCtx
{
friend class VkCtxScope;
enum { QueryCount = 64 * 1024 };
public:
VkCtx( VkPhysicalDevice physdev, VkDevice device, VkQueue queue, VkCommandBuffer cmdbuf )
: m_device( device )
, m_context( GetGpuCtxCounter().fetch_add( 1, std::memory_order_relaxed ) )
, m_head( 0 )
, m_tail( 0 )
, m_oldCnt( 0 )
, m_queryCount( QueryCount )
{
assert( m_context != 255 );
VkPhysicalDeviceProperties prop;
vkGetPhysicalDeviceProperties( physdev, &prop );
const float period = prop.limits.timestampPeriod;
VkQueryPoolCreateInfo poolInfo = {};
poolInfo.sType = VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO;
poolInfo.queryCount = m_queryCount;
poolInfo.queryType = VK_QUERY_TYPE_TIMESTAMP;
while( vkCreateQueryPool( device, &poolInfo, nullptr, &m_query ) != VK_SUCCESS )
{
m_queryCount /= 2;
poolInfo.queryCount = m_queryCount;
}
VkCommandBufferBeginInfo beginInfo = {};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
VkSubmitInfo submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &cmdbuf;
vkBeginCommandBuffer( cmdbuf, &beginInfo );
vkCmdResetQueryPool( cmdbuf, m_query, 0, m_queryCount );
vkEndCommandBuffer( cmdbuf );
vkQueueSubmit( queue, 1, &submitInfo, VK_NULL_HANDLE );
vkQueueWaitIdle( queue );
vkBeginCommandBuffer( cmdbuf, &beginInfo );
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, m_query, 0 );
vkEndCommandBuffer( cmdbuf );
vkQueueSubmit( queue, 1, &submitInfo, VK_NULL_HANDLE );
vkQueueWaitIdle( queue );
int64_t tcpu = Profiler::GetTime();
int64_t tgpu;
vkGetQueryPoolResults( device, m_query, 0, 1, sizeof( tgpu ), &tgpu, sizeof( tgpu ), VK_QUERY_RESULT_64_BIT | VK_QUERY_RESULT_WAIT_BIT );
vkBeginCommandBuffer( cmdbuf, &beginInfo );
vkCmdResetQueryPool( cmdbuf, m_query, 0, 1 );
vkEndCommandBuffer( cmdbuf );
vkQueueSubmit( queue, 1, &submitInfo, VK_NULL_HANDLE );
vkQueueWaitIdle( queue );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuNewContext );
MemWrite( &item->gpuNewContext.cpuTime, tcpu );
MemWrite( &item->gpuNewContext.gpuTime, tgpu );
memset( &item->gpuNewContext.thread, 0, sizeof( item->gpuNewContext.thread ) );
MemWrite( &item->gpuNewContext.period, period );
MemWrite( &item->gpuNewContext.context, m_context );
MemWrite( &item->gpuNewContext.accuracyBits, uint8_t( 0 ) );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
Profiler::QueueSerialFinish();
m_res = (int64_t*)tracy_malloc( sizeof( int64_t ) * m_queryCount );
}
~VkCtx()
{
tracy_free( m_res );
vkDestroyQueryPool( m_device, m_query, nullptr );
}
void Collect( VkCommandBuffer cmdbuf )
{
ZoneScopedC( Color::Red4 );
if( m_tail == m_head ) return;
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() )
{
vkCmdResetQueryPool( cmdbuf, m_query, 0, m_queryCount );
m_head = m_tail = 0;
return;
}
#endif
unsigned int cnt;
if( m_oldCnt != 0 )
{
cnt = m_oldCnt;
m_oldCnt = 0;
}
else
{
cnt = m_head < m_tail ? m_queryCount - m_tail : m_head - m_tail;
}
if( vkGetQueryPoolResults( m_device, m_query, m_tail, cnt, sizeof( int64_t ) * m_queryCount, m_res, sizeof( int64_t ), VK_QUERY_RESULT_64_BIT ) == VK_NOT_READY )
{
m_oldCnt = cnt;
return;
}
for( unsigned int idx=0; idx<cnt; idx++ )
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuTime );
MemWrite( &item->gpuTime.gpuTime, m_res[idx] );
MemWrite( &item->gpuTime.queryId, uint16_t( m_tail + idx ) );
MemWrite( &item->gpuTime.context, m_context );
Profiler::QueueSerialFinish();
}
vkCmdResetQueryPool( cmdbuf, m_query, m_tail, cnt );
m_tail += cnt;
if( m_tail == m_queryCount ) m_tail = 0;
}
private:
tracy_force_inline unsigned int NextQueryId()
{
const auto id = m_head;
m_head = ( m_head + 1 ) % m_queryCount;
assert( m_head != m_tail );
return id;
}
tracy_force_inline uint8_t GetId() const
{
return m_context;
}
VkDevice m_device;
VkQueryPool m_query;
uint8_t m_context;
unsigned int m_head;
unsigned int m_tail;
unsigned int m_oldCnt;
unsigned int m_queryCount;
int64_t* m_res;
};
class VkCtxScope
{
public:
tracy_force_inline VkCtxScope( VkCtx* ctx, const SourceLocationData* srcloc, VkCommandBuffer cmdbuf )
: m_cmdbuf( cmdbuf )
, m_ctx( ctx )
#ifdef TRACY_ON_DEMAND
, m_active( GetProfiler().IsConnected() )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = ctx->NextQueryId();
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, ctx->m_query, queryId );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginSerial );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, ctx->GetId() );
Profiler::QueueSerialFinish();
}
tracy_force_inline VkCtxScope( VkCtx* ctx, const SourceLocationData* srcloc, VkCommandBuffer cmdbuf, int depth )
: m_cmdbuf( cmdbuf )
, m_ctx( ctx )
#ifdef TRACY_ON_DEMAND
, m_active( GetProfiler().IsConnected() )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = ctx->NextQueryId();
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, ctx->m_query, queryId );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginCallstackSerial );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, ctx->GetId() );
Profiler::QueueSerialFinish();
GetProfiler().SendCallstack( depth );
}
tracy_force_inline ~VkCtxScope()
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = m_ctx->NextQueryId();
vkCmdWriteTimestamp( m_cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, m_ctx->m_query, queryId );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneEndSerial );
MemWrite( &item->gpuZoneEnd.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneEnd.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneEnd.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneEnd.context, m_ctx->GetId() );
Profiler::QueueSerialFinish();
}
private:
VkCommandBuffer m_cmdbuf;
VkCtx* m_ctx;
#ifdef TRACY_ON_DEMAND
const bool m_active;
#endif
};
static inline VkCtx* CreateVkContext( VkPhysicalDevice physdev, VkDevice device, VkQueue queue, VkCommandBuffer cmdbuf )
{
auto ctx = (VkCtx*)tracy_malloc( sizeof( VkCtx ) );
new(ctx) VkCtx( physdev, device, queue, cmdbuf );
return ctx;
}
static inline void DestroyVkContext( VkCtx* ctx )
{
ctx->~VkCtx();
tracy_free( ctx );
}
}
using TracyVkCtx = tracy::VkCtx*;
#define TracyVkContext( physdev, device, queue, cmdbuf ) tracy::CreateVkContext( physdev, device, queue, cmdbuf );
#define TracyVkDestroy( ctx ) tracy::DestroyVkContext( ctx );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyVkNamedZone( ctx, varname, cmdbuf, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, TRACY_CALLSTACK );
# define TracyVkNamedZoneC( ctx, varname, cmdbuf, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, TRACY_CALLSTACK );
# define TracyVkZone( ctx, cmdbuf, name ) TracyVkNamedZoneS( ctx, ___tracy_gpu_zone, cmdbuf, name, TRACY_CALLSTACK )
# define TracyVkZoneC( ctx, cmdbuf, name, color ) TracyVkNamedZoneCS( ctx, ___tracy_gpu_zone, cmdbuf, name, color, TRACY_CALLSTACK )
#else
# define TracyVkNamedZone( ctx, varname, cmdbuf, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf );
# define TracyVkNamedZoneC( ctx, varname, cmdbuf, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf );
# define TracyVkZone( ctx, cmdbuf, name ) TracyVkNamedZone( ctx, ___tracy_gpu_zone, cmdbuf, name )
# define TracyVkZoneC( ctx, cmdbuf, name, color ) TracyVkNamedZoneC( ctx, ___tracy_gpu_zone, cmdbuf, name, color )
#endif
#define TracyVkCollect( ctx, cmdbuf ) ctx->Collect( cmdbuf );
#ifdef TRACY_HAS_CALLSTACK
# define TracyVkNamedZoneS( ctx, varname, cmdbuf, name, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, depth );
# define TracyVkNamedZoneCS( ctx, varname, cmdbuf, name, color, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, depth );
# define TracyVkZoneS( ctx, cmdbuf, name, depth ) TracyVkNamedZoneS( ctx, ___tracy_gpu_zone, cmdbuf, name, depth )
# define TracyVkZoneCS( ctx, cmdbuf, name, color, depth ) TracyVkNamedZoneCS( ctx, ___tracy_gpu_zone, cmdbuf, name, color, depth )
#else
# define TracyVkNamedZoneS( ctx, varname, cmdbuf, name, depth ) TracyVkNamedZone( ctx, varname, cmdbuf, name )
# define TracyVkNamedZoneCS( ctx, varname, cmdbuf, name, color, depth ) TracyVkNamedZoneC( ctx, varname, cmdbuf, name, color )
# define TracyVkZoneS( ctx, cmdbuf, name, depth ) TracyVkZone( ctx, cmdbuf, name )
# define TracyVkZoneCS( ctx, cmdbuf, name, color, depth ) TracyVkZoneC( ctx, cmdbuf, name, color )
#endif
#endif
#endif

View File

@@ -0,0 +1,60 @@
CFLAGS +=
CXXFLAGS := $(CFLAGS) -std=gnu++17
DEFINES += -DTRACY_NO_STATISTICS
INCLUDES :=
LIBS := -lpthread
PROJECT := capture
IMAGE := $(PROJECT)-$(BUILD)
FILTER :=
BASE := $(shell egrep 'ClCompile.*cpp"' ../win32/$(PROJECT).vcxproj | sed -e 's/.*\"\(.*\)\".*/\1/' | sed -e 's@\\@/@g')
BASE2 := $(shell egrep 'ClCompile.*c"' ../win32/$(PROJECT).vcxproj | sed -e 's/.*\"\(.*\)\".*/\1/' | sed -e 's@\\@/@g')
SRC := $(filter-out $(FILTER),$(BASE))
SRC2 := $(filter-out $(FILTER),$(BASE2))
TBB := $(shell ld -ltbb -o /dev/null 2>/dev/null; echo $$?)
ifeq ($(TBB),0)
LIBS += -ltbb
endif
OBJDIRBASE := obj/$(BUILD)
OBJDIR := $(OBJDIRBASE)/o/o/o
OBJ := $(addprefix $(OBJDIR)/,$(SRC:%.cpp=%.o))
OBJ2 := $(addprefix $(OBJDIR)/,$(SRC2:%.c=%.o))
all: $(IMAGE)
$(OBJDIR)/%.o: %.cpp
$(CXX) -c $(INCLUDES) $(CXXFLAGS) $(DEFINES) $< -o $@
$(OBJDIR)/%.d : %.cpp
@echo Resolving dependencies of $<
@mkdir -p $(@D)
@$(CXX) -MM $(INCLUDES) $(CXXFLAGS) $(DEFINES) $< > $@.$$$$; \
sed 's,.*\.o[ :]*,$(OBJDIR)/$(<:.cpp=.o) $@ : ,g' < $@.$$$$ > $@; \
rm -f $@.$$$$
$(OBJDIR)/%.o: %.c
$(CC) -c $(INCLUDES) $(CFLAGS) $(DEFINES) $< -o $@
$(OBJDIR)/%.d : %.c
@echo Resolving dependencies of $<
@mkdir -p $(@D)
@$(CC) -MM $(INCLUDES) $(CFLAGS) $(DEFINES) $< > $@.$$$$; \
sed 's,.*\.o[ :]*,$(OBJDIR)/$(<:.c=.o) $@ : ,g' < $@.$$$$ > $@; \
rm -f $@.$$$$
$(IMAGE): $(OBJ) $(OBJ2)
$(CXX) $(CXXFLAGS) $(DEFINES) $(OBJ) $(OBJ2) $(LIBS) -o $@
ifneq "$(MAKECMDGOALS)" "clean"
-include $(addprefix $(OBJDIR)/,$(SRC:.cpp=.d)) %(addprefix $(OBJDIR)/,$(SRC2:.c=.d))
endif
clean:
rm -rf $(OBJDIRBASE) $(IMAGE)*
.PHONY: clean all

View File

@@ -2,6 +2,7 @@ ARCH := $(shell uname -m)
CFLAGS := -g3 -Wall
DEFINES := -DDEBUG
BUILD := debug
ifeq ($(ARCH),x86_64)
CFLAGS += -msse4.1

View File

@@ -2,6 +2,7 @@ ARCH := $(shell uname -m)
CFLAGS := -O3 -s -fomit-frame-pointer
DEFINES := -DNDEBUG
BUILD := release
ifeq ($(ARCH),x86_64)
CFLAGS += -msse4.1

View File

@@ -0,0 +1,25 @@
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.27428.2002
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "capture", "capture.vcxproj", "{447D58BF-94CD-4469-BB90-549C05D03E00}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Release|x64 = Release|x64
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{447D58BF-94CD-4469-BB90-549C05D03E00}.Debug|x64.ActiveCfg = Debug|x64
{447D58BF-94CD-4469-BB90-549C05D03E00}.Debug|x64.Build.0 = Debug|x64
{447D58BF-94CD-4469-BB90-549C05D03E00}.Release|x64.ActiveCfg = Release|x64
{447D58BF-94CD-4469-BB90-549C05D03E00}.Release|x64.Build.0 = Release|x64
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {3E51386C-43EA-44AC-9F24-AFAFE4D63ADE}
EndGlobalSection
EndGlobal

View File

@@ -0,0 +1,174 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{447D58BF-94CD-4469-BB90-549C05D03E00}</ProjectGuid>
<RootNamespace>capture</RootNamespace>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="Shared">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup />
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<ConformanceMode>true</ConformanceMode>
</ClCompile>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<ConformanceMode>true</ConformanceMode>
<MultiProcessorCompilation>true</MultiProcessorCompilation>
<PreprocessorDefinitions>TRACY_NO_STATISTICS;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;WIN32_LEAN_AND_MEAN;NOMINMAX;_USE_MATH_DEFINES;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<EnableEnhancedInstructionSet>AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
<LanguageStandard>stdcpplatest</LanguageStandard>
</ClCompile>
<Link>
<AdditionalDependencies>ws2_32.lib;%(AdditionalDependencies)</AdditionalDependencies>
<SubSystem>Console</SubSystem>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<SDLCheck>true</SDLCheck>
<ConformanceMode>true</ConformanceMode>
</ClCompile>
<Link>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<SDLCheck>true</SDLCheck>
<ConformanceMode>true</ConformanceMode>
<MultiProcessorCompilation>true</MultiProcessorCompilation>
<PreprocessorDefinitions>TRACY_NO_STATISTICS;NDEBUG;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;WIN32_LEAN_AND_MEAN;NOMINMAX;_USE_MATH_DEFINES;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<EnableEnhancedInstructionSet>AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
<LanguageStandard>stdcpplatest</LanguageStandard>
</ClCompile>
<Link>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<AdditionalDependencies>ws2_32.lib;%(AdditionalDependencies)</AdditionalDependencies>
<SubSystem>Console</SubSystem>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
<ClCompile Include="..\..\..\common\TracySocket.cpp" />
<ClCompile Include="..\..\..\common\TracySystem.cpp" />
<ClCompile Include="..\..\..\common\tracy_lz4.cpp" />
<ClCompile Include="..\..\..\common\tracy_lz4hc.cpp" />
<ClCompile Include="..\..\..\server\TracyMemory.cpp" />
<ClCompile Include="..\..\..\server\TracyPrint.cpp" />
<ClCompile Include="..\..\..\server\TracyTaskDispatch.cpp" />
<ClCompile Include="..\..\..\server\TracyThreadCompress.cpp" />
<ClCompile Include="..\..\..\server\TracyWorker.cpp" />
<ClCompile Include="..\..\src\capture.cpp" />
<ClCompile Include="..\..\src\getopt.c" />
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\..\..\common\TracyAlign.hpp" />
<ClInclude Include="..\..\..\common\TracyAlloc.hpp" />
<ClInclude Include="..\..\..\common\TracyColor.hpp" />
<ClInclude Include="..\..\..\common\TracyForceInline.hpp" />
<ClInclude Include="..\..\..\common\TracyProtocol.hpp" />
<ClInclude Include="..\..\..\common\TracyQueue.hpp" />
<ClInclude Include="..\..\..\common\TracySocket.hpp" />
<ClInclude Include="..\..\..\common\TracySystem.hpp" />
<ClInclude Include="..\..\..\common\tracy_benaphore.h" />
<ClInclude Include="..\..\..\common\tracy_lz4.hpp" />
<ClInclude Include="..\..\..\common\tracy_lz4hc.hpp" />
<ClInclude Include="..\..\..\common\tracy_sema.h" />
<ClInclude Include="..\..\..\server\TracyCharUtil.hpp" />
<ClInclude Include="..\..\..\server\TracyEvent.hpp" />
<ClInclude Include="..\..\..\server\TracyFileWrite.hpp" />
<ClInclude Include="..\..\..\server\TracyMemory.hpp" />
<ClInclude Include="..\..\..\server\TracyPopcnt.hpp" />
<ClInclude Include="..\..\..\server\TracyPrint.hpp" />
<ClInclude Include="..\..\..\server\TracySlab.hpp" />
<ClInclude Include="..\..\..\server\TracyTaskDispatch.hpp" />
<ClInclude Include="..\..\..\server\TracyThreadCompress.hpp" />
<ClInclude Include="..\..\..\server\TracyVector.hpp" />
<ClInclude Include="..\..\..\server\TracyWorker.hpp" />
<ClInclude Include="..\..\..\server\tracy_flat_hash_map.hpp" />
<ClInclude Include="..\..\src\getopt.h" />
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>

View File

@@ -1,23 +1,14 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<Filter Include="common">
<UniqueIdentifier>{8037a17e-7618-45b1-9aac-07468070b713}</UniqueIdentifier>
<Filter Include="src">
<UniqueIdentifier>{729c80ee-4d26-4a5e-8f1f-6c075783eb56}</UniqueIdentifier>
</Filter>
<Filter Include="server">
<UniqueIdentifier>{396d39d8-ca94-4e03-b965-701fa882efcb}</UniqueIdentifier>
<UniqueIdentifier>{cf23ef7b-7694-4154-830b-00cf053350ea}</UniqueIdentifier>
</Filter>
<Filter Include="src">
<UniqueIdentifier>{478ff7b3-4f0f-4121-8b3c-e48896be7606}</UniqueIdentifier>
</Filter>
<Filter Include="imgui">
<UniqueIdentifier>{474aec57-4ecd-467e-aecb-cdc4f41254ff}</UniqueIdentifier>
</Filter>
<Filter Include="gl3w">
<UniqueIdentifier>{ec4b32ba-a8c9-49e8-9625-8cb15feccd8a}</UniqueIdentifier>
</Filter>
<Filter Include="nfd">
<UniqueIdentifier>{46eb6aa0-de1c-447a-a6dd-aee2a06f85ef}</UniqueIdentifier>
<Filter Include="common">
<UniqueIdentifier>{e39d3623-47cd-4752-8da9-3ea324f964c1}</UniqueIdentifier>
</Filter>
</ItemGroup>
<ItemGroup>
@@ -30,41 +21,44 @@
<ClCompile Include="..\..\..\common\TracySystem.cpp">
<Filter>common</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyView.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\src\imgui_impl_glfw_gl3.cpp">
<Filter>src</Filter>
</ClCompile>
<ClCompile Include="..\..\src\main.cpp">
<Filter>src</Filter>
</ClCompile>
<ClCompile Include="..\..\..\imgui\imgui.cpp">
<Filter>imgui</Filter>
</ClCompile>
<ClCompile Include="..\..\..\imgui\imgui_demo.cpp">
<Filter>imgui</Filter>
</ClCompile>
<ClCompile Include="..\..\..\imgui\imgui_draw.cpp">
<Filter>imgui</Filter>
</ClCompile>
<ClCompile Include="..\..\libs\gl3w\GL\gl3w.c">
<Filter>gl3w</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyMemory.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\nfd\nfd_common.c">
<Filter>nfd</Filter>
<ClCompile Include="..\..\..\server\TracyWorker.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\nfd\nfd_win.cpp">
<Filter>nfd</Filter>
<ClCompile Include="..\..\src\capture.cpp">
<Filter>src</Filter>
</ClCompile>
<ClCompile Include="..\..\src\getopt.c">
<Filter>src</Filter>
</ClCompile>
<ClCompile Include="..\..\..\common\tracy_lz4hc.cpp">
<Filter>common</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyPrint.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyThreadCompress.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyTaskDispatch.cpp">
<Filter>server</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\..\..\common\tracy_lz4.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\TracyAlloc.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\TracyColor.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\TracyForceInline.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\TracyProtocol.hpp">
<Filter>common</Filter>
</ClInclude>
@@ -77,89 +71,56 @@
<ClInclude Include="..\..\..\common\TracySystem.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\tracy_flat_hash_map.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyCharUtil.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyEvent.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyFileWrite.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyMemory.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyPopcnt.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracySlab.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyVector.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyView.hpp">
<ClInclude Include="..\..\..\server\TracyWorker.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\src\imgui_impl_glfw_gl3.h">
<ClInclude Include="..\..\src\getopt.h">
<Filter>src</Filter>
</ClInclude>
<ClInclude Include="..\..\..\imgui\imconfig.h">
<Filter>imgui</Filter>
</ClInclude>
<ClInclude Include="..\..\..\imgui\imgui.h">
<Filter>imgui</Filter>
</ClInclude>
<ClInclude Include="..\..\..\imgui\imgui_internal.h">
<Filter>imgui</Filter>
</ClInclude>
<ClInclude Include="..\..\..\imgui\stb_rect_pack.h">
<Filter>imgui</Filter>
</ClInclude>
<ClInclude Include="..\..\..\imgui\stb_textedit.h">
<Filter>imgui</Filter>
</ClInclude>
<ClInclude Include="..\..\..\imgui\stb_truetype.h">
<Filter>imgui</Filter>
</ClInclude>
<ClInclude Include="..\..\libs\gl3w\GL\gl3w.h">
<Filter>gl3w</Filter>
</ClInclude>
<ClInclude Include="..\..\libs\gl3w\GL\glcorearb.h">
<Filter>gl3w</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyImGui.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyMemory.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\nfd\common.h">
<Filter>nfd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\nfd\nfd.h">
<Filter>nfd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\nfd\nfd_common.h">
<Filter>nfd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyFileRead.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyFileWrite.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyCharUtil.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyPopcnt.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\tracy_benaphore.h">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\tracy_sema.h">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\tracy_flat_hash_map.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\TracyForceInline.hpp">
<ClInclude Include="..\..\..\common\TracyAlign.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\tracy_pdqsort.h">
<ClInclude Include="..\..\..\common\tracy_benaphore.h">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\tracy_sema.h">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\tracy_lz4hc.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyPrint.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyThreadCompress.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyTaskDispatch.hpp">
<Filter>server</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<Natvis Include="DebugVis.natvis" />
</ItemGroup>
</Project>

166
capture/src/capture.cpp Normal file
View File

@@ -0,0 +1,166 @@
#ifdef _WIN32
# include <windows.h>
#endif
#include <chrono>
#include <inttypes.h>
#include <mutex>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include "../../common/TracyProtocol.hpp"
#include "../../server/TracyFileWrite.hpp"
#include "../../server/TracyMemory.hpp"
#include "../../server/TracyPrint.hpp"
#include "../../server/TracyWorker.hpp"
#include "getopt.h"
#ifndef _MSC_VER
struct sigaction oldsigint;
bool disconnect = false;
void SigInt( int )
{
disconnect = true;
}
#endif
void Usage()
{
printf( "Usage: capture -a address -o output.tracy [-p port]\n" );
exit( 1 );
}
int main( int argc, char** argv )
{
#ifdef _WIN32
if( !AttachConsole( ATTACH_PARENT_PROCESS ) )
{
AllocConsole();
SetConsoleMode( GetStdHandle( STD_OUTPUT_HANDLE ), 0x07 );
}
#endif
const char* address = nullptr;
const char* output = nullptr;
int port = 8086;
int c;
while( ( c = getopt( argc, argv, "a:o:p:" ) ) != -1 )
{
switch( c )
{
case 'a':
address = optarg;
break;
case 'o':
output = optarg;
break;
case 'p':
port = atoi( optarg );
break;
default:
Usage();
break;
}
}
if( !address || !output ) Usage();
printf( "Connecting to %s:%i...", address, port );
fflush( stdout );
tracy::Worker worker( address, port );
while( !worker.IsConnected() )
{
const auto handshake = worker.GetHandshakeStatus();
if( handshake == tracy::HandshakeProtocolMismatch )
{
printf( "\nThe client you are trying to connect to uses incompatible protocol version.\nMake sure you are using the same Tracy version on both client and server.\n" );
return 1;
}
if( handshake == tracy::HandshakeNotAvailable )
{
printf( "\nThe client you are trying to connect to is no longer able to sent profiling data,\nbecause another server was already connected to it.\nYou can do the following:\n\n 1. Restart the client application.\n 2. Rebuild the client application with on-demand mode enabled.\n" );
return 2;
}
if( handshake == tracy::HandshakeDropped )
{
printf( "\nThe client you are trying to connect to has disconnected during the initial\nconnection handshake. Please check your network configuration.\n" );
return 3;
}
}
while( !worker.HasData() ) std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
printf( "\nQueue delay: %s\nTimer resolution: %s\n", tracy::TimeToString( worker.GetDelay() ), tracy::TimeToString( worker.GetResolution() ) );
#ifndef _MSC_VER
struct sigaction sigint;
memset( &sigint, 0, sizeof( sigint ) );
sigint.sa_handler = SigInt;
sigaction( SIGINT, &sigint, &oldsigint );
#endif
auto& lock = worker.GetMbpsDataLock();
const auto t0 = std::chrono::high_resolution_clock::now();
while( worker.IsConnected() )
{
#ifndef _MSC_VER
if( disconnect )
{
worker.Disconnect();
disconnect = false;
}
#endif
lock.lock();
const auto mbps = worker.GetMbpsData().back();
const auto compRatio = worker.GetCompRatio();
const auto netTotal = worker.GetDataTransferred();
lock.unlock();
if( mbps < 0.1f )
{
printf( "\33[2K\r\033[36;1m%7.2f Kbps", mbps * 1000.f );
}
else
{
printf( "\33[2K\r\033[36;1m%7.2f Mbps", mbps );
}
printf( " \033[0m /\033[36;1m%5.1f%% \033[0m=\033[33;1m%7.2f Mbps \033[0m| \033[33mNet: \033[32m%s \033[0m| \033[33mMem: \033[31;1m%s\033[0m | \033[33mTime: %s\033[0m",
compRatio * 100.f,
mbps / compRatio,
tracy::MemSizeToString( netTotal ),
tracy::MemSizeToString( tracy::memUsage ),
tracy::TimeToString( worker.GetLastTime() ) );
fflush( stdout );
std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}
const auto t1 = std::chrono::high_resolution_clock::now();
const auto& failure = worker.GetFailureType();
if( failure != tracy::Worker::Failure::None )
{
printf( "\n\033[31;1mInstrumentation failure: %s\033[0m", tracy::Worker::GetFailureString( failure ) );
}
printf( "\nFrames: %" PRIu64 "\nTime span: %s\nZones: %s\nElapsed time: %s\nSaving trace...",
worker.GetFrameCount( *worker.GetFramesBase() ), tracy::TimeToString( worker.GetLastTime() ), tracy::RealToString( worker.GetZoneCount(), true ),
tracy::TimeToString( std::chrono::duration_cast<std::chrono::nanoseconds>( t1 - t0 ).count() ) );
fflush( stdout );
auto f = std::unique_ptr<tracy::FileWrite>( tracy::FileWrite::Open( output ) );
if( f )
{
worker.Write( *f );
printf( " \033[32;1mdone!\033[0m\n" );
}
else
{
printf( " \033[31;1failed!\033[0m\n" );
}
return 0;
}

228
capture/src/getopt.c Normal file
View File

@@ -0,0 +1,228 @@
/*******************************************************************************
* Copyright (c) 2012-2017, Kim Grasman <kim.grasman@gmail.com>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Kim Grasman nor the
* names of contributors may be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL KIM GRASMAN BE LIABLE FOR ANY
* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
******************************************************************************/
#include "getopt.h"
#include <stddef.h>
#include <string.h>
char* optarg;
int optopt;
/* The variable optind [...] shall be initialized to 1 by the system. */
int optind = 1;
int opterr;
static char* optcursor = NULL;
/* Implemented based on [1] and [2] for optional arguments.
optopt is handled FreeBSD-style, per [3].
Other GNU and FreeBSD extensions are purely accidental.
[1] http://pubs.opengroup.org/onlinepubs/000095399/functions/getopt.html
[2] http://www.kernel.org/doc/man-pages/online/pages/man3/getopt.3.html
[3] http://www.freebsd.org/cgi/man.cgi?query=getopt&sektion=3&manpath=FreeBSD+9.0-RELEASE
*/
int getopt(int argc, char* const argv[], const char* optstring) {
int optchar = -1;
const char* optdecl = NULL;
optarg = NULL;
opterr = 0;
optopt = 0;
/* Unspecified, but we need it to avoid overrunning the argv bounds. */
if (optind >= argc)
goto no_more_optchars;
/* If, when getopt() is called argv[optind] is a null pointer, getopt()
shall return -1 without changing optind. */
if (argv[optind] == NULL)
goto no_more_optchars;
/* If, when getopt() is called *argv[optind] is not the character '-',
getopt() shall return -1 without changing optind. */
if (*argv[optind] != '-')
goto no_more_optchars;
/* If, when getopt() is called argv[optind] points to the string "-",
getopt() shall return -1 without changing optind. */
if (strcmp(argv[optind], "-") == 0)
goto no_more_optchars;
/* If, when getopt() is called argv[optind] points to the string "--",
getopt() shall return -1 after incrementing optind. */
if (strcmp(argv[optind], "--") == 0) {
++optind;
goto no_more_optchars;
}
if (optcursor == NULL || *optcursor == '\0')
optcursor = argv[optind] + 1;
optchar = *optcursor;
/* FreeBSD: The variable optopt saves the last known option character
returned by getopt(). */
optopt = optchar;
/* The getopt() function shall return the next option character (if one is
found) from argv that matches a character in optstring, if there is
one that matches. */
optdecl = strchr(optstring, optchar);
if (optdecl) {
/* [I]f a character is followed by a colon, the option takes an
argument. */
if (optdecl[1] == ':') {
optarg = ++optcursor;
if (*optarg == '\0') {
/* GNU extension: Two colons mean an option takes an
optional arg; if there is text in the current argv-element
(i.e., in the same word as the option name itself, for example,
"-oarg"), then it is returned in optarg, otherwise optarg is set
to zero. */
if (optdecl[2] != ':') {
/* If the option was the last character in the string pointed to by
an element of argv, then optarg shall contain the next element
of argv, and optind shall be incremented by 2. If the resulting
value of optind is greater than argc, this indicates a missing
option-argument, and getopt() shall return an error indication.
Otherwise, optarg shall point to the string following the
option character in that element of argv, and optind shall be
incremented by 1.
*/
if (++optind < argc) {
optarg = argv[optind];
} else {
/* If it detects a missing option-argument, it shall return the
colon character ( ':' ) if the first character of optstring
was a colon, or a question-mark character ( '?' ) otherwise.
*/
optarg = NULL;
optchar = (optstring[0] == ':') ? ':' : '?';
}
} else {
optarg = NULL;
}
}
optcursor = NULL;
}
} else {
/* If getopt() encounters an option character that is not contained in
optstring, it shall return the question-mark ( '?' ) character. */
optchar = '?';
}
if (optcursor == NULL || *++optcursor == '\0')
++optind;
return optchar;
no_more_optchars:
optcursor = NULL;
return -1;
}
/* Implementation based on [1].
[1] http://www.kernel.org/doc/man-pages/online/pages/man3/getopt.3.html
*/
int getopt_long(int argc, char* const argv[], const char* optstring,
const struct option* longopts, int* longindex) {
const struct option* o = longopts;
const struct option* match = NULL;
int num_matches = 0;
size_t argument_name_length = 0;
const char* current_argument = NULL;
int retval = -1;
optarg = NULL;
optopt = 0;
if (optind >= argc)
return -1;
if (strlen(argv[optind]) < 3 || strncmp(argv[optind], "--", 2) != 0)
return getopt(argc, argv, optstring);
/* It's an option; starts with -- and is longer than two chars. */
current_argument = argv[optind] + 2;
argument_name_length = strcspn(current_argument, "=");
for (; o->name; ++o) {
if (strncmp(o->name, current_argument, argument_name_length) == 0) {
match = o;
++num_matches;
}
}
if (num_matches == 1) {
/* If longindex is not NULL, it points to a variable which is set to the
index of the long option relative to longopts. */
if (longindex)
*longindex = (match - longopts);
/* If flag is NULL, then getopt_long() shall return val.
Otherwise, getopt_long() returns 0, and flag shall point to a variable
which shall be set to val if the option is found, but left unchanged if
the option is not found. */
if (match->flag)
*(match->flag) = match->val;
retval = match->flag ? 0 : match->val;
if (match->has_arg != no_argument) {
optarg = strchr(argv[optind], '=');
if (optarg != NULL)
++optarg;
if (match->has_arg == required_argument) {
/* Only scan the next argv for required arguments. Behavior is not
specified, but has been observed with Ubuntu and Mac OSX. */
if (optarg == NULL && ++optind < argc) {
optarg = argv[optind];
}
if (optarg == NULL)
retval = ':';
}
} else if (strchr(argv[optind], '=')) {
/* An argument was provided to a non-argument option.
I haven't seen this specified explicitly, but both GNU and BSD-based
implementations show this behavior.
*/
retval = '?';
}
} else {
/* Unknown option or ambiguous match. */
retval = '?';
}
++optind;
return retval;
}

59
capture/src/getopt.h Normal file
View File

@@ -0,0 +1,59 @@
/*******************************************************************************
* Copyright (c) 2012-2017, Kim Grasman <kim.grasman@gmail.com>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Kim Grasman nor the
* names of contributors may be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL KIM GRASMAN BE LIABLE FOR ANY
* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
******************************************************************************/
#ifndef INCLUDED_GETOPT_PORT_H
#define INCLUDED_GETOPT_PORT_H
#if defined(__cplusplus)
extern "C" {
#endif
#define no_argument 1
#define required_argument 2
#define optional_argument 3
extern char* optarg;
extern int optind, opterr, optopt;
struct option {
const char* name;
int has_arg;
int* flag;
int val;
};
int getopt(int argc, char* const argv[], const char* optstring);
int getopt_long(int argc, char* const argv[],
const char* optstring, const struct option* longopts, int* longindex);
#if defined(__cplusplus)
}
#endif
#endif // INCLUDED_GETOPT_PORT_H

319
client/TracyArmCpuTable.hpp Normal file
View File

@@ -0,0 +1,319 @@
namespace tracy
{
static const char* DecodeArmImplementer( uint32_t v )
{
static char buf[16];
switch( v )
{
case 0x41: return "ARM";
case 0x42: return "Broadcom";
case 0x43: return "Cavium";
case 0x44: return "DEC";
case 0x46: return "Fujitsu";
case 0x48: return "HiSilicon";
case 0x4d: return "Motorola";
case 0x4e: return "Nvidia";
case 0x50: return "Applied Micro";
case 0x51: return "Qualcomm";
case 0x53: return "Samsung";
case 0x54: return "Texas Instruments";
case 0x56: return "Marvell";
case 0x61: return "Apple";
case 0x66: return "Faraday";
case 0x68: return "HXT";
case 0x69: return "Intel";
default: break;
}
sprintf( buf, "0x%x", v );
return buf;
}
static const char* DecodeArmPart( uint32_t impl, uint32_t part )
{
static char buf[16];
switch( impl )
{
case 0x41:
switch( part )
{
case 0x810: return "810";
case 0x920: return "920";
case 0x922: return "922";
case 0x926: return "926";
case 0x940: return "940";
case 0x946: return "946";
case 0x966: return "966";
case 0xa20: return "1020";
case 0xa22: return "1022";
case 0xa26: return "1026";
case 0xb02: return "11 MPCore";
case 0xb36: return "1136";
case 0xb56: return "1156";
case 0xb76: return "1176";
case 0xc05: return " Cortex-A5";
case 0xc07: return " Cortex-A7";
case 0xc08: return " Cortex-A8";
case 0xc09: return " Cortex-A9";
case 0xc0c: return " Cortex-A12";
case 0xc0d: return " Rockchip RK3288";
case 0xc0f: return " Cortex-A15";
case 0xc0e: return " Cortex-A17";
case 0xc14: return " Cortex-R4";
case 0xc15: return " Cortex-R5";
case 0xc17: return " Cortex-R7";
case 0xc18: return " Cortex-R8";
case 0xc20: return " Cortex-M0";
case 0xc21: return " Cortex-M1";
case 0xc23: return " Cortex-M3";
case 0xc24: return " Cortex-M4";
case 0xc27: return " Cortex-M7";
case 0xc60: return " Cortex-M0+";
case 0xd00: return " AArch64 simulator";
case 0xd01: return " Cortex-A32";
case 0xd03: return " Cortex-A53";
case 0xd04: return " Cortex-A35";
case 0xd05: return " Cortex-A55";
case 0xd06: return " Cortex-A65";
case 0xd07: return " Cortex-A57";
case 0xd08: return " Cortex-A72";
case 0xd09: return " Cortex-A73";
case 0xd0a: return " Cortex-A75";
case 0xd0b: return " Cortex-A76";
case 0xd0c: return " Neoverse N1";
case 0xd0d: return " Cortex-A77";
case 0xd0e: return " Cortex-A76AE";
case 0xd0f: return " AEMv8";
case 0xd13: return " Cortex-R52";
case 0xd20: return " Cortex-M23";
case 0xd21: return " Cortex-M33";
case 0xd4a: return " Neoverse E1";
default: break;
}
case 0x42:
switch( part )
{
case 0xf: return " Brahma B15";
case 0x100: return " Brahma B53";
case 0x516: return " ThunderX2";
default: break;
}
case 0x43:
switch( part )
{
case 0xa0: return " ThunderX";
case 0xa1: return " ThunderX 88XX";
case 0xa2: return " ThunderX 81XX";
case 0xa3: return " ThunderX 83XX";
case 0xaf: return " ThunderX2 99xx";
default: break;
}
case 0x44:
switch( part )
{
case 0xa10: return " SA110";
case 0xa11: return " SA1100";
default: break;
}
case 0x46:
switch( part )
{
case 0x1: return " A64FX";
default: break;
}
case 0x48:
switch( part )
{
case 0xd01: return " TSV100";
case 0xd40: return " Kirin 980";
default: break;
}
case 0x4e:
switch( part )
{
case 0x0: return " Denver";
case 0x3: return " Denver 2";
case 0x4: return " Carmel";
default: break;
}
case 0x50:
switch( part )
{
case 0x0: return " X-Gene";
default: break;
}
case 0x51:
switch( part )
{
case 0xf: return " Scorpion";
case 0x2d: return " Scorpion";
case 0x4d: return " Krait";
case 0x6f: return " Krait";
case 0x200: return " Kryo";
case 0x201: return " Kryo Silver (Snapdragon 821)";
case 0x205: return " Kryo Gold";
case 0x211: return " Kryo Silver (Snapdragon 820)";
case 0x800: return " Kryo 260 / 280 Gold";
case 0x801: return " Kryo 260 / 280 Silver";
case 0x802: return " Kryo 385 Gold";
case 0x803: return " Kryo 385 Silver";
case 0x804: return " Kryo 485 Gold";
case 0xc00: return " Falkor";
case 0xc01: return " Saphira";
default: break;
}
case 0x53:
switch( part )
{
case 0x1: return " Exynos M1/M2";
case 0x2: return " Exynos M3";
default: break;
}
case 0x56:
switch( part )
{
case 0x131: return " Feroceon 88FR131";
case 0x581: return " PJ4 / PJ4B";
case 0x584: return " PJ4B-MP / PJ4C";
default: break;
}
case 0x61:
switch( part )
{
case 0x1: return " Cyclone";
case 0x2: return " Typhoon";
case 0x3: return " Typhoon/Capri";
case 0x4: return " Twister";
case 0x5: return " Twister/Elba/Malta";
case 0x6: return " Hurricane";
case 0x7: return " Hurricane/Myst";
default: break;
}
case 0x66:
switch( part )
{
case 0x526: return " FA526";
case 0x626: return " FA626";
default: break;
}
case 0x68:
switch( part )
{
case 0x0: return " Phecda";
default: break;
}
default: break;
}
sprintf( buf, " 0x%x", part );
return buf;
}
static const char* DecodeIosDevice( const char* id )
{
static const char* DeviceTable[] = {
"i386", "32-bit simulator",
"x86_64", "64-bit simulator",
"iPhone1,1", "iPhone",
"iPhone1,2", "iPhone 3G",
"iPhone2,1", "iPhone 3GS",
"iPhone3,1", "iPhone 4 (GSM)",
"iPhone3,2", "iPhone 4 (GSM)",
"iPhone3,3", "iPhone 4 (CDMA)",
"iPhone4,1", "iPhone 4S",
"iPhone5,1", "iPhone 5 (A1428)",
"iPhone5,2", "iPhone 5 (A1429)",
"iPhone5,3", "iPhone 5c (A1456/A1532)",
"iPhone5,4", "iPhone 5c (A1507/A1516/1526/A1529)",
"iPhone6,1", "iPhone 5s (A1433/A1533)",
"iPhone6,2", "iPhone 5s (A1457/A1518/A1528/A1530)",
"iPhone7,1", "iPhone 6 Plus",
"iPhone7,2", "iPhone 6",
"iPhone8,1", "iPhone 6S",
"iPhone8,2", "iPhone 6S Plus",
"iPhone8,4", "iPhone SE",
"iPhone9,1", "iPhone 7 (CDMA)",
"iPhone9,2", "iPhone 7 Plus (CDMA)",
"iPhone9,3", "iPhone 7 (GSM)",
"iPhone9,4", "iPhone 7 Plus (GSM)",
"iPhone10,1", "iPhone 8 (CDMA)",
"iPhone10,2", "iPhone 8 Plus (CDMA)",
"iPhone10,3", "iPhone X (CDMA)",
"iPhone10,4", "iPhone 8 (GSM)",
"iPhone10,5", "iPhone 8 Plus (GSM)",
"iPhone10,6", "iPhone X (GSM)",
"iPhone11,2", "iPhone XS",
"iPhone11,4", "iPhone XS Max",
"iPhone11,6", "iPhone XS Max China",
"iPhone11,8", "iPhone XR",
"iPad1,1", "iPad (A1219/A1337)",
"iPad2,1", "iPad 2 (A1395)",
"iPad2,2", "iPad 2 (A1396)",
"iPad2,3", "iPad 2 (A1397)",
"iPad2,4", "iPad 2 (A1395)",
"iPad2,5", "iPad Mini (A1432)",
"iPad2,6", "iPad Mini (A1454)",
"iPad2,7", "iPad Mini (A1455)",
"iPad3,1", "iPad 3 (A1416)",
"iPad3,2", "iPad 3 (A1403)",
"iPad3,3", "iPad 3 (A1430)",
"iPad3,4", "iPad 4 (A1458)",
"iPad3,5", "iPad 4 (A1459)",
"iPad3,6", "iPad 4 (A1460)",
"iPad4,1", "iPad Air (A1474)",
"iPad4,2", "iPad Air (A1475)",
"iPad4,3", "iPad Air (A1476)",
"iPad4,4", "iPad Mini 2 (A1489)",
"iPad4,5", "iPad Mini 2 (A1490)",
"iPad4,6", "iPad Mini 2 (A1491)",
"iPad4,7", "iPad Mini 3 (A1599)",
"iPad4,8", "iPad Mini 3 (A1600)",
"iPad4,9", "iPad Mini 3 (A1601)",
"iPad5,1", "iPad Mini 4 (A1538)",
"iPad5,2", "iPad Mini 4 (A1550)",
"iPad5,3", "iPad Air 2 (A1566)",
"iPad5,4", "iPad Air 2 (A1567)",
"iPad6,3", "iPad Pro 9.7\" (A1673)",
"iPad6,4", "iPad Pro 9.7\" (A1674)",
"iPad6,5", "iPad Pro 9.7\" (A1675)",
"iPad6,7", "iPad Pro 12.9\" (A1584)",
"iPad6,8", "iPad Pro 12.9\" (A1652)",
"iPad6,11", "iPad 5th gen (A1822)",
"iPad6,12", "iPad 5th gen (A1823)",
"iPad7,1", "iPad Pro 12.9\" 2nd gen (A1670)",
"iPad7,2", "iPad Pro 12.9\" 2nd gen (A1671/A1821)",
"iPad7,3", "iPad Pro 10.5\" (A1701)",
"iPad7,4", "iPad Pro 10.5\" (A1709)",
"iPad7,5", "iPad 6th gen (A1893)",
"iPad7,6", "iPad 6th gen (A1954)",
"iPad8,1", "iPad Pro 11\" (A1980)",
"iPad8,2", "iPad Pro 11\" (A1980)",
"iPad8,3", "iPad Pro 11\" (A1934/A1979/A2013)",
"iPad8,4", "iPad Pro 11\" (A1934/A1979/A2013)",
"iPad8,5", "iPad Pro 12.9\" 3rd gen (A1876)",
"iPad8,6", "iPad Pro 12.9\" 3rd gen (A1876)",
"iPad8,7", "iPad Pro 12.9\" 3rd gen (A1895/A1983/A2014)",
"iPad8,8", "iPad Pro 12.9\" 3rd gen (A1895/A1983/A2014)",
"iPad11,1", "iPad Mini 5th gen (A2133)",
"iPad11,2", "iPad Mini 5th gen (A2124/A2125/A2126)",
"iPad11,3", "iPad Air 3rd gen (A2152)",
"iPad11,4", "iPad Air 3rd gen (A2123/A2153/A2154)",
"iPod1,1", "iPod Touch",
"iPod2,1", "iPod Touch 2nd gen",
"iPod3,1", "iPod Touch 3rd gen",
"iPod4,1", "iPod Touch 4th gen",
"iPod5,1", "iPod Touch 5th gen",
"iPod7,1", "iPod Touch 6th gen",
"iPod9,1", "iPod Touch 7th gen",
nullptr
};
auto ptr = DeviceTable;
while( *ptr )
{
if( strcmp( ptr[0], id ) == 0 ) return ptr[1];
ptr += 2;
}
return id;
}
}

603
client/TracyCallstack.cpp Normal file
View File

@@ -0,0 +1,603 @@
#include <stdio.h>
#include <string.h>
#include "TracyCallstack.hpp"
#ifdef TRACY_HAS_CALLSTACK
#if TRACY_HAS_CALLSTACK == 1
# ifndef NOMINMAX
# define NOMINMAX
# endif
# include <windows.h>
# ifdef _MSC_VER
# pragma warning( push )
# pragma warning( disable : 4091 )
# endif
# include <dbghelp.h>
# ifdef _MSC_VER
# pragma warning( pop )
# endif
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 3
# include "../libbacktrace/backtrace.hpp"
# include <dlfcn.h>
# include <cxxabi.h>
#elif TRACY_HAS_CALLSTACK == 4 || TRACY_HAS_CALLSTACK == 5
# include <dlfcn.h>
# include <cxxabi.h>
#endif
namespace tracy
{
#if TRACY_HAS_CALLSTACK == 1
enum { MaxCbTrace = 16 };
int cb_num;
CallstackEntry cb_data[MaxCbTrace];
extern "C" { t_RtlWalkFrameChain RtlWalkFrameChain = 0; }
#if defined __MINGW32__ && API_VERSION_NUMBER < 12
extern "C" {
// Actual required API_VERSION_NUMBER is unknown because it is undocumented. These functions are not present in at least v11.
DWORD IMAGEAPI SymAddrIncludeInlineTrace(HANDLE hProcess, DWORD64 Address);
BOOL IMAGEAPI SymQueryInlineTrace(HANDLE hProcess, DWORD64 StartAddress, DWORD StartContext, DWORD64 StartRetAddress,
DWORD64 CurAddress, LPDWORD CurContext, LPDWORD CurFrameIndex);
BOOL IMAGEAPI SymFromInlineContext(HANDLE hProcess, DWORD64 Address, ULONG InlineContext, PDWORD64 Displacement,
PSYMBOL_INFO Symbol);
BOOL IMAGEAPI SymGetLineFromInlineContext(HANDLE hProcess, DWORD64 qwAddr, ULONG InlineContext,
DWORD64 qwModuleBaseAddress, PDWORD pdwDisplacement, PIMAGEHLP_LINE64 Line64);
};
#endif
void InitCallstack()
{
#ifdef UNICODE
RtlWalkFrameChain = (t_RtlWalkFrameChain)GetProcAddress( GetModuleHandle( L"ntdll.dll" ), "RtlWalkFrameChain" );
#else
RtlWalkFrameChain = (t_RtlWalkFrameChain)GetProcAddress( GetModuleHandle( "ntdll.dll" ), "RtlWalkFrameChain" );
#endif
SymInitialize( GetCurrentProcess(), nullptr, true );
SymSetOptions( SYMOPT_LOAD_LINES );
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[1024];
const auto proc = GetCurrentProcess();
char buf[sizeof( SYMBOL_INFO ) + 1024];
auto si = (SYMBOL_INFO*)buf;
si->SizeOfStruct = sizeof( SYMBOL_INFO );
si->MaxNameLen = 1024;
if( SymFromAddr( proc, ptr, nullptr, si ) == 0 )
{
*ret = '\0';
}
else
{
memcpy( ret, si->Name, si->NameLen );
ret[si->NameLen] = '\0';
}
return ret;
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
int write;
const auto proc = GetCurrentProcess();
#ifndef __CYGWIN__
DWORD inlineNum = SymAddrIncludeInlineTrace( proc, ptr );
if( inlineNum > MaxCbTrace - 1 ) inlineNum = MaxCbTrace - 1;
DWORD ctx = 0;
DWORD idx;
BOOL doInline = FALSE;
if( inlineNum != 0 ) doInline = SymQueryInlineTrace( proc, ptr, 0, ptr, ptr, &ctx, &idx );
if( doInline )
{
write = inlineNum;
cb_num = 1 + inlineNum;
}
else
#endif
{
write = 0;
cb_num = 1;
}
char buf[sizeof( SYMBOL_INFO ) + 1024];
auto si = (SYMBOL_INFO*)buf;
si->SizeOfStruct = sizeof( SYMBOL_INFO );
si->MaxNameLen = 1024;
if( SymFromAddr( proc, ptr, nullptr, si ) == 0 )
{
memcpy( si->Name, "[unknown]", 10 );
si->NameLen = 9;
}
IMAGEHLP_LINE64 line;
DWORD displacement = 0;
line.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
{
auto name = (char*)tracy_malloc(si->NameLen + 1);
memcpy(name, si->Name, si->NameLen);
name[si->NameLen] = '\0';
cb_data[write].name = name;
const char* filename;
if (SymGetLineFromAddr64(proc, ptr, &displacement, &line) == 0)
{
filename = "[unknown]";
cb_data[write].line = 0;
}
else
{
filename = line.FileName;
cb_data[write].line = line.LineNumber;
}
const auto fsz = strlen(filename);
auto file = (char*)tracy_malloc(fsz + 1);
memcpy(file, filename, fsz);
file[fsz] = '\0';
cb_data[write].file = file;
}
#ifndef __CYGWIN__
if( doInline )
{
for( DWORD i=0; i<inlineNum; i++ )
{
auto& cb = cb_data[i];
if( SymFromInlineContext( proc, ptr, ctx, nullptr, si ) == 0 )
{
memcpy( si->Name, "[unknown]", 10 );
si->NameLen = 9;
}
auto name = (char*)tracy_malloc( si->NameLen + 1 );
memcpy( name, si->Name, si->NameLen );
name[si->NameLen] = '\0';
cb.name = name;
const char* filename;
if( SymGetLineFromInlineContext( proc, ptr, ctx, 0, &displacement, &line ) == 0 )
{
filename = "[unknown]";
cb.line = 0;
}
else
{
filename = line.FileName;
cb.line = line.LineNumber;
}
const auto fsz = strlen( filename );
auto file = (char*)tracy_malloc( fsz + 1 );
memcpy( file, filename, fsz );
file[fsz] = '\0';
cb.file = file;
ctx++;
}
}
#endif
return { cb_data, uint8_t( cb_num ) };
}
#elif TRACY_HAS_CALLSTACK == 4
void InitCallstack()
{
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[1024];
auto vptr = (void*)ptr;
char** sym = nullptr;
const char* symname = nullptr;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) && dlinfo.dli_sname )
{
symname = dlinfo.dli_sname;
}
else
{
sym = backtrace_symbols( &vptr, 1 );
if( sym )
{
symname = *sym;
}
}
if( symname )
{
strcpy( ret, symname );
}
else
{
*ret = '\0';
}
return ret;
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
static CallstackEntry cb;
cb.line = 0;
char* demangled = nullptr;
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)ptr;
char** sym = nullptr;
ptrdiff_t symoff = 0;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)ptr - (char*)dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
{
size_t len = 0;
int status;
demangled = abi::__cxa_demangle( symname, nullptr, &len, &status );
if( status == 0 )
{
symname = demangled;
}
}
}
if( !symname )
{
sym = backtrace_symbols( &vptr, 1 );
if( !sym )
{
symname = "[unknown]";
}
else
{
symname = *sym;
}
}
if( !symloc )
{
symloc = "[unknown]";
}
if( symoff == 0 )
{
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + 1 );
memcpy( name, symname, namelen );
name[namelen] = '\0';
cb.name = name;
}
else
{
char buf[32];
const auto offlen = sprintf( buf, " + %td", symoff );
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + offlen + 1 );
memcpy( name, symname, namelen );
memcpy( name + namelen, buf, offlen );
name[namelen + offlen] = '\0';
cb.name = name;
}
char buf[32];
const auto addrlen = sprintf( buf, " [%p]", (void*)ptr );
const auto loclen = strlen( symloc );
auto loc = (char*)tracy_malloc( loclen + addrlen + 1 );
memcpy( loc, symloc, loclen );
memcpy( loc + loclen, buf, addrlen );
loc[loclen + addrlen] = '\0';
cb.file = loc;
if( sym ) free( sym );
if( demangled ) free( demangled );
return { &cb, 1 };
}
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 3
enum { MaxCbTrace = 16 };
struct backtrace_state* cb_bts;
int cb_num;
CallstackEntry cb_data[MaxCbTrace];
void InitCallstack()
{
cb_bts = backtrace_create_state( nullptr, 0, nullptr, nullptr );
}
static inline char* CopyString( const char* src )
{
const auto sz = strlen( src );
auto dst = (char*)tracy_malloc( sz + 1 );
memcpy( dst, src, sz );
dst[sz] = '\0';
return dst;
}
static int FastCallstackDataCb( void* data, uintptr_t pc, const char* fn, int lineno, const char* function )
{
if( function )
{
strcpy( (char*)data, function );
}
else
{
const char* symname = nullptr;
auto vptr = (void*)pc;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symname = dlinfo.dli_sname;
}
if( symname )
{
strcpy( (char*)data, symname );
}
else
{
*(char*)data = '\0';
}
}
return 1;
}
static void FastCallstackErrorCb( void* data, const char* /*msg*/, int /*errnum*/ )
{
*(char*)data = '\0';
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[1024];
backtrace_pcinfo( cb_bts, ptr, FastCallstackDataCb, FastCallstackErrorCb, ret );
return ret;
}
static int CallstackDataCb( void* /*data*/, uintptr_t pc, const char* fn, int lineno, const char* function )
{
enum { DemangleBufLen = 64*1024 };
char demangled[DemangleBufLen];
if( !fn && !function )
{
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)pc;
ptrdiff_t symoff = 0;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)pc - (char*)dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
{
size_t len = DemangleBufLen;
int status;
abi::__cxa_demangle( symname, demangled, &len, &status );
if( status == 0 )
{
symname = demangled;
}
}
}
if( !symname ) symname = "[unknown]";
if( !symloc ) symloc = "[unknown]";
if( symoff == 0 )
{
cb_data[cb_num].name = CopyString( symname );
}
else
{
char buf[32];
const auto offlen = sprintf( buf, " + %td", symoff );
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + offlen + 1 );
memcpy( name, symname, namelen );
memcpy( name + namelen, buf, offlen );
name[namelen + offlen] = '\0';
cb_data[cb_num].name = name;
}
char buf[32];
const auto addrlen = sprintf( buf, " [%p]", (void*)pc );
const auto loclen = strlen( symloc );
auto loc = (char*)tracy_malloc( loclen + addrlen + 1 );
memcpy( loc, symloc, loclen );
memcpy( loc + loclen, buf, addrlen );
loc[loclen + addrlen] = '\0';
cb_data[cb_num].file = loc;
cb_data[cb_num].line = 0;
}
else
{
if( !fn ) fn = "[unknown]";
if( !function )
{
function = "[unknown]";
}
else
{
if( function[0] == '_' )
{
size_t len = DemangleBufLen;
int status;
abi::__cxa_demangle( function, demangled, &len, &status );
if( status == 0 )
{
function = demangled;
}
}
}
cb_data[cb_num].name = CopyString( function );
cb_data[cb_num].file = CopyString( fn );
cb_data[cb_num].line = lineno;
}
if( ++cb_num >= MaxCbTrace )
{
return 1;
}
else
{
return 0;
}
}
static void CallstackErrorCb( void* /*data*/, const char* /*msg*/, int /*errnum*/ )
{
for( int i=0; i<cb_num; i++ )
{
tracy_free( (void*)cb_data[i].name );
tracy_free( (void*)cb_data[i].file );
}
cb_data[0].name = CopyString( "[error]" );
cb_data[0].file = CopyString( "[error]" );
cb_data[0].line = 0;
cb_num = 1;
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
cb_num = 0;
backtrace_pcinfo( cb_bts, ptr, CallstackDataCb, CallstackErrorCb, nullptr );
assert( cb_num > 0 );
return { cb_data, uint8_t( cb_num ) };
}
#elif TRACY_HAS_CALLSTACK == 5
void InitCallstack()
{
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[1024];
auto vptr = (void*)ptr;
char** sym = nullptr;
const char* symname = nullptr;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) && dlinfo.dli_sname )
{
symname = dlinfo.dli_sname;
}
if( symname )
{
strcpy( ret, symname );
}
else
{
*ret = '\0';
}
return ret;
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
static CallstackEntry cb;
cb.line = 0;
char* demangled = nullptr;
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)ptr;
char** sym = nullptr;
ptrdiff_t symoff = 0;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)ptr - (char*)dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
{
size_t len = 0;
int status;
demangled = abi::__cxa_demangle( symname, nullptr, &len, &status );
if( status == 0 )
{
symname = demangled;
}
}
}
if( !symname )
{
symname = "[unknown]";
}
if( !symloc )
{
symloc = "[unknown]";
}
if( symoff == 0 )
{
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + 1 );
memcpy( name, symname, namelen );
name[namelen] = '\0';
cb.name = name;
}
else
{
char buf[32];
const auto offlen = sprintf( buf, " + %td", symoff );
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + offlen + 1 );
memcpy( name, symname, namelen );
memcpy( name + namelen, buf, offlen );
name[namelen + offlen] = '\0';
cb.name = name;
}
char buf[32];
const auto addrlen = sprintf( buf, " [%p]", (void*)ptr );
const auto loclen = strlen( symloc );
auto loc = (char*)tracy_malloc( loclen + addrlen + 1 );
memcpy( loc, symloc, loclen );
memcpy( loc + loclen, buf, addrlen );
loc[loclen + addrlen] = '\0';
cb.file = loc;
if( sym ) free( sym );
if( demangled ) free( demangled );
return { &cb, 1 };
}
#endif
}
#endif

22
client/TracyCallstack.h Normal file
View File

@@ -0,0 +1,22 @@
#ifndef __TRACYCALLSTACK_H__
#define __TRACYCALLSTACK_H__
#if defined _WIN32 || defined __CYGWIN__
# define TRACY_HAS_CALLSTACK 1
#elif defined __ANDROID__
# if !defined __arm__ || __ANDROID_API__ >= 21
# define TRACY_HAS_CALLSTACK 2
# else
# define TRACY_HAS_CALLSTACK 5
# endif
#elif defined __linux
# if defined _GNU_SOURCE && defined __GLIBC__
# define TRACY_HAS_CALLSTACK 3
# else
# define TRACY_HAS_CALLSTACK 2
# endif
#elif defined __APPLE__
# define TRACY_HAS_CALLSTACK 4
#endif
#endif

112
client/TracyCallstack.hpp Normal file
View File

@@ -0,0 +1,112 @@
#ifndef __TRACYCALLSTACK_HPP__
#define __TRACYCALLSTACK_HPP__
#include "TracyCallstack.h"
#if TRACY_HAS_CALLSTACK == 1
extern "C"
{
typedef unsigned long (__stdcall *t_RtlWalkFrameChain)( void**, unsigned long, unsigned long );
extern t_RtlWalkFrameChain RtlWalkFrameChain;
}
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 5
# include <unwind.h>
#elif TRACY_HAS_CALLSTACK >= 3
# include <execinfo.h>
#endif
#ifdef TRACY_HAS_CALLSTACK
#include <assert.h>
#include <stdint.h>
#include "../common/TracyAlloc.hpp"
#include "../common/TracyForceInline.hpp"
namespace tracy
{
struct CallstackEntry
{
const char* name;
const char* file;
uint32_t line;
};
struct CallstackEntryData
{
const CallstackEntry* data;
uint8_t size;
};
const char* DecodeCallstackPtrFast( uint64_t ptr );
CallstackEntryData DecodeCallstackPtr( uint64_t ptr );
void InitCallstack();
#if TRACY_HAS_CALLSTACK == 1
static tracy_force_inline void* Callstack( int depth )
{
assert( depth >= 1 && depth < 63 );
auto trace = (uintptr_t*)tracy_malloc( ( 1 + depth ) * sizeof( uintptr_t ) );
const auto num = RtlWalkFrameChain( (void**)( trace + 1 ), depth, 0 );
*trace = num;
return trace;
}
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 5
struct BacktraceState
{
void** current;
void** end;
};
static _Unwind_Reason_Code tracy_unwind_callback( struct _Unwind_Context* ctx, void* arg )
{
auto state = (BacktraceState*)arg;
uintptr_t pc = _Unwind_GetIP( ctx );
if( pc )
{
if( state->current == state->end ) return _URC_END_OF_STACK;
*state->current++ = (void*)pc;
}
return _URC_NO_REASON;
}
static tracy_force_inline void* Callstack( int depth )
{
assert( depth >= 1 && depth < 63 );
auto trace = (uintptr_t*)tracy_malloc( ( 1 + depth ) * sizeof( uintptr_t ) );
BacktraceState state = { (void**)(trace+1), (void**)(trace+1+depth) };
_Unwind_Backtrace( tracy_unwind_callback, &state );
*trace = (uintptr_t*)state.current - trace + 1;
return trace;
}
#elif TRACY_HAS_CALLSTACK == 3 || TRACY_HAS_CALLSTACK == 4
static tracy_force_inline void* Callstack( int depth )
{
assert( depth >= 1 );
auto trace = (uintptr_t*)tracy_malloc( ( 1 + depth ) * sizeof( uintptr_t ) );
const auto num = backtrace( (void**)(trace+1), depth );
*trace = num;
return trace;
}
#endif
}
#endif
#endif

646
client/TracyDxt1.cpp Normal file
View File

@@ -0,0 +1,646 @@
#include "TracyDxt1.hpp"
#include "../common/TracyForceInline.hpp"
#include <assert.h>
#include <stdint.h>
#include <string.h>
#ifdef __ARM_NEON
# include <arm_neon.h>
#endif
#if defined __AVX__ && !defined __SSE4_1__
# define __SSE4_1__
#endif
#if defined __SSE4_1__ || defined __AVX2__
# ifdef _MSC_VER
# include <intrin.h>
# else
# include <x86intrin.h>
# ifdef __CYGWIN__
# ifndef _mm256_cvtsi256_si32
# define _mm256_cvtsi256_si32( v ) ( _mm_cvtsi128_si32( _mm256_castsi256_si128( v ) ) )
# endif
# endif
# endif
#endif
namespace tracy
{
static inline uint16_t to565( uint8_t r, uint8_t g, uint8_t b )
{
return ( ( r & 0xF8 ) << 8 ) | ( ( g & 0xFC ) << 3 ) | ( b >> 3 );
}
static inline uint16_t to565( uint32_t c )
{
return
( ( c & 0xF80000 ) >> 19 ) |
( ( c & 0x00FC00 ) >> 5 ) |
( ( c & 0x0000F8 ) << 8 );
}
static const uint16_t DivTable[255*3+1] = {
0xffff, 0xffff, 0xffff, 0xffff, 0xcccc, 0xaaaa, 0x9249, 0x8000, 0x71c7, 0x6666, 0x5d17, 0x5555, 0x4ec4, 0x4924, 0x4444, 0x4000,
0x3c3c, 0x38e3, 0x35e5, 0x3333, 0x30c3, 0x2e8b, 0x2c85, 0x2aaa, 0x28f5, 0x2762, 0x25ed, 0x2492, 0x234f, 0x2222, 0x2108, 0x2000,
0x1f07, 0x1e1e, 0x1d41, 0x1c71, 0x1bac, 0x1af2, 0x1a41, 0x1999, 0x18f9, 0x1861, 0x17d0, 0x1745, 0x16c1, 0x1642, 0x15c9, 0x1555,
0x14e5, 0x147a, 0x1414, 0x13b1, 0x1352, 0x12f6, 0x129e, 0x1249, 0x11f7, 0x11a7, 0x115b, 0x1111, 0x10c9, 0x1084, 0x1041, 0x1000,
0x0fc0, 0x0f83, 0x0f48, 0x0f0f, 0x0ed7, 0x0ea0, 0x0e6c, 0x0e38, 0x0e07, 0x0dd6, 0x0da7, 0x0d79, 0x0d4c, 0x0d20, 0x0cf6, 0x0ccc,
0x0ca4, 0x0c7c, 0x0c56, 0x0c30, 0x0c0c, 0x0be8, 0x0bc5, 0x0ba2, 0x0b81, 0x0b60, 0x0b40, 0x0b21, 0x0b02, 0x0ae4, 0x0ac7, 0x0aaa,
0x0a8e, 0x0a72, 0x0a57, 0x0a3d, 0x0a23, 0x0a0a, 0x09f1, 0x09d8, 0x09c0, 0x09a9, 0x0991, 0x097b, 0x0964, 0x094f, 0x0939, 0x0924,
0x090f, 0x08fb, 0x08e7, 0x08d3, 0x08c0, 0x08ad, 0x089a, 0x0888, 0x0876, 0x0864, 0x0853, 0x0842, 0x0831, 0x0820, 0x0810, 0x0800,
0x07f0, 0x07e0, 0x07d1, 0x07c1, 0x07b3, 0x07a4, 0x0795, 0x0787, 0x0779, 0x076b, 0x075d, 0x0750, 0x0743, 0x0736, 0x0729, 0x071c,
0x070f, 0x0703, 0x06f7, 0x06eb, 0x06df, 0x06d3, 0x06c8, 0x06bc, 0x06b1, 0x06a6, 0x069b, 0x0690, 0x0685, 0x067b, 0x0670, 0x0666,
0x065c, 0x0652, 0x0648, 0x063e, 0x0634, 0x062b, 0x0621, 0x0618, 0x060f, 0x0606, 0x05fd, 0x05f4, 0x05eb, 0x05e2, 0x05d9, 0x05d1,
0x05c9, 0x05c0, 0x05b8, 0x05b0, 0x05a8, 0x05a0, 0x0598, 0x0590, 0x0588, 0x0581, 0x0579, 0x0572, 0x056b, 0x0563, 0x055c, 0x0555,
0x054e, 0x0547, 0x0540, 0x0539, 0x0532, 0x052b, 0x0525, 0x051e, 0x0518, 0x0511, 0x050b, 0x0505, 0x04fe, 0x04f8, 0x04f2, 0x04ec,
0x04e6, 0x04e0, 0x04da, 0x04d4, 0x04ce, 0x04c8, 0x04c3, 0x04bd, 0x04b8, 0x04b2, 0x04ad, 0x04a7, 0x04a2, 0x049c, 0x0497, 0x0492,
0x048d, 0x0487, 0x0482, 0x047d, 0x0478, 0x0473, 0x046e, 0x0469, 0x0465, 0x0460, 0x045b, 0x0456, 0x0452, 0x044d, 0x0448, 0x0444,
0x043f, 0x043b, 0x0436, 0x0432, 0x042d, 0x0429, 0x0425, 0x0421, 0x041c, 0x0418, 0x0414, 0x0410, 0x040c, 0x0408, 0x0404, 0x0400,
0x03fc, 0x03f8, 0x03f4, 0x03f0, 0x03ec, 0x03e8, 0x03e4, 0x03e0, 0x03dd, 0x03d9, 0x03d5, 0x03d2, 0x03ce, 0x03ca, 0x03c7, 0x03c3,
0x03c0, 0x03bc, 0x03b9, 0x03b5, 0x03b2, 0x03ae, 0x03ab, 0x03a8, 0x03a4, 0x03a1, 0x039e, 0x039b, 0x0397, 0x0394, 0x0391, 0x038e,
0x038b, 0x0387, 0x0384, 0x0381, 0x037e, 0x037b, 0x0378, 0x0375, 0x0372, 0x036f, 0x036c, 0x0369, 0x0366, 0x0364, 0x0361, 0x035e,
0x035b, 0x0358, 0x0355, 0x0353, 0x0350, 0x034d, 0x034a, 0x0348, 0x0345, 0x0342, 0x0340, 0x033d, 0x033a, 0x0338, 0x0335, 0x0333,
0x0330, 0x032e, 0x032b, 0x0329, 0x0326, 0x0324, 0x0321, 0x031f, 0x031c, 0x031a, 0x0317, 0x0315, 0x0313, 0x0310, 0x030e, 0x030c,
0x0309, 0x0307, 0x0305, 0x0303, 0x0300, 0x02fe, 0x02fc, 0x02fa, 0x02f7, 0x02f5, 0x02f3, 0x02f1, 0x02ef, 0x02ec, 0x02ea, 0x02e8,
0x02e6, 0x02e4, 0x02e2, 0x02e0, 0x02de, 0x02dc, 0x02da, 0x02d8, 0x02d6, 0x02d4, 0x02d2, 0x02d0, 0x02ce, 0x02cc, 0x02ca, 0x02c8,
0x02c6, 0x02c4, 0x02c2, 0x02c0, 0x02be, 0x02bc, 0x02bb, 0x02b9, 0x02b7, 0x02b5, 0x02b3, 0x02b1, 0x02b0, 0x02ae, 0x02ac, 0x02aa,
0x02a8, 0x02a7, 0x02a5, 0x02a3, 0x02a1, 0x02a0, 0x029e, 0x029c, 0x029b, 0x0299, 0x0297, 0x0295, 0x0294, 0x0292, 0x0291, 0x028f,
0x028d, 0x028c, 0x028a, 0x0288, 0x0287, 0x0285, 0x0284, 0x0282, 0x0280, 0x027f, 0x027d, 0x027c, 0x027a, 0x0279, 0x0277, 0x0276,
0x0274, 0x0273, 0x0271, 0x0270, 0x026e, 0x026d, 0x026b, 0x026a, 0x0268, 0x0267, 0x0265, 0x0264, 0x0263, 0x0261, 0x0260, 0x025e,
0x025d, 0x025c, 0x025a, 0x0259, 0x0257, 0x0256, 0x0255, 0x0253, 0x0252, 0x0251, 0x024f, 0x024e, 0x024d, 0x024b, 0x024a, 0x0249,
0x0247, 0x0246, 0x0245, 0x0243, 0x0242, 0x0241, 0x0240, 0x023e, 0x023d, 0x023c, 0x023b, 0x0239, 0x0238, 0x0237, 0x0236, 0x0234,
0x0233, 0x0232, 0x0231, 0x0230, 0x022e, 0x022d, 0x022c, 0x022b, 0x022a, 0x0229, 0x0227, 0x0226, 0x0225, 0x0224, 0x0223, 0x0222,
0x0220, 0x021f, 0x021e, 0x021d, 0x021c, 0x021b, 0x021a, 0x0219, 0x0218, 0x0216, 0x0215, 0x0214, 0x0213, 0x0212, 0x0211, 0x0210,
0x020f, 0x020e, 0x020d, 0x020c, 0x020b, 0x020a, 0x0209, 0x0208, 0x0207, 0x0206, 0x0205, 0x0204, 0x0203, 0x0202, 0x0201, 0x0200,
0x01ff, 0x01fe, 0x01fd, 0x01fc, 0x01fb, 0x01fa, 0x01f9, 0x01f8, 0x01f7, 0x01f6, 0x01f5, 0x01f4, 0x01f3, 0x01f2, 0x01f1, 0x01f0,
0x01ef, 0x01ee, 0x01ed, 0x01ec, 0x01eb, 0x01ea, 0x01e9, 0x01e9, 0x01e8, 0x01e7, 0x01e6, 0x01e5, 0x01e4, 0x01e3, 0x01e2, 0x01e1,
0x01e0, 0x01e0, 0x01df, 0x01de, 0x01dd, 0x01dc, 0x01db, 0x01da, 0x01da, 0x01d9, 0x01d8, 0x01d7, 0x01d6, 0x01d5, 0x01d4, 0x01d4,
0x01d3, 0x01d2, 0x01d1, 0x01d0, 0x01cf, 0x01cf, 0x01ce, 0x01cd, 0x01cc, 0x01cb, 0x01cb, 0x01ca, 0x01c9, 0x01c8, 0x01c7, 0x01c7,
0x01c6, 0x01c5, 0x01c4, 0x01c3, 0x01c3, 0x01c2, 0x01c1, 0x01c0, 0x01c0, 0x01bf, 0x01be, 0x01bd, 0x01bd, 0x01bc, 0x01bb, 0x01ba,
0x01ba, 0x01b9, 0x01b8, 0x01b7, 0x01b7, 0x01b6, 0x01b5, 0x01b4, 0x01b4, 0x01b3, 0x01b2, 0x01b2, 0x01b1, 0x01b0, 0x01af, 0x01af,
0x01ae, 0x01ad, 0x01ad, 0x01ac, 0x01ab, 0x01aa, 0x01aa, 0x01a9, 0x01a8, 0x01a8, 0x01a7, 0x01a6, 0x01a6, 0x01a5, 0x01a4, 0x01a4,
0x01a3, 0x01a2, 0x01a2, 0x01a1, 0x01a0, 0x01a0, 0x019f, 0x019e, 0x019e, 0x019d, 0x019c, 0x019c, 0x019b, 0x019a, 0x019a, 0x0199,
0x0198, 0x0198, 0x0197, 0x0197, 0x0196, 0x0195, 0x0195, 0x0194, 0x0193, 0x0193, 0x0192, 0x0192, 0x0191, 0x0190, 0x0190, 0x018f,
0x018f, 0x018e, 0x018d, 0x018d, 0x018c, 0x018b, 0x018b, 0x018a, 0x018a, 0x0189, 0x0189, 0x0188, 0x0187, 0x0187, 0x0186, 0x0186,
0x0185, 0x0184, 0x0184, 0x0183, 0x0183, 0x0182, 0x0182, 0x0181, 0x0180, 0x0180, 0x017f, 0x017f, 0x017e, 0x017e, 0x017d, 0x017d,
0x017c, 0x017b, 0x017b, 0x017a, 0x017a, 0x0179, 0x0179, 0x0178, 0x0178, 0x0177, 0x0177, 0x0176, 0x0175, 0x0175, 0x0174, 0x0174,
0x0173, 0x0173, 0x0172, 0x0172, 0x0171, 0x0171, 0x0170, 0x0170, 0x016f, 0x016f, 0x016e, 0x016e, 0x016d, 0x016d, 0x016c, 0x016c,
0x016b, 0x016b, 0x016a, 0x016a, 0x0169, 0x0169, 0x0168, 0x0168, 0x0167, 0x0167, 0x0166, 0x0166, 0x0165, 0x0165, 0x0164, 0x0164,
0x0163, 0x0163, 0x0162, 0x0162, 0x0161, 0x0161, 0x0160, 0x0160, 0x015f, 0x015f, 0x015e, 0x015e, 0x015d, 0x015d, 0x015d, 0x015c,
0x015c, 0x015b, 0x015b, 0x015a, 0x015a, 0x0159, 0x0159, 0x0158, 0x0158, 0x0158, 0x0157, 0x0157, 0x0156, 0x0156
};
static const uint16_t DivTableAVX[255*3+1] = {
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x0000, 0x38e3, 0x35e5, 0x3333, 0x30c3, 0x2e8b, 0x2c85, 0x2aaa, 0x28f5, 0x2762, 0x25ed, 0x2492, 0x234f, 0x2222, 0x2108, 0x2000,
0x1f07, 0x1e1e, 0x1d41, 0x1c71, 0x1bac, 0x1af2, 0x1a41, 0x1999, 0x18f9, 0x1861, 0x17d0, 0x1745, 0x16c1, 0x1642, 0x15c9, 0x1555,
0x14e5, 0x147a, 0x1414, 0x13b1, 0x1352, 0x12f6, 0x129e, 0x1249, 0x11f7, 0x11a7, 0x115b, 0x1111, 0x10c9, 0x1084, 0x1041, 0x1000,
0x0fc0, 0x0f83, 0x0f48, 0x0f0f, 0x0ed7, 0x0ea0, 0x0e6c, 0x0e38, 0x0e07, 0x0dd6, 0x0da7, 0x0d79, 0x0d4c, 0x0d20, 0x0cf6, 0x0ccc,
0x0ca4, 0x0c7c, 0x0c56, 0x0c30, 0x0c0c, 0x0be8, 0x0bc5, 0x0ba2, 0x0b81, 0x0b60, 0x0b40, 0x0b21, 0x0b02, 0x0ae4, 0x0ac7, 0x0aaa,
0x0a8e, 0x0a72, 0x0a57, 0x0a3d, 0x0a23, 0x0a0a, 0x09f1, 0x09d8, 0x09c0, 0x09a9, 0x0991, 0x097b, 0x0964, 0x094f, 0x0939, 0x0924,
0x090f, 0x08fb, 0x08e7, 0x08d3, 0x08c0, 0x08ad, 0x089a, 0x0888, 0x0876, 0x0864, 0x0853, 0x0842, 0x0831, 0x0820, 0x0810, 0x0800,
0x07f0, 0x07e0, 0x07d1, 0x07c1, 0x07b3, 0x07a4, 0x0795, 0x0787, 0x0779, 0x076b, 0x075d, 0x0750, 0x0743, 0x0736, 0x0729, 0x071c,
0x070f, 0x0703, 0x06f7, 0x06eb, 0x06df, 0x06d3, 0x06c8, 0x06bc, 0x06b1, 0x06a6, 0x069b, 0x0690, 0x0685, 0x067b, 0x0670, 0x0666,
0x065c, 0x0652, 0x0648, 0x063e, 0x0634, 0x062b, 0x0621, 0x0618, 0x060f, 0x0606, 0x05fd, 0x05f4, 0x05eb, 0x05e2, 0x05d9, 0x05d1,
0x05c9, 0x05c0, 0x05b8, 0x05b0, 0x05a8, 0x05a0, 0x0598, 0x0590, 0x0588, 0x0581, 0x0579, 0x0572, 0x056b, 0x0563, 0x055c, 0x0555,
0x054e, 0x0547, 0x0540, 0x0539, 0x0532, 0x052b, 0x0525, 0x051e, 0x0518, 0x0511, 0x050b, 0x0505, 0x04fe, 0x04f8, 0x04f2, 0x04ec,
0x04e6, 0x04e0, 0x04da, 0x04d4, 0x04ce, 0x04c8, 0x04c3, 0x04bd, 0x04b8, 0x04b2, 0x04ad, 0x04a7, 0x04a2, 0x049c, 0x0497, 0x0492,
0x048d, 0x0487, 0x0482, 0x047d, 0x0478, 0x0473, 0x046e, 0x0469, 0x0465, 0x0460, 0x045b, 0x0456, 0x0452, 0x044d, 0x0448, 0x0444,
0x043f, 0x043b, 0x0436, 0x0432, 0x042d, 0x0429, 0x0425, 0x0421, 0x041c, 0x0418, 0x0414, 0x0410, 0x040c, 0x0408, 0x0404, 0x0400,
0x03fc, 0x03f8, 0x03f4, 0x03f0, 0x03ec, 0x03e8, 0x03e4, 0x03e0, 0x03dd, 0x03d9, 0x03d5, 0x03d2, 0x03ce, 0x03ca, 0x03c7, 0x03c3,
0x03c0, 0x03bc, 0x03b9, 0x03b5, 0x03b2, 0x03ae, 0x03ab, 0x03a8, 0x03a4, 0x03a1, 0x039e, 0x039b, 0x0397, 0x0394, 0x0391, 0x038e,
0x038b, 0x0387, 0x0384, 0x0381, 0x037e, 0x037b, 0x0378, 0x0375, 0x0372, 0x036f, 0x036c, 0x0369, 0x0366, 0x0364, 0x0361, 0x035e,
0x035b, 0x0358, 0x0355, 0x0353, 0x0350, 0x034d, 0x034a, 0x0348, 0x0345, 0x0342, 0x0340, 0x033d, 0x033a, 0x0338, 0x0335, 0x0333,
0x0330, 0x032e, 0x032b, 0x0329, 0x0326, 0x0324, 0x0321, 0x031f, 0x031c, 0x031a, 0x0317, 0x0315, 0x0313, 0x0310, 0x030e, 0x030c,
0x0309, 0x0307, 0x0305, 0x0303, 0x0300, 0x02fe, 0x02fc, 0x02fa, 0x02f7, 0x02f5, 0x02f3, 0x02f1, 0x02ef, 0x02ec, 0x02ea, 0x02e8,
0x02e6, 0x02e4, 0x02e2, 0x02e0, 0x02de, 0x02dc, 0x02da, 0x02d8, 0x02d6, 0x02d4, 0x02d2, 0x02d0, 0x02ce, 0x02cc, 0x02ca, 0x02c8,
0x02c6, 0x02c4, 0x02c2, 0x02c0, 0x02be, 0x02bc, 0x02bb, 0x02b9, 0x02b7, 0x02b5, 0x02b3, 0x02b1, 0x02b0, 0x02ae, 0x02ac, 0x02aa,
0x02a8, 0x02a7, 0x02a5, 0x02a3, 0x02a1, 0x02a0, 0x029e, 0x029c, 0x029b, 0x0299, 0x0297, 0x0295, 0x0294, 0x0292, 0x0291, 0x028f,
0x028d, 0x028c, 0x028a, 0x0288, 0x0287, 0x0285, 0x0284, 0x0282, 0x0280, 0x027f, 0x027d, 0x027c, 0x027a, 0x0279, 0x0277, 0x0276,
0x0274, 0x0273, 0x0271, 0x0270, 0x026e, 0x026d, 0x026b, 0x026a, 0x0268, 0x0267, 0x0265, 0x0264, 0x0263, 0x0261, 0x0260, 0x025e,
0x025d, 0x025c, 0x025a, 0x0259, 0x0257, 0x0256, 0x0255, 0x0253, 0x0252, 0x0251, 0x024f, 0x024e, 0x024d, 0x024b, 0x024a, 0x0249,
0x0247, 0x0246, 0x0245, 0x0243, 0x0242, 0x0241, 0x0240, 0x023e, 0x023d, 0x023c, 0x023b, 0x0239, 0x0238, 0x0237, 0x0236, 0x0234,
0x0233, 0x0232, 0x0231, 0x0230, 0x022e, 0x022d, 0x022c, 0x022b, 0x022a, 0x0229, 0x0227, 0x0226, 0x0225, 0x0224, 0x0223, 0x0222,
0x0220, 0x021f, 0x021e, 0x021d, 0x021c, 0x021b, 0x021a, 0x0219, 0x0218, 0x0216, 0x0215, 0x0214, 0x0213, 0x0212, 0x0211, 0x0210,
0x020f, 0x020e, 0x020d, 0x020c, 0x020b, 0x020a, 0x0209, 0x0208, 0x0207, 0x0206, 0x0205, 0x0204, 0x0203, 0x0202, 0x0201, 0x0200,
0x01ff, 0x01fe, 0x01fd, 0x01fc, 0x01fb, 0x01fa, 0x01f9, 0x01f8, 0x01f7, 0x01f6, 0x01f5, 0x01f4, 0x01f3, 0x01f2, 0x01f1, 0x01f0,
0x01ef, 0x01ee, 0x01ed, 0x01ec, 0x01eb, 0x01ea, 0x01e9, 0x01e9, 0x01e8, 0x01e7, 0x01e6, 0x01e5, 0x01e4, 0x01e3, 0x01e2, 0x01e1,
0x01e0, 0x01e0, 0x01df, 0x01de, 0x01dd, 0x01dc, 0x01db, 0x01da, 0x01da, 0x01d9, 0x01d8, 0x01d7, 0x01d6, 0x01d5, 0x01d4, 0x01d4,
0x01d3, 0x01d2, 0x01d1, 0x01d0, 0x01cf, 0x01cf, 0x01ce, 0x01cd, 0x01cc, 0x01cb, 0x01cb, 0x01ca, 0x01c9, 0x01c8, 0x01c7, 0x01c7,
0x01c6, 0x01c5, 0x01c4, 0x01c3, 0x01c3, 0x01c2, 0x01c1, 0x01c0, 0x01c0, 0x01bf, 0x01be, 0x01bd, 0x01bd, 0x01bc, 0x01bb, 0x01ba,
0x01ba, 0x01b9, 0x01b8, 0x01b7, 0x01b7, 0x01b6, 0x01b5, 0x01b4, 0x01b4, 0x01b3, 0x01b2, 0x01b2, 0x01b1, 0x01b0, 0x01af, 0x01af,
0x01ae, 0x01ad, 0x01ad, 0x01ac, 0x01ab, 0x01aa, 0x01aa, 0x01a9, 0x01a8, 0x01a8, 0x01a7, 0x01a6, 0x01a6, 0x01a5, 0x01a4, 0x01a4,
0x01a3, 0x01a2, 0x01a2, 0x01a1, 0x01a0, 0x01a0, 0x019f, 0x019e, 0x019e, 0x019d, 0x019c, 0x019c, 0x019b, 0x019a, 0x019a, 0x0199,
0x0198, 0x0198, 0x0197, 0x0197, 0x0196, 0x0195, 0x0195, 0x0194, 0x0193, 0x0193, 0x0192, 0x0192, 0x0191, 0x0190, 0x0190, 0x018f,
0x018f, 0x018e, 0x018d, 0x018d, 0x018c, 0x018b, 0x018b, 0x018a, 0x018a, 0x0189, 0x0189, 0x0188, 0x0187, 0x0187, 0x0186, 0x0186,
0x0185, 0x0184, 0x0184, 0x0183, 0x0183, 0x0182, 0x0182, 0x0181, 0x0180, 0x0180, 0x017f, 0x017f, 0x017e, 0x017e, 0x017d, 0x017d,
0x017c, 0x017b, 0x017b, 0x017a, 0x017a, 0x0179, 0x0179, 0x0178, 0x0178, 0x0177, 0x0177, 0x0176, 0x0175, 0x0175, 0x0174, 0x0174,
0x0173, 0x0173, 0x0172, 0x0172, 0x0171, 0x0171, 0x0170, 0x0170, 0x016f, 0x016f, 0x016e, 0x016e, 0x016d, 0x016d, 0x016c, 0x016c,
0x016b, 0x016b, 0x016a, 0x016a, 0x0169, 0x0169, 0x0168, 0x0168, 0x0167, 0x0167, 0x0166, 0x0166, 0x0165, 0x0165, 0x0164, 0x0164,
0x0163, 0x0163, 0x0162, 0x0162, 0x0161, 0x0161, 0x0160, 0x0160, 0x015f, 0x015f, 0x015e, 0x015e, 0x015d, 0x015d, 0x015d, 0x015c,
0x015c, 0x015b, 0x015b, 0x015a, 0x015a, 0x0159, 0x0159, 0x0158, 0x0158, 0x0158, 0x0157, 0x0157, 0x0156, 0x0156
};
static const uint16_t DivTableNEON[255*3+1] = {
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x0000, 0x1c71, 0x1af2, 0x1999, 0x1861, 0x1745, 0x1642, 0x1555, 0x147a, 0x13b1, 0x12f6, 0x1249, 0x11a7, 0x1111, 0x1084, 0x1000,
0x0f83, 0x0f0f, 0x0ea0, 0x0e38, 0x0dd6, 0x0d79, 0x0d20, 0x0ccc, 0x0c7c, 0x0c30, 0x0be8, 0x0ba2, 0x0b60, 0x0b21, 0x0ae4, 0x0aaa,
0x0a72, 0x0a3d, 0x0a0a, 0x09d8, 0x09a9, 0x097b, 0x094f, 0x0924, 0x08fb, 0x08d3, 0x08ad, 0x0888, 0x0864, 0x0842, 0x0820, 0x0800,
0x07e0, 0x07c1, 0x07a4, 0x0787, 0x076b, 0x0750, 0x0736, 0x071c, 0x0703, 0x06eb, 0x06d3, 0x06bc, 0x06a6, 0x0690, 0x067b, 0x0666,
0x0652, 0x063e, 0x062b, 0x0618, 0x0606, 0x05f4, 0x05e2, 0x05d1, 0x05c0, 0x05b0, 0x05a0, 0x0590, 0x0581, 0x0572, 0x0563, 0x0555,
0x0547, 0x0539, 0x052b, 0x051e, 0x0511, 0x0505, 0x04f8, 0x04ec, 0x04e0, 0x04d4, 0x04c8, 0x04bd, 0x04b2, 0x04a7, 0x049c, 0x0492,
0x0487, 0x047d, 0x0473, 0x0469, 0x0460, 0x0456, 0x044d, 0x0444, 0x043b, 0x0432, 0x0429, 0x0421, 0x0418, 0x0410, 0x0408, 0x0400,
0x03f8, 0x03f0, 0x03e8, 0x03e0, 0x03d9, 0x03d2, 0x03ca, 0x03c3, 0x03bc, 0x03b5, 0x03ae, 0x03a8, 0x03a1, 0x039b, 0x0394, 0x038e,
0x0387, 0x0381, 0x037b, 0x0375, 0x036f, 0x0369, 0x0364, 0x035e, 0x0358, 0x0353, 0x034d, 0x0348, 0x0342, 0x033d, 0x0338, 0x0333,
0x032e, 0x0329, 0x0324, 0x031f, 0x031a, 0x0315, 0x0310, 0x030c, 0x0307, 0x0303, 0x02fe, 0x02fa, 0x02f5, 0x02f1, 0x02ec, 0x02e8,
0x02e4, 0x02e0, 0x02dc, 0x02d8, 0x02d4, 0x02d0, 0x02cc, 0x02c8, 0x02c4, 0x02c0, 0x02bc, 0x02b9, 0x02b5, 0x02b1, 0x02ae, 0x02aa,
0x02a7, 0x02a3, 0x02a0, 0x029c, 0x0299, 0x0295, 0x0292, 0x028f, 0x028c, 0x0288, 0x0285, 0x0282, 0x027f, 0x027c, 0x0279, 0x0276,
0x0273, 0x0270, 0x026d, 0x026a, 0x0267, 0x0264, 0x0261, 0x025e, 0x025c, 0x0259, 0x0256, 0x0253, 0x0251, 0x024e, 0x024b, 0x0249,
0x0246, 0x0243, 0x0241, 0x023e, 0x023c, 0x0239, 0x0237, 0x0234, 0x0232, 0x0230, 0x022d, 0x022b, 0x0229, 0x0226, 0x0224, 0x0222,
0x021f, 0x021d, 0x021b, 0x0219, 0x0216, 0x0214, 0x0212, 0x0210, 0x020e, 0x020c, 0x020a, 0x0208, 0x0206, 0x0204, 0x0202, 0x0200,
0x01fe, 0x01fc, 0x01fa, 0x01f8, 0x01f6, 0x01f4, 0x01f2, 0x01f0, 0x01ee, 0x01ec, 0x01ea, 0x01e9, 0x01e7, 0x01e5, 0x01e3, 0x01e1,
0x01e0, 0x01de, 0x01dc, 0x01da, 0x01d9, 0x01d7, 0x01d5, 0x01d4, 0x01d2, 0x01d0, 0x01cf, 0x01cd, 0x01cb, 0x01ca, 0x01c8, 0x01c7,
0x01c5, 0x01c3, 0x01c2, 0x01c0, 0x01bf, 0x01bd, 0x01bc, 0x01ba, 0x01b9, 0x01b7, 0x01b6, 0x01b4, 0x01b3, 0x01b2, 0x01b0, 0x01af,
0x01ad, 0x01ac, 0x01aa, 0x01a9, 0x01a8, 0x01a6, 0x01a5, 0x01a4, 0x01a2, 0x01a1, 0x01a0, 0x019e, 0x019d, 0x019c, 0x019a, 0x0199,
0x0198, 0x0197, 0x0195, 0x0194, 0x0193, 0x0192, 0x0190, 0x018f, 0x018e, 0x018d, 0x018b, 0x018a, 0x0189, 0x0188, 0x0187, 0x0186,
0x0184, 0x0183, 0x0182, 0x0181, 0x0180, 0x017f, 0x017e, 0x017d, 0x017b, 0x017a, 0x0179, 0x0178, 0x0177, 0x0176, 0x0175, 0x0174,
0x0173, 0x0172, 0x0171, 0x0170, 0x016f, 0x016e, 0x016d, 0x016c, 0x016b, 0x016a, 0x0169, 0x0168, 0x0167, 0x0166, 0x0165, 0x0164,
0x0163, 0x0162, 0x0161, 0x0160, 0x015f, 0x015e, 0x015d, 0x015c, 0x015b, 0x015a, 0x0159, 0x0158, 0x0158, 0x0157, 0x0156, 0x0155,
0x0154, 0x0153, 0x0152, 0x0151, 0x0150, 0x0150, 0x014f, 0x014e, 0x014d, 0x014c, 0x014b, 0x014a, 0x014a, 0x0149, 0x0148, 0x0147,
0x0146, 0x0146, 0x0145, 0x0144, 0x0143, 0x0142, 0x0142, 0x0141, 0x0140, 0x013f, 0x013e, 0x013e, 0x013d, 0x013c, 0x013b, 0x013b,
0x013a, 0x0139, 0x0138, 0x0138, 0x0137, 0x0136, 0x0135, 0x0135, 0x0134, 0x0133, 0x0132, 0x0132, 0x0131, 0x0130, 0x0130, 0x012f,
0x012e, 0x012e, 0x012d, 0x012c, 0x012b, 0x012b, 0x012a, 0x0129, 0x0129, 0x0128, 0x0127, 0x0127, 0x0126, 0x0125, 0x0125, 0x0124,
0x0123, 0x0123, 0x0122, 0x0121, 0x0121, 0x0120, 0x0120, 0x011f, 0x011e, 0x011e, 0x011d, 0x011c, 0x011c, 0x011b, 0x011b, 0x011a,
0x0119, 0x0119, 0x0118, 0x0118, 0x0117, 0x0116, 0x0116, 0x0115, 0x0115, 0x0114, 0x0113, 0x0113, 0x0112, 0x0112, 0x0111, 0x0111,
0x0110, 0x010f, 0x010f, 0x010e, 0x010e, 0x010d, 0x010d, 0x010c, 0x010c, 0x010b, 0x010a, 0x010a, 0x0109, 0x0109, 0x0108, 0x0108,
0x0107, 0x0107, 0x0106, 0x0106, 0x0105, 0x0105, 0x0104, 0x0104, 0x0103, 0x0103, 0x0102, 0x0102, 0x0101, 0x0101, 0x0100, 0x0100,
0x00ff, 0x00ff, 0x00fe, 0x00fe, 0x00fd, 0x00fd, 0x00fc, 0x00fc, 0x00fb, 0x00fb, 0x00fa, 0x00fa, 0x00f9, 0x00f9, 0x00f8, 0x00f8,
0x00f7, 0x00f7, 0x00f6, 0x00f6, 0x00f5, 0x00f5, 0x00f4, 0x00f4, 0x00f4, 0x00f3, 0x00f3, 0x00f2, 0x00f2, 0x00f1, 0x00f1, 0x00f0,
0x00f0, 0x00f0, 0x00ef, 0x00ef, 0x00ee, 0x00ee, 0x00ed, 0x00ed, 0x00ed, 0x00ec, 0x00ec, 0x00eb, 0x00eb, 0x00ea, 0x00ea, 0x00ea,
0x00e9, 0x00e9, 0x00e8, 0x00e8, 0x00e7, 0x00e7, 0x00e7, 0x00e6, 0x00e6, 0x00e5, 0x00e5, 0x00e5, 0x00e4, 0x00e4, 0x00e3, 0x00e3,
0x00e3, 0x00e2, 0x00e2, 0x00e1, 0x00e1, 0x00e1, 0x00e0, 0x00e0, 0x00e0, 0x00df, 0x00df, 0x00de, 0x00de, 0x00de, 0x00dd, 0x00dd,
0x00dd, 0x00dc, 0x00dc, 0x00db, 0x00db, 0x00db, 0x00da, 0x00da, 0x00da, 0x00d9, 0x00d9, 0x00d9, 0x00d8, 0x00d8, 0x00d7, 0x00d7,
0x00d7, 0x00d6, 0x00d6, 0x00d6, 0x00d5, 0x00d5, 0x00d5, 0x00d4, 0x00d4, 0x00d4, 0x00d3, 0x00d3, 0x00d3, 0x00d2, 0x00d2, 0x00d2,
0x00d1, 0x00d1, 0x00d1, 0x00d0, 0x00d0, 0x00d0, 0x00cf, 0x00cf, 0x00cf, 0x00ce, 0x00ce, 0x00ce, 0x00cd, 0x00cd, 0x00cd, 0x00cc,
0x00cc, 0x00cc, 0x00cb, 0x00cb, 0x00cb, 0x00ca, 0x00ca, 0x00ca, 0x00c9, 0x00c9, 0x00c9, 0x00c9, 0x00c8, 0x00c8, 0x00c8, 0x00c7,
0x00c7, 0x00c7, 0x00c6, 0x00c6, 0x00c6, 0x00c5, 0x00c5, 0x00c5, 0x00c5, 0x00c4, 0x00c4, 0x00c4, 0x00c3, 0x00c3, 0x00c3, 0x00c3,
0x00c2, 0x00c2, 0x00c2, 0x00c1, 0x00c1, 0x00c1, 0x00c1, 0x00c0, 0x00c0, 0x00c0, 0x00bf, 0x00bf, 0x00bf, 0x00bf, 0x00be, 0x00be,
0x00be, 0x00bd, 0x00bd, 0x00bd, 0x00bd, 0x00bc, 0x00bc, 0x00bc, 0x00bc, 0x00bb, 0x00bb, 0x00bb, 0x00ba, 0x00ba, 0x00ba, 0x00ba,
0x00b9, 0x00b9, 0x00b9, 0x00b9, 0x00b8, 0x00b8, 0x00b8, 0x00b8, 0x00b7, 0x00b7, 0x00b7, 0x00b7, 0x00b6, 0x00b6, 0x00b6, 0x00b6,
0x00b5, 0x00b5, 0x00b5, 0x00b5, 0x00b4, 0x00b4, 0x00b4, 0x00b4, 0x00b3, 0x00b3, 0x00b3, 0x00b3, 0x00b2, 0x00b2, 0x00b2, 0x00b2,
0x00b1, 0x00b1, 0x00b1, 0x00b1, 0x00b0, 0x00b0, 0x00b0, 0x00b0, 0x00af, 0x00af, 0x00af, 0x00af, 0x00ae, 0x00ae, 0x00ae, 0x00ae,
0x00ae, 0x00ad, 0x00ad, 0x00ad, 0x00ad, 0x00ac, 0x00ac, 0x00ac, 0x00ac, 0x00ac, 0x00ab, 0x00ab, 0x00ab, 0x00ab,
};
static tracy_force_inline uint64_t ProcessRGB( const uint8_t* src )
{
#ifdef __SSE4_1__
__m128i px0 = _mm_loadu_si128(((__m128i*)src) + 0);
__m128i px1 = _mm_loadu_si128(((__m128i*)src) + 1);
__m128i px2 = _mm_loadu_si128(((__m128i*)src) + 2);
__m128i px3 = _mm_loadu_si128(((__m128i*)src) + 3);
__m128i smask = _mm_set1_epi32( 0xF8FCF8 );
__m128i sd0 = _mm_and_si128( px0, smask );
__m128i sd1 = _mm_and_si128( px1, smask );
__m128i sd2 = _mm_and_si128( px2, smask );
__m128i sd3 = _mm_and_si128( px3, smask );
__m128i sc = _mm_shuffle_epi32(sd0, _MM_SHUFFLE(0, 0, 0, 0));
__m128i sc0 = _mm_cmpeq_epi8(sd0, sc);
__m128i sc1 = _mm_cmpeq_epi8(sd1, sc);
__m128i sc2 = _mm_cmpeq_epi8(sd2, sc);
__m128i sc3 = _mm_cmpeq_epi8(sd3, sc);
__m128i sm0 = _mm_and_si128(sc0, sc1);
__m128i sm1 = _mm_and_si128(sc2, sc3);
__m128i sm = _mm_and_si128(sm0, sm1);
if( _mm_testc_si128(sm, _mm_set1_epi32(-1)) )
{
return uint64_t( to565( src[0], src[1], src[2] ) ) << 16;
}
__m128i min0 = _mm_min_epu8( px0, px1 );
__m128i min1 = _mm_min_epu8( px2, px3 );
__m128i min2 = _mm_min_epu8( min0, min1 );
__m128i max0 = _mm_max_epu8( px0, px1 );
__m128i max1 = _mm_max_epu8( px2, px3 );
__m128i max2 = _mm_max_epu8( max0, max1 );
__m128i min3 = _mm_shuffle_epi32( min2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m128i max3 = _mm_shuffle_epi32( max2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m128i min4 = _mm_min_epu8( min2, min3 );
__m128i max4 = _mm_max_epu8( max2, max3 );
__m128i min5 = _mm_shuffle_epi32( min4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m128i max5 = _mm_shuffle_epi32( max4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m128i rmin = _mm_min_epu8( min4, min5 );
__m128i rmax = _mm_max_epu8( max4, max5 );
__m128i range1 = _mm_subs_epu8( rmax, rmin );
__m128i range2 = _mm_sad_epu8( rmax, rmin );
uint32_t vrange = _mm_cvtsi128_si32( range2 ) >> 1;
__m128i range = _mm_set1_epi16( DivTable[vrange] );
__m128i inset1 = _mm_srli_epi16( range1, 4 );
__m128i inset = _mm_and_si128( inset1, _mm_set1_epi8( 0xF ) );
__m128i min = _mm_adds_epu8( rmin, inset );
__m128i max = _mm_subs_epu8( rmax, inset );
__m128i c0 = _mm_subs_epu8( px0, rmin );
__m128i c1 = _mm_subs_epu8( px1, rmin );
__m128i c2 = _mm_subs_epu8( px2, rmin );
__m128i c3 = _mm_subs_epu8( px3, rmin );
__m128i is0 = _mm_maddubs_epi16( c0, _mm_set1_epi8( 1 ) );
__m128i is1 = _mm_maddubs_epi16( c1, _mm_set1_epi8( 1 ) );
__m128i is2 = _mm_maddubs_epi16( c2, _mm_set1_epi8( 1 ) );
__m128i is3 = _mm_maddubs_epi16( c3, _mm_set1_epi8( 1 ) );
__m128i s0 = _mm_hadd_epi16( is0, is1 );
__m128i s1 = _mm_hadd_epi16( is2, is3 );
__m128i m0 = _mm_mulhi_epu16( s0, range );
__m128i m1 = _mm_mulhi_epu16( s1, range );
__m128i p0 = _mm_packus_epi16( m0, m1 );
__m128i p1 = _mm_or_si128( _mm_srai_epi32( p0, 6 ), _mm_srai_epi32( p0, 12 ) );
__m128i p2 = _mm_or_si128( _mm_srai_epi32( p0, 18 ), p0 );
__m128i p3 = _mm_or_si128( p1, p2 );
__m128i p =_mm_shuffle_epi8( p3, _mm_set1_epi32( 0x0C080400 ) );
uint32_t vmin = _mm_cvtsi128_si32( min );
uint32_t vmax = _mm_cvtsi128_si32( max );
uint32_t vp = _mm_cvtsi128_si32( p );
return uint64_t( ( uint64_t( to565( vmin ) ) << 16 ) | to565( vmax ) | ( uint64_t( vp ) << 32 ) );
#elif defined __ARM_NEON
# ifdef __aarch64__
uint8x16x4_t px = vld4q_u8( src );
uint8x16_t lr = px.val[0];
uint8x16_t lg = px.val[1];
uint8x16_t lb = px.val[2];
uint8_t rmaxr = vmaxvq_u8( lr );
uint8_t rmaxg = vmaxvq_u8( lg );
uint8_t rmaxb = vmaxvq_u8( lb );
uint8_t rminr = vminvq_u8( lr );
uint8_t rming = vminvq_u8( lg );
uint8_t rminb = vminvq_u8( lb );
int rr = rmaxr - rminr;
int rg = rmaxg - rming;
int rb = rmaxb - rminb;
int vrange1 = rr + rg + rb;
uint16_t vrange2 = DivTableNEON[vrange1];
uint8_t insetr = rr >> 4;
uint8_t insetg = rg >> 4;
uint8_t insetb = rb >> 4;
uint8_t minr = rminr + insetr;
uint8_t ming = rming + insetg;
uint8_t minb = rminb + insetb;
uint8_t maxr = rmaxr - insetr;
uint8_t maxg = rmaxg - insetg;
uint8_t maxb = rmaxb - insetb;
uint8x16_t cr = vsubq_u8( lr, vdupq_n_u8( rminr ) );
uint8x16_t cg = vsubq_u8( lg, vdupq_n_u8( rming ) );
uint8x16_t cb = vsubq_u8( lb, vdupq_n_u8( rminb ) );
uint16x8_t is0l = vaddl_u8( vget_low_u8( cr ), vget_low_u8( cg ) );
uint16x8_t is0h = vaddl_u8( vget_high_u8( cr ), vget_high_u8( cg ) );
uint16x8_t is1l = vaddw_u8( is0l, vget_low_u8( cb ) );
uint16x8_t is1h = vaddw_u8( is0h, vget_high_u8( cb ) );
int16x8_t range = vdupq_n_s16( vrange2 );
uint16x8_t m0 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( is1l ), range ) );
uint16x8_t m1 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( is1h ), range ) );
uint8x8_t p00 = vmovn_u16( m0 );
uint8x8_t p01 = vmovn_u16( m1 );
uint8x16_t p0 = vcombine_u8( p00, p01 );
uint32x4_t p1 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 6 ), vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 12 ) );
uint32x4_t p2 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 18 ), vreinterpretq_u32_u8( p0 ) );
uint32x4_t p3 = vaddq_u32( p1, p2 );
uint16x4x2_t p4 = vuzp_u16( vget_low_u16( vreinterpretq_u16_u32( p3 ) ), vget_high_u16( vreinterpretq_u16_u32( p3 ) ) );
uint8x8x2_t p = vuzp_u8( vreinterpret_u8_u16( p4.val[0] ), vreinterpret_u8_u16( p4.val[0] ) );
uint32_t vp;
vst1_lane_u32( &vp, vreinterpret_u32_u8( p.val[0] ), 0 );
return uint64_t( ( uint64_t( to565( minr, ming, minb ) ) << 16 ) | to565( maxr, maxg, maxb ) | ( uint64_t( vp ) << 32 ) );
# else
uint32x4_t px0 = vld1q_u32( (uint32_t*)src );
uint32x4_t px1 = vld1q_u32( (uint32_t*)src + 4 );
uint32x4_t px2 = vld1q_u32( (uint32_t*)src + 8 );
uint32x4_t px3 = vld1q_u32( (uint32_t*)src + 12 );
uint32x4_t smask = vdupq_n_u32( 0xF8FCF8 );
uint32x4_t sd0 = vandq_u32( smask, px0 );
uint32x4_t sd1 = vandq_u32( smask, px1 );
uint32x4_t sd2 = vandq_u32( smask, px2 );
uint32x4_t sd3 = vandq_u32( smask, px3 );
uint32x4_t sc = vdupq_n_u32( sd0[0] );
uint32x4_t sc0 = vceqq_u32( sd0, sc );
uint32x4_t sc1 = vceqq_u32( sd1, sc );
uint32x4_t sc2 = vceqq_u32( sd2, sc );
uint32x4_t sc3 = vceqq_u32( sd3, sc );
uint32x4_t sm0 = vandq_u32( sc0, sc1 );
uint32x4_t sm1 = vandq_u32( sc2, sc3 );
int64x2_t sm = vreinterpretq_s64_u32( vandq_u32( sm0, sm1 ) );
if( sm[0] == -1 && sm[1] == -1 )
{
return uint64_t( to565( src[0], src[1], src[2] ) ) << 16;
}
uint32x4_t mask = vdupq_n_u32( 0xFFFFFF );
uint8x16_t l0 = vreinterpretq_u8_u32( vandq_u32( mask, px0 ) );
uint8x16_t l1 = vreinterpretq_u8_u32( vandq_u32( mask, px1 ) );
uint8x16_t l2 = vreinterpretq_u8_u32( vandq_u32( mask, px2 ) );
uint8x16_t l3 = vreinterpretq_u8_u32( vandq_u32( mask, px3 ) );
uint8x16_t min0 = vminq_u8( l0, l1 );
uint8x16_t min1 = vminq_u8( l2, l3 );
uint8x16_t min2 = vminq_u8( min0, min1 );
uint8x16_t max0 = vmaxq_u8( l0, l1 );
uint8x16_t max1 = vmaxq_u8( l2, l3 );
uint8x16_t max2 = vmaxq_u8( max0, max1 );
uint8x16_t min3 = vreinterpretq_u8_u32( vrev64q_u32( vreinterpretq_u32_u8( min2 ) ) );
uint8x16_t max3 = vreinterpretq_u8_u32( vrev64q_u32( vreinterpretq_u32_u8( max2 ) ) );
uint8x16_t min4 = vminq_u8( min2, min3 );
uint8x16_t max4 = vmaxq_u8( max2, max3 );
uint8x16_t min5 = vcombine_u8( vget_high_u8( min4 ), vget_low_u8( min4 ) );
uint8x16_t max5 = vcombine_u8( vget_high_u8( max4 ), vget_low_u8( max4 ) );
uint8x16_t rmin = vminq_u8( min4, min5 );
uint8x16_t rmax = vmaxq_u8( max4, max5 );
uint8x16_t range1 = vsubq_u8( rmax, rmin );
uint8x8_t range2 = vget_low_u8( range1 );
uint8x8x2_t range3 = vzip_u8( range2, vdup_n_u8( 0 ) );
uint16x4_t range4 = vreinterpret_u16_u8( range3.val[0] );
uint16_t vrange1;
uint16x4_t range5 = vpadd_u16( range4, range4 );
uint16x4_t range6 = vpadd_u16( range5, range5 );
vst1_lane_u16( &vrange1, range6, 0 );
uint32_t vrange2 = ( 2 << 16 ) / uint32_t( vrange1 + 1 );
uint16x8_t range = vdupq_n_u16( vrange2 );
uint8x16_t inset = vshrq_n_u8( range1, 4 );
uint8x16_t min = vaddq_u8( rmin, inset );
uint8x16_t max = vsubq_u8( rmax, inset );
uint8x16_t c0 = vsubq_u8( l0, rmin );
uint8x16_t c1 = vsubq_u8( l1, rmin );
uint8x16_t c2 = vsubq_u8( l2, rmin );
uint8x16_t c3 = vsubq_u8( l3, rmin );
uint16x8_t is0 = vpaddlq_u8( c0 );
uint16x8_t is1 = vpaddlq_u8( c1 );
uint16x8_t is2 = vpaddlq_u8( c2 );
uint16x8_t is3 = vpaddlq_u8( c3 );
uint16x4_t is4 = vpadd_u16( vget_low_u16( is0 ), vget_high_u16( is0 ) );
uint16x4_t is5 = vpadd_u16( vget_low_u16( is1 ), vget_high_u16( is1 ) );
uint16x4_t is6 = vpadd_u16( vget_low_u16( is2 ), vget_high_u16( is2 ) );
uint16x4_t is7 = vpadd_u16( vget_low_u16( is3 ), vget_high_u16( is3 ) );
uint16x8_t s0 = vcombine_u16( is4, is5 );
uint16x8_t s1 = vcombine_u16( is6, is7 );
uint16x8_t m0 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( s0 ), vreinterpretq_s16_u16( range ) ) );
uint16x8_t m1 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( s1 ), vreinterpretq_s16_u16( range ) ) );
uint8x8_t p00 = vmovn_u16( m0 );
uint8x8_t p01 = vmovn_u16( m1 );
uint8x16_t p0 = vcombine_u8( p00, p01 );
uint32x4_t p1 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 6 ), vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 12 ) );
uint32x4_t p2 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 18 ), vreinterpretq_u32_u8( p0 ) );
uint32x4_t p3 = vaddq_u32( p1, p2 );
uint16x4x2_t p4 = vuzp_u16( vget_low_u16( vreinterpretq_u16_u32( p3 ) ), vget_high_u16( vreinterpretq_u16_u32( p3 ) ) );
uint8x8x2_t p = vuzp_u8( vreinterpret_u8_u16( p4.val[0] ), vreinterpret_u8_u16( p4.val[0] ) );
uint32_t vmin, vmax, vp;
vst1q_lane_u32( &vmin, vreinterpretq_u32_u8( min ), 0 );
vst1q_lane_u32( &vmax, vreinterpretq_u32_u8( max ), 0 );
vst1_lane_u32( &vp, vreinterpret_u32_u8( p.val[0] ), 0 );
return uint64_t( ( uint64_t( to565( vmin ) ) << 16 ) | to565( vmax ) | ( uint64_t( vp ) << 32 ) );
# endif
#else
const auto ref = to565( src[0], src[1], src[2] );
auto stmp = src + 4;
for( int i=1; i<16; i++ )
{
if( to565( stmp[0], stmp[1], stmp[2] ) != ref )
{
break;
}
stmp += 4;
}
if( stmp == src + 64 )
{
return uint64_t( ref ) << 16;
}
uint8_t min[3] = { src[0], src[1], src[2] };
uint8_t max[3] = { src[0], src[1], src[2] };
auto tmp = src + 4;
for( int i=1; i<16; i++ )
{
for( int j=0; j<3; j++ )
{
if( tmp[j] < min[j] ) min[j] = tmp[j];
else if( tmp[j] > max[j] ) max[j] = tmp[j];
}
tmp += 4;
}
const uint32_t range = DivTable[max[0] - min[0] + max[1] - min[1] + max[2] - min[2]];
const uint32_t rmin = min[0] + min[1] + min[2];
for( int i=0; i<3; i++ )
{
const uint8_t inset = ( max[i] - min[i] ) >> 4;
min[i] += inset;
max[i] -= inset;
}
uint32_t data = 0;
for( int i=0; i<16; i++ )
{
const uint32_t c = src[0] + src[1] + src[2] - rmin;
const uint8_t idx = ( c * range ) >> 16;
data |= idx << (i*2);
src += 4;
}
return uint64_t( ( uint64_t( to565( min[0], min[1], min[2] ) ) << 16 ) | to565( max[0], max[1], max[2] ) | ( uint64_t( data ) << 32 ) );
#endif
}
#ifdef __AVX2__
static tracy_force_inline void ProcessRGB_AVX( const uint8_t* src, char*& dst )
{
__m256i px0 = _mm256_loadu_si256(((__m256i*)src) + 0);
__m256i px1 = _mm256_loadu_si256(((__m256i*)src) + 1);
__m256i px2 = _mm256_loadu_si256(((__m256i*)src) + 2);
__m256i px3 = _mm256_loadu_si256(((__m256i*)src) + 3);
__m256i min0 = _mm256_min_epu8( px0, px1 );
__m256i min1 = _mm256_min_epu8( px2, px3 );
__m256i min2 = _mm256_min_epu8( min0, min1 );
__m256i max0 = _mm256_max_epu8( px0, px1 );
__m256i max1 = _mm256_max_epu8( px2, px3 );
__m256i max2 = _mm256_max_epu8( max0, max1 );
__m256i min3 = _mm256_shuffle_epi32( min2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m256i max3 = _mm256_shuffle_epi32( max2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m256i min4 = _mm256_min_epu8( min2, min3 );
__m256i max4 = _mm256_max_epu8( max2, max3 );
__m256i min5 = _mm256_shuffle_epi32( min4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m256i max5 = _mm256_shuffle_epi32( max4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m256i rmin = _mm256_min_epu8( min4, min5 );
__m256i rmax = _mm256_max_epu8( max4, max5 );
__m256i range1 = _mm256_subs_epu8( rmax, rmin );
__m256i range2 = _mm256_sad_epu8( rmax, rmin );
uint16_t vrange0 = DivTableAVX[_mm256_cvtsi256_si32( range2 ) >> 1];
uint16_t vrange1 = DivTableAVX[_mm256_extract_epi16( range2, 8 ) >> 1];
__m256i range00 = _mm256_set1_epi16( vrange0 );
__m256i range = _mm256_inserti128_si256( range00, _mm_set1_epi16( vrange1 ), 1 );
__m256i inset1 = _mm256_srli_epi16( range1, 4 );
__m256i inset = _mm256_and_si256( inset1, _mm256_set1_epi8( 0xF ) );
__m256i min = _mm256_adds_epu8( rmin, inset );
__m256i max = _mm256_subs_epu8( rmax, inset );
__m256i c0 = _mm256_subs_epu8( px0, rmin );
__m256i c1 = _mm256_subs_epu8( px1, rmin );
__m256i c2 = _mm256_subs_epu8( px2, rmin );
__m256i c3 = _mm256_subs_epu8( px3, rmin );
__m256i is0 = _mm256_maddubs_epi16( c0, _mm256_set1_epi8( 1 ) );
__m256i is1 = _mm256_maddubs_epi16( c1, _mm256_set1_epi8( 1 ) );
__m256i is2 = _mm256_maddubs_epi16( c2, _mm256_set1_epi8( 1 ) );
__m256i is3 = _mm256_maddubs_epi16( c3, _mm256_set1_epi8( 1 ) );
__m256i s0 = _mm256_hadd_epi16( is0, is1 );
__m256i s1 = _mm256_hadd_epi16( is2, is3 );
__m256i m0 = _mm256_mulhi_epu16( s0, range );
__m256i m1 = _mm256_mulhi_epu16( s1, range );
__m256i p0 = _mm256_packus_epi16( m0, m1 );
__m256i p1 = _mm256_or_si256( _mm256_srai_epi32( p0, 6 ), _mm256_srai_epi32( p0, 12 ) );
__m256i p2 = _mm256_or_si256( _mm256_srai_epi32( p0, 18 ), p0 );
__m256i p3 = _mm256_or_si256( p1, p2 );
__m256i p =_mm256_shuffle_epi8( p3, _mm256_set1_epi32( 0x0C080400 ) );
__m256i mm0 = _mm256_unpacklo_epi8( _mm256_setzero_si256(), min );
__m256i mm1 = _mm256_unpacklo_epi8( _mm256_setzero_si256(), max );
__m256i mm2 = _mm256_unpacklo_epi64( mm1, mm0 );
__m256i mmr = _mm256_slli_epi64( _mm256_srli_epi64( mm2, 11 ), 11 );
__m256i mmg = _mm256_slli_epi64( _mm256_srli_epi64( mm2, 26 ), 5 );
__m256i mmb = _mm256_srli_epi64( _mm256_slli_epi64( mm2, 16 ), 59 );
__m256i mm3 = _mm256_or_si256( mmr, mmg );
__m256i mm4 = _mm256_or_si256( mm3, mmb );
__m256i mm5 = _mm256_shuffle_epi8( mm4, _mm256_set1_epi32( 0x09080100 ) );
__m256i d0 = _mm256_unpacklo_epi32( mm5, p );
__m256i d1 = _mm256_permute4x64_epi64( d0, _MM_SHUFFLE( 3, 2, 2, 0 ) );
_mm_storeu_si128( (__m128i*)dst, _mm256_castsi256_si128( d1 ) );
dst += 16;
}
#endif
void CompressImageDxt1( const char* src, char* dst, int w, int h )
{
assert( (w % 4) == 0 && (h % 4) == 0 );
#ifdef __AVX2__
if( w%8 == 0 )
{
uint32_t buf[8*4];
int i = 0;
auto blocks = w * h / 32;
do
{
auto tmp = (char*)buf;
memcpy( tmp, src, 8*4 );
memcpy( tmp + 8*4, src + w * 4, 8*4 );
memcpy( tmp + 16*4, src + w * 8, 8*4 );
memcpy( tmp + 24*4, src + w * 12, 8*4 );
src += 8*4;
if( ++i == w/8 )
{
src += w * 3 * 4;
i = 0;
}
ProcessRGB_AVX( (uint8_t*)buf, dst );
}
while( --blocks );
}
else
#endif
{
uint32_t buf[4*4];
int i = 0;
auto ptr = dst;
auto blocks = w * h / 16;
do
{
auto tmp = (char*)buf;
memcpy( tmp, src, 4*4 );
memcpy( tmp + 4*4, src + w * 4, 4*4 );
memcpy( tmp + 8*4, src + w * 8, 4*4 );
memcpy( tmp + 12*4, src + w * 12, 4*4 );
src += 4*4;
if( ++i == w/4 )
{
src += w * 3 * 4;
i = 0;
}
const auto c = ProcessRGB( (uint8_t*)buf );
memcpy( ptr, &c, sizeof( uint64_t ) );
ptr += sizeof( uint64_t );
}
while( --blocks );
}
}
}

11
client/TracyDxt1.hpp Normal file
View File

@@ -0,0 +1,11 @@
#ifndef __TRACYDXT1_HPP__
#define __TRACYDXT1_HPP__
namespace tracy
{
void CompressImageDxt1( const char* src, char* dst, int w, int h );
}
#endif

116
client/TracyFastVector.hpp Normal file
View File

@@ -0,0 +1,116 @@
#ifndef __TRACYFASTVECTOR_HPP__
#define __TRACYFASTVECTOR_HPP__
#include <stddef.h>
#include "../common/TracyAlloc.hpp"
#include "../common/TracyForceInline.hpp"
namespace tracy
{
template<typename T>
class FastVector
{
public:
using iterator = T*;
using const_iterator = const T*;
FastVector( size_t capacity )
: m_ptr( (T*)tracy_malloc( sizeof( T ) * capacity ) )
, m_write( m_ptr )
, m_end( m_ptr + capacity )
{
}
FastVector( const FastVector& ) = delete;
FastVector( FastVector&& ) = delete;
~FastVector()
{
tracy_free( m_ptr );
}
FastVector& operator=( const FastVector& ) = delete;
FastVector& operator=( FastVector&& ) = delete;
bool empty() const { return m_ptr == m_write; }
size_t size() const { return m_write - m_ptr; }
T* data() { return m_ptr; }
const T* data() const { return m_ptr; };
T* begin() { return m_ptr; }
const T* begin() const { return m_ptr; }
T* end() { return m_write; }
const T* end() const { return m_write; }
T& front() { assert( !empty() ); return m_ptr[0]; }
const T& front() const { assert( !empty() ); return m_ptr[0]; }
T& back() { assert( !empty() ); return m_write[-1]; }
const T& back() const { assert( !empty() ); return m_write[-1]; }
T& operator[]( size_t idx ) { return m_ptr[idx]; }
const T& operator[]( size_t idx ) const { return m_ptr[idx]; }
T* push_next()
{
if( m_write == m_end ) AllocMore();
return m_write++;
}
T* prepare_next()
{
if( m_write == m_end ) AllocMore();
return m_write;
}
void commit_next()
{
m_write++;
}
void clear()
{
m_write = m_ptr;
}
void swap( FastVector& vec )
{
const auto ptr1 = m_ptr;
const auto ptr2 = vec.m_ptr;
const auto write1 = m_write;
const auto write2 = vec.m_write;
const auto end1 = m_end;
const auto end2 = vec.m_end;
m_ptr = ptr2;
vec.m_ptr = ptr1;
m_write = write2;
vec.m_write = write1;
m_end = end2;
vec.m_end = end1;
}
private:
tracy_no_inline void AllocMore()
{
const auto cap = ( m_end - m_ptr ) * 2;
const auto size = m_write - m_ptr;
T* ptr = (T*)tracy_malloc( sizeof( T ) * cap );
memcpy( ptr, m_ptr, size * sizeof( T ) );
tracy_free( m_ptr );
m_ptr = ptr;
m_write = m_ptr + size;
m_end = m_ptr + cap;
}
T* m_ptr;
T* m_write;
T* m_end;
};
}
#endif

View File

@@ -5,31 +5,178 @@
#include <limits>
#include "../common/TracySystem.hpp"
#include "../common/TracyAlign.hpp"
#include "TracyProfiler.hpp"
namespace tracy
{
extern std::atomic<uint32_t> s_lockCounter;
class LockableCtx
{
public:
tracy_force_inline LockableCtx( const SourceLocationData* srcloc )
: m_id( GetLockCounter().fetch_add( 1, std::memory_order_relaxed ) )
#ifdef TRACY_ON_DEMAND
, m_lockCount( 0 )
, m_active( false )
#endif
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::LockAnnounce );
MemWrite( &item->lockAnnounce.id, m_id );
MemWrite( &item->lockAnnounce.time, Profiler::GetTime() );
MemWrite( &item->lockAnnounce.lckloc, (uint64_t)srcloc );
MemWrite( &item->lockAnnounce.type, LockType::Lockable );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
LockableCtx( const LockableCtx& ) = delete;
LockableCtx& operator=( const LockableCtx& ) = delete;
tracy_force_inline ~LockableCtx()
{
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::LockTerminate );
MemWrite( &item->lockTerminate.id, m_id );
MemWrite( &item->lockTerminate.time, Profiler::GetTime() );
MemWrite( &item->lockTerminate.type, LockType::Lockable );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline bool BeforeLock()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return false;
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockWait );
MemWrite( &item->lockWait.thread, GetThreadHandle() );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::Lockable );
Profiler::QueueSerialFinish();
return true;
}
tracy_force_inline void AfterLock()
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterUnlock()
{
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !GetProfiler().IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockRelease );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterTryLock( bool acquired )
{
#ifdef TRACY_ON_DEMAND
if( !acquired ) return;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return;
#endif
if( acquired )
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
}
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
#ifdef TRACY_ON_DEMAND
const auto active = m_active.load( std::memory_order_relaxed );
if( !active ) return;
const auto connected = GetProfiler().IsConnected();
if( !connected )
{
if( active ) m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockMark );
MemWrite( &item->lockMark.thread, GetThreadHandle() );
MemWrite( &item->lockMark.id, m_id );
MemWrite( &item->lockMark.srcloc, (uint64_t)srcloc );
Profiler::QueueSerialFinish();
}
private:
uint32_t m_id;
#ifdef TRACY_ON_DEMAND
std::atomic<uint32_t> m_lockCount;
std::atomic<bool> m_active;
#endif
};
template<class T>
class Lockable
{
public:
tracy_force_inline Lockable( const SourceLocation* srcloc )
: m_id( s_lockCounter.fetch_add( 1, std::memory_order_relaxed ) )
tracy_force_inline Lockable( const SourceLocationData* srcloc )
: m_ctx( srcloc )
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockAnnounce;
item->lockAnnounce.id = m_id;
item->lockAnnounce.lckloc = (uint64_t)srcloc;
item->lockAnnounce.type = LockType::Lockable;
tail.store( magic + 1, std::memory_order_release );
}
Lockable( const Lockable& ) = delete;
@@ -37,105 +184,288 @@ public:
tracy_force_inline void lock()
{
const auto thread = GetThreadHandle();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockWait;
item->lockWait.id = m_id;
item->lockWait.thread = thread;
item->lockWait.time = Profiler::GetTime();
item->lockWait.type = LockType::Lockable;
tail.store( magic + 1, std::memory_order_release );
}
const auto runAfter = m_ctx.BeforeLock();
m_lockable.lock();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockObtain;
item->lockObtain.id = m_id;
item->lockObtain.thread = thread;
item->lockObtain.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
}
if( runAfter ) m_ctx.AfterLock();
}
tracy_force_inline void unlock()
{
m_lockable.unlock();
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockRelease;
item->lockRelease.id = m_id;
item->lockRelease.thread = GetThreadHandle();
item->lockRelease.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
m_ctx.AfterUnlock();
}
tracy_force_inline bool try_lock()
{
const auto ret = m_lockable.try_lock();
if( ret )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockObtain;
item->lockObtain.id = (uint64_t)&m_lockable;
item->lockObtain.thread = GetThreadHandle();
item->lockObtain.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
}
return ret;
const auto acquired = m_lockable.try_lock();
m_ctx.AfterTryLock( acquired );
return acquired;
}
tracy_force_inline void Mark( const SourceLocation* srcloc ) const
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockMark;
item->lockMark.id = m_id;
item->lockMark.thread = GetThreadHandle();
item->lockMark.srcloc = (uint64_t)srcloc;
tail.store( magic + 1, std::memory_order_release );
m_ctx.Mark( srcloc );
}
private:
T m_lockable;
uint32_t m_id;
LockableCtx m_ctx;
};
class SharedLockableCtx
{
public:
tracy_force_inline SharedLockableCtx( const SourceLocationData* srcloc )
: m_id( GetLockCounter().fetch_add( 1, std::memory_order_relaxed ) )
#ifdef TRACY_ON_DEMAND
, m_lockCount( 0 )
, m_active( false )
#endif
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::LockAnnounce );
MemWrite( &item->lockAnnounce.id, m_id );
MemWrite( &item->lockAnnounce.time, Profiler::GetTime() );
MemWrite( &item->lockAnnounce.lckloc, (uint64_t)srcloc );
MemWrite( &item->lockAnnounce.type, LockType::SharedLockable );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
SharedLockableCtx( const SharedLockableCtx& ) = delete;
SharedLockableCtx& operator=( const SharedLockableCtx& ) = delete;
tracy_force_inline ~SharedLockableCtx()
{
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::LockTerminate );
MemWrite( &item->lockTerminate.id, m_id );
MemWrite( &item->lockTerminate.time, Profiler::GetTime() );
MemWrite( &item->lockTerminate.type, LockType::SharedLockable );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline bool BeforeLock()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return false;
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockWait );
MemWrite( &item->lockWait.thread, GetThreadHandle() );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::SharedLockable );
Profiler::QueueSerialFinish();
return true;
}
tracy_force_inline void AfterLock()
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterUnlock()
{
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !GetProfiler().IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockRelease );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterTryLock( bool acquired )
{
#ifdef TRACY_ON_DEMAND
if( !acquired ) return;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return;
#endif
if( acquired )
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
}
tracy_force_inline bool BeforeLockShared()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return false;
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedWait );
MemWrite( &item->lockWait.thread, GetThreadHandle() );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::SharedLockable );
Profiler::QueueSerialFinish();
return true;
}
tracy_force_inline void AfterLockShared()
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterUnlockShared()
{
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !GetProfiler().IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedRelease );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterTryLockShared( bool acquired )
{
#ifdef TRACY_ON_DEMAND
if( !acquired ) return;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return;
#endif
if( acquired )
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
}
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
#ifdef TRACY_ON_DEMAND
const auto active = m_active.load( std::memory_order_relaxed );
if( !active ) return;
const auto connected = GetProfiler().IsConnected();
if( !connected )
{
if( active ) m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockMark );
MemWrite( &item->lockMark.thread, GetThreadHandle() );
MemWrite( &item->lockMark.id, m_id );
MemWrite( &item->lockMark.srcloc, (uint64_t)srcloc );
Profiler::QueueSerialFinish();
}
private:
uint32_t m_id;
#ifdef TRACY_ON_DEMAND
std::atomic<uint32_t> m_lockCount;
std::atomic<bool> m_active;
#endif
};
template<class T>
class SharedLockable
{
public:
tracy_force_inline SharedLockable( const SourceLocation* srcloc )
: m_id( s_lockCounter.fetch_add( 1, std::memory_order_relaxed ) )
tracy_force_inline SharedLockable( const SourceLocationData* srcloc )
: m_ctx( srcloc )
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockAnnounce;
item->lockAnnounce.id = m_id;
item->lockAnnounce.lckloc = (uint64_t)srcloc;
item->lockAnnounce.type = LockType::SharedLockable;
tail.store( magic + 1, std::memory_order_release );
}
SharedLockable( const SharedLockable& ) = delete;
@@ -143,148 +473,52 @@ public:
tracy_force_inline void lock()
{
const auto thread = GetThreadHandle();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockWait;
item->lockWait.id = m_id;
item->lockWait.thread = thread;
item->lockWait.time = Profiler::GetTime();
item->lockWait.type = LockType::SharedLockable;
tail.store( magic + 1, std::memory_order_release );
}
const auto runAfter = m_ctx.BeforeLock();
m_lockable.lock();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockObtain;
item->lockObtain.id = m_id;
item->lockObtain.thread = thread;
item->lockObtain.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
}
if( runAfter ) m_ctx.AfterLock();
}
tracy_force_inline void unlock()
{
m_lockable.unlock();
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockRelease;
item->lockRelease.id = m_id;
item->lockRelease.thread = GetThreadHandle();
item->lockRelease.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
m_ctx.AfterUnlock();
}
tracy_force_inline bool try_lock()
{
const auto ret = m_lockable.try_lock();
if( ret )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockObtain;
item->lockObtain.id = (uint64_t)&m_lockable;
item->lockObtain.thread = GetThreadHandle();
item->lockObtain.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
}
return ret;
const auto acquired = m_lockable.try_lock();
m_ctx.AfterTryLock( acquired );
return acquired;
}
tracy_force_inline void lock_shared()
{
const auto thread = GetThreadHandle();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockSharedWait;
item->lockWait.id = m_id;
item->lockWait.thread = thread;
item->lockWait.time = Profiler::GetTime();
item->lockWait.type = LockType::SharedLockable;
tail.store( magic + 1, std::memory_order_release );
}
const auto runAfter = m_ctx.BeforeLockShared();
m_lockable.lock_shared();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockSharedObtain;
item->lockObtain.id = m_id;
item->lockObtain.thread = thread;
item->lockObtain.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
}
if( runAfter ) m_ctx.AfterLockShared();
}
tracy_force_inline void unlock_shared()
{
m_lockable.unlock_shared();
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockSharedRelease;
item->lockRelease.id = m_id;
item->lockRelease.thread = GetThreadHandle();
item->lockRelease.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
m_ctx.AfterUnlockShared();
}
tracy_force_inline bool try_lock_shared()
{
const auto ret = m_lockable.try_lock_shared();
if( ret )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockSharedObtain;
item->lockObtain.id = (uint64_t)&m_lockable;
item->lockObtain.thread = GetThreadHandle();
item->lockObtain.time = Profiler::GetTime();
tail.store( magic + 1, std::memory_order_release );
}
return ret;
const auto acquired = m_lockable.try_lock_shared();
m_ctx.AfterTryLockShared( acquired );
return acquired;
}
tracy_force_inline void Mark( const SourceLocation* srcloc ) const
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::LockMark;
item->lockMark.id = m_id;
item->lockMark.thread = GetThreadHandle();
item->lockMark.srcloc = (uint64_t)srcloc;
tail.store( magic + 1, std::memory_order_release );
m_ctx.Mark( srcloc );
}
private:
T m_lockable;
uint32_t m_id;
SharedLockableCtx m_ctx;
};

File diff suppressed because it is too large Load Diff

View File

@@ -1,31 +1,66 @@
#ifndef __TRACYPROFILER_HPP__
#define __TRACYPROFILER_HPP__
#include <assert.h>
#include <atomic>
#include <chrono>
#include <stdint.h>
#include <string.h>
#include "concurrentqueue.h"
#include "../common/tracy_lz4.hpp"
#include "tracy_concurrentqueue.h"
#include "TracyCallstack.hpp"
#include "TracySysTime.hpp"
#include "TracyFastVector.hpp"
#include "../common/TracyQueue.hpp"
#include "../common/TracyAlign.hpp"
#include "../common/TracyAlloc.hpp"
#include "../common/TracySystem.hpp"
#include "../common/TracyMutex.hpp"
#if defined _MSC_VER || defined __CYGWIN__
#if defined _WIN32 || defined __CYGWIN__
# include <intrin.h>
#endif
#ifdef __APPLE__
# include <TargetConditionals.h>
# include <mach/mach_time.h>
#endif
#if defined _MSC_VER || defined __CYGWIN__ || ( ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 ) && !defined __ANDROID__ )
# define TRACY_RDTSCP_SUPPORTED
#if defined _WIN32 || defined __CYGWIN__ || ( ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 ) && !defined __ANDROID__ ) || __ARM_ARCH >= 6
# define TRACY_HW_TIMER
#endif
#if !defined TRACY_HW_TIMER || ( __ARM_ARCH >= 6 && !defined CLOCK_MONOTONIC_RAW )
#include <chrono>
#endif
#ifndef TracyConcat
# define TracyConcat(x,y) TracyConcatIndirect(x,y)
#endif
#ifndef TracyConcatIndirect
# define TracyConcatIndirect(x,y) x##y
#endif
namespace tracy
{
class GpuCtx;
class Profiler;
class Socket;
class UdpBroadcast;
struct SourceLocation
struct GpuCtxWrapper
{
GpuCtx* ptr;
};
TRACY_API moodycamel::ConcurrentQueue<QueueItem>::ExplicitProducer* GetToken();
TRACY_API Profiler& GetProfiler();
TRACY_API std::atomic<uint32_t>& GetLockCounter();
TRACY_API std::atomic<uint8_t>& GetGpuCtxCounter();
TRACY_API GpuCtxWrapper& GetGpuCtx();
TRACY_API uint64_t GetThreadHandle();
TRACY_API void InitRPMallocThread();
struct SourceLocationData
{
const char* name;
const char* function;
@@ -34,180 +69,491 @@ struct SourceLocation
uint32_t color;
};
struct ProducerWrapper
#ifdef TRACY_ON_DEMAND
struct LuaZoneState
{
moodycamel::ConcurrentQueue<QueueItem>::ExplicitProducer* ptr;
};
extern thread_local ProducerWrapper s_token;
class GpuCtx;
struct GpuCtxWrapper
{
GpuCtx* ptr;
uint32_t counter;
bool active;
};
#endif
using Magic = moodycamel::ConcurrentQueueDefaultTraits::index_t;
class Profiler
{
struct FrameImageQueueItem
{
void* image;
uint64_t frame;
uint16_t w;
uint16_t h;
uint8_t offset;
bool flip;
};
public:
Profiler();
~Profiler();
#ifdef TRACY_RDTSCP_SUPPORTED
static tracy_force_inline int64_t tracy_rdtscp( uint32_t& cpu )
{
#if defined _MSC_VER || defined __CYGWIN__
const auto t = int64_t( __rdtscp( &cpu ) );
return t;
#elif defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64
uint32_t eax, edx;
asm volatile ( "rdtscp" : "=a" (eax), "=d" (edx), "=c" (cpu) :: );
return ( uint64_t( edx ) << 32 ) + uint64_t( eax );
#endif
}
#endif
#ifdef TRACY_RDTSCP_SUPPORTED
static tracy_force_inline int64_t tracy_rdtscp()
{
#if defined _MSC_VER || defined __CYGWIN__
static unsigned int dontcare;
const auto t = int64_t( __rdtscp( &dontcare ) );
return t;
#elif defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64
uint32_t eax, edx;
asm volatile ( "rdtscp" : "=a" (eax), "=d" (edx) :: "%ecx" );
return ( uint64_t( edx ) << 32 ) + uint64_t( eax );
#endif
}
#endif
static tracy_force_inline int64_t GetTime( uint32_t& cpu )
{
#ifdef TRACY_RDTSCP_SUPPORTED
return tracy_rdtscp( cpu );
#else
cpu = 0xFFFFFFFF;
return std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now().time_since_epoch() ).count();
#endif
}
static tracy_force_inline int64_t GetTime()
{
#ifdef TRACY_RDTSCP_SUPPORTED
return tracy_rdtscp();
#ifdef TRACY_HW_TIMER
# if TARGET_OS_IOS == 1
return mach_absolute_time();
# elif __ARM_ARCH >= 6
# ifdef CLOCK_MONOTONIC_RAW
struct timespec ts;
clock_gettime( CLOCK_MONOTONIC_RAW, &ts );
return int64_t( ts.tv_sec ) * 1000000000ll + int64_t( ts.tv_nsec );
# else
return std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now().time_since_epoch() ).count();
# endif
# elif defined _WIN32 || defined __CYGWIN__
return int64_t( __rdtsc() );
# elif defined __i386 || defined _M_IX86
uint32_t eax, edx;
asm volatile ( "rdtsc" : "=a" (eax), "=d" (edx) );
return ( uint64_t( edx ) << 32 ) + uint64_t( eax );
# elif defined __x86_64__ || defined _M_X64
uint64_t rax, rdx;
asm volatile ( "rdtsc" : "=a" (rax), "=d" (rdx) );
return ( rdx << 32 ) + rax;
# endif
#else
return std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now().time_since_epoch() ).count();
#endif
}
static tracy_force_inline void FrameMark()
tracy_force_inline uint32_t GetNextZoneId()
{
return m_zoneId.fetch_add( 1, std::memory_order_relaxed );
}
static tracy_force_inline QueueItem* QueueSerial()
{
auto& p = GetProfiler();
p.m_serialLock.lock();
return p.m_serialQueue.prepare_next();
}
static tracy_force_inline void QueueSerialFinish()
{
auto& p = GetProfiler();
p.m_serialQueue.commit_next();
p.m_serialLock.unlock();
}
static tracy_force_inline void SendFrameMark( const char* name )
{
if( !name ) GetProfiler().m_frameCount.fetch_add( 1, std::memory_order_relaxed );
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::FrameMarkMsg;
item->frameMark.time = GetTime();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::FrameMarkMsg );
MemWrite( &item->frameMark.time, GetTime() );
MemWrite( &item->frameMark.name, uint64_t( name ) );
tail.store( magic + 1, std::memory_order_release );
}
static tracy_force_inline void SendFrameMark( const char* name, QueueType type )
{
assert( type == QueueType::FrameMarkMsgStart || type == QueueType::FrameMarkMsgEnd );
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
auto item = QueueSerial();
MemWrite( &item->hdr.type, type );
MemWrite( &item->frameMark.time, GetTime() );
MemWrite( &item->frameMark.name, uint64_t( name ) );
QueueSerialFinish();
}
static tracy_force_inline void SendFrameImage( const void* image, uint16_t w, uint16_t h, uint8_t offset, bool flip )
{
auto& profiler = GetProfiler();
#ifdef TRACY_ON_DEMAND
if( !profiler.IsConnected() ) return;
#endif
const auto sz = size_t( w ) * size_t( h ) * 4;
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, image, sz );
profiler.m_fiLock.lock();
auto fi = profiler.m_fiQueue.prepare_next();
fi->image = ptr;
fi->frame = profiler.m_frameCount.load( std::memory_order_relaxed ) - offset;
fi->w = w;
fi->h = h;
fi->flip = flip;
profiler.m_fiQueue.commit_next();
profiler.m_fiLock.unlock();
}
static tracy_force_inline void PlotData( const char* name, int64_t val )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::PlotData;
item->plotData.name = (uint64_t)name;
item->plotData.time = GetTime();
item->plotData.type = PlotDataType::Int;
item->plotData.data.i = val;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::PlotData );
MemWrite( &item->plotData.name, (uint64_t)name );
MemWrite( &item->plotData.time, GetTime() );
MemWrite( &item->plotData.type, PlotDataType::Int );
MemWrite( &item->plotData.data.i, val );
tail.store( magic + 1, std::memory_order_release );
}
static tracy_force_inline void PlotData( const char* name, float val )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::PlotData;
item->plotData.name = (uint64_t)name;
item->plotData.time = GetTime();
item->plotData.type = PlotDataType::Float;
item->plotData.data.f = val;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::PlotData );
MemWrite( &item->plotData.name, (uint64_t)name );
MemWrite( &item->plotData.time, GetTime() );
MemWrite( &item->plotData.type, PlotDataType::Float );
MemWrite( &item->plotData.data.f, val );
tail.store( magic + 1, std::memory_order_release );
}
static tracy_force_inline void PlotData( const char* name, double val )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::PlotData;
item->plotData.name = (uint64_t)name;
item->plotData.time = GetTime();
item->plotData.type = PlotDataType::Double;
item->plotData.data.d = val;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::PlotData );
MemWrite( &item->plotData.name, (uint64_t)name );
MemWrite( &item->plotData.time, GetTime() );
MemWrite( &item->plotData.type, PlotDataType::Double );
MemWrite( &item->plotData.data.d, val );
tail.store( magic + 1, std::memory_order_release );
}
static tracy_force_inline void Message( const char* txt, size_t size )
static tracy_force_inline void ConfigurePlot( const char* name, PlotFormatType type )
{
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::PlotConfig );
MemWrite( &item->plotConfig.name, (uint64_t)name );
MemWrite( &item->plotConfig.type, (uint8_t)type );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
static tracy_force_inline void Message( const char* txt, size_t size, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::Message;
item->message.time = GetTime();
item->message.thread = GetThreadHandle();
item->message.text = (uint64_t)ptr;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, callstack == 0 ? QueueType::Message : QueueType::MessageCallstack );
MemWrite( &item->message.time, GetTime() );
MemWrite( &item->message.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void Message( const char* txt, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, callstack == 0 ? QueueType::MessageLiteral : QueueType::MessageLiteralCallstack );
MemWrite( &item->message.time, GetTime() );
MemWrite( &item->message.text, (uint64_t)txt );
tail.store( magic + 1, std::memory_order_release );
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void MessageColor( const char* txt, size_t size, uint32_t color, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, callstack == 0 ? QueueType::MessageColor : QueueType::MessageColorCallstack );
MemWrite( &item->messageColor.time, GetTime() );
MemWrite( &item->messageColor.text, (uint64_t)ptr );
MemWrite( &item->messageColor.r, uint8_t( ( color ) & 0xFF ) );
MemWrite( &item->messageColor.g, uint8_t( ( color >> 8 ) & 0xFF ) );
MemWrite( &item->messageColor.b, uint8_t( ( color >> 16 ) & 0xFF ) );
tail.store( magic + 1, std::memory_order_release );
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void MessageColor( const char* txt, uint32_t color, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, callstack == 0 ? QueueType::MessageLiteralColor : QueueType::MessageLiteralColorCallstack );
MemWrite( &item->messageColor.time, GetTime() );
MemWrite( &item->messageColor.text, (uint64_t)txt );
MemWrite( &item->messageColor.r, uint8_t( ( color ) & 0xFF ) );
MemWrite( &item->messageColor.g, uint8_t( ( color >> 8 ) & 0xFF ) );
MemWrite( &item->messageColor.b, uint8_t( ( color >> 16 ) & 0xFF ) );
tail.store( magic + 1, std::memory_order_release );
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void MessageAppInfo( const char* txt, size_t size )
{
Magic magic;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::MessageAppInfo );
MemWrite( &item->message.time, GetTime() );
MemWrite( &item->message.text, (uint64_t)ptr );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
static tracy_force_inline void Message( const char* txt )
static tracy_force_inline void MemAlloc( const void* ptr, size_t size )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::MessageLiteral;
item->message.time = GetTime();
item->message.thread = GetThreadHandle();
item->message.text = (uint64_t)txt;
tail.store( magic + 1, std::memory_order_release );
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
const auto thread = GetThreadHandle();
GetProfiler().m_serialLock.lock();
SendMemAlloc( QueueType::MemAlloc, thread, ptr, size );
GetProfiler().m_serialLock.unlock();
}
static tracy_force_inline void MemFree( const void* ptr )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
const auto thread = GetThreadHandle();
GetProfiler().m_serialLock.lock();
SendMemFree( QueueType::MemFree, thread, ptr );
GetProfiler().m_serialLock.unlock();
}
static tracy_force_inline void MemAllocCallstack( const void* ptr, size_t size, int depth )
{
#ifdef TRACY_HAS_CALLSTACK
auto& profiler = GetProfiler();
# ifdef TRACY_ON_DEMAND
if( !profiler.IsConnected() ) return;
# endif
const auto thread = GetThreadHandle();
rpmalloc_thread_initialize();
auto callstack = Callstack( depth );
profiler.m_serialLock.lock();
SendMemAlloc( QueueType::MemAllocCallstack, thread, ptr, size );
SendCallstackMemory( callstack );
profiler.m_serialLock.unlock();
#else
MemAlloc( ptr, size );
#endif
}
static tracy_force_inline void MemFreeCallstack( const void* ptr, int depth )
{
#ifdef TRACY_HAS_CALLSTACK
auto& profiler = GetProfiler();
# ifdef TRACY_ON_DEMAND
if( !profiler.IsConnected() ) return;
# endif
const auto thread = GetThreadHandle();
rpmalloc_thread_initialize();
auto callstack = Callstack( depth );
profiler.m_serialLock.lock();
SendMemFree( QueueType::MemFreeCallstack, thread, ptr );
SendCallstackMemory( callstack );
profiler.m_serialLock.unlock();
#else
MemFree( ptr );
#endif
}
static tracy_force_inline void SendCallstack( int depth )
{
#ifdef TRACY_HAS_CALLSTACK
auto ptr = Callstack( depth );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::Callstack );
MemWrite( &item->callstack.ptr, ptr );
tail.store( magic + 1, std::memory_order_release );
#endif
}
void SendCallstack( int depth, const char* skipBefore );
static void CutCallstack( void* callstack, const char* skipBefore );
static bool ShouldExit();
#ifdef TRACY_ON_DEMAND
tracy_force_inline bool IsConnected() const
{
return m_isConnected.load( std::memory_order_acquire );
}
tracy_force_inline uint64_t ConnectionId() const
{
return m_connectionId.load( std::memory_order_acquire );
}
tracy_force_inline void DeferItem( const QueueItem& item )
{
m_deferredLock.lock();
auto dst = m_deferredQueue.push_next();
memcpy( dst, &item, sizeof( item ) );
m_deferredLock.unlock();
}
#endif
void RequestShutdown() { m_shutdown.store( true, std::memory_order_relaxed ); m_shutdownManual.store( true, std::memory_order_relaxed ); }
bool HasShutdownFinished() const { return m_shutdownFinished.load( std::memory_order_relaxed ); }
void SendString( uint64_t ptr, const char* str, QueueType type );
private:
enum DequeueStatus { Success, ConnectionLost, QueueEmpty };
enum class DequeueStatus { DataDequeued, ConnectionLost, QueueEmpty };
static void LaunchWorker( void* ptr ) { ((Profiler*)ptr)->Worker(); }
void Worker();
DequeueStatus Dequeue( moodycamel::ConsumerToken& token );
static void LaunchCompressWorker( void* ptr ) { ((Profiler*)ptr)->CompressWorker(); }
void CompressWorker();
void ClearQueues( tracy::moodycamel::ConsumerToken& token );
void ClearSerial();
DequeueStatus Dequeue( tracy::moodycamel::ConsumerToken& token );
DequeueStatus DequeueContextSwitches( tracy::moodycamel::ConsumerToken& token, int64_t& timeStop );
DequeueStatus DequeueSerial();
bool AppendData( const void* data, size_t len );
bool CommitData();
bool NeedDataSize( size_t len );
tracy_force_inline void AppendDataUnsafe( const void* data, size_t len )
{
memcpy( m_buffer + m_bufferOffset, data, len );
m_bufferOffset += int( len );
}
bool SendData( const char* data, size_t len );
bool SendString( uint64_t ptr, const char* str, QueueType type );
void SendLongString( uint64_t ptr, const char* str, size_t len, QueueType type );
void SendSourceLocation( uint64_t ptr );
bool SendSourceLocationPayload( uint64_t ptr );
void SendSourceLocationPayload( uint64_t ptr );
void SendCallstackPayload( uint64_t ptr );
void SendCallstackAlloc( uint64_t ptr );
void SendCallstackFrame( uint64_t ptr );
bool HandleServerQuery();
void HandleDisconnect();
void CalibrateTimer();
void CalibrateDelay();
static tracy_force_inline void SendCallstackMemory( void* ptr )
{
#ifdef TRACY_HAS_CALLSTACK
auto item = GetProfiler().m_serialQueue.prepare_next();
MemWrite( &item->hdr.type, QueueType::CallstackMemory );
MemWrite( &item->callstackMemory.ptr, (uint64_t)ptr );
GetProfiler().m_serialQueue.commit_next();
#endif
}
static tracy_force_inline void SendMemAlloc( QueueType type, const uint64_t thread, const void* ptr, size_t size )
{
assert( type == QueueType::MemAlloc || type == QueueType::MemAllocCallstack );
auto item = GetProfiler().m_serialQueue.prepare_next();
MemWrite( &item->hdr.type, type );
MemWrite( &item->memAlloc.time, GetTime() );
MemWrite( &item->memAlloc.thread, thread );
MemWrite( &item->memAlloc.ptr, (uint64_t)ptr );
if( compile_time_condition<sizeof( size ) == 4>::value )
{
memcpy( &item->memAlloc.size, &size, 4 );
memset( &item->memAlloc.size + 4, 0, 2 );
}
else
{
assert( sizeof( size ) == 8 );
memcpy( &item->memAlloc.size, &size, 6 );
}
GetProfiler().m_serialQueue.commit_next();
}
static tracy_force_inline void SendMemFree( QueueType type, const uint64_t thread, const void* ptr )
{
assert( type == QueueType::MemFree || type == QueueType::MemFreeCallstack );
auto item = GetProfiler().m_serialQueue.prepare_next();
MemWrite( &item->hdr.type, type );
MemWrite( &item->memFree.time, GetTime() );
MemWrite( &item->memFree.thread, thread );
MemWrite( &item->memFree.ptr, (uint64_t)ptr );
GetProfiler().m_serialQueue.commit_next();
}
double m_timerMul;
uint64_t m_resolution;
uint64_t m_delay;
@@ -215,15 +561,50 @@ private:
uint64_t m_mainThread;
uint64_t m_epoch;
std::atomic<bool> m_shutdown;
std::atomic<bool> m_shutdownManual;
std::atomic<bool> m_shutdownFinished;
Socket* m_sock;
UdpBroadcast* m_broadcast;
bool m_noExit;
std::atomic<uint32_t> m_zoneId;
LZ4_stream_t* m_stream;
uint64_t m_threadCtx;
int64_t m_refTimeThread;
int64_t m_refTimeSerial;
int64_t m_refTimeCtx;
int64_t m_refTimeGpu;
void* m_stream; // LZ4_stream_t*
char* m_buffer;
int m_bufferOffset;
int m_bufferStart;
QueueItem* m_itemBuf;
char* m_lz4Buf;
FastVector<QueueItem> m_serialQueue, m_serialDequeue;
TracyMutex m_serialLock;
FastVector<FrameImageQueueItem> m_fiQueue, m_fiDequeue;
TracyMutex m_fiLock;
std::atomic<uint64_t> m_frameCount;
#ifdef TRACY_ON_DEMAND
std::atomic<bool> m_isConnected;
std::atomic<uint64_t> m_connectionId;
TracyMutex m_deferredLock;
FastVector<QueueItem> m_deferredQueue;
#endif
#ifdef TRACY_HAS_SYSTIME
void ProcessSysTime();
SysTime m_sysTime;
uint64_t m_sysTimeLast = 0;
#else
void ProcessSysTime() {}
#endif
};
};

View File

@@ -5,6 +5,7 @@
#include <string.h>
#include "../common/TracySystem.hpp"
#include "../common/TracyAlign.hpp"
#include "../common/TracyAlloc.hpp"
#include "TracyProfiler.hpp"
@@ -14,50 +15,103 @@ namespace tracy
class ScopedZone
{
public:
tracy_force_inline ScopedZone( const SourceLocation* srcloc )
tracy_force_inline ScopedZone( const SourceLocationData* srcloc, bool is_active = true )
#ifdef TRACY_ON_DEMAND
: m_active( is_active && GetProfiler().IsConnected() )
, m_connectionId( GetProfiler().ConnectionId() )
#else
: m_active( is_active )
#endif
{
const auto thread = GetThreadHandle();
m_thread = thread;
if( !m_active ) return;
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneBegin;
item->zoneBegin.time = Profiler::GetTime( item->zoneBegin.cpu );
item->zoneBegin.thread = thread;
item->zoneBegin.srcloc = (uint64_t)srcloc;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBegin );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)srcloc );
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline ScopedZone( const SourceLocationData* srcloc, int depth, bool is_active = true )
#ifdef TRACY_ON_DEMAND
: m_active( is_active && GetProfiler().IsConnected() )
, m_connectionId( GetProfiler().ConnectionId() )
#else
: m_active( is_active )
#endif
{
if( !m_active ) return;
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginCallstack );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)srcloc );
tail.store( magic + 1, std::memory_order_release );
GetProfiler().SendCallstack( depth );
}
tracy_force_inline ~ScopedZone()
{
if( !m_active ) return;
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneEnd;
item->zoneEnd.time = Profiler::GetTime( item->zoneEnd.cpu );
item->zoneEnd.thread = m_thread;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneEnd );
MemWrite( &item->zoneEnd.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline void Text( const char* txt, size_t size )
{
if( !m_active ) return;
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<moodycamel::CanAlloc>( magic );
item->hdr.type = QueueType::ZoneText;
item->zoneText.thread = m_thread;
item->zoneText.text = (uint64_t)ptr;
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneText );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline void Name( const char* txt, size_t size )
{
if( !m_active ) return;
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
Magic magic;
auto token = GetToken();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ZoneName );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
}
private:
uint64_t m_thread;
const bool m_active;
#ifdef TRACY_ON_DEMAND
uint64_t m_connectionId;
#endif
};
}

91
client/TracySysTime.cpp Normal file
View File

@@ -0,0 +1,91 @@
#include "TracySysTime.hpp"
#ifdef TRACY_HAS_SYSTIME
# if defined _WIN32 || defined __CYGWIN__
# include <windows.h>
# elif defined __linux__
# include <stdio.h>
# include <inttypes.h>
# elif defined __APPLE__
# include <mach/mach_host.h>
# include <mach/host_info.h>
# endif
namespace tracy
{
# if defined _WIN32 || defined __CYGWIN__
static inline uint64_t ConvertTime( const FILETIME& t )
{
return ( uint64_t( t.dwHighDateTime ) << 32 ) | uint64_t( t.dwLowDateTime );
}
void SysTime::ReadTimes()
{
FILETIME idleTime;
FILETIME kernelTime;
FILETIME userTime;
GetSystemTimes( &idleTime, &kernelTime, &userTime );
idle = ConvertTime( idleTime );
const auto kernel = ConvertTime( kernelTime );
const auto user = ConvertTime( userTime );
used = kernel + user;
}
# elif defined __linux__
void SysTime::ReadTimes()
{
uint64_t user, nice, system;
FILE* f = fopen( "/proc/stat", "r" );
if( f )
{
fscanf( f, "cpu %" PRIu64 " %" PRIu64 " %" PRIu64" %" PRIu64, &user, &nice, &system, &idle );
fclose( f );
used = user + nice + system;
}
}
# elif defined __APPLE__
void SysTime::ReadTimes()
{
host_cpu_load_info_data_t info;
mach_msg_type_number_t cnt = HOST_CPU_LOAD_INFO_COUNT;
host_statistics( mach_host_self(), HOST_CPU_LOAD_INFO, reinterpret_cast<host_info_t>( &info ), &cnt );
used = info.cpu_ticks[CPU_STATE_USER] + info.cpu_ticks[CPU_STATE_NICE] + info.cpu_ticks[CPU_STATE_SYSTEM];
idle = info.cpu_ticks[CPU_STATE_IDLE];
}
#endif
SysTime::SysTime()
{
ReadTimes();
}
float SysTime::Get()
{
const auto oldUsed = used;
const auto oldIdle = idle;
ReadTimes();
const auto diffIdle = idle - oldIdle;
const auto diffUsed = used - oldUsed;
#if defined _WIN32 || defined __CYGWIN__
return diffUsed == 0 ? -1 : ( diffUsed - diffIdle ) * 100.f / diffUsed;
#elif defined __linux__ || defined __APPLE__
const auto total = diffUsed + diffIdle;
return total == 0 ? -1 : diffUsed * 100.f / total;
#endif
}
}
#endif

30
client/TracySysTime.hpp Normal file
View File

@@ -0,0 +1,30 @@
#ifndef __TRACYSYSTIME_HPP__
#define __TRACYSYSTIME_HPP__
#if defined _WIN32 || defined __CYGWIN__ || defined __linux__ || defined __APPLE__
# define TRACY_HAS_SYSTIME
#endif
#ifdef TRACY_HAS_SYSTIME
#include <stdint.h>
namespace tracy
{
class SysTime
{
public:
SysTime();
float Get();
void ReadTimes();
private:
uint64_t idle, used;
};
}
#endif
#endif

862
client/TracySysTrace.cpp Normal file
View File

@@ -0,0 +1,862 @@
#include "TracySysTrace.hpp"
#ifdef TRACY_HAS_SYSTEM_TRACING
# if defined _WIN32 || defined __CYGWIN__
# ifndef NOMINMAX
# define NOMINMAX
# endif
# define INITGUID
# include <assert.h>
# include <string.h>
# include <windows.h>
# include <dbghelp.h>
# include <evntrace.h>
# include <evntcons.h>
# include <psapi.h>
# include <winternl.h>
# include "../common/TracyAlloc.hpp"
# include "../common/TracySystem.hpp"
# include "TracyProfiler.hpp"
namespace tracy
{
TRACEHANDLE s_traceHandle;
TRACEHANDLE s_traceHandle2;
EVENT_TRACE_PROPERTIES* s_prop;
struct CSwitch
{
uint32_t newThreadId;
uint32_t oldThreadId;
int8_t newThreadPriority;
int8_t oldThreadPriority;
uint8_t previousCState;
int8_t spareByte;
int8_t oldThreadWaitReason;
int8_t oldThreadWaitMode;
int8_t oldThreadState;
int8_t oldThreadWaitIdealProcessor;
uint32_t newThreadWaitTime;
uint32_t reserved;
};
struct ReadyThread
{
uint32_t threadId;
int8_t adjustReason;
int8_t adjustIncrement;
int8_t flag;
int8_t reserverd;
};
void WINAPI EventRecordCallback( PEVENT_RECORD record )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
const auto& hdr = record->EventHeader;
if( hdr.EventDescriptor.Opcode == 36 )
{
const auto cswitch = (const CSwitch*)record->UserData;
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ContextSwitch );
MemWrite( &item->contextSwitch.time, hdr.TimeStamp.QuadPart );
memcpy( &item->contextSwitch.oldThread, &cswitch->oldThreadId, sizeof( cswitch->oldThreadId ) );
memcpy( &item->contextSwitch.newThread, &cswitch->newThreadId, sizeof( cswitch->newThreadId ) );
memset( ((char*)&item->contextSwitch.oldThread)+4, 0, 4 );
memset( ((char*)&item->contextSwitch.newThread)+4, 0, 4 );
MemWrite( &item->contextSwitch.cpu, record->BufferContext.ProcessorNumber );
MemWrite( &item->contextSwitch.reason, cswitch->oldThreadWaitReason );
MemWrite( &item->contextSwitch.state, cswitch->oldThreadState );
tail.store( magic + 1, std::memory_order_release );
}
else if( hdr.EventDescriptor.Opcode == 50 )
{
const auto rt = (const ReadyThread*)record->UserData;
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ThreadWakeup );
MemWrite( &item->threadWakeup.time, hdr.TimeStamp.QuadPart );
memcpy( &item->threadWakeup.thread, &rt->threadId, sizeof( rt->threadId ) );
memset( ((char*)&item->threadWakeup.thread)+4, 0, 4 );
tail.store( magic + 1, std::memory_order_release );
}
}
bool SysTraceStart()
{
TOKEN_PRIVILEGES priv = {};
priv.PrivilegeCount = 1;
priv.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
if( LookupPrivilegeValue( nullptr, SE_SYSTEM_PROFILE_NAME, &priv.Privileges[0].Luid ) == 0 ) return false;
HANDLE pt;
if( OpenProcessToken( GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES, &pt ) == 0 ) return false;
const auto adjust = AdjustTokenPrivileges( pt, FALSE, &priv, 0, nullptr, nullptr );
CloseHandle( pt );
if( adjust == 0 ) return false;
const auto status = GetLastError();
if( status != ERROR_SUCCESS ) return false;
const auto psz = sizeof( EVENT_TRACE_PROPERTIES ) + sizeof( KERNEL_LOGGER_NAME );
s_prop = (EVENT_TRACE_PROPERTIES*)tracy_malloc( psz );
memset( s_prop, 0, sizeof( EVENT_TRACE_PROPERTIES ) );
s_prop->EnableFlags = EVENT_TRACE_FLAG_CSWITCH | EVENT_TRACE_FLAG_DISPATCHER;
s_prop->LogFileMode = EVENT_TRACE_REAL_TIME_MODE;
s_prop->Wnode.BufferSize = psz;
s_prop->Wnode.Flags = WNODE_FLAG_TRACED_GUID;
s_prop->Wnode.ClientContext = 3;
s_prop->Wnode.Guid = SystemTraceControlGuid;
s_prop->LoggerNameOffset = sizeof( EVENT_TRACE_PROPERTIES );
memcpy( ((char*)s_prop) + sizeof( EVENT_TRACE_PROPERTIES ), KERNEL_LOGGER_NAME, sizeof( KERNEL_LOGGER_NAME ) );
auto backup = tracy_malloc( psz );
memcpy( backup, s_prop, psz );
const auto controlStatus = ControlTrace( 0, KERNEL_LOGGER_NAME, s_prop, EVENT_TRACE_CONTROL_STOP );
if( controlStatus != ERROR_SUCCESS && controlStatus != ERROR_WMI_INSTANCE_NOT_FOUND )
{
tracy_free( s_prop );
return false;
}
memcpy( s_prop, backup, psz );
tracy_free( backup );
const auto startStatus = StartTrace( &s_traceHandle, KERNEL_LOGGER_NAME, s_prop );
if( startStatus != ERROR_SUCCESS )
{
tracy_free( s_prop );
return false;
}
#ifdef UNICODE
WCHAR KernelLoggerName[sizeof( KERNEL_LOGGER_NAME )];
#else
char KernelLoggerName[sizeof( KERNEL_LOGGER_NAME )];
#endif
memcpy( KernelLoggerName, KERNEL_LOGGER_NAME, sizeof( KERNEL_LOGGER_NAME ) );
EVENT_TRACE_LOGFILE log = {};
log.LoggerName = KernelLoggerName;
log.ProcessTraceMode = PROCESS_TRACE_MODE_REAL_TIME | PROCESS_TRACE_MODE_EVENT_RECORD | PROCESS_TRACE_MODE_RAW_TIMESTAMP;
log.EventRecordCallback = EventRecordCallback;
s_traceHandle2 = OpenTrace( &log );
if( s_traceHandle2 == (TRACEHANDLE)INVALID_HANDLE_VALUE )
{
CloseTrace( s_traceHandle );
tracy_free( s_prop );
return false;
}
return true;
}
void SysTraceStop()
{
CloseTrace( s_traceHandle2 );
CloseTrace( s_traceHandle );
}
void SysTraceWorker( void* ptr )
{
SetThreadName( "Tracy SysTrace" );
ProcessTrace( &s_traceHandle2, 1, 0, 0 );
ControlTrace( 0, KERNEL_LOGGER_NAME, s_prop, EVENT_TRACE_CONTROL_STOP );
tracy_free( s_prop );
}
#ifdef __CYGWIN__
extern "C" typedef DWORD (WINAPI *t_GetProcessIdOfThread)( HANDLE );
extern "C" typedef DWORD (WINAPI *t_GetProcessImageFileNameA)( HANDLE, LPSTR, DWORD );
# ifdef UNICODE
t_GetProcessIdOfThread GetProcessIdOfThread = (t_GetProcessIdOfThread)GetProcAddress( GetModuleHandle( L"kernel32.dll" ), "GetProcessIdOfThread" );
t_GetProcessImageFileNameA GetProcessImageFileNameA = (t_GetProcessImageFileNameA)GetProcAddress( GetModuleHandle( L"kernel32.dll" ), "K32GetProcessImageFileNameA" );
# else
t_GetProcessIdOfThread GetProcessIdOfThread = (t_GetProcessIdOfThread)GetProcAddress( GetModuleHandle( "kernel32.dll" ), "GetProcessIdOfThread" );
t_GetProcessImageFileNameA GetProcessImageFileNameA = (t_GetProcessImageFileNameA)GetProcAddress( GetModuleHandle( "kernel32.dll" ), "K32GetProcessImageFileNameA" );
# endif
#endif
extern "C" typedef NTSTATUS (WINAPI *t_NtQueryInformationThread)( HANDLE, THREADINFOCLASS, PVOID, ULONG, PULONG );
extern "C" typedef BOOL (WINAPI *t_EnumProcessModules)( HANDLE, HMODULE*, DWORD, LPDWORD );
extern "C" typedef BOOL (WINAPI *t_GetModuleInformation)( HANDLE, HMODULE, LPMODULEINFO, DWORD );
extern "C" typedef DWORD (WINAPI *t_GetModuleBaseNameA)( HANDLE, HMODULE, LPSTR, DWORD );
#ifdef UNICODE
t_NtQueryInformationThread NtQueryInformationThread = (t_NtQueryInformationThread)GetProcAddress( GetModuleHandle( L"ntdll.dll" ), "NtQueryInformationThread" );
t_EnumProcessModules _EnumProcessModules = (t_EnumProcessModules)GetProcAddress( GetModuleHandle( L"kernel32.dll" ), "K32EnumProcessModules" );
t_GetModuleInformation _GetModuleInformation = (t_GetModuleInformation)GetProcAddress( GetModuleHandle( L"kernel32.dll" ), "K32GetModuleInformation" );
t_GetModuleBaseNameA _GetModuleBaseNameA = (t_GetModuleBaseNameA)GetProcAddress( GetModuleHandle( L"kernel32.dll" ), "K32GetModuleBaseNameA" );
#else
t_NtQueryInformationThread NtQueryInformationThread = (t_NtQueryInformationThread)GetProcAddress( GetModuleHandle( "ntdll.dll" ), "NtQueryInformationThread" );
t_EnumProcessModules _EnumProcessModules = (t_EnumProcessModules)GetProcAddress( GetModuleHandle( "kernel32.dll" ), "K32EnumProcessModules" );
t_GetModuleInformation _GetModuleInformation = (t_GetModuleInformation)GetProcAddress( GetModuleHandle( "kernel32.dll" ), "K32GetModuleInformation" );
t_GetModuleBaseNameA _GetModuleBaseNameA = (t_GetModuleBaseNameA)GetProcAddress( GetModuleHandle( "kernel32.dll" ), "K32GetModuleBaseNameA" );
#endif
void SysTraceSendExternalName( uint64_t thread )
{
bool threadSent = false;
auto hnd = OpenThread( THREAD_QUERY_INFORMATION, FALSE, DWORD( thread ) );
if( hnd == 0 )
{
hnd = OpenThread( THREAD_QUERY_LIMITED_INFORMATION, FALSE, DWORD( thread ) );
}
if( hnd != 0 )
{
#if defined NTDDI_WIN10_RS2 && NTDDI_VERSION >= NTDDI_WIN10_RS2
PWSTR tmp;
GetThreadDescription( hnd, &tmp );
char buf[256];
if( tmp )
{
auto ret = wcstombs( buf, tmp, 256 );
if( ret != 0 )
{
GetProfiler().SendString( thread, buf, QueueType::ExternalThreadName );
threadSent = true;
}
}
#endif
const auto pid = GetProcessIdOfThread( hnd );
if( !threadSent && NtQueryInformationThread && _EnumProcessModules && _GetModuleInformation && _GetModuleBaseNameA )
{
void* ptr;
ULONG retlen;
auto status = NtQueryInformationThread( hnd, (THREADINFOCLASS)9 /*ThreadQuerySetWin32StartAddress*/, &ptr, sizeof( &ptr ), &retlen );
if( status == 0 )
{
const auto phnd = OpenProcess( PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, pid );
if( phnd != INVALID_HANDLE_VALUE )
{
HMODULE modules[1024];
DWORD needed;
if( _EnumProcessModules( phnd, modules, 1024 * sizeof( HMODULE ), &needed ) != 0 )
{
const auto sz = std::min( DWORD( needed / sizeof( HMODULE ) ), DWORD( 1024 ) );
for( DWORD i=0; i<sz; i++ )
{
MODULEINFO info;
if( _GetModuleInformation( phnd, modules[i], &info, sizeof( info ) ) != 0 )
{
if( (uint64_t)ptr >= (uint64_t)info.lpBaseOfDll && (uint64_t)ptr <= (uint64_t)info.lpBaseOfDll + (uint64_t)info.SizeOfImage )
{
char buf[1024];
if( _GetModuleBaseNameA( phnd, modules[i], buf, 1024 ) != 0 )
{
GetProfiler().SendString( thread, buf, QueueType::ExternalThreadName );
threadSent = true;
}
}
}
}
}
CloseHandle( phnd );
}
}
}
CloseHandle( hnd );
if( !threadSent )
{
GetProfiler().SendString( thread, "???", QueueType::ExternalThreadName );
threadSent = true;
}
if( pid != 0 )
{
{
uint64_t _pid = pid;
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::TidToPid );
MemWrite( &item->tidToPid.tid, thread );
MemWrite( &item->tidToPid.pid, _pid );
tail.store( magic + 1, std::memory_order_release );
}
if( pid == 4 )
{
GetProfiler().SendString( thread, "System", QueueType::ExternalName );
return;
}
else
{
const auto phnd = OpenProcess( PROCESS_QUERY_LIMITED_INFORMATION, FALSE, pid );
if( phnd != INVALID_HANDLE_VALUE )
{
char buf[1024];
const auto sz = GetProcessImageFileNameA( phnd, buf, 1024 );
CloseHandle( phnd );
if( sz != 0 )
{
auto ptr = buf + sz - 1;
while( ptr > buf && *ptr != '\\' ) ptr--;
if( *ptr == '\\' ) ptr++;
GetProfiler().SendString( thread, ptr, QueueType::ExternalName );
return;
}
}
}
}
}
if( !threadSent )
{
GetProfiler().SendString( thread, "???", QueueType::ExternalThreadName );
}
GetProfiler().SendString( thread, "???", QueueType::ExternalName );
}
}
# elif defined __linux__
# include <sys/types.h>
# include <sys/stat.h>
# include <sys/wait.h>
# include <fcntl.h>
# include <inttypes.h>
# include <limits>
# include <poll.h>
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <unistd.h>
# include "TracyProfiler.hpp"
# ifdef __ANDROID__
# include "TracySysTracePayload.hpp"
# endif
namespace tracy
{
static const char BasePath[] = "/sys/kernel/debug/tracing/";
static const char TracingOn[] = "tracing_on";
static const char CurrentTracer[] = "current_tracer";
static const char TraceOptions[] = "trace_options";
static const char TraceClock[] = "trace_clock";
static const char SchedSwitch[] = "events/sched/sched_switch/enable";
static const char SchedWakeup[] = "events/sched/sched_wakeup/enable";
static const char BufferSizeKb[] = "buffer_size_kb";
static const char TracePipe[] = "trace_pipe";
#ifdef __ANDROID__
static bool TraceWrite( const char* path, size_t psz, const char* val, size_t vsz )
{
char tmp[256];
sprintf( tmp, "su -c 'echo \"%s\" > %s%s'", val, BasePath, path );
return system( tmp ) == 0;
}
#else
static bool TraceWrite( const char* path, size_t psz, const char* val, size_t vsz )
{
char tmp[256];
memcpy( tmp, BasePath, sizeof( BasePath ) - 1 );
memcpy( tmp + sizeof( BasePath ) - 1, path, psz );
int fd = open( tmp, O_WRONLY );
if( fd < 0 ) return false;
for(;;)
{
ssize_t cnt = write( fd, val, vsz );
if( cnt == (ssize_t)vsz )
{
close( fd );
return true;
}
if( cnt < 0 )
{
close( fd );
return false;
}
vsz -= cnt;
val += cnt;
}
}
#endif
#ifdef __ANDROID__
void SysTraceInjectPayload()
{
int pipefd[2];
if( pipe( pipefd ) == 0 )
{
const auto pid = fork();
if( pid == 0 )
{
// child
close( pipefd[1] );
if( dup2( pipefd[0], STDIN_FILENO ) >= 0 )
{
close( pipefd[0] );
execlp( "su", "su", "-c", "cat > /data/tracy_systrace", (char*)nullptr );
exit( 1 );
}
}
else if( pid > 0 )
{
// parent
close( pipefd[0] );
#ifdef __aarch64__
write( pipefd[1], tracy_systrace_aarch64_data, tracy_systrace_aarch64_size );
#else
write( pipefd[1], tracy_systrace_armv7_data, tracy_systrace_armv7_size );
#endif
close( pipefd[1] );
waitpid( pid, nullptr, 0 );
system( "su -c 'chmod 700 /data/tracy_systrace'" );
}
}
}
#endif
bool SysTraceStart()
{
if( !TraceWrite( TracingOn, sizeof( TracingOn ), "0", 2 ) ) return false;
if( !TraceWrite( CurrentTracer, sizeof( CurrentTracer ), "nop", 4 ) ) return false;
TraceWrite( TraceOptions, sizeof( TraceOptions ), "norecord-cmd", 13 );
TraceWrite( TraceOptions, sizeof( TraceOptions ), "norecord-tgid", 14 );
TraceWrite( TraceOptions, sizeof( TraceOptions ), "noirq-info", 11 );
#if defined TRACY_HW_TIMER && ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 )
if( !TraceWrite( TraceClock, sizeof( TraceClock ), "x86-tsc", 8 ) ) return false;
#elif __ARM_ARCH >= 6
if( !TraceWrite( TraceClock, sizeof( TraceClock ), "mono_raw", 9 ) ) return false;
#endif
if( !TraceWrite( SchedSwitch, sizeof( SchedSwitch ), "1", 2 ) ) return false;
if( !TraceWrite( SchedWakeup, sizeof( SchedWakeup ), "1", 2 ) ) return false;
if( !TraceWrite( BufferSizeKb, sizeof( BufferSizeKb ), "512", 4 ) ) return false;
#if defined __ANDROID__ && ( defined __aarch64__ || defined __ARM_ARCH )
SysTraceInjectPayload();
#endif
if( !TraceWrite( TracingOn, sizeof( TracingOn ), "1", 2 ) ) return false;
return true;
}
void SysTraceStop()
{
TraceWrite( TracingOn, sizeof( TracingOn ), "0", 2 );
}
static uint64_t ReadNumber( const char*& ptr )
{
uint64_t val = 0;
for(;;)
{
if( *ptr >= '0' && *ptr <= '9' )
{
val = val * 10 + ( *ptr - '0' );
ptr++;
}
else
{
return val;
}
}
}
static uint8_t ReadState( char state )
{
switch( state )
{
case 'D': return 101;
case 'I': return 102;
case 'R': return 103;
case 'S': return 104;
case 'T': return 105;
case 't': return 106;
case 'W': return 107;
case 'X': return 108;
case 'Z': return 109;
default: return 100;
}
}
#if defined __ANDROID__ && defined __ANDROID_API__ && __ANDROID_API__ < 18
/*-
* Copyright (c) 2011 The NetBSD Foundation, Inc.
* All rights reserved.
*
* This code is derived from software contributed to The NetBSD Foundation
* by Christos Zoulas.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
ssize_t getdelim(char **buf, size_t *bufsiz, int delimiter, FILE *fp)
{
char *ptr, *eptr;
if (*buf == NULL || *bufsiz == 0) {
*bufsiz = BUFSIZ;
if ((*buf = (char*)malloc(*bufsiz)) == NULL)
return -1;
}
for (ptr = *buf, eptr = *buf + *bufsiz;;) {
int c = fgetc(fp);
if (c == -1) {
if (feof(fp))
return ptr == *buf ? -1 : ptr - *buf;
else
return -1;
}
*ptr++ = c;
if (c == delimiter) {
*ptr = '\0';
return ptr - *buf;
}
if (ptr + 2 >= eptr) {
char *nbuf;
size_t nbufsiz = *bufsiz * 2;
ssize_t d = ptr - *buf;
if ((nbuf = (char*)realloc(*buf, nbufsiz)) == NULL)
return -1;
*buf = nbuf;
*bufsiz = nbufsiz;
eptr = nbuf + nbufsiz;
ptr = nbuf + d;
}
}
}
ssize_t getline(char **buf, size_t *bufsiz, FILE *fp)
{
return getdelim(buf, bufsiz, '\n', fp);
}
#endif
static void HandleTraceLine( const char* line )
{
line += 24;
const auto cpu = (uint8_t)ReadNumber( line );
line++; // ']'
while( *line == ' ' ) line++;
#if defined TRACY_HW_TIMER && ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 )
const auto time = ReadNumber( line );
#elif __ARM_ARCH >= 6
const auto ts = ReadNumber( line );
line++; // '.'
const auto tus = ReadNumber( line );
const auto time = ts * 1000000000ll + tus * 1000ll;
#endif
line += 2; // ': '
if( memcmp( line, "sched_switch", 12 ) == 0 )
{
line += 14;
while( memcmp( line, "prev_pid", 8 ) != 0 ) line++;
line += 9;
const auto oldPid = ReadNumber( line );
line++;
while( memcmp( line, "prev_state", 10 ) != 0 ) line++;
line += 11;
const auto oldState = (uint8_t)ReadState( *line );
line += 5;
while( memcmp( line, "next_pid", 8 ) != 0 ) line++;
line += 9;
const auto newPid = ReadNumber( line );
uint8_t reason = 100;
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ContextSwitch );
MemWrite( &item->contextSwitch.time, time );
MemWrite( &item->contextSwitch.oldThread, oldPid );
MemWrite( &item->contextSwitch.newThread, newPid );
MemWrite( &item->contextSwitch.cpu, cpu );
MemWrite( &item->contextSwitch.reason, reason );
MemWrite( &item->contextSwitch.state, oldState );
tail.store( magic + 1, std::memory_order_release );
}
else if( memcmp( line, "sched_wakeup", 12 ) == 0 )
{
line += 14;
while( memcmp( line, "pid", 3 ) != 0 ) line++;
line += 4;
const auto pid = ReadNumber( line );
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::ThreadWakeup );
MemWrite( &item->threadWakeup.time, time );
MemWrite( &item->threadWakeup.thread, pid );
tail.store( magic + 1, std::memory_order_release );
}
}
#ifdef __ANDROID__
static void ProcessTraceLines( int fd )
{
// Linux pipe buffer is 64KB, additional 1KB is for unfinished lines
char* buf = (char*)tracy_malloc( (64+1)*1024 );
char* line = buf;
for(;;)
{
const auto rd = read( fd, line, 64*1024 );
if( rd <= 0 ) break;
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() )
{
if( rd < 64*1024 )
{
assert( line[rd-1] == '\n' );
line = buf;
std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
}
else
{
const auto end = line + rd;
line = end - 1;
while( line > buf && *line != '\n' ) line--;
if( line > buf )
{
line++;
const auto lsz = end - line;
memmove( buf, line, lsz );
line = buf + lsz;
}
}
continue;
}
#endif
const auto end = line + rd;
line = buf;
for(;;)
{
auto next = line;
while( next < end && *next != '\n' ) next++;
next++;
if( next >= end )
{
const auto lsz = end - line;
memmove( buf, line, lsz );
line = buf + lsz;
break;
}
HandleTraceLine( line );
line = next;
}
if( rd < 64*1024 )
{
std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
}
}
tracy_free( buf );
}
void SysTraceWorker( void* ptr )
{
SetThreadName( "Tracy SysTrace" );
int pipefd[2];
if( pipe( pipefd ) == 0 )
{
const auto pid = fork();
if( pid == 0 )
{
// child
close( pipefd[0] );
dup2( pipefd[1], STDERR_FILENO );
if( dup2( pipefd[1], STDOUT_FILENO ) >= 0 )
{
close( pipefd[1] );
#if defined __ANDROID__ && ( defined __aarch64__ || defined __ARM_ARCH )
execlp( "su", "su", "-c", "/data/tracy_systrace", (char*)nullptr );
#endif
execlp( "su", "su", "-c", "cat /sys/kernel/debug/tracing/trace_pipe", (char*)nullptr );
exit( 1 );
}
}
else if( pid > 0 )
{
// parent
close( pipefd[1] );
ProcessTraceLines( pipefd[0] );
close( pipefd[0] );
}
}
}
#else
static void ProcessTraceLines( int fd )
{
char* buf = (char*)tracy_malloc( 64*1024 );
struct pollfd pfd;
pfd.fd = fd;
pfd.events = POLLIN | POLLERR;
for(;;)
{
while( poll( &pfd, 1, 0 ) <= 0 ) std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
const auto rd = read( fd, buf, 64*1024 );
if( rd <= 0 ) break;
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) continue;
#endif
auto line = buf;
const auto end = buf + rd;
for(;;)
{
auto next = line;
while( next < end && *next != '\n' ) next++;
if( next == end ) break;
assert( *next == '\n' );
next++;
HandleTraceLine( line );
line = next;
}
}
tracy_free( buf );
}
void SysTraceWorker( void* ptr )
{
SetThreadName( "Tracy SysTrace" );
char tmp[256];
memcpy( tmp, BasePath, sizeof( BasePath ) - 1 );
memcpy( tmp + sizeof( BasePath ) - 1, TracePipe, sizeof( TracePipe ) );
int fd = open( tmp, O_RDONLY );
if( fd < 0 ) return;
ProcessTraceLines( fd );
close( fd );
}
#endif
void SysTraceSendExternalName( uint64_t thread )
{
FILE* f;
char fn[256];
sprintf( fn, "/proc/%" PRIu64 "/comm", thread );
f = fopen( fn, "rb" );
if( f )
{
char buf[256];
const auto sz = fread( buf, 1, 256, f );
if( sz > 0 && buf[sz-1] == '\n' ) buf[sz-1] = '\0';
GetProfiler().SendString( thread, buf, QueueType::ExternalThreadName );
fclose( f );
}
else
{
GetProfiler().SendString( thread, "???", QueueType::ExternalThreadName );
}
sprintf( fn, "/proc/%" PRIu64 "/status", thread );
f = fopen( fn, "rb" );
if( f )
{
int pid = -1;
size_t lsz = 1024;
auto line = (char*)malloc( lsz );
for(;;)
{
auto rd = getline( &line, &lsz, f );
if( rd <= 0 ) break;
if( memcmp( "Tgid:\t", line, 6 ) == 0 )
{
pid = atoi( line + 6 );
break;
}
}
free( line );
fclose( f );
if( pid >= 0 )
{
{
uint64_t _pid = pid;
Magic magic;
auto token = GetToken();
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin( magic );
MemWrite( &item->hdr.type, QueueType::TidToPid );
MemWrite( &item->tidToPid.tid, thread );
MemWrite( &item->tidToPid.pid, _pid );
tail.store( magic + 1, std::memory_order_release );
}
sprintf( fn, "/proc/%i/comm", pid );
f = fopen( fn, "rb" );
if( f )
{
char buf[256];
const auto sz = fread( buf, 1, 256, f );
if( sz > 0 && buf[sz-1] == '\n' ) buf[sz-1] = '\0';
GetProfiler().SendString( thread, buf, QueueType::ExternalName );
fclose( f );
return;
}
}
}
GetProfiler().SendString( thread, "???", QueueType::ExternalName );
}
}
# endif
#endif

25
client/TracySysTrace.hpp Normal file
View File

@@ -0,0 +1,25 @@
#ifndef __TRACYSYSTRACE_HPP__
#define __TRACYSYSTRACE_HPP__
#if !defined TRACY_NO_SYSTEM_TRACING && ( defined _WIN32 || defined __CYGWIN__ || defined __linux__ )
# define TRACY_HAS_SYSTEM_TRACING
#endif
#ifdef TRACY_HAS_SYSTEM_TRACING
#include <stdint.h>
namespace tracy
{
bool SysTraceStart();
void SysTraceStop();
void SysTraceWorker( void* ptr );
void SysTraceSendExternalName( uint64_t thread );
}
#endif
#endif

View File

@@ -0,0 +1,80 @@
// File: '/home/wolf/desktop/tracy_systrace.armv7' (1210 bytes)`
// File: '/home/wolf/desktop/tracy_systrace.aarch64' (1650 bytes)
// Exported using binary_to_compressed_c.cpp
namespace tracy
{
static const unsigned int tracy_systrace_armv7_size = 1210;
static const unsigned int tracy_systrace_armv7_data[1212/4] =
{
0x464c457f, 0x00010101, 0x00000000, 0x00000000, 0x00280003, 0x00000001, 0x00000208, 0x00000034, 0x00000000, 0x05000200, 0x00200034, 0x00280007,
0x00000000, 0x00000006, 0x00000034, 0x00000034, 0x00000034, 0x000000e0, 0x000000e0, 0x00000004, 0x00000004, 0x00000003, 0x00000114, 0x00000114,
0x00000114, 0x00000013, 0x00000013, 0x00000004, 0x00000001, 0x00000001, 0x00000000, 0x00000000, 0x00000000, 0x000003ed, 0x000003ed, 0x00000005,
0x00001000, 0x00000001, 0x000003ed, 0x000013ed, 0x000013ed, 0x000000cd, 0x000000cf, 0x00000006, 0x00001000, 0x00000002, 0x000003f0, 0x000013f0,
0x000013f0, 0x000000b8, 0x000000b8, 0x00000006, 0x00000004, 0x6474e551, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000006,
0x00000010, 0x70000001, 0x00000394, 0x00000394, 0x00000394, 0x00000008, 0x00000008, 0x00000004, 0x00000004, 0x7379732f, 0x2f6d6574, 0x2f6e6962,
0x6b6e696c, 0x00007265, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000001, 0x00000000, 0x00000000, 0x00000012, 0x00000016, 0x00000000,
0x00000000, 0x00000012, 0x6f6c6400, 0x006e6570, 0x4342494c, 0x62696c00, 0x732e6c64, 0x6c64006f, 0x006d7973, 0x00000001, 0x00000003, 0x00000001,
0x00000000, 0x00000000, 0x00000000, 0x00000001, 0x00000003, 0x00000002, 0x00000000, 0x00000000, 0x00000001, 0x00020000, 0x00000002, 0x00010001,
0x0000000d, 0x00000010, 0x00000000, 0x00050d63, 0x00020000, 0x00000008, 0x00000000, 0x000014b4, 0x00000116, 0x000014b8, 0x00000216, 0xe52de004,
0xe59fe004, 0xe08fe00e, 0xe5bef008, 0x000012bc, 0xe28fc600, 0xe28cca01, 0xe5bcf2bc, 0xe28fc600, 0xe28cca01, 0xe5bcf2b4, 0xe92d4ff0, 0xe28db01c,
0xe24dd01c, 0xe24dd801, 0xe59f0154, 0xe3a01001, 0xe08f0000, 0xebfffff1, 0xe59f1148, 0xe1a07000, 0xe08f1001, 0xebfffff0, 0xe59f113c, 0xe1a09000,
0xe1a00007, 0xe08f1001, 0xebffffeb, 0xe59f112c, 0xe1a04000, 0xe1a00007, 0xe08f1001, 0xebffffe6, 0xe59f111c, 0xe1a05000, 0xe1a00007, 0xe08f1001,
0xebffffe1, 0xe59f110c, 0xe1a06000, 0xe1a00007, 0xe08f1001, 0xebffffdc, 0xe58d0004, 0xe1a00007, 0xe59f10f4, 0xe08f1001, 0xebffffd7, 0xe1a0a000,
0xe59f00e8, 0xe3a01000, 0xe3a08000, 0xe08f0000, 0xe12fff39, 0xe1a07000, 0xe3700001, 0xca000001, 0xe3a00000, 0xe12fff34, 0xe3a00009, 0xe58d4000,
0xe1cd01b4, 0xe3090680, 0xe3400098, 0xe28d4010, 0xe58d000c, 0xe28d9018, 0xe58d8008, 0xe28d8008, 0xe58d7010, 0xea000003, 0xe1a02000, 0xe3a00001,
0xe1a01009, 0xe12fff3a, 0xe1a00004, 0xe3a01001, 0xe3a02000, 0xe12fff35, 0xe3500000, 0xca000008, 0xe1a00008, 0xe3a01000, 0xe12fff36, 0xe1a00004,
0xe3a01001, 0xe3a02000, 0xe12fff35, 0xe3500001, 0xbafffff6, 0xe59d3004, 0xe1a00007, 0xe1a01009, 0xe3a02801, 0xe12fff33, 0xe3500001, 0xaaffffe5,
0xe59d1000, 0xe3a00000, 0xe12fff31, 0xe24bd01c, 0xe8bd8ff0, 0x00000174, 0x0000016c, 0x0000015d, 0x0000014e, 0x0000013f, 0x00000135, 0x00000126,
0x00000114, 0x7ffffe74, 0x00000001, 0x6362696c, 0x006f732e, 0x6e65706f, 0x69786500, 0x6f700074, 0x6e006c6c, 0x736f6e61, 0x7065656c, 0x61657200,
0x72770064, 0x00657469, 0x7379732f, 0x72656b2f, 0x2f6c656e, 0x75626564, 0x72742f67, 0x6e696361, 0x72742f67, 0x5f656361, 0x65706970, 0x00000000,
0x00000003, 0x000014a8, 0x00000002, 0x00000010, 0x00000017, 0x000001cc, 0x00000014, 0x00000011, 0x00000015, 0x00000000, 0x00000006, 0x00000128,
0x0000000b, 0x00000010, 0x00000005, 0x00000158, 0x0000000a, 0x0000001c, 0x6ffffef5, 0x00000174, 0x00000004, 0x0000018c, 0x00000001, 0x0000000d,
0x0000001e, 0x00000008, 0x6ffffffb, 0x00000001, 0x6ffffff0, 0x000001a4, 0x6ffffffe, 0x000001ac, 0x6fffffff, 0x00000001, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x000001dc, 0x000001dc,
};
static const unsigned int tracy_systrace_aarch64_size = 1650;
static const unsigned int tracy_systrace_aarch64_data[1652/4] =
{
0x464c457f, 0x00010102, 0x00000000, 0x00000000, 0x00b70003, 0x00000001, 0x00000300, 0x00000000, 0x00000040, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00380040, 0x00400006, 0x00000000, 0x00000006, 0x00000005, 0x00000040, 0x00000000, 0x00000040, 0x00000000, 0x00000040, 0x00000000,
0x00000150, 0x00000000, 0x00000150, 0x00000000, 0x00000008, 0x00000000, 0x00000003, 0x00000004, 0x00000190, 0x00000000, 0x00000190, 0x00000000,
0x00000190, 0x00000000, 0x00000015, 0x00000000, 0x00000015, 0x00000000, 0x00000001, 0x00000000, 0x00000001, 0x00000005, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x000004d1, 0x00000000, 0x000004d1, 0x00000000, 0x00010000, 0x00000000, 0x00000001, 0x00000006,
0x000004d8, 0x00000000, 0x000104d8, 0x00000000, 0x000104d8, 0x00000000, 0x0000019a, 0x00000000, 0x000001a0, 0x00000000, 0x00010000, 0x00000000,
0x00000002, 0x00000006, 0x000004d8, 0x00000000, 0x000104d8, 0x00000000, 0x000104d8, 0x00000000, 0x00000170, 0x00000000, 0x00000170, 0x00000000,
0x00000008, 0x00000000, 0x6474e551, 0x00000006, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000010, 0x00000000, 0x7379732f, 0x2f6d6574, 0x2f6e6962, 0x6b6e696c, 0x34367265, 0x00000000, 0x00000001, 0x00000004,
0x00000003, 0x00000000, 0x00000000, 0x00000000, 0x00000002, 0x00000000, 0x00000001, 0x00000001, 0x00000001, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x000a0003, 0x00000300, 0x00000000,
0x00000000, 0x00000000, 0x0000000a, 0x00000012, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000011, 0x00000012, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x62696c00, 0x732e6c64, 0x6c64006f, 0x6e65706f, 0x736c6400, 0x4c006d79, 0x00434249, 0x00000000, 0x00020002, 0x00000000,
0x00010001, 0x00000001, 0x00000010, 0x00000000, 0x00050d63, 0x00020000, 0x00000017, 0x00000000, 0x00010668, 0x00000000, 0x00000402, 0x00000002,
0x00000000, 0x00000000, 0x00010670, 0x00000000, 0x00000402, 0x00000003, 0x00000000, 0x00000000, 0xa9bf7bf0, 0x90000090, 0xf9433211, 0x91198210,
0xd61f0220, 0xd503201f, 0xd503201f, 0xd503201f, 0x90000090, 0xf9433611, 0x9119a210, 0xd61f0220, 0x90000090, 0xf9433a11, 0x9119c210, 0xd61f0220,
0xf81b0ffc, 0xa9015ff8, 0xa90257f6, 0xa9034ff4, 0xa9047bfd, 0x910103fd, 0xd14043ff, 0xd10043ff, 0x90000000, 0x91120000, 0x320003e1, 0x97ffffed,
0x90000001, 0x91122021, 0xaa0003f7, 0x97ffffed, 0x90000001, 0xaa0003f8, 0x91123421, 0xaa1703e0, 0x97ffffe8, 0x90000001, 0xaa0003f3, 0x91124821,
0xaa1703e0, 0x97ffffe3, 0x90000001, 0xaa0003f4, 0x91125c21, 0xaa1703e0, 0x97ffffde, 0x90000001, 0xaa0003f5, 0x91128421, 0xaa1703e0, 0x97ffffd9,
0x90000001, 0xaa0003f6, 0x91129821, 0xaa1703e0, 0x97ffffd4, 0xaa0003f7, 0x90000000, 0x9112b000, 0x2a1f03e1, 0xd63f0300, 0x2a0003f8, 0x36f80060,
0x2a1f03e0, 0xd63f0260, 0x90000008, 0x3dc11d00, 0x52800128, 0xb81c83b8, 0x781cc3a8, 0x3d8003e0, 0x14000005, 0x93407c02, 0x320003e0, 0x910043e1,
0xd63f02e0, 0xd100e3a0, 0x320003e1, 0x2a1f03e2, 0xd63f0280, 0x7100001f, 0x5400014c, 0x910003e0, 0xaa1f03e1, 0xd63f02a0, 0xd100e3a0, 0x320003e1,
0x2a1f03e2, 0xd63f0280, 0x7100041f, 0x54ffff0b, 0x910043e1, 0x321003e2, 0x2a1803e0, 0xd63f02c0, 0x7100041f, 0x54fffd0a, 0x2a1f03e0, 0xd63f0260,
0x914043ff, 0x910043ff, 0xa9447bfd, 0xa9434ff4, 0xa94257f6, 0xa9415ff8, 0xf84507fc, 0xd65f03c0, 0x00000000, 0x00000000, 0x00989680, 0x00000000,
0x6362696c, 0x006f732e, 0x6e65706f, 0x69786500, 0x6f700074, 0x6e006c6c, 0x736f6e61, 0x7065656c, 0x61657200, 0x72770064, 0x00657469, 0x7379732f,
0x72656b2f, 0x2f6c656e, 0x75626564, 0x72742f67, 0x6e696361, 0x72742f67, 0x5f656361, 0x65706970, 0x00000000, 0x00000000, 0x00000001, 0x00000000,
0x00000001, 0x00000000, 0x00000004, 0x00000000, 0x000001a8, 0x00000000, 0x6ffffef5, 0x00000000, 0x000001c8, 0x00000000, 0x00000005, 0x00000000,
0x00000248, 0x00000000, 0x00000006, 0x00000000, 0x000001e8, 0x00000000, 0x0000000a, 0x00000000, 0x0000001c, 0x00000000, 0x0000000b, 0x00000000,
0x00000018, 0x00000000, 0x00000015, 0x00000000, 0x00000000, 0x00000000, 0x00000003, 0x00000000, 0x00010650, 0x00000000, 0x00000002, 0x00000000,
0x00000030, 0x00000000, 0x00000014, 0x00000000, 0x00000007, 0x00000000, 0x00000017, 0x00000000, 0x00000290, 0x00000000, 0x0000001e, 0x00000000,
0x00000008, 0x00000000, 0x6ffffffb, 0x00000000, 0x00000001, 0x00000000, 0x6ffffffe, 0x00000000, 0x00000270, 0x00000000, 0x6fffffff, 0x00000000,
0x00000001, 0x00000000, 0x6ffffff0, 0x00000000, 0x00000264, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x000104d8, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x000002c0, 0x00000000, 0x000002c0,
};
}

View File

@@ -1,7 +1,7 @@
#ifndef __TRACYTHREAD_HPP__
#define __TRACYTHREAD_HPP__
#ifdef _MSC_VER
#if defined _WIN32 || defined __CYGWIN__
# include <windows.h>
#else
# include <pthread.h>
@@ -10,7 +10,7 @@
namespace tracy
{
#ifdef _MSC_VER
#if defined _WIN32 || defined __CYGWIN__
class Thread
{

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -13,18 +13,23 @@
#include <stddef.h>
#include "../common/TracyApi.h"
namespace tracy
{
#if defined(__clang__) || defined(__GNUC__)
# define RPMALLOC_ATTRIBUTE __attribute__((__malloc__))
# define RPMALLOC_CALL
# define RPMALLOC_RESTRICT
# define RPMALLOC_CDECL
#elif defined(_MSC_VER)
# define RPMALLOC_ATTRIBUTE
# define RPMALLOC_CALL __declspec(restrict)
# define RPMALLOC_RESTRICT __declspec(restrict)
# define RPMALLOC_CDECL __cdecl
#else
# define RPMALLOC_ATTRIBUTE
# define RPMALLOC_CALL
# define RPMALLOC_RESTRICT
# define RPMALLOC_CDECL
#endif
//! Flag to rpaligned_realloc to not preserve content in reallocation
@@ -35,8 +40,6 @@ typedef struct rpmalloc_global_statistics_t {
size_t mapped;
//! Current amount of memory in global caches for small and medium sizes (<64KiB)
size_t cached;
//! Curren amount of memory in global caches for large sizes (>=64KiB)
size_t cached_large;
//! Total amount of memory mapped (only if ENABLE_STATISTICS=1)
size_t mapped_total;
//! Total amount of memory unmapped (only if ENABLE_STATISTICS=1)
@@ -44,10 +47,6 @@ typedef struct rpmalloc_global_statistics_t {
} rpmalloc_global_statistics_t;
typedef struct rpmalloc_thread_statistics_t {
//! Amount of memory currently requested in allocations (only if ENABLE_STATISTICS=1)
size_t requested;
//! Amount of memory actually allocated in memory blocks (only if ENABLE_STATISTICS=1)
size_t allocated;
//! Current number of bytes available for allocation from active spans
size_t active;
//! Current number of bytes available in thread size class caches
@@ -62,13 +61,51 @@ typedef struct rpmalloc_thread_statistics_t {
size_t global_to_thread;
} rpmalloc_thread_statistics_t;
typedef struct rpmalloc_config_t {
//! Map memory pages for the given number of bytes. The returned address MUST be
// aligned to the rpmalloc span size, which will always be a power of two.
// Optionally the function can store an alignment offset in the offset variable
// in case it performs alignment and the returned pointer is offset from the
// actual start of the memory region due to this alignment. The alignment offset
// will be passed to the memory unmap function. The alignment offset MUST NOT be
// larger than 65535 (storable in an uint16_t), if it is you must use natural
// alignment to shift it into 16 bits.
void* (*memory_map)(size_t size, size_t* offset);
//! Unmap the memory pages starting at address and spanning the given number of bytes.
// If release is set to 1, the unmap is for an entire span range as returned by
// a previous call to memory_map and that the entire range should be released.
// If release is set to 0, the unmap is a partial decommit of a subset of the mapped
// memory range.
void (*memory_unmap)(void* address, size_t size, size_t offset, int release);
//! Size of memory pages. The page size MUST be a power of two in [512,16384] range
// (2^9 to 2^14) unless 0 - set to 0 to use system page size. All memory mapping
// requests to memory_map will be made with size set to a multiple of the page size.
size_t page_size;
//! Size of a span of memory pages. MUST be a multiple of page size, and in [4096,262144]
// range (unless 0 - set to 0 to use the default span size).
size_t span_size;
//! Number of spans to map at each request to map new virtual memory blocks. This can
// be used to minimize the system call overhead at the cost of virtual memory address
// space. The extra mapped pages will not be written until actually used, so physical
// committed memory should not be affected in the default implementation.
size_t span_map_count;
//! Debug callback if memory guards are enabled. Called if a memory overwrite is detected
void (*memory_overwrite)(void* address);
} rpmalloc_config_t;
extern int
rpmalloc_initialize(void);
extern int
rpmalloc_initialize_config(const rpmalloc_config_t* config);
extern const rpmalloc_config_t*
rpmalloc_config(void);
extern void
rpmalloc_finalize(void);
extern void
void
rpmalloc_thread_initialize(void);
extern void
@@ -86,13 +123,13 @@ rpmalloc_thread_statistics(rpmalloc_thread_statistics_t* stats);
extern void
rpmalloc_global_statistics(rpmalloc_global_statistics_t* stats);
extern RPMALLOC_CALL void*
TRACY_API RPMALLOC_RESTRICT void*
rpmalloc(size_t size) RPMALLOC_ATTRIBUTE;
extern void
TRACY_API void
rpfree(void* ptr);
extern RPMALLOC_CALL void*
extern RPMALLOC_RESTRICT void*
rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIBUTE;
extern void*
@@ -101,10 +138,10 @@ rprealloc(void* ptr, size_t size);
extern void*
rpaligned_realloc(void* ptr, size_t alignment, size_t size, size_t oldsize, unsigned int flags);
extern RPMALLOC_CALL void*
extern RPMALLOC_RESTRICT void*
rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIBUTE;
extern RPMALLOC_CALL void*
extern RPMALLOC_RESTRICT void*
rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIBUTE;
extern int

27
common/TracyAlign.hpp Normal file
View File

@@ -0,0 +1,27 @@
#ifndef __TRACYALIGN_HPP__
#define __TRACYALIGN_HPP__
#include <string.h>
#include "TracyForceInline.hpp"
namespace tracy
{
template<typename T>
tracy_force_inline T MemRead( const void* ptr )
{
T val;
memcpy( &val, ptr, sizeof( T ) );
return val;
}
template<typename T>
tracy_force_inline void MemWrite( void* ptr, T val )
{
memcpy( ptr, &val, sizeof( T ) );
}
}
#endif

View File

@@ -1,6 +1,8 @@
#ifndef __TRACYALLOC_HPP__
#define __TRACYALLOC_HPP__
#include <stdlib.h>
#ifdef TRACY_ENABLE
# include "../client/tracy_rpmalloc.hpp"
#endif

14
common/TracyApi.h Normal file
View File

@@ -0,0 +1,14 @@
#ifndef __TRACYAPI_H__
#define __TRACYAPI_H__
#ifdef _WIN32
# if defined TRACY_IMPORTS
# define TRACY_API __declspec(dllimport)
# else
# define TRACY_API __declspec(dllexport)
# endif
#else
# define TRACY_API __attribute__((visibility("default")))
#endif
#endif // __TRACYAPI_H__

View File

@@ -2,7 +2,7 @@
#define __TRACYFORCEINLINE_HPP__
#if defined(__GNUC__)
# define tracy_force_inline __attribute__((always_inline))
# define tracy_force_inline __attribute__((always_inline)) inline
#elif defined(_MSC_VER)
# define tracy_force_inline __forceinline
#else

33
common/TracyMutex.hpp Normal file
View File

@@ -0,0 +1,33 @@
#ifndef __TRACYMUTEX_HPP__
#define __TRACYMUTEX_HPP__
#if defined _MSC_VER
# include <shared_mutex>
namespace tracy
{
using TracyMutex = std::shared_mutex;
}
#elif defined __CYGWIN__
#include "tracy_benaphore.h"
namespace tracy
{
using TracyMutex = NonRecursiveBenaphore;
}
#else
#include <mutex>
namespace tracy
{
using TracyMutex = std::mutex;
}
#endif
#endif

View File

@@ -4,18 +4,38 @@
#include <limits>
#include <stdint.h>
#include "../common/tracy_lz4.hpp"
namespace tracy
{
constexpr unsigned Lz4CompressBound( unsigned isize ) { return isize + ( isize / 255 ) + 16; }
enum : uint32_t { ProtocolVersion = 23 };
enum : uint32_t { BroadcastVersion = 0 };
using lz4sz_t = uint32_t;
enum { TargetFrameSize = 256 * 1024 };
enum { LZ4Size = LZ4_COMPRESSBOUND( TargetFrameSize ) };
enum { LZ4Size = Lz4CompressBound( TargetFrameSize ) };
static_assert( LZ4Size <= std::numeric_limits<lz4sz_t>::max(), "LZ4Size greater than lz4sz_t" );
static_assert( TargetFrameSize * 2 >= 64 * 1024, "Not enough space for LZ4 stream buffer" );
enum { HandshakeShibbolethSize = 8 };
static const char HandshakeShibboleth[HandshakeShibbolethSize] = { 'T', 'r', 'a', 'c', 'y', 'P', 'r', 'f' };
enum HandshakeStatus : uint8_t
{
HandshakePending,
HandshakeWelcome,
HandshakeProtocolMismatch,
HandshakeNotAvailable,
HandshakeDropped
};
enum { WelcomeMessageProgramNameSize = 64 };
enum { WelcomeMessageHostInfoSize = 1024 };
#pragma pack( 1 )
enum ServerQuery : uint8_t
{
ServerQueryTerminate,
@@ -23,25 +43,60 @@ enum ServerQuery : uint8_t
ServerQueryThreadString,
ServerQuerySourceLocation,
ServerQueryPlotName,
ServerQueryCallstackFrame,
ServerQueryFrameName,
ServerQueryDisconnect,
ServerQueryExternalName
};
enum { WelcomeMessageProgramNameSize = 64 };
struct ServerQueryPacket
{
ServerQuery type;
uint64_t ptr;
};
enum { ServerQueryPacketSize = sizeof( ServerQueryPacket ) };
#pragma pack( 1 )
struct WelcomeMessage
{
double timerMul;
uint64_t initBegin;
uint64_t initEnd;
int64_t initBegin;
int64_t initEnd;
uint64_t delay;
uint64_t resolution;
uint64_t epoch;
uint64_t pid;
uint8_t onDemand;
uint8_t isApple;
char programName[WelcomeMessageProgramNameSize];
char hostInfo[WelcomeMessageHostInfoSize];
};
#pragma pack()
enum { WelcomeMessageSize = sizeof( WelcomeMessage ) };
struct OnDemandPayloadMessage
{
uint64_t frames;
uint64_t currentTime;
};
enum { OnDemandPayloadMessageSize = sizeof( OnDemandPayloadMessage ) };
struct BroadcastMessage
{
uint32_t broadcastVersion;
uint32_t protocolVersion;
uint32_t activeTime; // in seconds
char programName[WelcomeMessageProgramNameSize];
};
enum { BroadcastMessageSize = sizeof( BroadcastMessage ) };
#pragma pack()
}
#endif

View File

@@ -9,51 +9,99 @@ namespace tracy
enum class QueueType : uint8_t
{
ZoneText,
ZoneName,
Message,
MessageColor,
MessageCallstack,
MessageColorCallstack,
MessageAppInfo,
ZoneBeginAllocSrcLoc,
Terminate,
ZoneBeginAllocSrcLocCallstack,
CallstackMemory,
Callstack,
CallstackAlloc,
FrameImage,
ZoneBegin,
ZoneBeginCallstack,
ZoneEnd,
FrameMarkMsg,
SourceLocation,
LockAnnounce,
LockWait,
LockObtain,
LockRelease,
LockSharedWait,
LockSharedObtain,
LockSharedRelease,
LockMark,
PlotData,
MessageLiteral,
GpuNewContext,
MemAlloc,
MemFree,
MemAllocCallstack,
MemFreeCallstack,
GpuZoneBegin,
GpuZoneBeginCallstack,
GpuZoneEnd,
GpuZoneBeginSerial,
GpuZoneBeginCallstackSerial,
GpuZoneEndSerial,
PlotData,
ContextSwitch,
ThreadWakeup,
GpuTime,
GpuResync,
Terminate,
KeepAlive,
ThreadContext,
Crash,
CrashReport,
ZoneValidation,
FrameMarkMsg,
FrameMarkMsgStart,
FrameMarkMsgEnd,
SourceLocation,
LockAnnounce,
LockTerminate,
LockMark,
MessageLiteral,
MessageLiteralColor,
MessageLiteralCallstack,
MessageLiteralColorCallstack,
GpuNewContext,
CallstackFrameSize,
CallstackFrame,
SysTimeReport,
TidToPid,
PlotConfig,
StringData,
ThreadName,
CustomStringData,
PlotName,
SourceLocationPayload,
CallstackPayload,
CallstackAllocPayload,
FrameName,
FrameImageData,
ExternalName,
ExternalThreadName,
NUM_TYPES
};
#pragma pack( 1 )
struct QueueThreadContext
{
uint64_t thread;
};
struct QueueZoneBegin
{
int64_t time;
uint64_t thread;
uint64_t srcloc; // ptr
uint32_t cpu;
};
struct QueueZoneEnd
{
int64_t time;
uint64_t thread;
uint32_t cpu;
};
struct QueueZoneValidation
{
uint32_t id;
};
struct QueueStringTransfer
@@ -64,6 +112,16 @@ struct QueueStringTransfer
struct QueueFrameMark
{
int64_t time;
uint64_t name; // ptr
};
struct QueueFrameImage
{
uint64_t image; // ptr
uint64_t frame;
uint16_t w;
uint16_t h;
uint8_t flip;
};
struct QueueSourceLocation
@@ -79,7 +137,6 @@ struct QueueSourceLocation
struct QueueZoneText
{
uint64_t thread;
uint64_t text; // ptr
};
@@ -92,36 +149,44 @@ enum class LockType : uint8_t
struct QueueLockAnnounce
{
uint32_t id;
int64_t time;
uint64_t lckloc; // ptr
LockType type;
};
struct QueueLockTerminate
{
uint32_t id;
int64_t time;
LockType type;
};
struct QueueLockWait
{
uint64_t thread;
uint32_t id;
int64_t time;
uint64_t thread;
LockType type;
};
struct QueueLockObtain
{
uint64_t thread;
uint32_t id;
int64_t time;
uint64_t thread;
};
struct QueueLockRelease
{
uint64_t thread;
uint32_t id;
int64_t time;
uint64_t thread;
};
struct QueueLockMark
{
uint32_t id;
uint64_t thread;
uint32_t id;
uint64_t srcloc; // ptr
};
@@ -148,16 +213,23 @@ struct QueuePlotData
struct QueueMessage
{
int64_t time;
uint64_t thread;
uint64_t text; // ptr
};
struct QueueMessageColor : public QueueMessage
{
uint8_t r;
uint8_t g;
uint8_t b;
};
struct QueueGpuNewContext
{
int64_t cpuTime;
int64_t gpuTime;
uint64_t thread;
uint16_t context;
float period;
uint8_t context;
uint8_t accuracyBits;
};
@@ -165,26 +237,115 @@ struct QueueGpuZoneBegin
{
int64_t cpuTime;
uint64_t srcloc;
uint16_t context;
uint64_t thread;
uint16_t queryId;
uint8_t context;
};
struct QueueGpuZoneEnd
{
int64_t cpuTime;
uint16_t context;
uint64_t thread;
uint16_t queryId;
uint8_t context;
};
struct QueueGpuTime
{
int64_t gpuTime;
uint16_t context;
uint16_t queryId;
uint8_t context;
};
struct QueueGpuResync
struct QueueMemAlloc
{
int64_t cpuTime;
int64_t gpuTime;
uint16_t context;
int64_t time;
uint64_t thread;
uint64_t ptr;
char size[6];
};
struct QueueMemFree
{
int64_t time;
uint64_t thread;
uint64_t ptr;
};
struct QueueCallstackMemory
{
uint64_t ptr;
};
struct QueueCallstack
{
uint64_t ptr;
};
struct QueueCallstackAlloc
{
uint64_t ptr;
uint64_t nativePtr;
};
struct QueueCallstackFrameSize
{
uint64_t ptr;
uint8_t size;
};
struct QueueCallstackFrame
{
uint64_t name;
uint64_t file;
uint32_t line;
};
struct QueueCrashReport
{
int64_t time;
uint64_t text; // ptr
};
struct QueueSysTime
{
int64_t time;
float sysTime;
};
struct QueueContextSwitch
{
int64_t time;
uint64_t oldThread;
uint64_t newThread;
uint8_t cpu;
uint8_t reason;
uint8_t state;
};
struct QueueThreadWakeup
{
int64_t time;
uint64_t thread;
};
struct QueueTidToPid
{
uint64_t tid;
uint64_t pid;
};
enum class PlotFormatType : uint8_t
{
Number,
Memory,
Percentage
};
struct QueuePlotConfig
{
uint64_t name; // ptr
uint8_t type;
};
struct QueueHeader
@@ -201,67 +362,127 @@ struct QueueItem
QueueHeader hdr;
union
{
QueueThreadContext threadCtx;
QueueZoneBegin zoneBegin;
QueueZoneEnd zoneEnd;
QueueZoneValidation zoneValidation;
QueueStringTransfer stringTransfer;
QueueFrameMark frameMark;
QueueFrameImage frameImage;
QueueSourceLocation srcloc;
QueueZoneText zoneText;
QueueLockAnnounce lockAnnounce;
QueueLockTerminate lockTerminate;
QueueLockWait lockWait;
QueueLockObtain lockObtain;
QueueLockRelease lockRelease;
QueueLockMark lockMark;
QueuePlotData plotData;
QueueMessage message;
QueueMessageColor messageColor;
QueueGpuNewContext gpuNewContext;
QueueGpuZoneBegin gpuZoneBegin;
QueueGpuZoneEnd gpuZoneEnd;
QueueGpuTime gpuTime;
QueueGpuResync gpuResync;
QueueMemAlloc memAlloc;
QueueMemFree memFree;
QueueCallstackMemory callstackMemory;
QueueCallstack callstack;
QueueCallstackAlloc callstackAlloc;
QueueCallstackFrameSize callstackFrameSize;
QueueCallstackFrame callstackFrame;
QueueCrashReport crashReport;
QueueSysTime sysTime;
QueueContextSwitch contextSwitch;
QueueThreadWakeup threadWakeup;
QueueTidToPid tidToPid;
QueuePlotConfig plotConfig;
};
};
#pragma pack()
enum { QueueItemSize = sizeof( QueueItem ) };
static const size_t QueueDataSize[] = {
sizeof( QueueHeader ) + sizeof( QueueZoneText ),
sizeof( QueueHeader ) + sizeof( QueueZoneText ), // zone name
sizeof( QueueHeader ) + sizeof( QueueMessage ),
sizeof( QueueHeader ) + sizeof( QueueMessageColor ),
sizeof( QueueHeader ) + sizeof( QueueMessage ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMessageColor ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMessage ), // app info
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // allocated source location
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // allocated source location, callstack
sizeof( QueueHeader ) + sizeof( QueueCallstackMemory ),
sizeof( QueueHeader ) + sizeof( QueueCallstack ),
sizeof( QueueHeader ) + sizeof( QueueCallstackAlloc ),
sizeof( QueueHeader ) + sizeof( QueueFrameImage ),
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // callstack
sizeof( QueueHeader ) + sizeof( QueueZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ),
sizeof( QueueHeader ) + sizeof( QueueLockObtain ),
sizeof( QueueHeader ) + sizeof( QueueLockRelease ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ), // shared
sizeof( QueueHeader ) + sizeof( QueueLockObtain ), // shared
sizeof( QueueHeader ) + sizeof( QueueLockRelease ), // shared
sizeof( QueueHeader ) + sizeof( QueueMemAlloc ),
sizeof( QueueHeader ) + sizeof( QueueMemFree ),
sizeof( QueueHeader ) + sizeof( QueueMemAlloc ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMemFree ), // callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // serial
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // serial, callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneEnd ), // serial
sizeof( QueueHeader ) + sizeof( QueuePlotData ),
sizeof( QueueHeader ) + sizeof( QueueContextSwitch ),
sizeof( QueueHeader ) + sizeof( QueueThreadWakeup ),
sizeof( QueueHeader ) + sizeof( QueueGpuTime ),
// above items must be first
sizeof( QueueHeader ), // terminate
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueFrameMark ),
sizeof( QueueHeader ), // keep alive
sizeof( QueueHeader ) + sizeof( QueueThreadContext ),
sizeof( QueueHeader ), // crash
sizeof( QueueHeader ) + sizeof( QueueCrashReport ),
sizeof( QueueHeader ) + sizeof( QueueZoneValidation ),
sizeof( QueueHeader ) + sizeof( QueueFrameMark ), // continuous frames
sizeof( QueueHeader ) + sizeof( QueueFrameMark ), // start
sizeof( QueueHeader ) + sizeof( QueueFrameMark ), // end
sizeof( QueueHeader ) + sizeof( QueueSourceLocation ),
sizeof( QueueHeader ) + sizeof( QueueLockAnnounce ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ),
sizeof( QueueHeader ) + sizeof( QueueLockObtain ),
sizeof( QueueHeader ) + sizeof( QueueLockRelease ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ),
sizeof( QueueHeader ) + sizeof( QueueLockObtain ),
sizeof( QueueHeader ) + sizeof( QueueLockRelease ),
sizeof( QueueHeader ) + sizeof( QueueLockTerminate ),
sizeof( QueueHeader ) + sizeof( QueueLockMark ),
sizeof( QueueHeader ) + sizeof( QueuePlotData ),
sizeof( QueueHeader ) + sizeof( QueueMessage ), // literal
sizeof( QueueHeader ) + sizeof( QueueMessageColor ), // literal
sizeof( QueueHeader ) + sizeof( QueueMessage ), // literal, callstack
sizeof( QueueHeader ) + sizeof( QueueMessageColor ), // literal, callstack
sizeof( QueueHeader ) + sizeof( QueueGpuNewContext ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueGpuTime ),
sizeof( QueueHeader ) + sizeof( QueueGpuResync ),
sizeof( QueueHeader ) + sizeof( QueueCallstackFrameSize ),
sizeof( QueueHeader ) + sizeof( QueueCallstackFrame ),
sizeof( QueueHeader ) + sizeof( QueueSysTime ),
sizeof( QueueHeader ) + sizeof( QueueTidToPid ),
sizeof( QueueHeader ) + sizeof( QueuePlotConfig ),
// keep all QueueStringTransfer below
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // string data
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // thread name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // custom string data
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // plot name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // allocated source location payload
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // callstack payload
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // callstack alloc payload
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // frame name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // frame image data
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // external name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // external thread name
};
static_assert( QueueItemSize == 32, "Queue item size not 32 bytes" );
static_assert( sizeof( QueueDataSize ) / sizeof( size_t ) == (uint8_t)QueueType::NUM_TYPES, "QueueDataSize mismatch" );
static_assert( sizeof( void* ) <= sizeof( uint64_t ), "Pointer size > 8 bytes" );
static_assert( sizeof( void* ) == sizeof( uintptr_t ), "Pointer size != uintptr_t" );
};

View File

@@ -8,13 +8,23 @@
#include "TracyAlloc.hpp"
#include "TracySocket.hpp"
#ifdef _MSC_VER
#ifdef _WIN32
# ifndef NOMINMAX
# define NOMINMAX
# endif
# include <winsock2.h>
# include <ws2tcpip.h>
# ifdef _MSC_VER
# pragma warning(disable:4244)
# pragma warning(disable:4267)
# endif
# define poll WSAPoll
#else
# include <arpa/inet.h>
# include <sys/socket.h>
# include <netdb.h>
# include <unistd.h>
# include <poll.h>
#endif
#ifndef MSG_NOSIGNAL
@@ -24,7 +34,13 @@
namespace tracy
{
#ifdef _MSC_VER
#ifdef _WIN32
typedef SOCKET socket_t;
#else
typedef int socket_t;
#endif
#ifdef _WIN32
struct __wsinit
{
__wsinit()
@@ -38,35 +54,41 @@ struct __wsinit
}
};
static __wsinit InitWinSock()
void InitWinSock()
{
static __wsinit init;
return init;
}
#endif
Socket::Socket()
: m_sock( -1 )
: m_buf( (char*)tracy_malloc( BufSize ) )
, m_bufPtr( nullptr )
, m_sock( -1 )
, m_bufLeft( 0 )
{
#ifdef _MSC_VER
#ifdef _WIN32
InitWinSock();
#endif
}
Socket::Socket( int sock )
: m_sock( sock )
: m_buf( (char*)tracy_malloc( BufSize ) )
, m_bufPtr( nullptr )
, m_sock( sock )
, m_bufLeft( 0 )
{
}
Socket::~Socket()
{
tracy_free( m_buf );
if( m_sock != -1 )
{
Close();
}
}
bool Socket::Connect( const char* addr, const char* port )
bool Socket::Connect( const char* addr, int port )
{
assert( m_sock == -1 );
@@ -77,18 +99,21 @@ bool Socket::Connect( const char* addr, const char* port )
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
if( getaddrinfo( addr, port, &hints, &res ) != 0 ) return false;
int sock;
char portbuf[32];
sprintf( portbuf, "%i", port );
if( getaddrinfo( addr, portbuf, &hints, &res ) != 0 ) return false;
int sock = 0;
for( ptr = res; ptr; ptr = ptr->ai_next )
{
if( ( sock = socket( ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol ) ) == -1 ) continue;
#if defined __APPLE__
int val = 1;
setsockopt( m_sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
if( connect( sock, ptr->ai_addr, ptr->ai_addrlen ) == -1 )
{
#ifdef _MSC_VER
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
@@ -107,7 +132,7 @@ bool Socket::Connect( const char* addr, const char* port )
void Socket::Close()
{
assert( m_sock != -1 );
#ifdef _MSC_VER
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
@@ -130,21 +155,58 @@ int Socket::Send( const void* _buf, int len )
return int( buf - start );
}
int Socket::Recv( void* _buf, int len, const timeval* tv )
int Socket::GetSendBufSize()
{
int bufSize;
#if defined _WIN32 || defined __CYGWIN__
int sz = sizeof( bufSize );
getsockopt( m_sock, SOL_SOCKET, SO_SNDBUF, (char*)&bufSize, &sz );
#else
socklen_t sz = sizeof( bufSize );
getsockopt( m_sock, SOL_SOCKET, SO_SNDBUF, &bufSize, &sz );
#endif
return bufSize;
}
int Socket::RecvBuffered( void* buf, int len, int timeout )
{
if( len <= m_bufLeft )
{
memcpy( buf, m_bufPtr, len );
m_bufPtr += len;
m_bufLeft -= len;
return len;
}
if( m_bufLeft > 0 )
{
memcpy( buf, m_bufPtr, m_bufLeft );
const auto ret = m_bufLeft;
m_bufLeft = 0;
return ret;
}
if( len >= BufSize ) return Recv( buf, len, timeout );
m_bufLeft = Recv( m_buf, BufSize, timeout );
if( m_bufLeft <= 0 ) return m_bufLeft;
const auto sz = len < m_bufLeft ? len : m_bufLeft;
memcpy( buf, m_buf, sz );
m_bufPtr = m_buf + sz;
m_bufLeft -= sz;
return sz;
}
int Socket::Recv( void* _buf, int len, int timeout )
{
auto buf = (char*)_buf;
fd_set fds;
FD_ZERO( &fds );
FD_SET( m_sock, &fds );
struct pollfd fd;
fd.fd = (socket_t)m_sock;
fd.events = POLLIN;
#ifndef _WIN32
timeval _tv = *tv;
select( m_sock+1, &fds, nullptr, nullptr, &_tv );
#else
select( m_sock+1, &fds, nullptr, nullptr, tv );
#endif
if( FD_ISSET( m_sock, &fds ) )
if( poll( &fd, 1, timeout ) > 0 )
{
return recv( m_sock, buf, len, 0 );
}
@@ -154,14 +216,14 @@ int Socket::Recv( void* _buf, int len, const timeval* tv )
}
}
bool Socket::Read( void* _buf, int len, const timeval* tv, bool(*exitCb)() )
bool Socket::Read( void* _buf, int len, int timeout, std::function<bool()> exitCb )
{
auto buf = (char*)_buf;
while( len > 0 )
{
if( exitCb() ) return false;
const auto sz = Recv( buf, len, tv );
const auto sz = RecvBuffered( buf, len, timeout );
switch( sz )
{
case 0:
@@ -184,33 +246,45 @@ bool Socket::Read( void* _buf, int len, const timeval* tv, bool(*exitCb)() )
return true;
}
bool Socket::ReadRaw( void* _buf, int len, int timeout )
{
auto buf = (char*)_buf;
while( len > 0 )
{
const auto sz = Recv( buf, len, timeout );
if( sz <= 0 ) return false;
len -= sz;
buf += sz;
}
return true;
}
bool Socket::HasData()
{
struct timeval tv;
memset( &tv, 0, sizeof( tv ) );
if( m_bufLeft > 0 ) return true;
fd_set fds;
FD_ZERO( &fds );
FD_SET( m_sock, &fds );
struct pollfd fd;
fd.fd = (socket_t)m_sock;
fd.events = POLLIN;
select( m_sock+1, &fds, nullptr, nullptr, &tv );
return FD_ISSET( m_sock, &fds );
return poll( &fd, 1, 0 ) > 0;
}
ListenSocket::ListenSocket()
: m_sock( -1 )
{
#ifdef _MSC_VER
#ifdef _WIN32
InitWinSock();
#endif
}
ListenSocket::~ListenSocket()
{
if( m_sock != -1 ) Close();
}
bool ListenSocket::Listen( const char* port, int backlog )
bool ListenSocket::Listen( int port, int backlog )
{
assert( m_sock == -1 );
@@ -222,15 +296,22 @@ bool ListenSocket::Listen( const char* port, int backlog )
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE;
if( getaddrinfo( nullptr, port, &hints, &res ) != 0 ) return false;
char portbuf[32];
sprintf( portbuf, "%i", port );
if( getaddrinfo( nullptr, portbuf, &hints, &res ) != 0 ) return false;
m_sock = socket( res->ai_family, res->ai_socktype, res->ai_protocol );
#if defined _MSC_VER || defined __CYGWIN__
#if defined _WIN32 || defined __CYGWIN__
unsigned long val = 0;
setsockopt( m_sock, IPPROTO_IPV6, IPV6_V6ONLY, (const char*)&val, sizeof( val ) );
#else
int val = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, &val, sizeof( val ) );
#endif
if( bind( m_sock, res->ai_addr, res->ai_addrlen ) == -1 ) return false;
if( listen( m_sock, backlog ) == -1 ) return false;
if( bind( m_sock, res->ai_addr, res->ai_addrlen ) == -1 ) { freeaddrinfo( res ); return false; }
if( listen( m_sock, backlog ) == -1 ) { freeaddrinfo( res ); return false; }
freeaddrinfo( res );
return true;
}
@@ -239,32 +320,23 @@ Socket* ListenSocket::Accept()
struct sockaddr_storage remote;
socklen_t sz = sizeof( remote );
struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 10000;
struct pollfd fd;
fd.fd = (socket_t)m_sock;
fd.events = POLLIN;
fd_set fds;
FD_ZERO( &fds );
FD_SET( m_sock, &fds );
select( m_sock+1, &fds, nullptr, nullptr, &tv );
if( FD_ISSET( m_sock, &fds ) )
if( poll( &fd, 1, 10 ) > 0 )
{
int sock = accept( m_sock, (sockaddr*)&remote, &sz);
if( sock == -1 ) return nullptr;
#if defined __APPLE__
int val = 1;
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
if( sock == -1 )
{
return nullptr;
}
else
{
auto ptr = (Socket*)tracy_malloc( sizeof( Socket ) );
new(ptr) Socket( sock );
return ptr;
}
auto ptr = (Socket*)tracy_malloc( sizeof( Socket ) );
new(ptr) Socket( sock );
return ptr;
}
else
{
@@ -275,7 +347,7 @@ Socket* ListenSocket::Accept()
void ListenSocket::Close()
{
assert( m_sock != -1 );
#ifdef _MSC_VER
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
@@ -283,4 +355,200 @@ void ListenSocket::Close()
m_sock = -1;
}
UdpBroadcast::UdpBroadcast()
: m_sock( -1 )
{
#ifdef _WIN32
InitWinSock();
#endif
}
UdpBroadcast::~UdpBroadcast()
{
if( m_sock != -1 ) Close();
}
bool UdpBroadcast::Open( const char* addr, int port )
{
assert( m_sock == -1 );
struct addrinfo hints;
struct addrinfo *res, *ptr;
memset( &hints, 0, sizeof( hints ) );
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_DGRAM;
char portbuf[32];
sprintf( portbuf, "%i", port );
if( getaddrinfo( addr, portbuf, &hints, &res ) != 0 ) return false;
int sock = 0;
for( ptr = res; ptr; ptr = ptr->ai_next )
{
if( ( sock = socket( ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol ) ) == -1 ) continue;
#if defined __APPLE__
int val = 1;
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
#if defined _WIN32 || defined __CYGWIN__
unsigned long broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, (const char*)&broadcast, sizeof( broadcast ) ) == -1 )
#else
int broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof( broadcast ) ) == -1 )
#endif
{
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
#endif
continue;
}
break;
}
freeaddrinfo( res );
if( !ptr ) return false;
m_sock = sock;
return true;
}
void UdpBroadcast::Close()
{
assert( m_sock != -1 );
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
#endif
m_sock = -1;
}
int UdpBroadcast::Send( int port, const void* data, int len )
{
assert( m_sock != -1 );
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons( port );
addr.sin_addr.s_addr = INADDR_BROADCAST;
return sendto( m_sock, (const char*)data, len, MSG_NOSIGNAL, (sockaddr*)&addr, sizeof( addr ) );
}
IpAddress::IpAddress()
: m_number( 0 )
{
*m_text = '\0';
}
IpAddress::~IpAddress()
{
}
void IpAddress::Set( const struct sockaddr& addr )
{
#if __MINGW32__
auto ai = (struct sockaddr_in*)&addr;
#else
auto ai = (const struct sockaddr_in*)&addr;
#endif
inet_ntop( AF_INET, &ai->sin_addr, m_text, 17 );
m_number = ai->sin_addr.s_addr;
}
UdpListen::UdpListen()
: m_sock( -1 )
{
#ifdef _WIN32
InitWinSock();
#endif
}
UdpListen::~UdpListen()
{
if( m_sock != -1 ) Close();
}
bool UdpListen::Listen( int port )
{
assert( m_sock == -1 );
int sock;
if( ( sock = socket( AF_INET, SOCK_DGRAM, 0 ) ) == -1 ) return false;
#if defined __APPLE__
int val = 1;
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
#if defined _WIN32 || defined __CYGWIN__
unsigned long reuse = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, (const char*)&reuse, sizeof( reuse ) );
#else
int reuse = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof( reuse ) );
#endif
#if defined _WIN32 || defined __CYGWIN__
unsigned long broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, (const char*)&broadcast, sizeof( broadcast ) ) == -1 )
#else
int broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof( broadcast ) ) == -1 )
#endif
{
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
#endif
return false;
}
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons( port );
addr.sin_addr.s_addr = INADDR_ANY;
if( bind( sock, (sockaddr*)&addr, sizeof( addr ) ) == -1 )
{
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
#endif
return false;
}
m_sock = sock;
return true;
}
void UdpListen::Close()
{
assert( m_sock != -1 );
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
#endif
m_sock = -1;
}
const char* UdpListen::Read( size_t& len, IpAddress& addr )
{
static char buf[2048];
struct pollfd fd;
fd.fd = (socket_t)m_sock;
fd.events = POLLIN;
if( poll( &fd, 1, 10 ) <= 0 ) return nullptr;
sockaddr sa;
socklen_t salen = sizeof( struct sockaddr );
len = (size_t)recvfrom( m_sock, buf, 2048, 0, &sa, &salen );
addr.Set( sa );
return buf;
}
}

View File

@@ -1,25 +1,34 @@
#ifndef __TRACYSOCKET_HPP__
#define __TRACYSOCKET_HPP__
struct timeval;
#include <functional>
struct sockaddr;
namespace tracy
{
#ifdef _WIN32
void InitWinSock();
#endif
class Socket
{
enum { BufSize = 128 * 1024 };
public:
Socket();
Socket( int sock );
~Socket();
bool Connect( const char* addr, const char* port );
bool Connect( const char* addr, int port );
void Close();
int Send( const void* buf, int len );
int Recv( void* buf, int len, const timeval* tv );
int GetSendBufSize();
bool Read( void* buf, int len, const timeval* tv, bool(*exitCb)() );
bool Read( void* buf, int len, int timeout, std::function<bool()> exitCb );
bool ReadRaw( void* buf, int len, int timeout );
bool HasData();
Socket( const Socket& ) = delete;
@@ -28,7 +37,13 @@ public:
Socket& operator=( Socket&& ) = delete;
private:
int RecvBuffered( void* buf, int len, int timeout );
int Recv( void* buf, int len, int timeout );
char* m_buf;
char* m_bufPtr;
int m_sock;
int m_bufLeft;
};
class ListenSocket
@@ -37,7 +52,7 @@ public:
ListenSocket();
~ListenSocket();
bool Listen( const char* port, int backlog );
bool Listen( int port, int backlog );
Socket* Accept();
void Close();
@@ -50,6 +65,67 @@ private:
int m_sock;
};
class UdpBroadcast
{
public:
UdpBroadcast();
~UdpBroadcast();
bool Open( const char* addr, int port );
void Close();
int Send( int port, const void* data, int len );
UdpBroadcast( const UdpBroadcast& ) = delete;
UdpBroadcast( UdpBroadcast&& ) = delete;
UdpBroadcast& operator=( const UdpBroadcast& ) = delete;
UdpBroadcast& operator=( UdpBroadcast&& ) = delete;
private:
int m_sock;
};
class IpAddress
{
public:
IpAddress();
~IpAddress();
void Set( const struct sockaddr& addr );
uint32_t GetNumber() const { return m_number; }
const char* GetText() const { return m_text; }
IpAddress( const IpAddress& ) = delete;
IpAddress( IpAddress&& ) = delete;
IpAddress& operator=( const IpAddress& ) = delete;
IpAddress& operator=( IpAddress&& ) = delete;
private:
uint32_t m_number;
char m_text[17];
};
class UdpListen
{
public:
UdpListen();
~UdpListen();
bool Listen( int port );
void Close();
const char* Read( size_t& len, IpAddress& addr );
UdpListen( const UdpListen& ) = delete;
UdpListen( UdpListen&& ) = delete;
UdpListen& operator=( const UdpListen& ) = delete;
UdpListen& operator=( UdpListen&& ) = delete;
private:
int m_sock;
};
}
#endif

View File

@@ -1,4 +1,12 @@
#ifdef _WIN32
#if defined _MSC_VER || defined __CYGWIN__ || defined _WIN32
# ifndef WIN32_LEAN_AND_MEAN
# define WIN32_LEAN_AND_MEAN
# endif
# ifndef NOMINMAX
# define NOMINMAX
# endif
#endif
#if defined _WIN32 || defined __CYGWIN__
# include <windows.h>
#else
# include <pthread.h>
@@ -6,12 +14,23 @@
# include <unistd.h>
#endif
#ifdef __linux__
# ifndef __ANDROID__
# include <syscall.h>
# endif
# include <fcntl.h>
#endif
#ifdef __MINGW32__
# define __STDC_FORMAT_MACROS
#endif
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include "TracySystem.hpp"
#ifdef TRACY_COLLECT_THREAD_NAMES
#ifdef TRACY_ENABLE
# include <atomic>
# include "TracyAlloc.hpp"
#endif
@@ -19,29 +38,25 @@
namespace tracy
{
#ifdef TRACY_COLLECT_THREAD_NAMES
#ifdef TRACY_ENABLE
struct ThreadNameData
{
uint64_t id;
const char* name;
ThreadNameData* next;
};
extern std::atomic<ThreadNameData*> s_threadNameData;
TRACY_API std::atomic<ThreadNameData*>& GetThreadNameData();
TRACY_API void InitRPMallocThread();
#endif
void SetThreadName( std::thread& thread, const char* name )
void SetThreadName( const char* name )
{
SetThreadName( thread.native_handle(), name );
}
void SetThreadName( std::thread::native_handle_type handle, const char* name )
{
#ifdef _WIN32
# ifdef NTDDI_WIN10_RS2
#if defined _WIN32 || defined __CYGWIN__
# if defined NTDDI_WIN10_RS2 && NTDDI_VERSION >= NTDDI_WIN10_RS2
wchar_t buf[256];
mbstowcs( buf, name, 256 );
SetThreadDescription( static_cast<HANDLE>( handle ), buf );
# else
SetThreadDescription( GetCurrentThread(), buf );
# elif defined _MSC_VER
const DWORD MS_VC_EXCEPTION=0x406D1388;
# pragma pack( push, 8 )
struct THREADNAME_INFO
@@ -53,7 +68,7 @@ void SetThreadName( std::thread::native_handle_type handle, const char* name )
};
# pragma pack(pop)
DWORD ThreadId = GetThreadId( static_cast<HANDLE>( handle ) );
DWORD ThreadId = GetCurrentThreadId();
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = name;
@@ -68,37 +83,34 @@ void SetThreadName( std::thread::native_handle_type handle, const char* name )
{
}
# endif
#elif defined _GNU_SOURCE && !defined __EMSCRIPTEN__
const auto sz = strlen( name );
if( sz <= 15 )
#elif defined _GNU_SOURCE && !defined __EMSCRIPTEN__ && !defined __CYGWIN__
{
pthread_setname_np( handle, name );
}
else
{
char buf[16];
memcpy( buf, name, 15 );
buf[15] = '\0';
pthread_setname_np( handle, buf );
const auto sz = strlen( name );
if( sz <= 15 )
{
pthread_setname_np( pthread_self(), name );
}
else
{
char buf[16];
memcpy( buf, name, 15 );
buf[15] = '\0';
pthread_setname_np( pthread_self(), buf );
}
}
#endif
#ifdef TRACY_COLLECT_THREAD_NAMES
#ifdef TRACY_ENABLE
{
InitRPMallocThread();
const auto sz = strlen( name );
char* buf = (char*)tracy_malloc( sz+1 );
memcpy( buf, name, sz );
buf[sz+1] = '\0';
auto data = (ThreadNameData*)tracy_malloc( sizeof( ThreadNameData ) );
# ifdef _WIN32
data->id = GetThreadId( static_cast<HANDLE>( handle ) );
# elif defined __APPLE__
pthread_threadid_np( handle, &data->id );
# else
data->id = (uint64_t)handle;
# endif
data->id = detail::GetThreadHandleImpl();
data->name = buf;
data->next = s_threadNameData.load( std::memory_order_relaxed );
while( !s_threadNameData.compare_exchange_weak( data->next, data, std::memory_order_release, std::memory_order_relaxed ) ) {}
data->next = GetThreadNameData().load( std::memory_order_relaxed );
while( !GetThreadNameData().compare_exchange_weak( data->next, data, std::memory_order_release, std::memory_order_relaxed ) ) {}
}
#endif
}
@@ -106,20 +118,19 @@ void SetThreadName( std::thread::native_handle_type handle, const char* name )
const char* GetThreadName( uint64_t id )
{
static char buf[256];
#ifdef TRACY_COLLECT_THREAD_NAMES
auto ptr = s_threadNameData.load( std::memory_order_relaxed );
#ifdef TRACY_ENABLE
auto ptr = GetThreadNameData().load( std::memory_order_relaxed );
while( ptr )
{
if( ptr->id == id )
{
strcpy( buf, ptr->name );
return buf;
return ptr->name;
}
ptr = ptr->next;
}
#else
# ifdef _WIN32
# ifdef NTDDI_WIN10_RS2
# if defined _WIN32 || defined __CYGWIN__
# if defined NTDDI_WIN10_RS2 && NTDDI_VERSION >= NTDDI_WIN10_RS2
auto hnd = OpenThread( THREAD_QUERY_LIMITED_INFORMATION, FALSE, (DWORD)id );
if( hnd != 0 )
{
@@ -133,11 +144,40 @@ const char* GetThreadName( uint64_t id )
}
}
# endif
# elif defined _GNU_SOURCE && !defined __ANDROID__ && !defined __EMSCRIPTEN__
# elif defined __GLIBC__ && !defined __ANDROID__ && !defined __EMSCRIPTEN__ && !defined __CYGWIN__
if( pthread_getname_np( (pthread_t)id, buf, 256 ) == 0 )
{
return buf;
}
# elif defined __linux__
int cs, fd;
char path[32];
# ifdef __ANDROID__
int tid = gettid();
# else
int tid = (int) syscall( SYS_gettid );
# endif
snprintf( path, sizeof( path ), "/proc/self/task/%d/comm", tid );
sprintf( buf, "%" PRIu64, id );
# ifndef __ANDROID__
pthread_setcancelstate( PTHREAD_CANCEL_DISABLE, &cs );
# endif
if ( ( fd = open( path, O_RDONLY ) ) > 0) {
int len = read( fd, buf, 255 );
if( len > 0 )
{
buf[len] = 0;
if( len > 1 && buf[len-1] == '\n' )
{
buf[len-1] = 0;
}
}
close( fd );
}
# ifndef __ANDROID__
pthread_setcancelstate( cs, 0 );
# endif
return buf;
# endif
#endif
sprintf( buf, "%" PRIu64, id );

View File

@@ -1,41 +1,62 @@
#ifndef __TRACYSYSTEM_HPP__
#define __TRACYSYSTEM_HPP__
#ifdef TRACY_ENABLE
# if defined __ANDROID__ || defined __CYGWIN__ || defined __APPLE__
# define TRACY_COLLECT_THREAD_NAMES
# endif
#endif
#ifdef _WIN32
#if defined _WIN32 || defined __CYGWIN__
# ifndef _WINDOWS_
extern "C" __declspec(dllimport) unsigned long __stdcall GetCurrentThreadId(void);
#else
# endif
#elif defined __APPLE__ || ( !defined __ANDROID__ && !defined __linux__ )
# include <pthread.h>
#endif
#ifdef __linux__
# include <unistd.h>
# ifdef __ANDROID__
# include <sys/types.h>
# else
# include <sys/syscall.h>
# endif
#endif
#include <stdint.h>
#include <thread>
#include "TracyApi.h"
namespace tracy
{
static inline uint64_t GetThreadHandle()
namespace detail
{
#ifdef _WIN32
static inline uint64_t GetThreadHandleImpl()
{
#if defined _WIN32 || defined __CYGWIN__
static_assert( sizeof( decltype( GetCurrentThreadId() ) ) <= sizeof( uint64_t ), "Thread handle too big to fit in protocol" );
return uint64_t( GetCurrentThreadId() );
#elif defined __APPLE__
uint64_t id;
pthread_threadid_np( pthread_self(), &id );
return id;
#elif defined __ANDROID__
return (uint64_t)gettid();
#elif defined __linux__
return (uint64_t)syscall( SYS_gettid );
#else
static_assert( sizeof( decltype( pthread_self() ) ) <= sizeof( uint64_t ), "Thread handle too big to fit in protocol" );
return uint64_t( pthread_self() );
#endif
}
}
void SetThreadName( std::thread& thread, const char* name );
void SetThreadName( std::thread::native_handle_type handle, const char* name );
#ifdef TRACY_ENABLE
TRACY_API uint64_t GetThreadHandle();
#else
static inline uint64_t GetThreadHandle()
{
return detail::GetThreadHandleImpl();
}
#endif
void SetThreadName( const char* name );
const char* GetThreadName( uint64_t id );
}

View File

@@ -44,7 +44,7 @@ public:
}
}
bool tryLock()
bool try_lock()
{
if (m_contentionCount.load(std::memory_order_relaxed) != 0)
return false;

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,7 @@
/*
* LZ4 - Fast LZ compression algorithm
* Header File
* Copyright (C) 2011-2017, Yann Collet.
* Copyright (C) 2011-present, Yann Collet.
BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
@@ -47,24 +47,28 @@ namespace tracy
/**
Introduction
LZ4 is lossless compression algorithm, providing compression speed at 400 MB/s per core,
LZ4 is lossless compression algorithm, providing compression speed at 500 MB/s per core,
scalable with multi-cores CPU. It features an extremely fast decoder, with speed in
multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.
The LZ4 compression library provides in-memory compression and decompression functions.
It gives full buffer control to user.
Compression can be done in:
- a single step (described as Simple Functions)
- a single step, reusing a context (described in Advanced Functions)
- unbounded multiple steps (described as Streaming compression)
lz4.h provides block compression functions. It gives full buffer control to user.
Decompressing an lz4-compressed block also requires metadata (such as compressed size).
Each application is free to encode such metadata in whichever way it wants.
lz4.h generates and decodes LZ4-compressed blocks (doc/lz4_Block_format.md).
Decompressing a block requires additional metadata, such as its compressed size.
Each application is free to encode and pass such metadata in whichever way it wants.
An additional format, called LZ4 frame specification (doc/lz4_Frame_format.md),
take care of encoding standard metadata alongside LZ4-compressed blocks.
If your application requires interoperability, it's recommended to use it.
A library is provided to take care of it, see lz4frame.h.
lz4.h only handle blocks, it can not generate Frames.
Blocks are different from Frames (doc/lz4_Frame_format.md).
Frames bundle both blocks and metadata in a specified manner.
This are required for compressed data to be self-contained and portable.
Frame format is delivered through a companion API, declared in lz4frame.h.
Note that the `lz4` CLI can only manage frames.
*/
/*^***************************************************************
@@ -73,24 +77,28 @@ namespace tracy
/*
* LZ4_DLL_EXPORT :
* Enable exporting of functions when building a Windows DLL
* LZ4LIB_API :
* LZ4LIB_VISIBILITY :
* Control library symbols visibility.
*/
#if defined(LZ4_DLL_EXPORT) && (LZ4_DLL_EXPORT==1)
# define LZ4LIB_API __declspec(dllexport)
#elif defined(LZ4_DLL_IMPORT) && (LZ4_DLL_IMPORT==1)
# define LZ4LIB_API __declspec(dllimport) /* It isn't required but allows to generate better code, saving a function pointer load from the IAT and an indirect jump.*/
#elif defined(__GNUC__) && (__GNUC__ >= 4)
# define LZ4LIB_API __attribute__ ((__visibility__ ("default")))
#else
# define LZ4LIB_API
#ifndef LZ4LIB_VISIBILITY
# if defined(__GNUC__) && (__GNUC__ >= 4)
# define LZ4LIB_VISIBILITY __attribute__ ((visibility ("default")))
# else
# define LZ4LIB_VISIBILITY
# endif
#endif
#if defined(LZ4_DLL_EXPORT) && (LZ4_DLL_EXPORT==1)
# define LZ4LIB_API __declspec(dllexport) LZ4LIB_VISIBILITY
#elif defined(LZ4_DLL_IMPORT) && (LZ4_DLL_IMPORT==1)
# define LZ4LIB_API __declspec(dllimport) LZ4LIB_VISIBILITY /* It isn't required but allows to generate better code, saving a function pointer load from the IAT and an indirect jump.*/
#else
# define LZ4LIB_API LZ4LIB_VISIBILITY
#endif
/*------ Version ------*/
#define LZ4_VERSION_MAJOR 1 /* for breaking interface changes */
#define LZ4_VERSION_MINOR 8 /* for new (non-breaking) interface capabilities */
#define LZ4_VERSION_RELEASE 0 /* for tweaks, bug-fixes, or development */
#define LZ4_VERSION_MINOR 9 /* for new (non-breaking) interface capabilities */
#define LZ4_VERSION_RELEASE 1 /* for tweaks, bug-fixes, or development */
#define LZ4_VERSION_NUMBER (LZ4_VERSION_MAJOR *100*100 + LZ4_VERSION_MINOR *100 + LZ4_VERSION_RELEASE)
@@ -99,8 +107,8 @@ namespace tracy
#define LZ4_EXPAND_AND_QUOTE(str) LZ4_QUOTE(str)
#define LZ4_VERSION_STRING LZ4_EXPAND_AND_QUOTE(LZ4_LIB_VERSION)
LZ4LIB_API int LZ4_versionNumber (void); /**< library version number; to be used when checking dll version */
LZ4LIB_API const char* LZ4_versionString (void); /**< library version string; to be used when checking dll version */
LZ4LIB_API int LZ4_versionNumber (void); /**< library version number; useful to check dll version */
LZ4LIB_API const char* LZ4_versionString (void); /**< library version string; useful to check dll version */
/*-************************************
@@ -109,42 +117,43 @@ LZ4LIB_API const char* LZ4_versionString (void); /**< library version string;
/*!
* LZ4_MEMORY_USAGE :
* Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
* Increasing memory usage improves compression ratio
* Reduced memory usage can improve speed, due to cache effect
* Increasing memory usage improves compression ratio.
* Reduced memory usage may improve speed, thanks to better cache locality.
* Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache
*/
#ifndef LZ4_MEMORY_USAGE
# define LZ4_MEMORY_USAGE 12
#endif
/*-************************************
* Simple Functions
**************************************/
/*! LZ4_compress_default() :
Compresses 'sourceSize' bytes from buffer 'source'
into already allocated 'dest' buffer of size 'maxDestSize'.
Compression is guaranteed to succeed if 'maxDestSize' >= LZ4_compressBound(sourceSize).
Compresses 'srcSize' bytes from buffer 'src'
into already allocated 'dst' buffer of size 'dstCapacity'.
Compression is guaranteed to succeed if 'dstCapacity' >= LZ4_compressBound(srcSize).
It also runs faster, so it's a recommended setting.
If the function cannot compress 'source' into a more limited 'dest' budget,
If the function cannot compress 'src' into a more limited 'dst' budget,
compression stops *immediately*, and the function result is zero.
As a consequence, 'dest' content is not valid.
This function never writes outside 'dest' buffer, nor read outside 'source' buffer.
sourceSize : Max supported value is LZ4_MAX_INPUT_VALUE
maxDestSize : full or partial size of buffer 'dest' (which must be already allocated)
return : the number of bytes written into buffer 'dest' (necessarily <= maxOutputSize)
or 0 if compression fails */
LZ4LIB_API int LZ4_compress_default(const char* source, char* dest, int sourceSize, int maxDestSize);
In which case, 'dst' content is undefined (invalid).
srcSize : max supported value is LZ4_MAX_INPUT_SIZE.
dstCapacity : size of buffer 'dst' (which must be already allocated)
@return : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
or 0 if compression fails
Note : This function is protected against buffer overflow scenarios (never writes outside 'dst' buffer, nor read outside 'source' buffer).
*/
LZ4LIB_API int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);
/*! LZ4_decompress_safe() :
compressedSize : is the precise full size of the compressed block.
maxDecompressedSize : is the size of destination buffer, which must be already allocated.
return : the number of bytes decompressed into destination buffer (necessarily <= maxDecompressedSize)
If destination buffer is not large enough, decoding will stop and output an error code (<0).
compressedSize : is the exact complete size of the compressed block.
dstCapacity : is the size of destination buffer, which must be already allocated.
@return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
If destination buffer is not large enough, decoding will stop and output an error code (negative value).
If the source stream is detected malformed, the function will stop decoding and return a negative result.
This function is protected against buffer overflow exploits, including malicious data packets.
It never writes outside output buffer, nor reads outside input buffer.
Note : This function is protected against malicious data packets (never writes outside 'dst' buffer, nor read outside 'source' buffer).
*/
LZ4LIB_API int LZ4_decompress_safe (const char* source, char* dest, int compressedSize, int maxDecompressedSize);
LZ4LIB_API int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);
/*-************************************
@@ -153,208 +162,334 @@ LZ4LIB_API int LZ4_decompress_safe (const char* source, char* dest, int compress
#define LZ4_MAX_INPUT_SIZE 0x7E000000 /* 2 113 929 216 bytes */
#define LZ4_COMPRESSBOUND(isize) ((unsigned)(isize) > (unsigned)LZ4_MAX_INPUT_SIZE ? 0 : (isize) + ((isize)/255) + 16)
/*!
LZ4_compressBound() :
/*! LZ4_compressBound() :
Provides the maximum size that LZ4 compression may output in a "worst case" scenario (input data not compressible)
This function is primarily useful for memory allocation purposes (destination buffer size).
Macro LZ4_COMPRESSBOUND() is also provided for compilation-time evaluation (stack memory allocation for example).
Note that LZ4_compress_default() compress faster when dest buffer size is >= LZ4_compressBound(srcSize)
Note that LZ4_compress_default() compresses faster when dstCapacity is >= LZ4_compressBound(srcSize)
inputSize : max supported value is LZ4_MAX_INPUT_SIZE
return : maximum output size in a "worst case" scenario
or 0, if input size is too large ( > LZ4_MAX_INPUT_SIZE)
or 0, if input size is incorrect (too large or negative)
*/
LZ4LIB_API int LZ4_compressBound(int inputSize);
/*!
LZ4_compress_fast() :
Same as LZ4_compress_default(), but allows to select an "acceleration" factor.
/*! LZ4_compress_fast() :
Same as LZ4_compress_default(), but allows selection of "acceleration" factor.
The larger the acceleration value, the faster the algorithm, but also the lesser the compression.
It's a trade-off. It can be fine tuned, with each successive value providing roughly +~3% to speed.
An acceleration value of "1" is the same as regular LZ4_compress_default()
Values <= 0 will be replaced by ACCELERATION_DEFAULT (see lz4.c), which is 1.
Values <= 0 will be replaced by ACCELERATION_DEFAULT (currently == 1, see lz4.c).
*/
LZ4LIB_API int LZ4_compress_fast (const char* source, char* dest, int sourceSize, int maxDestSize, int acceleration);
LZ4LIB_API int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
/*!
LZ4_compress_fast_extState() :
Same compression function, just using an externally allocated memory space to store compression state.
Use LZ4_sizeofState() to know how much memory must be allocated,
and allocate it on 8-bytes boundaries (using malloc() typically).
Then, provide it as 'void* state' to compression function.
*/
/*! LZ4_compress_fast_extState() :
* Same as LZ4_compress_fast(), using an externally allocated memory space for its state.
* Use LZ4_sizeofState() to know how much memory must be allocated,
* and allocate it on 8-bytes boundaries (using `malloc()` typically).
* Then, provide this buffer as `void* state` to compression function.
*/
LZ4LIB_API int LZ4_sizeofState(void);
LZ4LIB_API int LZ4_compress_fast_extState (void* state, const char* source, char* dest, int inputSize, int maxDestSize, int acceleration);
LZ4LIB_API int LZ4_compress_fast_extState (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
/*!
LZ4_compress_destSize() :
Reverse the logic, by compressing as much data as possible from 'source' buffer
into already allocated buffer 'dest' of size 'targetDestSize'.
This function either compresses the entire 'source' content into 'dest' if it's large enough,
or fill 'dest' buffer completely with as much data as possible from 'source'.
*sourceSizePtr : will be modified to indicate how many bytes where read from 'source' to fill 'dest'.
New value is necessarily <= old value.
return : Nb bytes written into 'dest' (necessarily <= targetDestSize)
or 0 if compression fails
/*! LZ4_compress_destSize() :
* Reverse the logic : compresses as much data as possible from 'src' buffer
* into already allocated buffer 'dst', of size >= 'targetDestSize'.
* This function either compresses the entire 'src' content into 'dst' if it's large enough,
* or fill 'dst' buffer completely with as much data as possible from 'src'.
* note: acceleration parameter is fixed to "default".
*
* *srcSizePtr : will be modified to indicate how many bytes where read from 'src' to fill 'dst'.
* New value is necessarily <= input value.
* @return : Nb bytes written into 'dst' (necessarily <= targetDestSize)
* or 0 if compression fails.
*/
LZ4LIB_API int LZ4_compress_destSize (const char* source, char* dest, int* sourceSizePtr, int targetDestSize);
LZ4LIB_API int LZ4_compress_destSize (const char* src, char* dst, int* srcSizePtr, int targetDstSize);
/*!
LZ4_decompress_fast() :
originalSize : is the original and therefore uncompressed size
return : the number of bytes read from the source buffer (in other words, the compressed size)
If the source stream is detected malformed, the function will stop decoding and return a negative result.
Destination buffer must be already allocated. Its size must be a minimum of 'originalSize' bytes.
note : This function fully respect memory boundaries for properly formed compressed data.
It is a bit faster than LZ4_decompress_safe().
However, it does not provide any protection against intentionally modified data stream (malicious input).
Use this function in trusted environment only (data to decode comes from a trusted source).
*/
LZ4LIB_API int LZ4_decompress_fast (const char* source, char* dest, int originalSize);
/*!
LZ4_decompress_safe_partial() :
This function decompress a compressed block of size 'compressedSize' at position 'source'
into destination buffer 'dest' of size 'maxDecompressedSize'.
The function tries to stop decompressing operation as soon as 'targetOutputSize' has been reached,
reducing decompression time.
return : the number of bytes decoded in the destination buffer (necessarily <= maxDecompressedSize)
Note : this number can be < 'targetOutputSize' should the compressed block to decode be smaller.
Always control how many bytes were decoded.
If the source stream is detected malformed, the function will stop decoding and return a negative result.
This function never writes outside of output buffer, and never reads outside of input buffer. It is therefore protected against malicious data packets
*/
LZ4LIB_API int LZ4_decompress_safe_partial (const char* source, char* dest, int compressedSize, int targetOutputSize, int maxDecompressedSize);
/*! LZ4_decompress_safe_partial() :
* Decompress an LZ4 compressed block, of size 'srcSize' at position 'src',
* into destination buffer 'dst' of size 'dstCapacity'.
* Up to 'targetOutputSize' bytes will be decoded.
* The function stops decoding on reaching this objective,
* which can boost performance when only the beginning of a block is required.
*
* @return : the number of bytes decoded in `dst` (necessarily <= dstCapacity)
* If source stream is detected malformed, function returns a negative result.
*
* Note : @return can be < targetOutputSize, if compressed block contains less data.
*
* Note 2 : this function features 2 parameters, targetOutputSize and dstCapacity,
* and expects targetOutputSize <= dstCapacity.
* It effectively stops decoding on reaching targetOutputSize,
* so dstCapacity is kind of redundant.
* This is because in a previous version of this function,
* decoding operation would not "break" a sequence in the middle.
* As a consequence, there was no guarantee that decoding would stop at exactly targetOutputSize,
* it could write more bytes, though only up to dstCapacity.
* Some "margin" used to be required for this operation to work properly.
* This is no longer necessary.
* The function nonetheless keeps its signature, in an effort to not break API.
*/
LZ4LIB_API int LZ4_decompress_safe_partial (const char* src, char* dst, int srcSize, int targetOutputSize, int dstCapacity);
/*-*********************************************
* Streaming Compression Functions
***********************************************/
typedef union LZ4_stream_u LZ4_stream_t; /* incomplete type (defined later) */
typedef union LZ4_stream_u LZ4_stream_t; /* incomplete type (defined later) */
/*! LZ4_createStream() and LZ4_freeStream() :
* LZ4_createStream() will allocate and initialize an `LZ4_stream_t` structure.
* LZ4_freeStream() releases its memory.
*/
LZ4LIB_API LZ4_stream_t* LZ4_createStream(void);
LZ4LIB_API int LZ4_freeStream (LZ4_stream_t* streamPtr);
/*! LZ4_resetStream() :
* An LZ4_stream_t structure can be allocated once and re-used multiple times.
* Use this function to start compressing a new stream.
/*! LZ4_resetStream_fast() : v1.9.0+
* Use this to prepare an LZ4_stream_t for a new chain of dependent blocks
* (e.g., LZ4_compress_fast_continue()).
*
* An LZ4_stream_t must be initialized once before usage.
* This is automatically done when created by LZ4_createStream().
* However, should the LZ4_stream_t be simply declared on stack (for example),
* it's necessary to initialize it first, using LZ4_initStream().
*
* After init, start any new stream with LZ4_resetStream_fast().
* A same LZ4_stream_t can be re-used multiple times consecutively
* and compress multiple streams,
* provided that it starts each new stream with LZ4_resetStream_fast().
*
* LZ4_resetStream_fast() is much faster than LZ4_initStream(),
* but is not compatible with memory regions containing garbage data.
*
* Note: it's only useful to call LZ4_resetStream_fast()
* in the context of streaming compression.
* The *extState* functions perform their own resets.
* Invoking LZ4_resetStream_fast() before is redundant, and even counterproductive.
*/
LZ4LIB_API void LZ4_resetStream (LZ4_stream_t* streamPtr);
LZ4LIB_API void LZ4_resetStream_fast (LZ4_stream_t* streamPtr);
/*! LZ4_loadDict() :
* Use this function to load a static dictionary into LZ4_stream_t.
* Any previous data will be forgotten, only 'dictionary' will remain in memory.
* Use this function to reference a static dictionary into LZ4_stream_t.
* The dictionary must remain available during compression.
* LZ4_loadDict() triggers a reset, so any previous data will be forgotten.
* The same dictionary will have to be loaded on decompression side for successful decoding.
* Dictionary are useful for better compression of small data (KB range).
* While LZ4 accept any input as dictionary,
* results are generally better when using Zstandard's Dictionary Builder.
* Loading a size of 0 is allowed, and is the same as reset.
* @return : dictionary size, in bytes (necessarily <= 64 KB)
* @return : loaded dictionary size, in bytes (necessarily <= 64 KB)
*/
LZ4LIB_API int LZ4_loadDict (LZ4_stream_t* streamPtr, const char* dictionary, int dictSize);
/*! LZ4_compress_fast_continue() :
* Compress content into 'src' using data from previously compressed blocks, improving compression ratio.
* 'dst' buffer must be already allocated.
* Compress 'src' content using data from previously compressed blocks, for better compression ratio.
* 'dst' buffer must be already allocated.
* If dstCapacity >= LZ4_compressBound(srcSize), compression is guaranteed to succeed, and runs faster.
*
* Important : Up to 64KB of previously compressed data is assumed to remain present and unmodified in memory !
* Special 1 : If input buffer is a double-buffer, it can have any size, including < 64 KB.
* Special 2 : If input buffer is a ring-buffer, it can have any size, including < 64 KB.
*
* @return : size of compressed block
* or 0 if there is an error (typically, compressed data cannot fit into 'dst')
* After an error, the stream status is invalid, it can only be reset or freed.
* or 0 if there is an error (typically, cannot fit into 'dst').
*
* Note 1 : Each invocation to LZ4_compress_fast_continue() generates a new block.
* Each block has precise boundaries.
* Each block must be decompressed separately, calling LZ4_decompress_*() with relevant metadata.
* It's not possible to append blocks together and expect a single invocation of LZ4_decompress_*() to decompress them together.
*
* Note 2 : The previous 64KB of source data is __assumed__ to remain present, unmodified, at same address in memory !
*
* Note 3 : When input is structured as a double-buffer, each buffer can have any size, including < 64 KB.
* Make sure that buffers are separated, by at least one byte.
* This construction ensures that each block only depends on previous block.
*
* Note 4 : If input buffer is a ring-buffer, it can have any size, including < 64 KB.
*
* Note 5 : After an error, the stream status is undefined (invalid), it can only be reset or freed.
*/
LZ4LIB_API int LZ4_compress_fast_continue (LZ4_stream_t* streamPtr, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
/*! LZ4_saveDict() :
* If previously compressed data block is not guaranteed to remain available at its current memory location,
* If last 64KB data cannot be guaranteed to remain available at its current memory location,
* save it into a safer place (char* safeBuffer).
* Note : it's not necessary to call LZ4_loadDict() after LZ4_saveDict(), dictionary is immediately usable.
* @return : saved dictionary size in bytes (necessarily <= dictSize), or 0 if error.
* This is schematically equivalent to a memcpy() followed by LZ4_loadDict(),
* but is much faster, because LZ4_saveDict() doesn't need to rebuild tables.
* @return : saved dictionary size in bytes (necessarily <= maxDictSize), or 0 if error.
*/
LZ4LIB_API int LZ4_saveDict (LZ4_stream_t* streamPtr, char* safeBuffer, int dictSize);
LZ4LIB_API int LZ4_saveDict (LZ4_stream_t* streamPtr, char* safeBuffer, int maxDictSize);
/*-**********************************************
* Streaming Decompression Functions
* Bufferless synchronous API
************************************************/
typedef union LZ4_streamDecode_u LZ4_streamDecode_t; /* incomplete type (defined later) */
typedef union LZ4_streamDecode_u LZ4_streamDecode_t; /* tracking context */
/*! LZ4_createStreamDecode() and LZ4_freeStreamDecode() :
* creation / destruction of streaming decompression tracking structure.
* A tracking structure can be re-used multiple times sequentially. */
* creation / destruction of streaming decompression tracking context.
* A tracking context can be re-used multiple times.
*/
LZ4LIB_API LZ4_streamDecode_t* LZ4_createStreamDecode(void);
LZ4LIB_API int LZ4_freeStreamDecode (LZ4_streamDecode_t* LZ4_stream);
/*! LZ4_setStreamDecode() :
* An LZ4_streamDecode_t structure can be allocated once and re-used multiple times.
* An LZ4_streamDecode_t context can be allocated once and re-used multiple times.
* Use this function to start decompression of a new stream of blocks.
* A dictionary can optionnally be set. Use NULL or size 0 for a simple reset order.
* A dictionary can optionally be set. Use NULL or size 0 for a reset order.
* Dictionary is presumed stable : it must remain accessible and unmodified during next decompression.
* @return : 1 if OK, 0 if error
*/
LZ4LIB_API int LZ4_setStreamDecode (LZ4_streamDecode_t* LZ4_streamDecode, const char* dictionary, int dictSize);
/*! LZ4_decoderRingBufferSize() : v1.8.2+
* Note : in a ring buffer scenario (optional),
* blocks are presumed decompressed next to each other
* up to the moment there is not enough remaining space for next block (remainingSize < maxBlockSize),
* at which stage it resumes from beginning of ring buffer.
* When setting such a ring buffer for streaming decompression,
* provides the minimum size of this ring buffer
* to be compatible with any source respecting maxBlockSize condition.
* @return : minimum ring buffer size,
* or 0 if there is an error (invalid maxBlockSize).
*/
LZ4LIB_API int LZ4_decoderRingBufferSize(int maxBlockSize);
#define LZ4_DECODER_RING_BUFFER_SIZE(maxBlockSize) (65536 + 14 + (maxBlockSize)) /* for static allocation; maxBlockSize presumed valid */
/*! LZ4_decompress_*_continue() :
* These decoding functions allow decompression of consecutive blocks in "streaming" mode.
* A block is an unsplittable entity, it must be presented entirely to a decompression function.
* Decompression functions only accept one block at a time.
* Previously decoded blocks *must* remain available at the memory position where they were decoded (up to 64 KB).
* Decompression functions only accepts one block at a time.
* The last 64KB of previously decoded data *must* remain available and unmodified at the memory position where they were decoded.
* If less than 64KB of data has been decoded, all the data must be present.
*
* Special : if application sets a ring buffer for decompression, it must respect one of the following conditions :
* - Exactly same size as encoding buffer, with same update rule (block boundaries at same positions)
* In which case, the decoding & encoding ring buffer can have any size, including very small ones ( < 64 KB).
* - Larger than encoding buffer, by a minimum of maxBlockSize more bytes.
* maxBlockSize is implementation dependent. It's the maximum size of any single block.
* Special : if decompression side sets a ring buffer, it must respect one of the following conditions :
* - Decompression buffer size is _at least_ LZ4_decoderRingBufferSize(maxBlockSize).
* maxBlockSize is the maximum size of any single block. It can have any value > 16 bytes.
* In which case, encoding and decoding buffers do not need to be synchronized.
* Actually, data can be produced by any source compliant with LZ4 format specification, and respecting maxBlockSize.
* - Synchronized mode :
* Decompression buffer size is _exactly_ the same as compression buffer size,
* and follows exactly same update rule (block boundaries at same positions),
* and decoding function is provided with exact decompressed size of each block (exception for last block of the stream),
* _then_ decoding & encoding ring buffer can have any size, including small ones ( < 64 KB).
* - Decompression buffer is larger than encoding buffer, by a minimum of maxBlockSize more bytes.
* In which case, encoding and decoding buffers do not need to be synchronized,
* and encoding ring buffer can have any size, including small ones ( < 64 KB).
* - _At least_ 64 KB + 8 bytes + maxBlockSize.
* In which case, encoding and decoding buffers do not need to be synchronized,
* and encoding ring buffer can have any size, including larger than decoding buffer.
* Whenever these conditions are not possible, save the last 64KB of decoded data into a safe buffer,
* and indicate where it is saved using LZ4_setStreamDecode() before decompressing next block.
*
* Whenever these conditions are not possible,
* save the last 64KB of decoded data into a safe buffer where it can't be modified during decompression,
* then indicate where this data is saved using LZ4_setStreamDecode(), before decompressing next block.
*/
LZ4LIB_API int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* source, char* dest, int compressedSize, int maxDecompressedSize);
LZ4LIB_API int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* source, char* dest, int originalSize);
LZ4LIB_API int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int srcSize, int dstCapacity);
/*! LZ4_decompress_*_usingDict() :
* These decoding functions work the same as
* a combination of LZ4_setStreamDecode() followed by LZ4_decompress_*_continue()
* They are stand-alone, and don't need an LZ4_streamDecode_t structure.
* Dictionary is presumed stable : it must remain accessible and unmodified during decompression.
* Performance tip : Decompression speed can be substantially increased
* when dst == dictStart + dictSize.
*/
LZ4LIB_API int LZ4_decompress_safe_usingDict (const char* source, char* dest, int compressedSize, int maxDecompressedSize, const char* dictStart, int dictSize);
LZ4LIB_API int LZ4_decompress_fast_usingDict (const char* source, char* dest, int originalSize, const char* dictStart, int dictSize);
LZ4LIB_API int LZ4_decompress_safe_usingDict (const char* src, char* dst, int srcSize, int dstCapcity, const char* dictStart, int dictSize);
/*^**********************************************
/*^*************************************
* !!!!!! STATIC LINKING ONLY !!!!!!
***********************************************/
/*-************************************
* Private definitions
**************************************
* Do not use these definitions.
* They are exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
* Using these definitions will expose code to API and/or ABI break in future versions of the library.
**************************************/
***************************************/
/*-****************************************************************************
* Experimental section
*
* Symbols declared in this section must be considered unstable. Their
* signatures or semantics may change, or they may be removed altogether in the
* future. They are therefore only safe to depend on when the caller is
* statically linked against the library.
*
* To protect against unsafe usage, not only are the declarations guarded,
* the definitions are hidden by default
* when building LZ4 as a shared/dynamic library.
*
* In order to access these declarations,
* define LZ4_STATIC_LINKING_ONLY in your application
* before including LZ4's headers.
*
* In order to make their implementations accessible dynamically, you must
* define LZ4_PUBLISH_STATIC_FUNCTIONS when building the LZ4 library.
******************************************************************************/
#ifdef LZ4_PUBLISH_STATIC_FUNCTIONS
#define LZ4LIB_STATIC_API LZ4LIB_API
#else
#define LZ4LIB_STATIC_API
#endif
#ifdef LZ4_STATIC_LINKING_ONLY
/*! LZ4_compress_fast_extState_fastReset() :
* A variant of LZ4_compress_fast_extState().
*
* Using this variant avoids an expensive initialization step.
* It is only safe to call if the state buffer is known to be correctly initialized already
* (see above comment on LZ4_resetStream_fast() for a definition of "correctly initialized").
* From a high level, the difference is that
* this function initializes the provided state with a call to something like LZ4_resetStream_fast()
* while LZ4_compress_fast_extState() starts with a call to LZ4_resetStream().
*/
LZ4LIB_STATIC_API int LZ4_compress_fast_extState_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
/*! LZ4_attach_dictionary() :
* This is an experimental API that allows
* efficient use of a static dictionary many times.
*
* Rather than re-loading the dictionary buffer into a working context before
* each compression, or copying a pre-loaded dictionary's LZ4_stream_t into a
* working LZ4_stream_t, this function introduces a no-copy setup mechanism,
* in which the working stream references the dictionary stream in-place.
*
* Several assumptions are made about the state of the dictionary stream.
* Currently, only streams which have been prepared by LZ4_loadDict() should
* be expected to work.
*
* Alternatively, the provided dictionaryStream may be NULL,
* in which case any existing dictionary stream is unset.
*
* If a dictionary is provided, it replaces any pre-existing stream history.
* The dictionary contents are the only history that can be referenced and
* logically immediately precede the data compressed in the first subsequent
* compression call.
*
* The dictionary will only remain attached to the working stream through the
* first compression call, at the end of which it is cleared. The dictionary
* stream (and source buffer) must remain in-place / accessible / unchanged
* through the completion of the first compression call on the stream.
*/
LZ4LIB_STATIC_API void LZ4_attach_dictionary(LZ4_stream_t* workingStream, const LZ4_stream_t* dictionaryStream);
#endif
/*-************************************************************
* PRIVATE DEFINITIONS
**************************************************************
* Do not use these definitions directly.
* They are only exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
* Accessing members will expose code to API and/or ABI break in future versions of the library.
**************************************************************/
#define LZ4_HASHLOG (LZ4_MEMORY_USAGE-2)
#define LZ4_HASHTABLESIZE (1 << LZ4_MEMORY_USAGE)
#define LZ4_HASH_SIZE_U32 (1 << LZ4_HASHLOG) /* required as macro for static allocation */
#if defined(__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */)
#include <stdint.h>
typedef struct {
typedef struct LZ4_stream_t_internal LZ4_stream_t_internal;
struct LZ4_stream_t_internal {
uint32_t hashTable[LZ4_HASH_SIZE_U32];
uint32_t currentOffset;
uint32_t initCheck;
uint16_t dirty;
uint16_t tableType;
const uint8_t* dictionary;
uint8_t* bufferStart; /* obsolete, used for slideInputBuffer */
const LZ4_stream_t_internal* dictCtx;
uint32_t dictSize;
} LZ4_stream_t_internal;
};
typedef struct {
const uint8_t* externalDict;
@@ -365,49 +500,67 @@ typedef struct {
#else
typedef struct {
typedef struct LZ4_stream_t_internal LZ4_stream_t_internal;
struct LZ4_stream_t_internal {
unsigned int hashTable[LZ4_HASH_SIZE_U32];
unsigned int currentOffset;
unsigned int initCheck;
unsigned short dirty;
unsigned short tableType;
const unsigned char* dictionary;
unsigned char* bufferStart; /* obsolete, used for slideInputBuffer */
const LZ4_stream_t_internal* dictCtx;
unsigned int dictSize;
} LZ4_stream_t_internal;
};
typedef struct {
const unsigned char* externalDict;
size_t extDictSize;
const unsigned char* prefixEnd;
size_t extDictSize;
size_t prefixSize;
} LZ4_streamDecode_t_internal;
#endif
/*!
* LZ4_stream_t :
* information structure to track an LZ4 stream.
* init this structure before first use.
* note : only use in association with static linking !
* this definition is not API/ABI safe,
* it may change in a future version !
/*! LZ4_stream_t :
* information structure to track an LZ4 stream.
* LZ4_stream_t can also be created using LZ4_createStream(), which is recommended.
* The structure definition can be convenient for static allocation
* (on stack, or as part of larger structure).
* Init this structure with LZ4_initStream() before first use.
* note : only use this definition in association with static linking !
* this definition is not API/ABI safe, and may change in a future version.
*/
#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE-3)) + 4)
#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE-3)) + 4 + ((sizeof(void*)==16) ? 4 : 0) /*AS-400*/ )
#define LZ4_STREAMSIZE (LZ4_STREAMSIZE_U64 * sizeof(unsigned long long))
union LZ4_stream_u {
unsigned long long table[LZ4_STREAMSIZE_U64];
LZ4_stream_t_internal internal_donotuse;
} ; /* previously typedef'd to LZ4_stream_t */
/*!
* LZ4_streamDecode_t :
* information structure to track an LZ4 stream during decompression.
* init this structure using LZ4_setStreamDecode (or memset()) before first use
* note : only use in association with static linking !
* this definition is not API/ABI safe,
* and may change in a future version !
/*! LZ4_initStream() : v1.9.0+
* An LZ4_stream_t structure must be initialized at least once.
* This is automatically done when invoking LZ4_createStream(),
* but it's not when the structure is simply declared on stack (for example).
*
* Use LZ4_initStream() to properly initialize a newly declared LZ4_stream_t.
* It can also initialize any arbitrary buffer of sufficient size,
* and will @return a pointer of proper type upon initialization.
*
* Note : initialization fails if size and alignment conditions are not respected.
* In which case, the function will @return NULL.
* Note2: An LZ4_stream_t structure guarantees correct alignment and size.
* Note3: Before v1.9.0, use LZ4_resetStream() instead
*/
#define LZ4_STREAMDECODESIZE_U64 4
LZ4LIB_API LZ4_stream_t* LZ4_initStream (void* buffer, size_t size);
/*! LZ4_streamDecode_t :
* information structure to track an LZ4 stream during decompression.
* init this structure using LZ4_setStreamDecode() before first use.
* note : only use in association with static linking !
* this definition is not API/ABI safe,
* and may change in a future version !
*/
#define LZ4_STREAMDECODESIZE_U64 (4 + ((sizeof(void*)==16) ? 2 : 0) /*AS-400*/ )
#define LZ4_STREAMDECODESIZE (LZ4_STREAMDECODESIZE_U64 * sizeof(unsigned long long))
union LZ4_streamDecode_u {
unsigned long long table[LZ4_STREAMDECODESIZE_U64];
@@ -420,20 +573,23 @@ union LZ4_streamDecode_u {
**************************************/
/*! Deprecation warnings
Should deprecation warnings be a problem,
it is generally possible to disable them,
typically with -Wno-deprecated-declarations for gcc
or _CRT_SECURE_NO_WARNINGS in Visual.
Otherwise, it's also possible to define LZ4_DISABLE_DEPRECATE_WARNINGS */
*
* Deprecated functions make the compiler generate a warning when invoked.
* This is meant to invite users to update their source code.
* Should deprecation warnings be a problem, it is generally possible to disable them,
* typically with -Wno-deprecated-declarations for gcc
* or _CRT_SECURE_NO_WARNINGS in Visual.
*
* Another method is to define LZ4_DISABLE_DEPRECATE_WARNINGS
* before including the header file.
*/
#ifdef LZ4_DISABLE_DEPRECATE_WARNINGS
# define LZ4_DEPRECATED(message) /* disable deprecation warnings */
#else
# define LZ4_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
# if defined(__clang__) /* clang doesn't handle mixed C++11 and CNU attributes */
# define LZ4_DEPRECATED(message) __attribute__((deprecated(message)))
# elif defined (__cplusplus) && (__cplusplus >= 201402) /* C++14 or greater */
# if defined (__cplusplus) && (__cplusplus >= 201402) /* C++14 or greater */
# define LZ4_DEPRECATED(message) [[deprecated(message)]]
# elif (LZ4_GCC_VERSION >= 405)
# elif (LZ4_GCC_VERSION >= 405) || defined(__clang__)
# define LZ4_DEPRECATED(message) __attribute__((deprecated(message)))
# elif (LZ4_GCC_VERSION >= 301)
# define LZ4_DEPRECATED(message) __attribute__((deprecated))
@@ -446,26 +602,77 @@ union LZ4_streamDecode_u {
#endif /* LZ4_DISABLE_DEPRECATE_WARNINGS */
/* Obsolete compression functions */
LZ4LIB_API LZ4_DEPRECATED("use LZ4_compress_default() instead") int LZ4_compress (const char* source, char* dest, int sourceSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_compress_default() instead") int LZ4_compress_limitedOutput (const char* source, char* dest, int sourceSize, int maxOutputSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_compress_fast_extState() instead") int LZ4_compress_withState (void* state, const char* source, char* dest, int inputSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_compress_fast_extState() instead") int LZ4_compress_limitedOutput_withState (void* state, const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_compress_fast_continue() instead") int LZ4_compress_continue (LZ4_stream_t* LZ4_streamPtr, const char* source, char* dest, int inputSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_compress_fast_continue() instead") int LZ4_compress_limitedOutput_continue (LZ4_stream_t* LZ4_streamPtr, const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_default() instead") LZ4LIB_API int LZ4_compress (const char* source, char* dest, int sourceSize);
LZ4_DEPRECATED("use LZ4_compress_default() instead") LZ4LIB_API int LZ4_compress_limitedOutput (const char* source, char* dest, int sourceSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_extState() instead") LZ4LIB_API int LZ4_compress_withState (void* state, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_extState() instead") LZ4LIB_API int LZ4_compress_limitedOutput_withState (void* state, const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_continue() instead") LZ4LIB_API int LZ4_compress_continue (LZ4_stream_t* LZ4_streamPtr, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_continue() instead") LZ4LIB_API int LZ4_compress_limitedOutput_continue (LZ4_stream_t* LZ4_streamPtr, const char* source, char* dest, int inputSize, int maxOutputSize);
/* Obsolete decompression functions */
LZ4LIB_API LZ4_DEPRECATED("use LZ4_decompress_fast() instead") int LZ4_uncompress (const char* source, char* dest, int outputSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_decompress_safe() instead") int LZ4_uncompress_unknownOutputSize (const char* source, char* dest, int isize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_decompress_fast() instead") LZ4LIB_API int LZ4_uncompress (const char* source, char* dest, int outputSize);
LZ4_DEPRECATED("use LZ4_decompress_safe() instead") LZ4LIB_API int LZ4_uncompress_unknownOutputSize (const char* source, char* dest, int isize, int maxOutputSize);
/* Obsolete streaming functions; use new streaming interface whenever possible */
LZ4LIB_API LZ4_DEPRECATED("use LZ4_createStream() instead") void* LZ4_create (char* inputBuffer);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_createStream() instead") int LZ4_sizeofStreamState(void);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_resetStream() instead") int LZ4_resetStreamState(void* state, char* inputBuffer);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_saveDict() instead") char* LZ4_slideInputBuffer (void* state);
/* Obsolete streaming functions; degraded functionality; do not use!
*
* In order to perform streaming compression, these functions depended on data
* that is no longer tracked in the state. They have been preserved as well as
* possible: using them will still produce a correct output. However, they don't
* actually retain any history between compression calls. The compression ratio
* achieved will therefore be no better than compressing each chunk
* independently.
*/
LZ4_DEPRECATED("Use LZ4_createStream() instead") LZ4LIB_API void* LZ4_create (char* inputBuffer);
LZ4_DEPRECATED("Use LZ4_createStream() instead") LZ4LIB_API int LZ4_sizeofStreamState(void);
LZ4_DEPRECATED("Use LZ4_resetStream() instead") LZ4LIB_API int LZ4_resetStreamState(void* state, char* inputBuffer);
LZ4_DEPRECATED("Use LZ4_saveDict() instead") LZ4LIB_API char* LZ4_slideInputBuffer (void* state);
/* Obsolete streaming decoding functions */
LZ4LIB_API LZ4_DEPRECATED("use LZ4_decompress_safe_usingDict() instead") int LZ4_decompress_safe_withPrefix64k (const char* src, char* dst, int compressedSize, int maxDstSize);
LZ4LIB_API LZ4_DEPRECATED("use LZ4_decompress_fast_usingDict() instead") int LZ4_decompress_fast_withPrefix64k (const char* src, char* dst, int originalSize);
LZ4_DEPRECATED("use LZ4_decompress_safe_usingDict() instead") LZ4LIB_API int LZ4_decompress_safe_withPrefix64k (const char* src, char* dst, int compressedSize, int maxDstSize);
LZ4_DEPRECATED("use LZ4_decompress_fast_usingDict() instead") LZ4LIB_API int LZ4_decompress_fast_withPrefix64k (const char* src, char* dst, int originalSize);
/*! LZ4_decompress_fast() : **unsafe!**
* These functions used to be faster than LZ4_decompress_safe(),
* but it has changed, and they are now slower than LZ4_decompress_safe().
* This is because LZ4_decompress_fast() doesn't know the input size,
* and therefore must progress more cautiously in the input buffer to not read beyond the end of block.
* On top of that `LZ4_decompress_fast()` is not protected vs malformed or malicious inputs, making it a security liability.
* As a consequence, LZ4_decompress_fast() is strongly discouraged, and deprecated.
*
* The last remaining LZ4_decompress_fast() specificity is that
* it can decompress a block without knowing its compressed size.
* Such functionality could be achieved in a more secure manner,
* by also providing the maximum size of input buffer,
* but it would require new prototypes, and adaptation of the implementation to this new use case.
*
* Parameters:
* originalSize : is the uncompressed size to regenerate.
* `dst` must be already allocated, its size must be >= 'originalSize' bytes.
* @return : number of bytes read from source buffer (== compressed size).
* The function expects to finish at block's end exactly.
* If the source stream is detected malformed, the function stops decoding and returns a negative result.
* note : LZ4_decompress_fast*() requires originalSize. Thanks to this information, it never writes past the output buffer.
* However, since it doesn't know its 'src' size, it may read an unknown amount of input, past input buffer bounds.
* Also, since match offsets are not validated, match reads from 'src' may underflow too.
* These issues never happen if input (compressed) data is correct.
* But they may happen if input data is invalid (error or intentional tampering).
* As a consequence, use these functions in trusted environments with trusted data **only**.
*/
LZ4_DEPRECATED("This function is deprecated and unsafe. Consider using LZ4_decompress_safe() instead")
LZ4LIB_API int LZ4_decompress_fast (const char* src, char* dst, int originalSize);
LZ4_DEPRECATED("This function is deprecated and unsafe. Consider using LZ4_decompress_safe_continue() instead")
LZ4LIB_API int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int originalSize);
LZ4_DEPRECATED("This function is deprecated and unsafe. Consider using LZ4_decompress_safe_usingDict() instead")
LZ4LIB_API int LZ4_decompress_fast_usingDict (const char* src, char* dst, int originalSize, const char* dictStart, int dictSize);
/*! LZ4_resetStream() :
* An LZ4_stream_t structure must be initialized at least once.
* This is done with LZ4_initStream(), or LZ4_resetStream().
* Consider switching to LZ4_initStream(),
* invoking LZ4_resetStream() will trigger deprecation warnings in the future.
*/
LZ4LIB_API void LZ4_resetStream (LZ4_stream_t* streamPtr);
}

1485
common/tracy_lz4hc.cpp Normal file

File diff suppressed because it is too large Load Diff

428
common/tracy_lz4hc.hpp Normal file
View File

@@ -0,0 +1,428 @@
/*
LZ4 HC - High Compression Mode of LZ4
Header File
Copyright (C) 2011-2017, Yann Collet.
BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You can contact the author at :
- LZ4 source repository : https://github.com/lz4/lz4
- LZ4 public forum : https://groups.google.com/forum/#!forum/lz4c
*/
#ifndef TRACY_LZ4_HC_H_19834876238432
#define TRACY_LZ4_HC_H_19834876238432
/* --- Dependency --- */
/* note : lz4hc requires lz4.h/lz4.c for compilation */
#include "tracy_lz4.hpp" /* stddef, LZ4LIB_API, LZ4_DEPRECATED */
namespace tracy
{
/* --- Useful constants --- */
#define LZ4HC_CLEVEL_MIN 3
#define LZ4HC_CLEVEL_DEFAULT 9
#define LZ4HC_CLEVEL_OPT_MIN 10
#define LZ4HC_CLEVEL_MAX 12
/*-************************************
* Block Compression
**************************************/
/*! LZ4_compress_HC() :
* Compress data from `src` into `dst`, using the powerful but slower "HC" algorithm.
* `dst` must be already allocated.
* Compression is guaranteed to succeed if `dstCapacity >= LZ4_compressBound(srcSize)` (see "lz4.h")
* Max supported `srcSize` value is LZ4_MAX_INPUT_SIZE (see "lz4.h")
* `compressionLevel` : any value between 1 and LZ4HC_CLEVEL_MAX will work.
* Values > LZ4HC_CLEVEL_MAX behave the same as LZ4HC_CLEVEL_MAX.
* @return : the number of bytes written into 'dst'
* or 0 if compression fails.
*/
LZ4LIB_API int LZ4_compress_HC (const char* src, char* dst, int srcSize, int dstCapacity, int compressionLevel);
/* Note :
* Decompression functions are provided within "lz4.h" (BSD license)
*/
/*! LZ4_compress_HC_extStateHC() :
* Same as LZ4_compress_HC(), but using an externally allocated memory segment for `state`.
* `state` size is provided by LZ4_sizeofStateHC().
* Memory segment must be aligned on 8-bytes boundaries (which a normal malloc() should do properly).
*/
LZ4LIB_API int LZ4_sizeofStateHC(void);
LZ4LIB_API int LZ4_compress_HC_extStateHC(void* stateHC, const char* src, char* dst, int srcSize, int maxDstSize, int compressionLevel);
/*! LZ4_compress_HC_destSize() : v1.9.0+
* Will compress as much data as possible from `src`
* to fit into `targetDstSize` budget.
* Result is provided in 2 parts :
* @return : the number of bytes written into 'dst' (necessarily <= targetDstSize)
* or 0 if compression fails.
* `srcSizePtr` : on success, *srcSizePtr is updated to indicate how much bytes were read from `src`
*/
LZ4LIB_API int LZ4_compress_HC_destSize(void* stateHC,
const char* src, char* dst,
int* srcSizePtr, int targetDstSize,
int compressionLevel);
/*-************************************
* Streaming Compression
* Bufferless synchronous API
**************************************/
typedef union LZ4_streamHC_u LZ4_streamHC_t; /* incomplete type (defined later) */
/*! LZ4_createStreamHC() and LZ4_freeStreamHC() :
* These functions create and release memory for LZ4 HC streaming state.
* Newly created states are automatically initialized.
* A same state can be used multiple times consecutively,
* starting with LZ4_resetStreamHC_fast() to start a new stream of blocks.
*/
LZ4LIB_API LZ4_streamHC_t* LZ4_createStreamHC(void);
LZ4LIB_API int LZ4_freeStreamHC (LZ4_streamHC_t* streamHCPtr);
/*
These functions compress data in successive blocks of any size,
using previous blocks as dictionary, to improve compression ratio.
One key assumption is that previous blocks (up to 64 KB) remain read-accessible while compressing next blocks.
There is an exception for ring buffers, which can be smaller than 64 KB.
Ring-buffer scenario is automatically detected and handled within LZ4_compress_HC_continue().
Before starting compression, state must be allocated and properly initialized.
LZ4_createStreamHC() does both, though compression level is set to LZ4HC_CLEVEL_DEFAULT.
Selecting the compression level can be done with LZ4_resetStreamHC_fast() (starts a new stream)
or LZ4_setCompressionLevel() (anytime, between blocks in the same stream) (experimental).
LZ4_resetStreamHC_fast() only works on states which have been properly initialized at least once,
which is automatically the case when state is created using LZ4_createStreamHC().
After reset, a first "fictional block" can be designated as initial dictionary,
using LZ4_loadDictHC() (Optional).
Invoke LZ4_compress_HC_continue() to compress each successive block.
The number of blocks is unlimited.
Previous input blocks, including initial dictionary when present,
must remain accessible and unmodified during compression.
It's allowed to update compression level anytime between blocks,
using LZ4_setCompressionLevel() (experimental).
'dst' buffer should be sized to handle worst case scenarios
(see LZ4_compressBound(), it ensures compression success).
In case of failure, the API does not guarantee recovery,
so the state _must_ be reset.
To ensure compression success
whenever `dst` buffer size cannot be made >= LZ4_compressBound(),
consider using LZ4_compress_HC_continue_destSize().
Whenever previous input blocks can't be preserved unmodified in-place during compression of next blocks,
it's possible to copy the last blocks into a more stable memory space, using LZ4_saveDictHC().
Return value of LZ4_saveDictHC() is the size of dictionary effectively saved into 'safeBuffer' (<= 64 KB)
After completing a streaming compression,
it's possible to start a new stream of blocks, using the same LZ4_streamHC_t state,
just by resetting it, using LZ4_resetStreamHC_fast().
*/
LZ4LIB_API void LZ4_resetStreamHC_fast(LZ4_streamHC_t* streamHCPtr, int compressionLevel); /* v1.9.0+ */
LZ4LIB_API int LZ4_loadDictHC (LZ4_streamHC_t* streamHCPtr, const char* dictionary, int dictSize);
LZ4LIB_API int LZ4_compress_HC_continue (LZ4_streamHC_t* streamHCPtr,
const char* src, char* dst,
int srcSize, int maxDstSize);
/*! LZ4_compress_HC_continue_destSize() : v1.9.0+
* Similar to LZ4_compress_HC_continue(),
* but will read as much data as possible from `src`
* to fit into `targetDstSize` budget.
* Result is provided into 2 parts :
* @return : the number of bytes written into 'dst' (necessarily <= targetDstSize)
* or 0 if compression fails.
* `srcSizePtr` : on success, *srcSizePtr will be updated to indicate how much bytes were read from `src`.
* Note that this function may not consume the entire input.
*/
LZ4LIB_API int LZ4_compress_HC_continue_destSize(LZ4_streamHC_t* LZ4_streamHCPtr,
const char* src, char* dst,
int* srcSizePtr, int targetDstSize);
LZ4LIB_API int LZ4_saveDictHC (LZ4_streamHC_t* streamHCPtr, char* safeBuffer, int maxDictSize);
/*^**********************************************
* !!!!!! STATIC LINKING ONLY !!!!!!
***********************************************/
/*-******************************************************************
* PRIVATE DEFINITIONS :
* Do not use these definitions directly.
* They are merely exposed to allow static allocation of `LZ4_streamHC_t`.
* Declare an `LZ4_streamHC_t` directly, rather than any type below.
* Even then, only do so in the context of static linking, as definitions may change between versions.
********************************************************************/
#define LZ4HC_DICTIONARY_LOGSIZE 16
#define LZ4HC_MAXD (1<<LZ4HC_DICTIONARY_LOGSIZE)
#define LZ4HC_MAXD_MASK (LZ4HC_MAXD - 1)
#define LZ4HC_HASH_LOG 15
#define LZ4HC_HASHTABLESIZE (1 << LZ4HC_HASH_LOG)
#define LZ4HC_HASH_MASK (LZ4HC_HASHTABLESIZE - 1)
#if defined(__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */)
#include <stdint.h>
typedef struct LZ4HC_CCtx_internal LZ4HC_CCtx_internal;
struct LZ4HC_CCtx_internal
{
uint32_t hashTable[LZ4HC_HASHTABLESIZE];
uint16_t chainTable[LZ4HC_MAXD];
const uint8_t* end; /* next block here to continue on current prefix */
const uint8_t* base; /* All index relative to this position */
const uint8_t* dictBase; /* alternate base for extDict */
uint32_t dictLimit; /* below that point, need extDict */
uint32_t lowLimit; /* below that point, no more dict */
uint32_t nextToUpdate; /* index from which to continue dictionary update */
short compressionLevel;
int8_t favorDecSpeed; /* favor decompression speed if this flag set,
otherwise, favor compression ratio */
int8_t dirty; /* stream has to be fully reset if this flag is set */
const LZ4HC_CCtx_internal* dictCtx;
};
#else
typedef struct LZ4HC_CCtx_internal LZ4HC_CCtx_internal;
struct LZ4HC_CCtx_internal
{
unsigned int hashTable[LZ4HC_HASHTABLESIZE];
unsigned short chainTable[LZ4HC_MAXD];
const unsigned char* end; /* next block here to continue on current prefix */
const unsigned char* base; /* All index relative to this position */
const unsigned char* dictBase; /* alternate base for extDict */
unsigned int dictLimit; /* below that point, need extDict */
unsigned int lowLimit; /* below that point, no more dict */
unsigned int nextToUpdate; /* index from which to continue dictionary update */
short compressionLevel;
char favorDecSpeed; /* favor decompression speed if this flag set,
otherwise, favor compression ratio */
char dirty; /* stream has to be fully reset if this flag is set */
const LZ4HC_CCtx_internal* dictCtx;
};
#endif
/* Do not use these definitions directly !
* Declare or allocate an LZ4_streamHC_t instead.
*/
#define LZ4_STREAMHCSIZE (4*LZ4HC_HASHTABLESIZE + 2*LZ4HC_MAXD + 56 + ((sizeof(void*)==16) ? 56 : 0) /* AS400*/ ) /* 262200 or 262256*/
#define LZ4_STREAMHCSIZE_SIZET (LZ4_STREAMHCSIZE / sizeof(size_t))
union LZ4_streamHC_u {
size_t table[LZ4_STREAMHCSIZE_SIZET];
LZ4HC_CCtx_internal internal_donotuse;
}; /* previously typedef'd to LZ4_streamHC_t */
/* LZ4_streamHC_t :
* This structure allows static allocation of LZ4 HC streaming state.
* This can be used to allocate statically, on state, or as part of a larger structure.
*
* Such state **must** be initialized using LZ4_initStreamHC() before first use.
*
* Note that invoking LZ4_initStreamHC() is not required when
* the state was created using LZ4_createStreamHC() (which is recommended).
* Using the normal builder, a newly created state is automatically initialized.
*
* Static allocation shall only be used in combination with static linking.
*/
/* LZ4_initStreamHC() : v1.9.0+
* Required before first use of a statically allocated LZ4_streamHC_t.
* Before v1.9.0 : use LZ4_resetStreamHC() instead
*/
LZ4LIB_API LZ4_streamHC_t* LZ4_initStreamHC (void* buffer, size_t size);
/*-************************************
* Deprecated Functions
**************************************/
/* see lz4.h LZ4_DISABLE_DEPRECATE_WARNINGS to turn off deprecation warnings */
/* deprecated compression functions */
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC (const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC_limitedOutput (const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC2 (const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput(const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC_withStateHC (void* state, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC_limitedOutput_withStateHC (void* state, const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC2_withStateHC (void* state, const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput_withStateHC(void* state, const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC_continue (LZ4_streamHC_t* LZ4_streamHCPtr, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC_limitedOutput_continue (LZ4_streamHC_t* LZ4_streamHCPtr, const char* source, char* dest, int inputSize, int maxOutputSize);
/* Obsolete streaming functions; degraded functionality; do not use!
*
* In order to perform streaming compression, these functions depended on data
* that is no longer tracked in the state. They have been preserved as well as
* possible: using them will still produce a correct output. However, use of
* LZ4_slideInputBufferHC() will truncate the history of the stream, rather
* than preserve a window-sized chunk of history.
*/
LZ4_DEPRECATED("use LZ4_createStreamHC() instead") LZ4LIB_API void* LZ4_createHC (const char* inputBuffer);
LZ4_DEPRECATED("use LZ4_saveDictHC() instead") LZ4LIB_API char* LZ4_slideInputBufferHC (void* LZ4HC_Data);
LZ4_DEPRECATED("use LZ4_freeStreamHC() instead") LZ4LIB_API int LZ4_freeHC (void* LZ4HC_Data);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC2_continue (void* LZ4HC_Data, const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput_continue (void* LZ4HC_Data, const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_createStreamHC() instead") LZ4LIB_API int LZ4_sizeofStreamStateHC(void);
LZ4_DEPRECATED("use LZ4_initStreamHC() instead") LZ4LIB_API int LZ4_resetStreamStateHC(void* state, char* inputBuffer);
/* LZ4_resetStreamHC() is now replaced by LZ4_initStreamHC().
* The intention is to emphasize the difference with LZ4_resetStreamHC_fast(),
* which is now the recommended function to start a new stream of blocks,
* but cannot be used to initialize a memory segment containing arbitrary garbage data.
*
* It is recommended to switch to LZ4_initStreamHC().
* LZ4_resetStreamHC() will generate deprecation warnings in a future version.
*/
LZ4LIB_API void LZ4_resetStreamHC (LZ4_streamHC_t* streamHCPtr, int compressionLevel);
}
#endif /* LZ4_HC_H_19834876238432 */
/*-**************************************************
* !!!!! STATIC LINKING ONLY !!!!!
* Following definitions are considered experimental.
* They should not be linked from DLL,
* as there is no guarantee of API stability yet.
* Prototypes will be promoted to "stable" status
* after successfull usage in real-life scenarios.
***************************************************/
#ifdef LZ4_HC_STATIC_LINKING_ONLY /* protection macro */
#ifndef LZ4_HC_SLO_098092834
#define LZ4_HC_SLO_098092834
namespace tracy
{
/*! LZ4_setCompressionLevel() : v1.8.0+ (experimental)
* It's possible to change compression level
* between successive invocations of LZ4_compress_HC_continue*()
* for dynamic adaptation.
*/
LZ4LIB_STATIC_API void LZ4_setCompressionLevel(
LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel);
/*! LZ4_favorDecompressionSpeed() : v1.8.2+ (experimental)
* Opt. Parser will favor decompression speed over compression ratio.
* Only applicable to levels >= LZ4HC_CLEVEL_OPT_MIN.
*/
LZ4LIB_STATIC_API void LZ4_favorDecompressionSpeed(
LZ4_streamHC_t* LZ4_streamHCPtr, int favor);
/*! LZ4_resetStreamHC_fast() : v1.9.0+
* When an LZ4_streamHC_t is known to be in a internally coherent state,
* it can often be prepared for a new compression with almost no work, only
* sometimes falling back to the full, expensive reset that is always required
* when the stream is in an indeterminate state (i.e., the reset performed by
* LZ4_resetStreamHC()).
*
* LZ4_streamHCs are guaranteed to be in a valid state when:
* - returned from LZ4_createStreamHC()
* - reset by LZ4_resetStreamHC()
* - memset(stream, 0, sizeof(LZ4_streamHC_t))
* - the stream was in a valid state and was reset by LZ4_resetStreamHC_fast()
* - the stream was in a valid state and was then used in any compression call
* that returned success
* - the stream was in an indeterminate state and was used in a compression
* call that fully reset the state (LZ4_compress_HC_extStateHC()) and that
* returned success
*
* Note:
* A stream that was last used in a compression call that returned an error
* may be passed to this function. However, it will be fully reset, which will
* clear any existing history and settings from the context.
*/
LZ4LIB_STATIC_API void LZ4_resetStreamHC_fast(
LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel);
/*! LZ4_compress_HC_extStateHC_fastReset() :
* A variant of LZ4_compress_HC_extStateHC().
*
* Using this variant avoids an expensive initialization step. It is only safe
* to call if the state buffer is known to be correctly initialized already
* (see above comment on LZ4_resetStreamHC_fast() for a definition of
* "correctly initialized"). From a high level, the difference is that this
* function initializes the provided state with a call to
* LZ4_resetStreamHC_fast() while LZ4_compress_HC_extStateHC() starts with a
* call to LZ4_resetStreamHC().
*/
LZ4LIB_STATIC_API int LZ4_compress_HC_extStateHC_fastReset (
void* state,
const char* src, char* dst,
int srcSize, int dstCapacity,
int compressionLevel);
/*! LZ4_attach_HC_dictionary() :
* This is an experimental API that allows for the efficient use of a
* static dictionary many times.
*
* Rather than re-loading the dictionary buffer into a working context before
* each compression, or copying a pre-loaded dictionary's LZ4_streamHC_t into a
* working LZ4_streamHC_t, this function introduces a no-copy setup mechanism,
* in which the working stream references the dictionary stream in-place.
*
* Several assumptions are made about the state of the dictionary stream.
* Currently, only streams which have been prepared by LZ4_loadDictHC() should
* be expected to work.
*
* Alternatively, the provided dictionary stream pointer may be NULL, in which
* case any existing dictionary stream is unset.
*
* A dictionary should only be attached to a stream without any history (i.e.,
* a stream that has just been reset).
*
* The dictionary will remain attached to the working stream only for the
* current stream session. Calls to LZ4_resetStreamHC(_fast) will remove the
* dictionary context association from the working stream. The dictionary
* stream (and source buffer) must remain in-place / accessible / unchanged
* through the lifetime of the stream session.
*/
LZ4LIB_STATIC_API void LZ4_attach_HC_dictionary(
LZ4_streamHC_t *working_stream,
const LZ4_streamHC_t *dictionary_stream);
}
#endif /* LZ4_HC_SLO_098092834 */
#endif /* LZ4_HC_STATIC_LINKING_ONLY */

View File

@@ -22,6 +22,12 @@
#include <atomic>
#include <cassert>
#if defined(__MACH__)
#include <mach/mach.h>
#elif defined(__unix__)
#include <semaphore.h>
#endif
namespace tracy
{
@@ -29,10 +35,22 @@ namespace tracy
//---------------------------------------------------------
// Semaphore (Windows)
//---------------------------------------------------------
#ifndef MAXLONG
enum { MAXLONG = 0x7fffffff };
#endif
#include <windows.h>
#undef min
#undef max
#ifndef INFINITE
enum { INFINITE = 0xFFFFFFFF };
#endif
#ifndef _WINDOWS_
typedef void* HANDLE;
extern "C" __declspec(dllimport) HANDLE __stdcall CreateSemaphoreA( void*, long, long, const char* );
extern "C" __declspec(dllimport) int __stdcall CloseHandle( HANDLE );
extern "C" __declspec(dllimport) unsigned long __stdcall WaitForSingleObject( HANDLE, unsigned long );
extern "C" __declspec(dllimport) int __stdcall ReleaseSemaphore( HANDLE, long, long* );
#endif
class Semaphore
{
@@ -46,7 +64,7 @@ public:
Semaphore(int initialCount = 0)
{
assert(initialCount >= 0);
m_hSema = CreateSemaphore(NULL, initialCount, MAXLONG, NULL);
m_hSema = CreateSemaphoreA(NULL, initialCount, MAXLONG, NULL);
}
~Semaphore()
@@ -72,8 +90,6 @@ public:
// Can't use POSIX semaphores due to http://lists.apple.com/archives/darwin-kernel/2009/Apr/msg00010.html
//---------------------------------------------------------
#include <mach/mach.h>
class Semaphore
{
private:
@@ -119,8 +135,6 @@ public:
// Semaphore (POSIX, Linux)
//---------------------------------------------------------
#include <semaphore.h>
class Semaphore
{
private:

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.4 KiB

BIN
doc/issues/dxt1+alpha.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 284 KiB

1
examples/ToyPathTracer/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
Windows/Compiled*Shader.h

View File

@@ -0,0 +1,4 @@
https://github.com/aras-p/ToyPathTracer
Modified to render only 10 frames. Client part requires 12 GB, server part
requires 7 GB.

View File

@@ -0,0 +1,33 @@
#if defined(__APPLE__) && !defined(__METAL_VERSION__)
#include <TargetConditionals.h>
#endif
#define kBackbufferWidth 1280
#define kBackbufferHeight 720
#if defined(__EMSCRIPTEN__)
#define CPU_CAN_DO_SIMD 0
#define CPU_CAN_DO_THREADS 0
#else
#define CPU_CAN_DO_SIMD 1
#define CPU_CAN_DO_THREADS 1
#endif
#define DO_SAMPLES_PER_PIXEL 4
#define DO_ANIMATE_SMOOTHING 0.9f
#define DO_LIGHT_SAMPLING 1
#define DO_MITSUBA_COMPARE 0
// Should path tracing be done on the GPU with a compute shader?
#define DO_COMPUTE_GPU 0
#define kCSGroupSizeX 8
#define kCSGroupSizeY 8
#define kCSMaxObjects 64
// Should float3 struct use SSE/NEON?
#define DO_FLOAT3_WITH_SIMD (!(DO_COMPUTE_GPU) && CPU_CAN_DO_SIMD && 1)
// Should HitSpheres function use SSE/NEON?
#define DO_HIT_SPHERES_SIMD (CPU_CAN_DO_SIMD && 1)

View File

@@ -0,0 +1,192 @@
#pragma once
#if defined(_MSC_VER)
#define VM_INLINE __forceinline
#else
#define VM_INLINE __attribute__((unused, always_inline, nodebug)) inline
#endif
#define kSimdWidth 4
#if !defined(__arm__) && !defined(__arm64__) && !defined(__EMSCRIPTEN__)
// ---- SSE implementation
#include <xmmintrin.h>
#include <emmintrin.h>
#include <smmintrin.h>
#define SHUFFLE4(V, X,Y,Z,W) float4(_mm_shuffle_ps((V).m, (V).m, _MM_SHUFFLE(W,Z,Y,X)))
struct float4
{
VM_INLINE float4() {}
VM_INLINE explicit float4(const float *p) { m = _mm_loadu_ps(p); }
VM_INLINE explicit float4(float x, float y, float z, float w) { m = _mm_set_ps(w, z, y, x); }
VM_INLINE explicit float4(float v) { m = _mm_set_ps1(v); }
VM_INLINE explicit float4(__m128 v) { m = v; }
VM_INLINE float getX() const { return _mm_cvtss_f32(m); }
VM_INLINE float getY() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(1, 1, 1, 1))); }
VM_INLINE float getZ() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(2, 2, 2, 2))); }
VM_INLINE float getW() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(3, 3, 3, 3))); }
__m128 m;
};
typedef float4 bool4;
VM_INLINE float4 operator+ (float4 a, float4 b) { a.m = _mm_add_ps(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a, float4 b) { a.m = _mm_sub_ps(a.m, b.m); return a; }
VM_INLINE float4 operator* (float4 a, float4 b) { a.m = _mm_mul_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator==(float4 a, float4 b) { a.m = _mm_cmpeq_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator!=(float4 a, float4 b) { a.m = _mm_cmpneq_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator< (float4 a, float4 b) { a.m = _mm_cmplt_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator> (float4 a, float4 b) { a.m = _mm_cmpgt_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator<=(float4 a, float4 b) { a.m = _mm_cmple_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator>=(float4 a, float4 b) { a.m = _mm_cmpge_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator&(bool4 a, bool4 b) { a.m = _mm_and_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator|(bool4 a, bool4 b) { a.m = _mm_or_ps(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a) { a.m = _mm_xor_ps(a.m, _mm_set1_ps(-0.0f)); return a; }
VM_INLINE float4 min(float4 a, float4 b) { a.m = _mm_min_ps(a.m, b.m); return a; }
VM_INLINE float4 max(float4 a, float4 b) { a.m = _mm_max_ps(a.m, b.m); return a; }
VM_INLINE float hmin(float4 v)
{
v = min(v, SHUFFLE4(v, 2, 3, 0, 0));
v = min(v, SHUFFLE4(v, 1, 0, 0, 0));
return v.getX();
}
// Returns a 4-bit code where bit0..bit3 is X..W
VM_INLINE unsigned mask(float4 v) { return _mm_movemask_ps(v.m); }
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool4 v) { return mask(v) != 0; }
VM_INLINE bool all(bool4 v) { return mask(v) == 15; }
// "select", i.e. hibit(cond) ? b : a
// on SSE4.1 and up this can be done easily via "blend" instruction;
// on older SSEs has to do a bunch of hoops, see
// https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/
VM_INLINE float4 select(float4 a, float4 b, bool4 cond)
{
#if defined(__SSE4_1__) || defined(_MSC_VER) // on windows assume we always have SSE4.1
a.m = _mm_blendv_ps(a.m, b.m, cond.m);
#else
__m128 d = _mm_castsi128_ps(_mm_srai_epi32(_mm_castps_si128(cond.m), 31));
a.m = _mm_or_ps(_mm_and_ps(d, b.m), _mm_andnot_ps(d, a.m));
#endif
return a;
}
VM_INLINE __m128i select(__m128i a, __m128i b, bool4 cond)
{
#if defined(__SSE4_1__) || defined(_MSC_VER) // on windows assume we always have SSE4.1
return _mm_blendv_epi8(a, b, _mm_castps_si128(cond.m));
#else
__m128i d = _mm_srai_epi32(_mm_castps_si128(cond.m), 31);
return _mm_or_si128(_mm_and_si128(d, b), _mm_andnot_si128(d, a));
#endif
}
VM_INLINE float4 sqrtf(float4 v) { return float4(_mm_sqrt_ps(v.m)); }
#elif !defined(__EMSCRIPTEN__)
// ---- NEON implementation
#define USE_NEON 1
#include <arm_neon.h>
struct float4
{
VM_INLINE float4() {}
VM_INLINE explicit float4(const float *p) { m = vld1q_f32(p); }
VM_INLINE explicit float4(float x, float y, float z, float w) { float v[4] = {x, y, z, w}; m = vld1q_f32(v); }
VM_INLINE explicit float4(float v) { m = vdupq_n_f32(v); }
VM_INLINE explicit float4(float32x4_t v) { m = v; }
VM_INLINE float getX() const { return vgetq_lane_f32(m, 0); }
VM_INLINE float getY() const { return vgetq_lane_f32(m, 1); }
VM_INLINE float getZ() const { return vgetq_lane_f32(m, 2); }
VM_INLINE float getW() const { return vgetq_lane_f32(m, 3); }
float32x4_t m;
};
typedef float4 bool4;
VM_INLINE float4 operator+ (float4 a, float4 b) { a.m = vaddq_f32(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a, float4 b) { a.m = vsubq_f32(a.m, b.m); return a; }
VM_INLINE float4 operator* (float4 a, float4 b) { a.m = vmulq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator==(float4 a, float4 b) { a.m = vceqq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator!=(float4 a, float4 b) { a.m = a.m = vmvnq_u32(vceqq_f32(a.m, b.m)); return a; }
VM_INLINE bool4 operator< (float4 a, float4 b) { a.m = vcltq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator> (float4 a, float4 b) { a.m = vcgtq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator<=(float4 a, float4 b) { a.m = vcleq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator>=(float4 a, float4 b) { a.m = vcgeq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator&(bool4 a, bool4 b) { a.m = vandq_u32(a.m, b.m); return a; }
VM_INLINE bool4 operator|(bool4 a, bool4 b) { a.m = vorrq_u32(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a) { a.m = vnegq_f32(a.m); return a; }
VM_INLINE float4 min(float4 a, float4 b) { a.m = vminq_f32(a.m, b.m); return a; }
VM_INLINE float4 max(float4 a, float4 b) { a.m = vmaxq_f32(a.m, b.m); return a; }
VM_INLINE float hmin(float4 v)
{
float32x2_t minOfHalfs = vpmin_f32(vget_low_f32(v.m), vget_high_f32(v.m));
float32x2_t minOfMinOfHalfs = vpmin_f32(minOfHalfs, minOfHalfs);
return vget_lane_f32(minOfMinOfHalfs, 0);
}
// Returns a 4-bit code where bit0..bit3 is X..W
VM_INLINE unsigned mask(float4 v)
{
static const uint32x4_t movemask = { 1, 2, 4, 8 };
static const uint32x4_t highbit = { 0x80000000, 0x80000000, 0x80000000, 0x80000000 };
uint32x4_t t0 = vreinterpretq_u32_f32(v.m);
uint32x4_t t1 = vtstq_u32(t0, highbit);
uint32x4_t t2 = vandq_u32(t1, movemask);
uint32x2_t t3 = vorr_u32(vget_low_u32(t2), vget_high_u32(t2));
return vget_lane_u32(t3, 0) | vget_lane_u32(t3, 1);
}
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool4 v) { return mask(v) != 0; }
VM_INLINE bool all(bool4 v) { return mask(v) == 15; }
// "select", i.e. hibit(cond) ? b : a
// on SSE4.1 and up this can be done easily via "blend" instruction;
// on older SSEs has to do a bunch of hoops, see
// https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/
VM_INLINE float4 select(float4 a, float4 b, bool4 cond)
{
a.m = vbslq_f32(cond.m, b.m, a.m);
return a;
}
VM_INLINE int32x4_t select(int32x4_t a, int32x4_t b, bool4 cond)
{
return vbslq_f32(cond.m, b, a);
}
VM_INLINE float4 sqrtf(float4 v)
{
float32x4_t V = v.m;
float32x4_t S0 = vrsqrteq_f32(V);
float32x4_t P0 = vmulq_f32( V, S0 );
float32x4_t R0 = vrsqrtsq_f32( P0, S0 );
float32x4_t S1 = vmulq_f32( S0, R0 );
float32x4_t P1 = vmulq_f32( V, S1 );
float32x4_t R1 = vrsqrtsq_f32( P1, S1 );
float32x4_t S2 = vmulq_f32( S1, R1 );
float32x4_t P2 = vmulq_f32( V, S2 );
float32x4_t R2 = vrsqrtsq_f32( P2, S2 );
float32x4_t S3 = vmulq_f32( S2, R2 );
return float4(vmulq_f32(V, S3));
}
VM_INLINE float4 splatX(float32x4_t v) { return float4(vdupq_lane_f32(vget_low_f32(v), 0)); }
VM_INLINE float4 splatY(float32x4_t v) { return float4(vdupq_lane_f32(vget_low_f32(v), 1)); }
VM_INLINE float4 splatZ(float32x4_t v) { return float4(vdupq_lane_f32(vget_high_f32(v), 0)); }
VM_INLINE float4 splatW(float32x4_t v) { return float4(vdupq_lane_f32(vget_high_f32(v), 1)); }
#endif

View File

@@ -0,0 +1,203 @@
#include "Maths.h"
#include <stdlib.h>
#include <stdint.h>
static uint32_t XorShift32(uint32_t& state)
{
uint32_t x = state;
x ^= x << 13;
x ^= x >> 17;
x ^= x << 15;
state = x;
return x;
}
float RandomFloat01(uint32_t& state)
{
return (XorShift32(state) & 0xFFFFFF) / 16777216.0f;
}
float3 RandomInUnitDisk(uint32_t& state)
{
float3 p;
do
{
p = 2.0 * float3(RandomFloat01(state),RandomFloat01(state),0) - float3(1,1,0);
} while (dot(p,p) >= 1.0);
return p;
}
float3 RandomInUnitSphere(uint32_t& state)
{
float3 p;
do {
p = 2.0*float3(RandomFloat01(state),RandomFloat01(state),RandomFloat01(state)) - float3(1,1,1);
} while (sqLength(p) >= 1.0);
return p;
}
float3 RandomUnitVector(uint32_t& state)
{
float z = RandomFloat01(state) * 2.0f - 1.0f;
float a = RandomFloat01(state) * 2.0f * kPI;
float r = sqrtf(1.0f - z * z);
float x = r * cosf(a);
float y = r * sinf(a);
return float3(x, y, z);
}
int HitSpheres(const Ray& r, const SpheresSoA& spheres, float tMin, float tMax, Hit& outHit)
{
#if DO_HIT_SPHERES_SIMD
float4 hitT = float4(tMax);
#if USE_NEON
int32x4_t id = vdupq_n_s32(-1);
#else
__m128i id = _mm_set1_epi32(-1);
#endif
#if DO_FLOAT3_WITH_SIMD && !USE_NEON
float4 rOrigX = SHUFFLE4(r.orig, 0, 0, 0, 0);
float4 rOrigY = SHUFFLE4(r.orig, 1, 1, 1, 1);
float4 rOrigZ = SHUFFLE4(r.orig, 2, 2, 2, 2);
float4 rDirX = SHUFFLE4(r.dir, 0, 0, 0, 0);
float4 rDirY = SHUFFLE4(r.dir, 1, 1, 1, 1);
float4 rDirZ = SHUFFLE4(r.dir, 2, 2, 2, 2);
#elif DO_FLOAT3_WITH_SIMD
float4 rOrigX = splatX(r.orig.m);
float4 rOrigY = splatY(r.orig.m);
float4 rOrigZ = splatZ(r.orig.m);
float4 rDirX = splatX(r.dir.m);
float4 rDirY = splatY(r.dir.m);
float4 rDirZ = splatZ(r.dir.m);
#else
float4 rOrigX = float4(r.orig.x);
float4 rOrigY = float4(r.orig.y);
float4 rOrigZ = float4(r.orig.z);
float4 rDirX = float4(r.dir.x);
float4 rDirY = float4(r.dir.y);
float4 rDirZ = float4(r.dir.z);
#endif
float4 tMin4 = float4(tMin);
#if USE_NEON
int32x4_t curId = vcombine_u32(vcreate_u32(0ULL | (1ULL<<32)), vcreate_u32(2ULL | (3ULL<<32)));
#else
__m128i curId = _mm_set_epi32(3, 2, 1, 0);
#endif
// process 4 spheres at once
for (int i = 0; i < spheres.simdCount; i += kSimdWidth)
{
// load data for 4 spheres
float4 sCenterX = float4(spheres.centerX + i);
float4 sCenterY = float4(spheres.centerY + i);
float4 sCenterZ = float4(spheres.centerZ + i);
float4 sSqRadius = float4(spheres.sqRadius + i);
// note: we flip this vector and calculate -b (nb) since that happens to be slightly preferable computationally
float4 coX = sCenterX - rOrigX;
float4 coY = sCenterY - rOrigY;
float4 coZ = sCenterZ - rOrigZ;
float4 nb = coX * rDirX + coY * rDirY + coZ * rDirZ;
float4 c = coX * coX + coY * coY + coZ * coZ - sSqRadius;
float4 discr = nb * nb - c;
bool4 discrPos = discr > float4(0.0f);
// if ray hits any of the 4 spheres
if (any(discrPos))
{
float4 discrSq = sqrtf(discr);
// ray could hit spheres at t0 & t1
float4 t0 = nb - discrSq;
float4 t1 = nb + discrSq;
float4 t = select(t1, t0, t0 > tMin4); // if t0 is above min, take it (since it's the earlier hit); else try t1.
bool4 msk = discrPos & (t > tMin4) & (t < hitT);
// if hit, take it
id = select(id, curId, msk);
hitT = select(hitT, t, msk);
}
#if USE_NEON
curId = vaddq_s32(curId, vdupq_n_s32(kSimdWidth));
#else
curId = _mm_add_epi32(curId, _mm_set1_epi32(kSimdWidth));
#endif
}
// now we have up to 4 hits, find and return closest one
float minT = hmin(hitT);
if (minT < tMax) // any actual hits?
{
int minMask = mask(hitT == float4(minT));
if (minMask != 0)
{
int id_scalar[4];
float hitT_scalar[4];
#if USE_NEON
vst1q_s32(id_scalar, id);
vst1q_f32(hitT_scalar, hitT.m);
#else
_mm_storeu_si128((__m128i *)id_scalar, id);
_mm_storeu_ps(hitT_scalar, hitT.m);
#endif
// In general, you would do this with a bit scan (first set/trailing zero count).
// But who cares, it's only 16 options.
static const int laneId[16] =
{
0, 0, 1, 0, // 00xx
2, 0, 1, 0, // 01xx
3, 0, 1, 0, // 10xx
2, 0, 1, 0, // 11xx
};
int lane = laneId[minMask];
int hitId = id_scalar[lane];
float finalHitT = hitT_scalar[lane];
outHit.pos = r.pointAt(finalHitT);
outHit.normal = (outHit.pos - float3(spheres.centerX[hitId], spheres.centerY[hitId], spheres.centerZ[hitId])) * spheres.invRadius[hitId];
outHit.t = finalHitT;
return hitId;
}
}
return -1;
#else // #if DO_HIT_SPHERES_SIMD
float hitT = tMax;
int id = -1;
for (int i = 0; i < spheres.count; ++i)
{
float coX = spheres.centerX[i] - r.orig.getX();
float coY = spheres.centerY[i] - r.orig.getY();
float coZ = spheres.centerZ[i] - r.orig.getZ();
float nb = coX * r.dir.getX() + coY * r.dir.getY() + coZ * r.dir.getZ();
float c = coX * coX + coY * coY + coZ * coZ - spheres.sqRadius[i];
float discr = nb * nb - c;
if (discr > 0)
{
float discrSq = sqrtf(discr);
// Try earlier t
float t = nb - discrSq;
if (t <= tMin) // before min, try later t!
t = nb + discrSq;
if (t > tMin && t < hitT)
{
id = i;
hitT = t;
}
}
}
if (id != -1)
{
outHit.pos = r.pointAt(hitT);
outHit.normal = (outHit.pos - float3(spheres.centerX[id], spheres.centerY[id], spheres.centerZ[id])) * spheres.invRadius[id];
outHit.t = hitT;
return id;
}
else
return -1;
#endif // #else of #if DO_HIT_SPHERES_SIMD
}

View File

@@ -0,0 +1,436 @@
#pragma once
#include <math.h>
#include <assert.h>
#include <stdint.h>
#include "Config.h"
#include "MathSimd.h"
#define kPI 3.1415926f
// SSE/SIMD vector largely based on http://www.codersnotes.com/notes/maths-lib-2016/
#if DO_FLOAT3_WITH_SIMD
#if !defined(__arm__) && !defined(__arm64__)
// ---- SSE implementation
// SHUFFLE3(v, 0,1,2) leaves the vector unchanged (v.xyz).
// SHUFFLE3(v, 0,0,0) splats the X (v.xxx).
#define SHUFFLE3(V, X,Y,Z) float3(_mm_shuffle_ps((V).m, (V).m, _MM_SHUFFLE(Z,Z,Y,X)))
struct float3
{
VM_INLINE float3() {}
VM_INLINE explicit float3(const float *p) { m = _mm_set_ps(p[2], p[2], p[1], p[0]); }
VM_INLINE explicit float3(float x, float y, float z) { m = _mm_set_ps(z, z, y, x); }
VM_INLINE explicit float3(float v) { m = _mm_set1_ps(v); }
VM_INLINE explicit float3(__m128 v) { m = v; }
VM_INLINE float getX() const { return _mm_cvtss_f32(m); }
VM_INLINE float getY() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(1, 1, 1, 1))); }
VM_INLINE float getZ() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(2, 2, 2, 2))); }
VM_INLINE float3 yzx() const { return SHUFFLE3(*this, 1, 2, 0); }
VM_INLINE float3 zxy() const { return SHUFFLE3(*this, 2, 0, 1); }
VM_INLINE void store(float *p) const { p[0] = getX(); p[1] = getY(); p[2] = getZ(); }
void setX(float x)
{
m = _mm_move_ss(m, _mm_set_ss(x));
}
void setY(float y)
{
__m128 t = _mm_move_ss(m, _mm_set_ss(y));
t = _mm_shuffle_ps(t, t, _MM_SHUFFLE(3, 2, 0, 0));
m = _mm_move_ss(t, m);
}
void setZ(float z)
{
__m128 t = _mm_move_ss(m, _mm_set_ss(z));
t = _mm_shuffle_ps(t, t, _MM_SHUFFLE(3, 0, 1, 0));
m = _mm_move_ss(t, m);
}
__m128 m;
};
typedef float3 bool3;
VM_INLINE float3 operator+ (float3 a, float3 b) { a.m = _mm_add_ps(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a, float3 b) { a.m = _mm_sub_ps(a.m, b.m); return a; }
VM_INLINE float3 operator* (float3 a, float3 b) { a.m = _mm_mul_ps(a.m, b.m); return a; }
VM_INLINE float3 operator/ (float3 a, float3 b) { a.m = _mm_div_ps(a.m, b.m); return a; }
VM_INLINE float3 operator* (float3 a, float b) { a.m = _mm_mul_ps(a.m, _mm_set1_ps(b)); return a; }
VM_INLINE float3 operator/ (float3 a, float b) { a.m = _mm_div_ps(a.m, _mm_set1_ps(b)); return a; }
VM_INLINE float3 operator* (float a, float3 b) { b.m = _mm_mul_ps(_mm_set1_ps(a), b.m); return b; }
VM_INLINE float3 operator/ (float a, float3 b) { b.m = _mm_div_ps(_mm_set1_ps(a), b.m); return b; }
VM_INLINE float3& operator+= (float3 &a, float3 b) { a = a + b; return a; }
VM_INLINE float3& operator-= (float3 &a, float3 b) { a = a - b; return a; }
VM_INLINE float3& operator*= (float3 &a, float3 b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float3 b) { a = a / b; return a; }
VM_INLINE float3& operator*= (float3 &a, float b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float b) { a = a / b; return a; }
VM_INLINE bool3 operator==(float3 a, float3 b) { a.m = _mm_cmpeq_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator!=(float3 a, float3 b) { a.m = _mm_cmpneq_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator< (float3 a, float3 b) { a.m = _mm_cmplt_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator> (float3 a, float3 b) { a.m = _mm_cmpgt_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator<=(float3 a, float3 b) { a.m = _mm_cmple_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator>=(float3 a, float3 b) { a.m = _mm_cmpge_ps(a.m, b.m); return a; }
VM_INLINE float3 min(float3 a, float3 b) { a.m = _mm_min_ps(a.m, b.m); return a; }
VM_INLINE float3 max(float3 a, float3 b) { a.m = _mm_max_ps(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a) { return float3(_mm_setzero_ps()) - a; }
VM_INLINE float hmin(float3 v)
{
v = min(v, SHUFFLE3(v, 1, 0, 2));
return min(v, SHUFFLE3(v, 2, 0, 1)).getX();
}
VM_INLINE float hmax(float3 v)
{
v = max(v, SHUFFLE3(v, 1, 0, 2));
return max(v, SHUFFLE3(v, 2, 0, 1)).getX();
}
VM_INLINE float3 cross(float3 a, float3 b)
{
// x <- a.y*b.z - a.z*b.y
// y <- a.z*b.x - a.x*b.z
// z <- a.x*b.y - a.y*b.x
// We can save a shuffle by grouping it in this wacky order:
return (a.zxy()*b - a*b.zxy()).zxy();
}
// Returns a 3-bit code where bit0..bit2 is X..Z
VM_INLINE unsigned mask(float3 v) { return _mm_movemask_ps(v.m) & 7; }
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool3 v) { return mask(v) != 0; }
VM_INLINE bool all(bool3 v) { return mask(v) == 7; }
VM_INLINE float3 clamp(float3 t, float3 a, float3 b) { return min(max(t, a), b); }
VM_INLINE float sum(float3 v) { return v.getX() + v.getY() + v.getZ(); }
VM_INLINE float dot(float3 a, float3 b) { return sum(a*b); }
#else // #if !defined(__arm__) && !defined(__arm64__)
// ---- NEON implementation
#include <arm_neon.h>
struct float3
{
VM_INLINE float3() {}
VM_INLINE explicit float3(const float *p) { float v[4] = {p[0], p[1], p[2], 0}; m = vld1q_f32(v); }
VM_INLINE explicit float3(float x, float y, float z) { float v[4] = {x, y, z, 0}; m = vld1q_f32(v); }
VM_INLINE explicit float3(float v) { m = vdupq_n_f32(v); }
VM_INLINE explicit float3(float32x4_t v) { m = v; }
VM_INLINE float getX() const { return vgetq_lane_f32(m, 0); }
VM_INLINE float getY() const { return vgetq_lane_f32(m, 1); }
VM_INLINE float getZ() const { return vgetq_lane_f32(m, 2); }
VM_INLINE float3 yzx() const
{
float32x2_t low = vget_low_f32(m);
float32x4_t yzx = vcombine_f32(vext_f32(low, vget_high_f32(m), 1), low);
return float3(yzx);
}
VM_INLINE float3 zxy() const
{
float32x4_t p = m;
p = vuzpq_f32(vreinterpretq_f32_s32(vextq_s32(vreinterpretq_s32_f32(p), vreinterpretq_s32_f32(p), 1)), p).val[1];
return float3(p);
}
VM_INLINE void store(float *p) const { p[0] = getX(); p[1] = getY(); p[2] = getZ(); }
void setX(float x)
{
m = vsetq_lane_f32(x, m, 0);
}
void setY(float y)
{
m = vsetq_lane_f32(y, m, 1);
}
void setZ(float z)
{
m = vsetq_lane_f32(z, m, 2);
}
float32x4_t m;
};
typedef float3 bool3;
VM_INLINE float32x4_t rcp_2(float32x4_t v)
{
float32x4_t e = vrecpeq_f32(v);
e = vmulq_f32(vrecpsq_f32(e, v), e);
e = vmulq_f32(vrecpsq_f32(e, v), e);
return e;
}
VM_INLINE float3 operator+ (float3 a, float3 b) { a.m = vaddq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a, float3 b) { a.m = vsubq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator* (float3 a, float3 b) { a.m = vmulq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator/ (float3 a, float3 b) { float32x4_t recip = rcp_2(b.m); a.m = vmulq_f32(a.m, recip); return a; }
VM_INLINE float3 operator* (float3 a, float b) { a.m = vmulq_f32(a.m, vdupq_n_f32(b)); return a; }
VM_INLINE float3 operator/ (float3 a, float b) { float32x4_t recip = rcp_2(vdupq_n_f32(b)); a.m = vmulq_f32(a.m, recip); return a; }
VM_INLINE float3 operator* (float a, float3 b) { b.m = vmulq_f32(vdupq_n_f32(a), b.m); return b; }
VM_INLINE float3 operator/ (float a, float3 b) { float32x4_t recip = rcp_2(b.m); b.m = vmulq_f32(vdupq_n_f32(a), recip); return b; }
VM_INLINE float3& operator+= (float3 &a, float3 b) { a = a + b; return a; }
VM_INLINE float3& operator-= (float3 &a, float3 b) { a = a - b; return a; }
VM_INLINE float3& operator*= (float3 &a, float3 b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float3 b) { a = a / b; return a; }
VM_INLINE float3& operator*= (float3 &a, float b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float b) { a = a / b; return a; }
VM_INLINE bool3 operator==(float3 a, float3 b) { a.m = vceqq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator!=(float3 a, float3 b) { a.m = vmvnq_u32(vceqq_f32(a.m, b.m)); return a; }
VM_INLINE bool3 operator< (float3 a, float3 b) { a.m = vcltq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator> (float3 a, float3 b) { a.m = vcgtq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator<=(float3 a, float3 b) { a.m = vcleq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator>=(float3 a, float3 b) { a.m = vcgeq_f32(a.m, b.m); return a; }
VM_INLINE float3 min(float3 a, float3 b) { a.m = vminq_f32(a.m, b.m); return a; }
VM_INLINE float3 max(float3 a, float3 b) { a.m = vmaxq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a) { a.m = vnegq_f32(a.m); return a; }
VM_INLINE float hmin(float3 v)
{
float32x2_t minOfHalfs = vpmin_f32(vget_low_f32(v.m), vget_high_f32(v.m));
float32x2_t minOfMinOfHalfs = vpmin_f32(minOfHalfs, minOfHalfs);
return vget_lane_f32(minOfMinOfHalfs, 0);
}
VM_INLINE float hmax(float3 v)
{
float32x2_t maxOfHalfs = vpmax_f32(vget_low_f32(v.m), vget_high_f32(v.m));
float32x2_t maxOfMaxOfHalfs = vpmax_f32(maxOfHalfs, maxOfHalfs);
return vget_lane_f32(maxOfMaxOfHalfs, 0);
}
VM_INLINE float3 cross(float3 a, float3 b)
{
// x <- a.y*b.z - a.z*b.y
// y <- a.z*b.x - a.x*b.z
// z <- a.x*b.y - a.y*b.x
// We can save a shuffle by grouping it in this wacky order:
return (a.zxy()*b - a*b.zxy()).zxy();
}
// Returns a 3-bit code where bit0..bit2 is X..Z
VM_INLINE unsigned mask(float3 v)
{
static const uint32x4_t movemask = { 1, 2, 4, 8 };
static const uint32x4_t highbit = { 0x80000000, 0x80000000, 0x80000000, 0x80000000 };
uint32x4_t t0 = vreinterpretq_u32_f32(v.m);
uint32x4_t t1 = vtstq_u32(t0, highbit);
uint32x4_t t2 = vandq_u32(t1, movemask);
uint32x2_t t3 = vorr_u32(vget_low_u32(t2), vget_high_u32(t2));
return vget_lane_u32(t3, 0) | vget_lane_u32(t3, 1);
}
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool3 v) { return mask(v) != 0; }
VM_INLINE bool all(bool3 v) { return mask(v) == 7; }
VM_INLINE float3 clamp(float3 t, float3 a, float3 b) { return min(max(t, a), b); }
VM_INLINE float sum(float3 v) { return v.getX() + v.getY() + v.getZ(); }
VM_INLINE float dot(float3 a, float3 b) { return sum(a*b); }
#endif // #else of #if !defined(__arm__) && !defined(__arm64__)
#else // #if DO_FLOAT3_WITH_SIMD
// ---- Simple scalar C implementation
struct float3
{
float3() : x(0), y(0), z(0) {}
float3(float x_, float y_, float z_) : x(x_), y(y_), z(z_) {}
float3 operator-() const { return float3(-x, -y, -z); }
float3& operator+=(const float3& o) { x+=o.x; y+=o.y; z+=o.z; return *this; }
float3& operator-=(const float3& o) { x-=o.x; y-=o.y; z-=o.z; return *this; }
float3& operator*=(const float3& o) { x*=o.x; y*=o.y; z*=o.z; return *this; }
float3& operator*=(float o) { x*=o; y*=o; z*=o; return *this; }
VM_INLINE float getX() const { return x; }
VM_INLINE float getY() const { return y; }
VM_INLINE float getZ() const { return z; }
VM_INLINE void setX(float x_) { x = x_; }
VM_INLINE void setY(float y_) { y = y_; }
VM_INLINE void setZ(float z_) { z = z_; }
VM_INLINE void store(float *p) const { p[0] = getX(); p[1] = getY(); p[2] = getZ(); }
float x, y, z;
};
VM_INLINE float3 operator+(const float3& a, const float3& b) { return float3(a.x+b.x,a.y+b.y,a.z+b.z); }
VM_INLINE float3 operator-(const float3& a, const float3& b) { return float3(a.x-b.x,a.y-b.y,a.z-b.z); }
VM_INLINE float3 operator*(const float3& a, const float3& b) { return float3(a.x*b.x,a.y*b.y,a.z*b.z); }
VM_INLINE float3 operator*(const float3& a, float b) { return float3(a.x*b,a.y*b,a.z*b); }
VM_INLINE float3 operator*(float a, const float3& b) { return float3(a*b.x,a*b.y,a*b.z); }
VM_INLINE float dot(const float3& a, const float3& b) { return a.x*b.x+a.y*b.y+a.z*b.z; }
VM_INLINE float3 cross(const float3& a, const float3& b)
{
return float3(
a.y*b.z - a.z*b.y,
-(a.x*b.z - a.z*b.x),
a.x*b.y - a.y*b.x
);
}
#endif // #else of #if DO_FLOAT3_WITH_SIMD
VM_INLINE float length(float3 v) { return sqrtf(dot(v, v)); }
VM_INLINE float sqLength(float3 v) { return dot(v, v); }
VM_INLINE float3 normalize(float3 v) { return v * (1.0f / length(v)); }
VM_INLINE float3 lerp(float3 a, float3 b, float t) { return a + (b-a)*t; }
inline void AssertUnit(float3 v)
{
assert(fabsf(sqLength(v) - 1.0f) < 0.01f);
}
inline float3 reflect(float3 v, float3 n)
{
return v - 2*dot(v,n)*n;
}
inline bool refract(float3 v, float3 n, float nint, float3& outRefracted)
{
AssertUnit(v);
float dt = dot(v, n);
float discr = 1.0f - nint*nint*(1-dt*dt);
if (discr > 0)
{
outRefracted = nint * (v - n*dt) - n*sqrtf(discr);
return true;
}
return false;
}
inline float schlick(float cosine, float ri)
{
float r0 = (1-ri) / (1+ri);
r0 = r0*r0;
return r0 + (1-r0)*powf(1-cosine, 5);
}
struct Ray
{
Ray() {}
Ray(float3 orig_, float3 dir_) : orig(orig_), dir(dir_) { AssertUnit(dir); }
float3 pointAt(float t) const { return orig + dir * t; }
float3 orig;
float3 dir;
};
struct Hit
{
float3 pos;
float3 normal;
float t;
};
struct Sphere
{
Sphere() : radius(1.0f), invRadius(0.0f) {}
Sphere(float3 center_, float radius_) : center(center_), radius(radius_), invRadius(0.0f) {}
void UpdateDerivedData() { invRadius = 1.0f/radius; }
float3 center;
float radius;
float invRadius;
};
// data for all spheres in a "structure of arrays" layout
struct SpheresSoA
{
SpheresSoA(int c)
{
count = c;
// we'll be processing spheres in kSimdWidth chunks, so make sure to allocate
// enough space
simdCount = (c + (kSimdWidth - 1)) / kSimdWidth * kSimdWidth;
centerX = new float[simdCount];
centerY = new float[simdCount];
centerZ = new float[simdCount];
sqRadius = new float[simdCount];
invRadius = new float[simdCount];
// set all data to "impossible sphere" state
for (int i = count; i < simdCount; ++i)
{
centerX[i] = centerY[i] = centerZ[i] = 10000.0f;
sqRadius[i] = 0.0f;
invRadius[i] = 0.0f;
}
}
~SpheresSoA()
{
delete[] centerX;
delete[] centerY;
delete[] centerZ;
delete[] sqRadius;
delete[] invRadius;
}
float* centerX;
float* centerY;
float* centerZ;
float* sqRadius;
float* invRadius;
int simdCount;
int count;
};
int HitSpheres(const Ray& r, const SpheresSoA& spheres, float tMin, float tMax, Hit& outHit);
float RandomFloat01(uint32_t& state);
float3 RandomInUnitDisk(uint32_t& state);
float3 RandomInUnitSphere(uint32_t& state);
float3 RandomUnitVector(uint32_t& state);
struct Camera
{
Camera() {}
// vfov is top to bottom in degrees
Camera(const float3& lookFrom, const float3& lookAt, const float3& vup, float vfov, float aspect, float aperture, float focusDist)
{
lensRadius = aperture / 2;
float theta = vfov*kPI/180;
float halfHeight = tanf(theta/2);
float halfWidth = aspect * halfHeight;
origin = lookFrom;
w = normalize(lookFrom - lookAt);
u = normalize(cross(vup, w));
v = cross(w, u);
lowerLeftCorner = origin - halfWidth*focusDist*u - halfHeight*focusDist*v - focusDist*w;
horizontal = 2*halfWidth*focusDist*u;
vertical = 2*halfHeight*focusDist*v;
}
Ray GetRay(float s, float t, uint32_t& state) const
{
float3 rd = lensRadius * RandomInUnitDisk(state);
float3 offset = u * rd.getX() + v * rd.getY();
return Ray(origin + offset, normalize(lowerLeftCorner + s*horizontal + t*vertical - origin - offset));
}
float3 origin;
float3 lowerLeftCorner;
float3 horizontal;
float3 vertical;
float3 u, v, w;
float lensRadius;
};

View File

@@ -0,0 +1,392 @@
#include "Config.h"
#include "Test.h"
#include "Maths.h"
#include <algorithm>
#if CPU_CAN_DO_THREADS
#include "enkiTS/TaskScheduler_c.h"
#include <thread>
#endif
#include <atomic>
#include "../../../Tracy.hpp"
// 46 spheres (2 emissive) when enabled; 9 spheres (1 emissive) when disabled
#define DO_BIG_SCENE 1
static Sphere s_Spheres[] =
{
{float3(0,-100.5,-1), 100},
{float3(2,0,-1), 0.5f},
{float3(0,0,-1), 0.5f},
{float3(-2,0,-1), 0.5f},
{float3(2,0,1), 0.5f},
{float3(0,0,1), 0.5f},
{float3(-2,0,1), 0.5f},
{float3(0.5f,1,0.5f), 0.5f},
{float3(-1.5f,1.5f,0.f), 0.3f},
#if DO_BIG_SCENE
{float3(4,0,-3), 0.5f}, {float3(3,0,-3), 0.5f}, {float3(2,0,-3), 0.5f}, {float3(1,0,-3), 0.5f}, {float3(0,0,-3), 0.5f}, {float3(-1,0,-3), 0.5f}, {float3(-2,0,-3), 0.5f}, {float3(-3,0,-3), 0.5f}, {float3(-4,0,-3), 0.5f},
{float3(4,0,-4), 0.5f}, {float3(3,0,-4), 0.5f}, {float3(2,0,-4), 0.5f}, {float3(1,0,-4), 0.5f}, {float3(0,0,-4), 0.5f}, {float3(-1,0,-4), 0.5f}, {float3(-2,0,-4), 0.5f}, {float3(-3,0,-4), 0.5f}, {float3(-4,0,-4), 0.5f},
{float3(4,0,-5), 0.5f}, {float3(3,0,-5), 0.5f}, {float3(2,0,-5), 0.5f}, {float3(1,0,-5), 0.5f}, {float3(0,0,-5), 0.5f}, {float3(-1,0,-5), 0.5f}, {float3(-2,0,-5), 0.5f}, {float3(-3,0,-5), 0.5f}, {float3(-4,0,-5), 0.5f},
{float3(4,0,-6), 0.5f}, {float3(3,0,-6), 0.5f}, {float3(2,0,-6), 0.5f}, {float3(1,0,-6), 0.5f}, {float3(0,0,-6), 0.5f}, {float3(-1,0,-6), 0.5f}, {float3(-2,0,-6), 0.5f}, {float3(-3,0,-6), 0.5f}, {float3(-4,0,-6), 0.5f},
{float3(1.5f,1.5f,-2), 0.3f},
#endif // #if DO_BIG_SCENE
};
const int kSphereCount = sizeof(s_Spheres) / sizeof(s_Spheres[0]);
static SpheresSoA s_SpheresSoA(kSphereCount);
struct Material
{
enum Type { Lambert, Metal, Dielectric };
Type type;
float3 albedo;
float3 emissive;
float roughness;
float ri;
};
static Material s_SphereMats[kSphereCount] =
{
{ Material::Lambert, float3(0.8f, 0.8f, 0.8f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.8f, 0.4f, 0.4f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0, 0, },
{ Material::Metal, float3(0.4f, 0.4f, 0.8f), float3(0,0,0), 0, 0 },
{ Material::Metal, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0, 0 },
{ Material::Metal, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0.2f, 0 },
{ Material::Metal, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0.6f, 0 },
{ Material::Dielectric, float3(0.4f, 0.4f, 0.4f), float3(0,0,0), 0, 1.5f },
{ Material::Lambert, float3(0.8f, 0.6f, 0.2f), float3(30,25,15), 0, 0 },
#if DO_BIG_SCENE
{ Material::Lambert, float3(0.1f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.2f, 0.2f, 0.2f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.3f, 0.3f, 0.3f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.4f, 0.4f, 0.4f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.5f, 0.5f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.6f, 0.6f, 0.6f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.7f, 0.7f, 0.7f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.8f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.9f, 0.9f, 0.9f), float3(0,0,0), 0, 0, },
{ Material::Metal, float3(0.1f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.2f, 0.2f, 0.2f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.3f, 0.3f, 0.3f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.4f, 0.4f, 0.4f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.5f, 0.5f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.6f, 0.6f, 0.6f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.7f, 0.7f, 0.7f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.8f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.9f, 0.9f, 0.9f), float3(0,0,0), 0, 0, },
{ Material::Metal, float3(0.8f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.8f, 0.5f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.8f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.4f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.8f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.1f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.5f, 0.1f, 0.8f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.8f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.8f, 0.5f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.8f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.4f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.8f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.1f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.5f, 0.1f, 0.8f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.1f, 0.2f, 0.5f), float3(3,10,20), 0, 0 },
#endif
};
static int s_EmissiveSpheres[kSphereCount];
static int s_EmissiveSphereCount;
static Camera s_Cam;
const float kMinT = 0.001f;
const float kMaxT = 1.0e7f;
const int kMaxDepth = 10;
bool HitWorld(const Ray& r, float tMin, float tMax, Hit& outHit, int& outID)
{
outID = HitSpheres(r, s_SpheresSoA, tMin, tMax, outHit);
return outID != -1;
}
static bool Scatter(const Material& mat, const Ray& r_in, const Hit& rec, float3& attenuation, Ray& scattered, float3& outLightE, int& inoutRayCount, uint32_t& state)
{
ZoneScoped;
outLightE = float3(0,0,0);
if (mat.type == Material::Lambert)
{
// random point on unit sphere that is tangent to the hit point
float3 target = rec.pos + rec.normal + RandomUnitVector(state);
scattered = Ray(rec.pos, normalize(target - rec.pos));
attenuation = mat.albedo;
// sample lights
#if DO_LIGHT_SAMPLING
for (int j = 0; j < s_EmissiveSphereCount; ++j)
{
int i = s_EmissiveSpheres[j];
const Material& smat = s_SphereMats[i];
if (&mat == &smat)
continue; // skip self
const Sphere& s = s_Spheres[i];
// create a random direction towards sphere
// coord system for sampling: sw, su, sv
float3 sw = normalize(s.center - rec.pos);
float3 su = normalize(cross(fabs(sw.getX())>0.01f ? float3(0,1,0):float3(1,0,0), sw));
float3 sv = cross(sw, su);
// sample sphere by solid angle
float cosAMax = sqrtf(1.0f - s.radius*s.radius / sqLength(rec.pos-s.center));
float eps1 = RandomFloat01(state), eps2 = RandomFloat01(state);
float cosA = 1.0f - eps1 + eps1 * cosAMax;
float sinA = sqrtf(1.0f - cosA*cosA);
float phi = 2 * kPI * eps2;
float3 l = su * (cosf(phi) * sinA) + sv * (sinf(phi) * sinA) + sw * cosA;
//l = normalize(l); // NOTE(fg): This is already normalized, by construction.
// shoot shadow ray
Hit lightHit;
int hitID;
++inoutRayCount;
if (HitWorld(Ray(rec.pos, l), kMinT, kMaxT, lightHit, hitID) && hitID == i)
{
float omega = 2 * kPI * (1-cosAMax);
float3 rdir = r_in.dir;
AssertUnit(rdir);
float3 nl = dot(rec.normal, rdir) < 0 ? rec.normal : -rec.normal;
outLightE += (mat.albedo * smat.emissive) * (std::max(0.0f, dot(l, nl)) * omega / kPI);
}
}
#endif
return true;
}
else if (mat.type == Material::Metal)
{
AssertUnit(r_in.dir); AssertUnit(rec.normal);
float3 refl = reflect(r_in.dir, rec.normal);
// reflected ray, and random inside of sphere based on roughness
float roughness = mat.roughness;
#if DO_MITSUBA_COMPARE
roughness = 0; // until we get better BRDF for metals
#endif
scattered = Ray(rec.pos, normalize(refl + roughness*RandomInUnitSphere(state)));
attenuation = mat.albedo;
return dot(scattered.dir, rec.normal) > 0;
}
else if (mat.type == Material::Dielectric)
{
AssertUnit(r_in.dir); AssertUnit(rec.normal);
float3 outwardN;
float3 rdir = r_in.dir;
float3 refl = reflect(rdir, rec.normal);
float nint;
attenuation = float3(1,1,1);
float3 refr;
float reflProb;
float cosine;
if (dot(rdir, rec.normal) > 0)
{
outwardN = -rec.normal;
nint = mat.ri;
cosine = mat.ri * dot(rdir, rec.normal);
}
else
{
outwardN = rec.normal;
nint = 1.0f / mat.ri;
cosine = -dot(rdir, rec.normal);
}
if (refract(rdir, outwardN, nint, refr))
{
reflProb = schlick(cosine, mat.ri);
}
else
{
reflProb = 1;
}
if (RandomFloat01(state) < reflProb)
scattered = Ray(rec.pos, normalize(refl));
else
scattered = Ray(rec.pos, normalize(refr));
}
else
{
attenuation = float3(1,0,1);
return false;
}
return true;
}
static float3 Trace(const Ray& r, int depth, int& inoutRayCount, uint32_t& state, bool doMaterialE = true)
{
ZoneScoped;
Hit rec;
int id = 0;
++inoutRayCount;
if (HitWorld(r, kMinT, kMaxT, rec, id))
{
Ray scattered;
float3 attenuation;
float3 lightE;
const Material& mat = s_SphereMats[id];
float3 matE = mat.emissive;
if (depth < kMaxDepth && Scatter(mat, r, rec, attenuation, scattered, lightE, inoutRayCount, state))
{
#if DO_LIGHT_SAMPLING
if (!doMaterialE) matE = float3(0,0,0); // don't add material emission if told so
// dor Lambert materials, we just did explicit light (emissive) sampling and already
// for their contribution, so if next ray bounce hits the light again, don't add
// emission
doMaterialE = (mat.type != Material::Lambert);
#endif
return matE + lightE + attenuation * Trace(scattered, depth+1, inoutRayCount, state, doMaterialE);
}
else
{
return matE;
}
}
else
{
// sky
#if DO_MITSUBA_COMPARE
return float3(0.15f,0.21f,0.3f); // easier compare with Mitsuba's constant environment light
#else
float3 unitDir = r.dir;
float t = 0.5f*(unitDir.getY() + 1.0f);
return ((1.0f-t)*float3(1.0f, 1.0f, 1.0f) + t*float3(0.5f, 0.7f, 1.0f)) * 0.3f;
#endif
}
}
#if CPU_CAN_DO_THREADS
static enkiTaskScheduler* g_TS;
#endif
void InitializeTest()
{
ZoneScoped;
#if CPU_CAN_DO_THREADS
g_TS = enkiNewTaskScheduler();
enkiInitTaskSchedulerNumThreads(g_TS, std::max<int>( 2, std::thread::hardware_concurrency() - 2));
#endif
}
void ShutdownTest()
{
ZoneScoped;
#if CPU_CAN_DO_THREADS
enkiDeleteTaskScheduler(g_TS);
#endif
}
struct JobData
{
float time;
int frameCount;
int screenWidth, screenHeight;
float* backbuffer;
Camera* cam;
std::atomic<int> rayCount;
unsigned testFlags;
};
static void TraceRowJob(uint32_t start, uint32_t end, uint32_t threadnum, void* data_)
{
ZoneScoped;
JobData& data = *(JobData*)data_;
float* backbuffer = data.backbuffer + start * data.screenWidth * 4;
float invWidth = 1.0f / data.screenWidth;
float invHeight = 1.0f / data.screenHeight;
float lerpFac = float(data.frameCount) / float(data.frameCount+1);
if (data.testFlags & kFlagAnimate)
lerpFac *= DO_ANIMATE_SMOOTHING;
if (!(data.testFlags & kFlagProgressive))
lerpFac = 0;
int rayCount = 0;
for (uint32_t y = start; y < end; ++y)
{
uint32_t state = (y * 9781 + data.frameCount * 6271) | 1;
for (int x = 0; x < data.screenWidth; ++x)
{
float3 col(0, 0, 0);
for (int s = 0; s < DO_SAMPLES_PER_PIXEL; s++)
{
float u = float(x + RandomFloat01(state)) * invWidth;
float v = float(y + RandomFloat01(state)) * invHeight;
Ray r = data.cam->GetRay(u, v, state);
col += Trace(r, 0, rayCount, state);
}
col *= 1.0f / float(DO_SAMPLES_PER_PIXEL);
float3 prev(backbuffer[0], backbuffer[1], backbuffer[2]);
col = prev * lerpFac + col * (1-lerpFac);
col.store(backbuffer);
backbuffer += 4;
}
}
data.rayCount += rayCount;
}
void UpdateTest(float time, int frameCount, int screenWidth, int screenHeight, unsigned testFlags)
{
ZoneScoped;
if (testFlags & kFlagAnimate)
{
s_Spheres[1].center.setY(cosf(time) + 1.0f);
s_Spheres[8].center.setZ(sinf(time)*0.3f);
}
float3 lookfrom(0, 2, 3);
float3 lookat(0, 0, 0);
float distToFocus = 3;
#if DO_MITSUBA_COMPARE
float aperture = 0.0f;
#else
float aperture = 0.1f;
#endif
#if DO_BIG_SCENE
aperture *= 0.2f;
#endif
s_EmissiveSphereCount = 0;
for (int i = 0; i < kSphereCount; ++i)
{
Sphere& s = s_Spheres[i];
s.UpdateDerivedData();
s_SpheresSoA.centerX[i] = s.center.getX();
s_SpheresSoA.centerY[i] = s.center.getY();
s_SpheresSoA.centerZ[i] = s.center.getZ();
s_SpheresSoA.sqRadius[i] = s.radius * s.radius;
s_SpheresSoA.invRadius[i] = s.invRadius;
// Remember IDs of emissive spheres (light sources)
const Material& smat = s_SphereMats[i];
if (smat.emissive.getX() > 0 || smat.emissive.getY() > 0 || smat.emissive.getZ() > 0)
{
s_EmissiveSpheres[s_EmissiveSphereCount] = i;
s_EmissiveSphereCount++;
}
}
s_Cam = Camera(lookfrom, lookat, float3(0, 1, 0), 60, float(screenWidth) / float(screenHeight), aperture, distToFocus);
}
void DrawTest(float time, int frameCount, int screenWidth, int screenHeight, float* backbuffer, int& outRayCount, unsigned testFlags)
{
ZoneScoped;
JobData args;
args.time = time;
args.frameCount = frameCount;
args.screenWidth = screenWidth;
args.screenHeight = screenHeight;
args.backbuffer = backbuffer;
args.cam = &s_Cam;
args.testFlags = testFlags;
args.rayCount = 0;
#if CPU_CAN_DO_THREADS
enkiTaskSet* task = enkiCreateTaskSet(g_TS, TraceRowJob);
bool threaded = true;
enkiAddTaskSetToPipeMinRange(g_TS, task, &args, screenHeight, threaded ? 4 : screenHeight);
enkiWaitForTaskSet(g_TS, task);
enkiDeleteTaskSet(task);
#else
TraceRowJob(0, screenHeight, 0, &args);
#endif
outRayCount = args.rayCount;
}
void GetObjectCount(int& outCount, int& outObjectSize, int& outMaterialSize, int& outCamSize)
{
ZoneScoped;
outCount = kSphereCount;
outObjectSize = sizeof(Sphere);
outMaterialSize = sizeof(Material);
outCamSize = sizeof(Camera);
}
void GetSceneDesc(void* outObjects, void* outMaterials, void* outCam, void* outEmissives, int* outEmissiveCount)
{
ZoneScoped;
memcpy(outObjects, s_Spheres, kSphereCount * sizeof(s_Spheres[0]));
memcpy(outMaterials, s_SphereMats, kSphereCount * sizeof(s_SphereMats[0]));
memcpy(outCam, &s_Cam, sizeof(s_Cam));
memcpy(outEmissives, s_EmissiveSpheres, s_EmissiveSphereCount * sizeof(s_EmissiveSpheres[0]));
*outEmissiveCount = s_EmissiveSphereCount;
}

View File

@@ -0,0 +1,17 @@
#pragma once
#include <stdint.h>
enum TestFlags
{
kFlagAnimate = (1 << 0),
kFlagProgressive = (1 << 1),
};
void InitializeTest();
void ShutdownTest();
void UpdateTest(float time, int frameCount, int screenWidth, int screenHeight, unsigned testFlags);
void DrawTest(float time, int frameCount, int screenWidth, int screenHeight, float* backbuffer, int& outRayCount, unsigned testFlags);
void GetObjectCount(int& outCount, int& outObjectSize, int& outMaterialSize, int& outCamSize);
void GetSceneDesc(void* outObjects, void* outMaterials, void* outCam, void* outEmissives, int* outEmissiveCount);

View File

@@ -0,0 +1,79 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#undef GetObject
#include <intrin.h>
extern "C" void _ReadWriteBarrier();
#pragma intrinsic(_ReadWriteBarrier)
#pragma intrinsic(_InterlockedCompareExchange)
#pragma intrinsic(_InterlockedExchangeAdd)
// Memory Barriers to prevent CPU and Compiler re-ordering
#define BASE_MEMORYBARRIER_ACQUIRE() _ReadWriteBarrier()
#define BASE_MEMORYBARRIER_RELEASE() _ReadWriteBarrier()
#define BASE_ALIGN(x) __declspec( align( x ) )
#else
#define BASE_MEMORYBARRIER_ACQUIRE() __asm__ __volatile__("": : :"memory")
#define BASE_MEMORYBARRIER_RELEASE() __asm__ __volatile__("": : :"memory")
#define BASE_ALIGN(x) __attribute__ ((aligned( x )))
#endif
namespace enki
{
// Atomically performs: if( *pDest == compareWith ) { *pDest = swapTo; }
// returns old *pDest (so if successfull, returns compareWith)
inline uint32_t AtomicCompareAndSwap( volatile uint32_t* pDest, uint32_t swapTo, uint32_t compareWith )
{
#ifdef _WIN32
// assumes two's complement - unsigned / signed conversion leads to same bit pattern
return _InterlockedCompareExchange( (volatile long*)pDest,swapTo, compareWith );
#else
return __sync_val_compare_and_swap( pDest, compareWith, swapTo );
#endif
}
inline uint64_t AtomicCompareAndSwap( volatile uint64_t* pDest, uint64_t swapTo, uint64_t compareWith )
{
#ifdef _WIN32
// assumes two's complement - unsigned / signed conversion leads to same bit pattern
return _InterlockedCompareExchange64( (__int64 volatile*)pDest, swapTo, compareWith );
#else
return __sync_val_compare_and_swap( pDest, compareWith, swapTo );
#endif
}
// Atomically performs: tmp = *pDest; *pDest += value; return tmp;
inline int32_t AtomicAdd( volatile int32_t* pDest, int32_t value )
{
#ifdef _WIN32
return _InterlockedExchangeAdd( (long*)pDest, value );
#else
return __sync_fetch_and_add( pDest, value );
#endif
}
}

View File

@@ -0,0 +1,240 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#include <assert.h>
#include "Atomics.h"
#include <string.h>
namespace enki
{
// LockLessMultiReadPipe - Single writer, multiple reader thread safe pipe using (semi) lockless programming
// Readers can only read from the back of the pipe
// The single writer can write to the front of the pipe, and read from both ends (a writer can be a reader)
// for many of the principles used here, see http://msdn.microsoft.com/en-us/library/windows/desktop/ee418650(v=vs.85).aspx
// Note: using log2 sizes so we do not need to clamp (multi-operation)
// T is the contained type
// Note this is not true lockless as the use of flags as a form of lock state.
template<uint8_t cSizeLog2, typename T> class LockLessMultiReadPipe
{
public:
LockLessMultiReadPipe();
~LockLessMultiReadPipe() {}
// ReaderTryReadBack returns false if we were unable to read
// This is thread safe for both multiple readers and the writer
bool ReaderTryReadBack( T* pOut );
// WriterTryReadFront returns false if we were unable to read
// This is thread safe for the single writer, but should not be called by readers
bool WriterTryReadFront( T* pOut );
// WriterTryWriteFront returns false if we were unable to write
// This is thread safe for the single writer, but should not be called by readers
bool WriterTryWriteFront( const T& in );
// IsPipeEmpty() is a utility function, not intended for general use
// Should only be used very prudently.
bool IsPipeEmpty() const
{
return 0 == m_WriteIndex - m_ReadCount;
}
void Clear()
{
m_WriteIndex = 0;
m_ReadIndex = 0;
m_ReadCount = 0;
memset( (void*)m_Flags, 0, sizeof( m_Flags ) );
}
private:
const static uint32_t ms_cSize = ( 1 << cSizeLog2 );
const static uint32_t ms_cIndexMask = ms_cSize - 1;
const static uint32_t FLAG_INVALID = 0xFFFFFFFF; // 32bit for CAS
const static uint32_t FLAG_CAN_WRITE = 0x00000000; // 32bit for CAS
const static uint32_t FLAG_CAN_READ = 0x11111111; // 32bit for CAS
T m_Buffer[ ms_cSize ];
// read and write indexes allow fast access to the pipe, but actual access
// controlled by the access flags.
volatile uint32_t BASE_ALIGN(4) m_WriteIndex;
volatile uint32_t BASE_ALIGN(4) m_ReadCount;
volatile uint32_t m_Flags[ ms_cSize ];
volatile uint32_t BASE_ALIGN(4) m_ReadIndex;
};
template<uint8_t cSizeLog2, typename T> inline
LockLessMultiReadPipe<cSizeLog2,T>::LockLessMultiReadPipe()
: m_WriteIndex(0)
, m_ReadIndex(0)
, m_ReadCount(0)
{
assert( cSizeLog2 < 32 );
memset( (void*)m_Flags, 0, sizeof( m_Flags ) );
}
template<uint8_t cSizeLog2, typename T> inline
bool LockLessMultiReadPipe<cSizeLog2,T>::ReaderTryReadBack( T* pOut )
{
uint32_t actualReadIndex;
uint32_t readCount = m_ReadCount;
// We get hold of read index for consistency,
// and do first pass starting at read count
uint32_t readIndexToUse = readCount;
while(true)
{
uint32_t writeIndex = m_WriteIndex;
// power of two sizes ensures we can use a simple calc without modulus
uint32_t numInPipe = writeIndex - readCount;
if( 0 == numInPipe )
{
return false;
}
if( readIndexToUse >= writeIndex )
{
// move back to start
readIndexToUse = m_ReadIndex;
}
// power of two sizes ensures we can perform AND for a modulus
actualReadIndex = readIndexToUse & ms_cIndexMask;
// Multiple potential readers mean we should check if the data is valid,
// using an atomic compare exchange
uint32_t previous = AtomicCompareAndSwap( &m_Flags[ actualReadIndex ], FLAG_INVALID, FLAG_CAN_READ );
if( FLAG_CAN_READ == previous )
{
break;
}
++readIndexToUse;
//update known readcount
readCount = m_ReadCount;
}
// we update the read index using an atomic add, as we've only read one piece of data.
// this ensure consistency of the read index, and the above loop ensures readers
// only read from unread data
AtomicAdd( (volatile int32_t*)&m_ReadCount, 1 );
BASE_MEMORYBARRIER_ACQUIRE();
// now read data, ensuring we do so after above reads & CAS
*pOut = m_Buffer[ actualReadIndex ];
m_Flags[ actualReadIndex ] = FLAG_CAN_WRITE;
return true;
}
template<uint8_t cSizeLog2, typename T> inline
bool LockLessMultiReadPipe<cSizeLog2,T>::WriterTryReadFront( T* pOut )
{
uint32_t writeIndex = m_WriteIndex;
uint32_t frontReadIndex = writeIndex;
// Multiple potential readers mean we should check if the data is valid,
// using an atomic compare exchange - which acts as a form of lock (so not quite lockless really).
uint32_t previous = FLAG_INVALID;
uint32_t actualReadIndex = 0;
while( true )
{
// power of two sizes ensures we can use a simple calc without modulus
uint32_t readCount = m_ReadCount;
uint32_t numInPipe = writeIndex - readCount;
if( 0 == numInPipe || 0 == frontReadIndex )
{
// frontReadIndex can get to 0 here if that item was just being read by another thread.
m_ReadIndex = readCount;
return false;
}
--frontReadIndex;
actualReadIndex = frontReadIndex & ms_cIndexMask;
previous = AtomicCompareAndSwap( &m_Flags[ actualReadIndex ], FLAG_INVALID, FLAG_CAN_READ );
if( FLAG_CAN_READ == previous )
{
break;
}
else if( m_ReadIndex >= frontReadIndex )
{
return false;
}
}
// now read data, ensuring we do so after above reads & CAS
*pOut = m_Buffer[ actualReadIndex ];
m_Flags[ actualReadIndex ] = FLAG_CAN_WRITE;
BASE_MEMORYBARRIER_RELEASE();
// 32-bit aligned stores are atomic, and writer owns the write index
// we only move one back as this is as many as we have read, not where we have read from.
--m_WriteIndex;
return true;
}
template<uint8_t cSizeLog2, typename T> inline
bool LockLessMultiReadPipe<cSizeLog2,T>::WriterTryWriteFront( const T& in )
{
// The writer 'owns' the write index, and readers can only reduce
// the amount of data in the pipe.
// We get hold of both values for consistency and to reduce false sharing
// impacting more than one access
uint32_t writeIndex = m_WriteIndex;
// power of two sizes ensures we can perform AND for a modulus
uint32_t actualWriteIndex = writeIndex & ms_cIndexMask;
// a reader may still be reading this item, as there are multiple readers
if( m_Flags[ actualWriteIndex ] != FLAG_CAN_WRITE )
{
return false; // still being read, so have caught up with tail.
}
// as we are the only writer we can update the data without atomics
// whilst the write index has not been updated
m_Buffer[ actualWriteIndex ] = in;
m_Flags[ actualWriteIndex ] = FLAG_CAN_READ;
// We need to ensure the above writes occur prior to updating the write index,
// otherwise another thread might read before it's finished
BASE_MEMORYBARRIER_RELEASE();
// 32-bit aligned stores are atomic, and the writer controls the write index
++writeIndex;
m_WriteIndex = writeIndex;
return true;
}
}

View File

@@ -0,0 +1,437 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#include <assert.h>
#include "TaskScheduler.h"
#include "LockLessMultiReadPipe.h"
using namespace enki;
static const uint32_t PIPESIZE_LOG2 = 8;
static const uint32_t SPIN_COUNT = 100;
static const uint32_t SPIN_BACKOFF_MULTIPLIER = 10;
static const uint32_t MAX_NUM_INITIAL_PARTITIONS = 8;
// each software thread gets it's own copy of gtl_threadNum, so this is safe to use as a static variable
static THREAD_LOCAL uint32_t gtl_threadNum = 0;
namespace enki
{
struct SubTaskSet
{
ITaskSet* pTask;
TaskSetPartition partition;
};
// we derive class TaskPipe rather than typedef to get forward declaration working easily
class TaskPipe : public LockLessMultiReadPipe<PIPESIZE_LOG2,enki::SubTaskSet> {};
struct ThreadArgs
{
uint32_t threadNum;
TaskScheduler* pTaskScheduler;
};
}
namespace
{
SubTaskSet SplitTask( SubTaskSet& subTask_, uint32_t rangeToSplit_ )
{
SubTaskSet splitTask = subTask_;
uint32_t rangeLeft = subTask_.partition.end - subTask_.partition.start;
if( rangeToSplit_ > rangeLeft )
{
rangeToSplit_ = rangeLeft;
}
splitTask.partition.end = subTask_.partition.start + rangeToSplit_;
subTask_.partition.start = splitTask.partition.end;
return splitTask;
}
#if defined _WIN32
#if defined _M_IX86 || defined _M_X64
#pragma intrinsic(_mm_pause)
inline void Pause() { _mm_pause(); }
#endif
#elif defined __i386__ || defined __x86_64__
inline void Pause() { __asm__ __volatile__("pause;"); }
#else
inline void Pause() { ;} // may have NOP or yield equiv
#endif
}
static void SafeCallback(ProfilerCallbackFunc func_, uint32_t threadnum_)
{
if( func_ )
{
func_(threadnum_);
}
}
ProfilerCallbacks* TaskScheduler::GetProfilerCallbacks()
{
return &m_ProfilerCallbacks;
}
THREADFUNC_DECL TaskScheduler::TaskingThreadFunction( void* pArgs )
{
ThreadArgs args = *(ThreadArgs*)pArgs;
uint32_t threadNum = args.threadNum;
TaskScheduler* pTS = args.pTaskScheduler;
gtl_threadNum = threadNum;
SafeCallback( pTS->m_ProfilerCallbacks.threadStart, threadNum );
uint32_t spinCount = 0;
uint32_t hintPipeToCheck_io = threadNum + 1; // does not need to be clamped.
while( pTS->m_bRunning )
{
if(!pTS->TryRunTask( threadNum, hintPipeToCheck_io ) )
{
// no tasks, will spin then wait
++spinCount;
if( spinCount > SPIN_COUNT )
{
pTS->WaitForTasks( threadNum );
spinCount = 0;
}
else
{
uint32_t spinBackoffCount = spinCount * SPIN_BACKOFF_MULTIPLIER;
while( spinBackoffCount )
{
Pause();
--spinBackoffCount;
}
}
}
else
{
spinCount = 0;
}
}
AtomicAdd( &pTS->m_NumThreadsRunning, -1 );
SafeCallback( pTS->m_ProfilerCallbacks.threadStop, threadNum );
return 0;
}
void TaskScheduler::StartThreads()
{
if( m_bHaveThreads )
{
return;
}
m_bRunning = true;
SemaphoreCreate( m_NewTaskSemaphore );
// we create one less thread than m_NumThreads as the main thread counts as one
m_pThreadNumStore = new ThreadArgs[m_NumThreads];
m_pThreadIDs = new threadid_t[m_NumThreads];
m_pThreadNumStore[0].threadNum = 0;
m_pThreadNumStore[0].pTaskScheduler = this;
m_pThreadIDs[0] = 0;
m_NumThreadsWaiting = 0;
m_NumThreadsRunning = 1;// acount for main thread
for( uint32_t thread = 1; thread < m_NumThreads; ++thread )
{
m_pThreadNumStore[thread].threadNum = thread;
m_pThreadNumStore[thread].pTaskScheduler = this;
ThreadCreate( &m_pThreadIDs[thread], TaskingThreadFunction, &m_pThreadNumStore[thread] );
++m_NumThreadsRunning;
}
// ensure we have sufficient tasks to equally fill either all threads including main
// or just the threads we've launched, this is outside the firstinit as we want to be able
// to runtime change it
if( 1 == m_NumThreads )
{
m_NumPartitions = 1;
m_NumInitialPartitions = 1;
}
else
{
m_NumPartitions = m_NumThreads * (m_NumThreads - 1);
m_NumInitialPartitions = m_NumThreads - 1;
if( m_NumInitialPartitions > MAX_NUM_INITIAL_PARTITIONS )
{
m_NumInitialPartitions = MAX_NUM_INITIAL_PARTITIONS;
}
}
m_bHaveThreads = true;
}
void TaskScheduler::StopThreads( bool bWait_ )
{
if( m_bHaveThreads )
{
// wait for them threads quit before deleting data
m_bRunning = false;
while( bWait_ && m_NumThreadsRunning > 1 )
{
// keep firing event to ensure all threads pick up state of m_bRunning
SemaphoreSignal( m_NewTaskSemaphore, m_NumThreadsRunning );
}
for( uint32_t thread = 1; thread < m_NumThreads; ++thread )
{
ThreadTerminate( m_pThreadIDs[thread] );
}
m_NumThreads = 0;
delete[] m_pThreadNumStore;
delete[] m_pThreadIDs;
m_pThreadNumStore = 0;
m_pThreadIDs = 0;
SemaphoreClose( m_NewTaskSemaphore );
m_bHaveThreads = false;
m_NumThreadsWaiting = 0;
m_NumThreadsRunning = 0;
}
}
bool TaskScheduler::TryRunTask( uint32_t threadNum, uint32_t& hintPipeToCheck_io_ )
{
// check for tasks
SubTaskSet subTask;
bool bHaveTask = m_pPipesPerThread[ threadNum ].WriterTryReadFront( &subTask );
uint32_t threadToCheck = hintPipeToCheck_io_;
uint32_t checkCount = 0;
while( !bHaveTask && checkCount < m_NumThreads )
{
threadToCheck = ( hintPipeToCheck_io_ + checkCount ) % m_NumThreads;
if( threadToCheck != threadNum )
{
bHaveTask = m_pPipesPerThread[ threadToCheck ].ReaderTryReadBack( &subTask );
}
++checkCount;
}
if( bHaveTask )
{
// update hint, will preserve value unless actually got task from another thread.
hintPipeToCheck_io_ = threadToCheck;
uint32_t partitionSize = subTask.partition.end - subTask.partition.start;
if( subTask.pTask->m_RangeToRun < partitionSize )
{
SubTaskSet taskToRun = SplitTask( subTask, subTask.pTask->m_RangeToRun );
SplitAndAddTask( gtl_threadNum, subTask, subTask.pTask->m_RangeToRun, 0 );
taskToRun.pTask->ExecuteRange( taskToRun.partition, threadNum );
AtomicAdd( &taskToRun.pTask->m_RunningCount, -1 );
}
else
{
// the task has already been divided up by AddTaskSetToPipe, so just run it
subTask.pTask->ExecuteRange( subTask.partition, threadNum );
AtomicAdd( &subTask.pTask->m_RunningCount, -1 );
}
}
return bHaveTask;
}
void TaskScheduler::WaitForTasks( uint32_t threadNum )
{
// We incrememt the number of threads waiting here in order
// to ensure that the check for tasks occurs after the increment
// to prevent a task being added after a check, then the thread waiting.
// This will occasionally result in threads being mistakenly awoken,
// but they will then go back to sleep.
AtomicAdd( &m_NumThreadsWaiting, 1 );
bool bHaveTasks = false;
for( uint32_t thread = 0; thread < m_NumThreads; ++thread )
{
if( !m_pPipesPerThread[ thread ].IsPipeEmpty() )
{
bHaveTasks = true;
break;
}
}
if( !bHaveTasks )
{
SafeCallback( m_ProfilerCallbacks.waitStart, threadNum );
SemaphoreWait( m_NewTaskSemaphore );
SafeCallback( m_ProfilerCallbacks.waitStop, threadNum );
}
int32_t prev = AtomicAdd( &m_NumThreadsWaiting, -1 );
assert( prev != 0 );
}
void TaskScheduler::WakeThreads()
{
SemaphoreSignal( m_NewTaskSemaphore, m_NumThreadsWaiting );
}
void TaskScheduler::SplitAndAddTask( uint32_t threadNum_, SubTaskSet subTask_,
uint32_t rangeToSplit_, int32_t runningCountOffset_ )
{
int32_t numAdded = 0;
while( subTask_.partition.start != subTask_.partition.end )
{
SubTaskSet taskToAdd = SplitTask( subTask_, rangeToSplit_ );
// add the partition to the pipe
++numAdded;
if( !m_pPipesPerThread[ gtl_threadNum ].WriterTryWriteFront( taskToAdd ) )
{
if( numAdded > 1 )
{
WakeThreads();
}
// alter range to run the appropriate fraction
if( taskToAdd.pTask->m_RangeToRun < rangeToSplit_ )
{
taskToAdd.partition.end = taskToAdd.partition.start + taskToAdd.pTask->m_RangeToRun;
subTask_.partition.start = taskToAdd.partition.end;
}
taskToAdd.pTask->ExecuteRange( taskToAdd.partition, threadNum_ );
--numAdded;
}
}
// increment running count by number added
AtomicAdd( &subTask_.pTask->m_RunningCount, numAdded + runningCountOffset_ );
WakeThreads();
}
void TaskScheduler::AddTaskSetToPipe( ITaskSet* pTaskSet )
{
// set running count to -1 to guarantee it won't be found complete until all subtasks added
pTaskSet->m_RunningCount = -1;
// divide task up and add to pipe
pTaskSet->m_RangeToRun = pTaskSet->m_SetSize / m_NumPartitions;
if( pTaskSet->m_RangeToRun < pTaskSet->m_MinRange ) { pTaskSet->m_RangeToRun = pTaskSet->m_MinRange; }
uint32_t rangeToSplit = pTaskSet->m_SetSize / m_NumInitialPartitions;
if( rangeToSplit < pTaskSet->m_MinRange ) { rangeToSplit = pTaskSet->m_MinRange; }
SubTaskSet subTask;
subTask.pTask = pTaskSet;
subTask.partition.start = 0;
subTask.partition.end = pTaskSet->m_SetSize;
SplitAndAddTask( gtl_threadNum, subTask, rangeToSplit, 1 );
}
void TaskScheduler::WaitforTaskSet( const ITaskSet* pTaskSet )
{
uint32_t hintPipeToCheck_io = gtl_threadNum + 1; // does not need to be clamped.
if( pTaskSet )
{
while( pTaskSet->m_RunningCount )
{
TryRunTask( gtl_threadNum, hintPipeToCheck_io );
// should add a spin then wait for task completion event.
}
}
else
{
TryRunTask( gtl_threadNum, hintPipeToCheck_io );
}
}
void TaskScheduler::WaitforAll()
{
bool bHaveTasks = true;
uint32_t hintPipeToCheck_io = gtl_threadNum + 1; // does not need to be clamped.
int32_t threadsRunning = m_NumThreadsRunning - 1;
while( bHaveTasks || m_NumThreadsWaiting < threadsRunning )
{
TryRunTask( gtl_threadNum, hintPipeToCheck_io );
bHaveTasks = false;
for( uint32_t thread = 0; thread < m_NumThreads; ++thread )
{
if( !m_pPipesPerThread[ thread ].IsPipeEmpty() )
{
bHaveTasks = true;
break;
}
}
}
}
void TaskScheduler::WaitforAllAndShutdown()
{
WaitforAll();
StopThreads(true);
delete[] m_pPipesPerThread;
m_pPipesPerThread = 0;
}
uint32_t TaskScheduler::GetNumTaskThreads() const
{
return m_NumThreads;
}
TaskScheduler::TaskScheduler()
: m_pPipesPerThread(NULL)
, m_NumThreads(0)
, m_pThreadNumStore(NULL)
, m_pThreadIDs(NULL)
, m_bRunning(false)
, m_NumThreadsRunning(0)
, m_NumThreadsWaiting(0)
, m_NumPartitions(0)
, m_bHaveThreads(false)
{
memset(&m_ProfilerCallbacks, 0, sizeof(m_ProfilerCallbacks));
}
TaskScheduler::~TaskScheduler()
{
StopThreads( true ); // Stops threads, waiting for them.
delete[] m_pPipesPerThread;
m_pPipesPerThread = 0;
}
void TaskScheduler::Initialize( uint32_t numThreads_ )
{
assert( numThreads_ );
StopThreads( true ); // Stops threads, waiting for them.
delete[] m_pPipesPerThread;
m_NumThreads = numThreads_;
m_pPipesPerThread = new TaskPipe[ m_NumThreads ];
StartThreads();
}
void TaskScheduler::Initialize()
{
Initialize( GetNumHardwareThreads() );
}

View File

@@ -0,0 +1,177 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#include "Threads.h"
namespace enki
{
struct TaskSetPartition
{
uint32_t start;
uint32_t end;
};
class TaskScheduler;
class TaskPipe;
struct ThreadArgs;
struct SubTaskSet;
// Subclass ITaskSet to create tasks.
// TaskSets can be re-used, but check
class ITaskSet
{
public:
ITaskSet()
: m_SetSize(1)
, m_MinRange(1)
, m_RunningCount(0)
, m_RangeToRun(1)
{}
ITaskSet( uint32_t setSize_ )
: m_SetSize( setSize_ )
, m_MinRange(1)
, m_RunningCount(0)
, m_RangeToRun(1)
{}
ITaskSet( uint32_t setSize_, uint32_t minRange_ )
: m_SetSize( setSize_ )
, m_MinRange( minRange_ )
, m_RunningCount(0)
, m_RangeToRun(minRange_)
{}
// Execute range should be overloaded to process tasks. It will be called with a
// range_ where range.start >= 0; range.start < range.end; and range.end < m_SetSize;
// The range values should be mapped so that linearly processing them in order is cache friendly
// i.e. neighbouring values should be close together.
// threadnum should not be used for changing processing of data, it's intended purpose
// is to allow per-thread data buckets for output.
virtual void ExecuteRange( TaskSetPartition range, uint32_t threadnum ) = 0;
// Size of set - usually the number of data items to be processed, see ExecuteRange. Defaults to 1
uint32_t m_SetSize;
// Minimum size of of TaskSetPartition range when splitting a task set into partitions.
// This should be set to a value which results in computation effort of at least 10k
// clock cycles to minimize tast scheduler overhead.
// NOTE: The last partition will be smaller than m_MinRange if m_SetSize is not a multiple
// of m_MinRange.
// Also known as grain size in literature.
uint32_t m_MinRange;
bool GetIsComplete()
{
return 0 == m_RunningCount;
}
private:
friend class TaskScheduler;
volatile int32_t m_RunningCount;
uint32_t m_RangeToRun;
};
// TaskScheduler implements several callbacks intended for profilers
typedef void (*ProfilerCallbackFunc)( uint32_t threadnum_ );
struct ProfilerCallbacks
{
ProfilerCallbackFunc threadStart;
ProfilerCallbackFunc threadStop;
ProfilerCallbackFunc waitStart;
ProfilerCallbackFunc waitStop;
};
class TaskScheduler
{
public:
TaskScheduler();
~TaskScheduler();
// Call either Initialize() or Initialize( numThreads_ ) before adding tasks.
// Initialize() will create GetNumHardwareThreads()-1 threads, which is
// sufficient to fill the system when including the main thread.
// Initialize can be called multiple times - it will wait for completion
// before re-initializing.
void Initialize();
// Initialize( numThreads_ ) - numThreads_ (must be > 0)
// will create numThreads_-1 threads, as thread 0 is
// the thread on which the initialize was called.
void Initialize( uint32_t numThreads_ );
// Adds the TaskSet to pipe and returns if the pipe is not full.
// If the pipe is full, pTaskSet is run.
// should only be called from main thread, or within a task
void AddTaskSetToPipe( ITaskSet* pTaskSet );
// Runs the TaskSets in pipe until true == pTaskSet->GetIsComplete();
// should only be called from thread which created the taskscheduler , or within a task
// if called with 0 it will try to run tasks, and return if none available.
void WaitforTaskSet( const ITaskSet* pTaskSet );
// Waits for all task sets to complete - not guaranteed to work unless we know we
// are in a situation where tasks aren't being continuosly added.
void WaitforAll();
// Waits for all task sets to complete and shutdown threads - not guaranteed to work unless we know we
// are in a situation where tasks aren't being continuosly added.
void WaitforAllAndShutdown();
// Returns the number of threads created for running tasks + 1
// to account for the main thread.
uint32_t GetNumTaskThreads() const;
// Returns the ProfilerCallbacks structure so that it can be modified to
// set the callbacks.
ProfilerCallbacks* GetProfilerCallbacks();
private:
static THREADFUNC_DECL TaskingThreadFunction( void* pArgs );
void WaitForTasks( uint32_t threadNum );
bool TryRunTask( uint32_t threadNum, uint32_t& hintPipeToCheck_io_ );
void StartThreads();
void StopThreads( bool bWait_ );
void SplitAndAddTask( uint32_t threadNum_, SubTaskSet subTask_,
uint32_t rangeToSplit_, int32_t runningCountOffset_ );
void WakeThreads();
TaskPipe* m_pPipesPerThread;
uint32_t m_NumThreads;
ThreadArgs* m_pThreadNumStore;
threadid_t* m_pThreadIDs;
volatile bool m_bRunning;
volatile int32_t m_NumThreadsRunning;
volatile int32_t m_NumThreadsWaiting;
uint32_t m_NumPartitions;
uint32_t m_NumInitialPartitions;
semaphoreid_t m_NewTaskSemaphore;
bool m_bHaveThreads;
ProfilerCallbacks m_ProfilerCallbacks;
TaskScheduler( const TaskScheduler& nocopy );
TaskScheduler& operator=( const TaskScheduler& nocopy );
};
}

View File

@@ -0,0 +1,122 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#include "TaskScheduler_c.h"
#include "TaskScheduler.h"
#include <assert.h>
using namespace enki;
struct enkiTaskScheduler : TaskScheduler
{
};
struct enkiTaskSet : ITaskSet
{
enkiTaskSet( enkiTaskExecuteRange taskFun_ ) : taskFun(taskFun_), pArgs(NULL) {}
virtual void ExecuteRange( TaskSetPartition range, uint32_t threadnum )
{
taskFun( range.start, range.end, threadnum, pArgs );
}
enkiTaskExecuteRange taskFun;
void* pArgs;
};
enkiTaskScheduler* enkiNewTaskScheduler()
{
enkiTaskScheduler* pETS = new enkiTaskScheduler();
return pETS;
}
void enkiInitTaskScheduler( enkiTaskScheduler* pETS_ )
{
pETS_->Initialize();
}
void enkiInitTaskSchedulerNumThreads( enkiTaskScheduler* pETS_, uint32_t numThreads_ )
{
pETS_->Initialize( numThreads_ );
}
void enkiDeleteTaskScheduler( enkiTaskScheduler* pETS_ )
{
delete pETS_;
}
enkiTaskSet* enkiCreateTaskSet( enkiTaskScheduler* pETS_, enkiTaskExecuteRange taskFunc_ )
{
return new enkiTaskSet( taskFunc_ );
}
void enkiDeleteTaskSet( enkiTaskSet* pTaskSet_ )
{
delete pTaskSet_;
}
void enkiAddTaskSetToPipe( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_, void* pArgs_, uint32_t setSize_ )
{
assert( pTaskSet_ );
assert( pTaskSet_->taskFun );
pTaskSet_->m_SetSize = setSize_;
pTaskSet_->pArgs = pArgs_;
pETS_->AddTaskSetToPipe( pTaskSet_ );
}
void enkiAddTaskSetToPipeMinRange(enkiTaskScheduler * pETS_, enkiTaskSet * pTaskSet_, void * pArgs_, uint32_t setSize_, uint32_t minRange_)
{
assert( pTaskSet_ );
assert( pTaskSet_->taskFun );
pTaskSet_->m_SetSize = setSize_;
pTaskSet_->m_MinRange = minRange_;
pTaskSet_->pArgs = pArgs_;
pETS_->AddTaskSetToPipe( pTaskSet_ );
}
int enkiIsTaskSetComplete( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ )
{
assert( pTaskSet_ );
return ( pTaskSet_->GetIsComplete() ) ? 1 : 0;
}
void enkiWaitForTaskSet( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ )
{
pETS_->WaitforTaskSet( pTaskSet_ );
}
void enkiWaitForAll( enkiTaskScheduler* pETS_ )
{
pETS_->WaitforAll();
}
uint32_t enkiGetNumTaskThreads( enkiTaskScheduler* pETS_ )
{
return pETS_->GetNumTaskThreads();
}
enkiProfilerCallbacks* enkiGetProfilerCallbacks( enkiTaskScheduler* pETS_ )
{
assert( sizeof(enkiProfilerCallbacks) == sizeof(enki::ProfilerCallbacks) );
return (enkiProfilerCallbacks*)pETS_->GetProfilerCallbacks();
}

View File

@@ -0,0 +1,104 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
#include <stdint.h>
typedef struct enkiTaskScheduler enkiTaskScheduler;
typedef struct enkiTaskSet enkiTaskSet;
typedef void (* enkiTaskExecuteRange)( uint32_t start_, uint32_t end, uint32_t threadnum_, void* pArgs_ );
// Create a new task scheduler
enkiTaskScheduler* enkiNewTaskScheduler();
// Initialize task scheduler - will create GetNumHardwareThreads()-1 threads, which is
// sufficient to fill the system when including the main thread.
// Initialize can be called multiple times - it will wait for completion
// before re-initializing.
void enkiInitTaskScheduler( enkiTaskScheduler* pETS_ );
// Initialize a task scheduler with numThreads_ (must be > 0)
// will create numThreads_-1 threads, as thread 0 is
// the thread on which the initialize was called.
void enkiInitTaskSchedulerNumThreads( enkiTaskScheduler* pETS_, uint32_t numThreads_ );
// Delete a task scheduler
void enkiDeleteTaskScheduler( enkiTaskScheduler* pETS_ );
// Create a task set.
enkiTaskSet* enkiCreateTaskSet( enkiTaskScheduler* pETS_, enkiTaskExecuteRange taskFunc_ );
// Delete a task set.
void enkiDeleteTaskSet( enkiTaskSet* pTaskSet_ );
// Schedule the task
void enkiAddTaskSetToPipe( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_,
void* pArgs_, uint32_t setSize_ );
// Schedule the task with a minimum range.
// This should be set to a value which results in computation effort of at least 10k
// clock cycles to minimize tast scheduler overhead.
// NOTE: The last partition will be smaller than m_MinRange if m_SetSize is not a multiple
// of m_MinRange.
// Also known as grain size in literature.
void enkiAddTaskSetToPipeMinRange( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_,
void* pArgs_, uint32_t setSize_, uint32_t minRange_ );
// Check if TaskSet is complete. Doesn't wait. Returns 1 if complete, 0 if not.
int enkiIsTaskSetComplete( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ );
// Wait for a given task.
// should only be called from thread which created the taskscheduler , or within a task
// if called with 0 it will try to run tasks, and return if none available.
void enkiWaitForTaskSet( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ );
// Waits for all task sets to complete - not guaranteed to work unless we know we
// are in a situation where tasks aren't being continuosly added.
void enkiWaitForAll( enkiTaskScheduler* pETS_ );
// get number of threads
uint32_t enkiGetNumTaskThreads( enkiTaskScheduler* pETS_ );
// TaskScheduler implements several callbacks intended for profilers
typedef void (*enkiProfilerCallbackFunc)( uint32_t threadnum_ );
struct enkiProfilerCallbacks
{
enkiProfilerCallbackFunc threadStart;
enkiProfilerCallbackFunc threadStop;
enkiProfilerCallbackFunc waitStart;
enkiProfilerCallbackFunc waitStop;
};
// Get the callback structure so it can be set
struct enkiProfilerCallbacks* enkiGetProfilerCallbacks( enkiTaskScheduler* pETS_ );
#ifdef __cplusplus
}
#endif

View File

@@ -0,0 +1,210 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#include <assert.h>
#ifdef _WIN32
#include "Atomics.h"
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#define THREADFUNC_DECL DWORD WINAPI
#define THREAD_LOCAL __declspec( thread )
namespace enki
{
typedef HANDLE threadid_t;
// declare the thread start function as:
// THREADFUNC_DECL MyThreadStart( void* pArg );
inline bool ThreadCreate( threadid_t* returnid, DWORD ( WINAPI *StartFunc) (void* ), void* pArg )
{
// posix equiv pthread_create
DWORD threadid;
*returnid = CreateThread( 0, 0, StartFunc, pArg, 0, &threadid );
return *returnid != NULL;
}
inline bool ThreadTerminate( threadid_t threadid )
{
// posix equiv pthread_cancel
return CloseHandle( threadid ) == 0;
}
inline uint32_t GetNumHardwareThreads()
{
SYSTEM_INFO sysInfo;
GetSystemInfo(&sysInfo);
return sysInfo.dwNumberOfProcessors;
}
}
#else // posix
#include <pthread.h>
#include <unistd.h>
#define THREADFUNC_DECL void*
#define THREAD_LOCAL __thread
namespace enki
{
typedef pthread_t threadid_t;
// declare the thread start function as:
// THREADFUNC_DECL MyThreadStart( void* pArg );
inline bool ThreadCreate( threadid_t* returnid, void* ( *StartFunc) (void* ), void* pArg )
{
// posix equiv pthread_create
int32_t retval = pthread_create( returnid, NULL, StartFunc, pArg );
return retval == 0;
}
inline bool ThreadTerminate( threadid_t threadid )
{
// posix equiv pthread_cancel
return pthread_cancel( threadid ) == 0;
}
inline uint32_t GetNumHardwareThreads()
{
return (uint32_t)sysconf( _SC_NPROCESSORS_ONLN );
}
}
#endif // posix
// Semaphore implementation
#ifdef _WIN32
namespace enki
{
struct semaphoreid_t
{
HANDLE sem;
};
inline void SemaphoreCreate( semaphoreid_t& semaphoreid )
{
semaphoreid.sem = CreateSemaphore(NULL, 0, MAXLONG, NULL );
}
inline void SemaphoreClose( semaphoreid_t& semaphoreid )
{
CloseHandle( semaphoreid.sem );
}
inline void SemaphoreWait( semaphoreid_t& semaphoreid )
{
DWORD retval = WaitForSingleObject( semaphoreid.sem, INFINITE );
assert( retval != WAIT_FAILED );
}
inline void SemaphoreSignal( semaphoreid_t& semaphoreid, int32_t countWaiting )
{
if( countWaiting )
{
ReleaseSemaphore( semaphoreid.sem, countWaiting, NULL );
}
}
}
#elif defined(__MACH__)
// OS X does not have POSIX semaphores
// see https://developer.apple.com/library/content/documentation/Darwin/Conceptual/KernelProgramming/synchronization/synchronization.html
#include <mach/mach.h>
namespace enki
{
struct semaphoreid_t
{
semaphore_t sem;
};
inline void SemaphoreCreate( semaphoreid_t& semaphoreid )
{
semaphore_create( mach_task_self(), &semaphoreid.sem, SYNC_POLICY_FIFO, 0 );
}
inline void SemaphoreClose( semaphoreid_t& semaphoreid )
{
semaphore_destroy( mach_task_self(), semaphoreid.sem );
}
inline void SemaphoreWait( semaphoreid_t& semaphoreid )
{
semaphore_wait( semaphoreid.sem );
}
inline void SemaphoreSignal( semaphoreid_t& semaphoreid, int32_t countWaiting )
{
while( countWaiting-- > 0 )
{
semaphore_signal( semaphoreid.sem );
}
}
}
#else // POSIX
#include <semaphore.h>
namespace enki
{
struct semaphoreid_t
{
sem_t sem;
};
inline void SemaphoreCreate( semaphoreid_t& semaphoreid )
{
int err = sem_init( &semaphoreid.sem, 0, 0 );
assert( err == 0 );
}
inline void SemaphoreClose( semaphoreid_t& semaphoreid )
{
sem_destroy( &semaphoreid.sem );
}
inline void SemaphoreWait( semaphoreid_t& semaphoreid )
{
int err = sem_wait( &semaphoreid.sem );
assert( err == 0 );
}
inline void SemaphoreSignal( semaphoreid_t& semaphoreid, int32_t countWaiting )
{
while( countWaiting-- > 0 )
{
sem_post( &semaphoreid.sem );
}
}
}
#endif

View File

@@ -0,0 +1,395 @@
#include "../Source/Config.h"
inline uint RNG(inout uint state)
{
uint x = state;
x ^= x << 13;
x ^= x >> 17;
x ^= x << 15;
state = x;
return x;
}
float RandomFloat01(inout uint state)
{
return (RNG(state) & 0xFFFFFF) / 16777216.0f;
}
float3 RandomInUnitDisk(inout uint state)
{
float a = RandomFloat01(state) * 2.0f * 3.1415926f;
float2 xy = float2(cos(a), sin(a));
xy *= sqrt(RandomFloat01(state));
return float3(xy, 0);
}
float3 RandomInUnitSphere(inout uint state)
{
float z = RandomFloat01(state) * 2.0f - 1.0f;
float t = RandomFloat01(state) * 2.0f * 3.1415926f;
float r = sqrt(max(0.0, 1.0f - z * z));
float x = r * cos(t);
float y = r * sin(t);
float3 res = float3(x, y, z);
res *= pow(RandomFloat01(state), 1.0 / 3.0);
return res;
}
float3 RandomUnitVector(inout uint state)
{
float z = RandomFloat01(state) * 2.0f - 1.0f;
float a = RandomFloat01(state) * 2.0f * 3.1415926f;
float r = sqrt(1.0f - z * z);
float x = r * cos(a);
float y = r * sin(a);
return float3(x, y, z);
}
struct Ray
{
float3 orig;
float3 dir;
};
Ray MakeRay(float3 orig_, float3 dir_) { Ray r; r.orig = orig_; r.dir = dir_; return r; }
float3 RayPointAt(Ray r, float t) { return r.orig + r.dir * t; }
inline bool refract(float3 v, float3 n, float nint, out float3 outRefracted)
{
float dt = dot(v, n);
float discr = 1.0f - nint * nint*(1 - dt * dt);
if (discr > 0)
{
outRefracted = nint * (v - n * dt) - n * sqrt(discr);
return true;
}
return false;
}
inline float schlick(float cosine, float ri)
{
float r0 = (1 - ri) / (1 + ri);
r0 = r0 * r0;
// note: saturate to guard against possible tiny negative numbers
return r0 + (1 - r0)*pow(saturate(1 - cosine), 5);
}
struct Hit
{
float3 pos;
float3 normal;
float t;
};
struct Sphere
{
float3 center;
float radius;
float invRadius;
};
#define MatLambert 0
#define MatMetal 1
#define MatDielectric 2
struct Material
{
int type;
float3 albedo;
float3 emissive;
float roughness;
float ri;
};
groupshared Sphere s_GroupSpheres[kCSMaxObjects];
groupshared Material s_GroupMaterials[kCSMaxObjects];
groupshared int s_GroupEmissives[kCSMaxObjects];
struct Camera
{
float3 origin;
float3 lowerLeftCorner;
float3 horizontal;
float3 vertical;
float3 u, v, w;
float lensRadius;
};
Ray CameraGetRay(Camera cam, float s, float t, inout uint state)
{
float3 rd = cam.lensRadius * RandomInUnitDisk(state);
float3 offset = cam.u * rd.x + cam.v * rd.y;
return MakeRay(cam.origin + offset, normalize(cam.lowerLeftCorner + s * cam.horizontal + t * cam.vertical - cam.origin - offset));
}
int HitSpheres(Ray r, int sphereCount, float tMin, float tMax, inout Hit outHit)
{
float hitT = tMax;
int id = -1;
for (int i = 0; i < sphereCount; ++i)
{
Sphere s = s_GroupSpheres[i];
float3 co = s.center - r.orig;
float nb = dot(co, r.dir);
float c = dot(co, co) - s.radius*s.radius;
float discr = nb * nb - c;
if (discr > 0)
{
float discrSq = sqrt(discr);
// Try earlier t
float t = nb - discrSq;
if (t <= tMin) // before min, try later t!
t = nb + discrSq;
if (t > tMin && t < hitT)
{
id = i;
hitT = t;
}
}
}
if (id != -1)
{
outHit.pos = RayPointAt(r, hitT);
outHit.normal = (outHit.pos - s_GroupSpheres[id].center) * s_GroupSpheres[id].invRadius;
outHit.t = hitT;
}
return id;
}
struct Params
{
Camera cam;
int sphereCount;
int screenWidth;
int screenHeight;
int frames;
float invWidth;
float invHeight;
float lerpFac;
int emissiveCount;
};
#define kMinT 0.001f
#define kMaxT 1.0e7f
#define kMaxDepth 10
static int HitWorld(int sphereCount, Ray r, float tMin, float tMax, inout Hit outHit)
{
return HitSpheres(r, sphereCount, tMin, tMax, outHit);
}
static bool Scatter(int sphereCount, int emissiveCount, int matID, Ray r_in, Hit rec, out float3 attenuation, out Ray scattered, out float3 outLightE, inout int inoutRayCount, inout uint state)
{
outLightE = float3(0, 0, 0);
Material mat = s_GroupMaterials[matID];
if (mat.type == MatLambert)
{
// random point on unit sphere that is tangent to the hit point
float3 target = rec.pos + rec.normal + RandomUnitVector(state);
scattered = MakeRay(rec.pos, normalize(target - rec.pos));
attenuation = mat.albedo;
// sample lights
#if DO_LIGHT_SAMPLING
for (int j = 0; j < emissiveCount; ++j)
{
int i = s_GroupEmissives[j];
if (matID == i)
continue; // skip self
Material smat = s_GroupMaterials[i];
Sphere s = s_GroupSpheres[i];
// create a random direction towards sphere
// coord system for sampling: sw, su, sv
float3 sw = normalize(s.center - rec.pos);
float3 su = normalize(cross(abs(sw.x)>0.01f ? float3(0, 1, 0) : float3(1, 0, 0), sw));
float3 sv = cross(sw, su);
// sample sphere by solid angle
float cosAMax = sqrt(1.0f - s.radius*s.radius / dot(rec.pos - s.center, rec.pos - s.center));
float eps1 = RandomFloat01(state), eps2 = RandomFloat01(state);
float cosA = 1.0f - eps1 + eps1 * cosAMax;
float sinA = sqrt(1.0f - cosA * cosA);
float phi = 2 * 3.1415926 * eps2;
float3 l = su * cos(phi) * sinA + sv * sin(phi) * sinA + sw * cosA;
// shoot shadow ray
Hit lightHit;
++inoutRayCount;
int hitID = HitWorld(sphereCount, MakeRay(rec.pos, l), kMinT, kMaxT, lightHit);
if (hitID == i)
{
float omega = 2 * 3.1415926 * (1 - cosAMax);
float3 rdir = r_in.dir;
float3 nl = dot(rec.normal, rdir) < 0 ? rec.normal : -rec.normal;
outLightE += (mat.albedo * smat.emissive) * (max(0.0f, dot(l, nl)) * omega / 3.1415926);
}
}
#endif
return true;
}
else if (mat.type == MatMetal)
{
float3 refl = reflect(r_in.dir, rec.normal);
// reflected ray, and random inside of sphere based on roughness
float roughness = mat.roughness;
#if DO_MITSUBA_COMPARE
roughness = 0; // until we get better BRDF for metals
#endif
scattered = MakeRay(rec.pos, normalize(refl + roughness*RandomInUnitSphere(state)));
attenuation = mat.albedo;
return dot(scattered.dir, rec.normal) > 0;
}
else if (mat.type == MatDielectric)
{
float3 outwardN;
float3 rdir = r_in.dir;
float3 refl = reflect(rdir, rec.normal);
float nint;
attenuation = float3(1, 1, 1);
float3 refr;
float reflProb;
float cosine;
if (dot(rdir, rec.normal) > 0)
{
outwardN = -rec.normal;
nint = mat.ri;
cosine = mat.ri * dot(rdir, rec.normal);
}
else
{
outwardN = rec.normal;
nint = 1.0f / mat.ri;
cosine = -dot(rdir, rec.normal);
}
if (refract(rdir, outwardN, nint, refr))
{
reflProb = schlick(cosine, mat.ri);
}
else
{
reflProb = 1;
}
if (RandomFloat01(state) < reflProb)
scattered = MakeRay(rec.pos, normalize(refl));
else
scattered = MakeRay(rec.pos, normalize(refr));
}
else
{
attenuation = float3(1, 0, 1);
scattered = MakeRay(float3(0,0,0), float3(0, 0, 1));
return false;
}
return true;
}
static float3 Trace(int sphereCount, int emissiveCount, Ray r, inout int inoutRayCount, inout uint state)
{
float3 col = 0;
float3 curAtten = 1;
bool doMaterialE = true;
// GPUs don't support recursion, so do tracing iterations in a loop up to max depth
for (int depth = 0; depth < kMaxDepth; ++depth)
{
Hit rec;
++inoutRayCount;
int id = HitWorld(sphereCount, r, kMinT, kMaxT, rec);
if (id >= 0)
{
Ray scattered;
float3 attenuation;
float3 lightE;
Material mat = s_GroupMaterials[id];
float3 matE = mat.emissive;
if (Scatter(sphereCount, emissiveCount, id, r, rec, attenuation, scattered, lightE, inoutRayCount, state))
{
#if DO_LIGHT_SAMPLING
if (!doMaterialE) matE = 0;
doMaterialE = (mat.type != MatLambert);
#endif
col += curAtten * (matE + lightE);
curAtten *= attenuation;
r = scattered;
}
else
{
col += curAtten * matE;
break;
}
}
else
{
// sky
#if DO_MITSUBA_COMPARE
col += curAtten * float3(0.15f, 0.21f, 0.3f); // easier compare with Mitsuba's constant environment light
#else
float3 unitDir = r.dir;
float t = 0.5f*(unitDir.y + 1.0f);
float3 skyCol = ((1.0f - t)*float3(1.0f, 1.0f, 1.0f) + t * float3(0.5f, 0.7f, 1.0f)) * 0.3f;
col += curAtten * skyCol;
#endif
break;
}
}
return col;
}
Texture2D srcImage : register(t0);
RWTexture2D<float4> dstImage : register(u0);
StructuredBuffer<Sphere> g_Spheres : register(t1);
StructuredBuffer<Material> g_Materials : register(t2);
StructuredBuffer<Params> g_Params : register(t3);
StructuredBuffer<int> g_Emissives : register(t4);
RWByteAddressBuffer g_OutRayCount : register(u1);
[numthreads(kCSGroupSizeX, kCSGroupSizeY, 1)]
void main(uint3 gid : SV_DispatchThreadID, uint3 tid : SV_GroupThreadID)
{
// First, move scene data (spheres, materials, emissive indices) into group shared
// memory. Do this in parallel; each thread in group copies its own chunk of data.
uint threadID = tid.y * kCSGroupSizeX + tid.x;
uint groupSize = kCSGroupSizeX * kCSGroupSizeY;
uint objCount = g_Params[0].sphereCount;
uint myObjCount = (objCount + groupSize - 1) / groupSize;
uint myObjStart = threadID * myObjCount;
for (uint io = myObjStart; io < myObjStart + myObjCount; ++io)
{
if (io < objCount)
{
s_GroupSpheres[io] = g_Spheres[io];
s_GroupMaterials[io] = g_Materials[io];
}
if (io < g_Params[0].emissiveCount)
{
s_GroupEmissives[io] = g_Emissives[io];
}
}
GroupMemoryBarrierWithGroupSync();
int rayCount = 0;
float3 col = 0;
Params params = g_Params[0];
uint rngState = (gid.x * 1973 + gid.y * 9277 + params.frames * 26699) | 1;
for (int s = 0; s < DO_SAMPLES_PER_PIXEL; s++)
{
float u = float(gid.x + RandomFloat01(rngState)) * params.invWidth;
float v = float(gid.y + RandomFloat01(rngState)) * params.invHeight;
Ray r = CameraGetRay(params.cam, u, v, rngState);
col += Trace(params.sphereCount, params.emissiveCount, r, rayCount, rngState);
}
col *= 1.0f / float(DO_SAMPLES_PER_PIXEL);
float3 prev = srcImage.Load(int3(gid.xy,0)).rgb;
col = lerp(col, prev, params.lerpFac);
dstImage[gid.xy] = float4(col, 1);
g_OutRayCount.InterlockedAdd(0, rayCount);
}

View File

@@ -0,0 +1,15 @@
float3 LinearToSRGB(float3 rgb)
{
rgb = max(rgb, float3(0, 0, 0));
return max(1.055 * pow(rgb, 0.416666667) - 0.055, 0.0);
}
Texture2D tex : register(t0);
SamplerState smp : register(s0);
float4 main(float2 uv : TEXCOORD0) : SV_Target
{
float3 col = tex.Sample(smp, uv).rgb;
col = LinearToSRGB(col);
return float4(col, 1.0f);
}

View File

@@ -0,0 +1,31 @@
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.27130.2036
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "TestCpu", "TestCpu.vcxproj", "{4F84B756-87F5-4B92-827B-DA087DAE1900}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Debug|x86 = Debug|x86
Release|x64 = Release|x64
Release|x86 = Release|x86
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x64.ActiveCfg = Debug|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x64.Build.0 = Debug|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x86.ActiveCfg = Debug|Win32
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x86.Build.0 = Debug|Win32
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x64.ActiveCfg = Release|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x64.Build.0 = Release|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x86.ActiveCfg = Release|Win32
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x86.Build.0 = Release|Win32
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {067FB780-37B8-465E-AD7E-E7B238B9C04F}
EndGlobalSection
EndGlobal

View File

@@ -0,0 +1,243 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{4F84B756-87F5-4B92-827B-DA087DAE1900}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>TestCpu</RootNamespace>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="Shared">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<PreprocessorDefinitions>WIN32;_DEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<PreprocessorDefinitions>TRACY_ENABLE;_DEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<ExceptionHandling>false</ExceptionHandling>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<BufferSecurityCheck>false</BufferSecurityCheck>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>TRACY_ENABLE;NDEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<ExceptionHandling>false</ExceptionHandling>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<BufferSecurityCheck>false</BufferSecurityCheck>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
<ClCompile Include="..\..\..\TracyClient.cpp" />
<ClCompile Include="..\Source\enkiTS\TaskScheduler.cpp" />
<ClCompile Include="..\Source\enkiTS\TaskScheduler_c.cpp" />
<ClCompile Include="..\Source\Maths.cpp" />
<ClCompile Include="..\Source\Test.cpp" />
<ClCompile Include="TestWin.cpp" />
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\Source\Config.h" />
<ClInclude Include="..\Source\enkiTS\Atomics.h" />
<ClInclude Include="..\Source\enkiTS\LockLessMultiReadPipe.h" />
<ClInclude Include="..\Source\enkiTS\TaskScheduler.h" />
<ClInclude Include="..\Source\enkiTS\TaskScheduler_c.h" />
<ClInclude Include="..\Source\enkiTS\Threads.h" />
<ClInclude Include="..\Source\Maths.h" />
<ClInclude Include="..\Source\MathSimd.h" />
<ClInclude Include="..\Source\Test.h" />
<ClInclude Include="..\Source\stb_image.h" />
</ItemGroup>
<ItemGroup>
<None Include="..\.editorconfig" />
</ItemGroup>
<ItemGroup>
<FxCompile Include="ComputeShader.hlsl">
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">5.0</ShaderModel>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">5.0</ShaderModel>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">CompiledComputeShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">CompiledComputeShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">CompiledComputeShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">CompiledComputeShader.h</HeaderFileOutput>
</FxCompile>
<FxCompile Include="PixelShader.hlsl">
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Pixel</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Pixel</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Pixel</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Pixel</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">CompiledPixelShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">CompiledPixelShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">CompiledPixelShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">CompiledPixelShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">g_PSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">g_PSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">g_PSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">g_PSBytecode</VariableName>
</FxCompile>
<FxCompile Include="VertexShader.hlsl">
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Vertex</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Vertex</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Vertex</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Vertex</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">CompiledVertexShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">CompiledVertexShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">CompiledVertexShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">CompiledVertexShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">g_VSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">g_VSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">g_VSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">g_VSBytecode</VariableName>
</FxCompile>
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>

View File

@@ -0,0 +1,67 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<ClCompile Include="TestWin.cpp" />
<ClCompile Include="..\Source\Test.cpp">
<Filter>Source</Filter>
</ClCompile>
<ClCompile Include="..\Source\enkiTS\TaskScheduler.cpp">
<Filter>Source\enkiTS</Filter>
</ClCompile>
<ClCompile Include="..\Source\enkiTS\TaskScheduler_c.cpp">
<Filter>Source\enkiTS</Filter>
</ClCompile>
<ClCompile Include="..\Source\Maths.cpp">
<Filter>Source</Filter>
</ClCompile>
<ClCompile Include="..\..\..\TracyClient.cpp" />
</ItemGroup>
<ItemGroup>
<Filter Include="Source">
<UniqueIdentifier>{5f19f217-c1c7-4eeb-be61-8b986fee9375}</UniqueIdentifier>
</Filter>
<Filter Include="Source\enkiTS">
<UniqueIdentifier>{38c448a8-1dcc-4116-9410-a9f8d068caff}</UniqueIdentifier>
</Filter>
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\Source\Test.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\stb_image.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\Atomics.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\LockLessMultiReadPipe.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\TaskScheduler.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\TaskScheduler_c.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\Threads.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\Maths.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\Config.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\MathSimd.h">
<Filter>Source</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<None Include="..\.editorconfig" />
</ItemGroup>
<ItemGroup>
<FxCompile Include="VertexShader.hlsl" />
<FxCompile Include="PixelShader.hlsl" />
<FxCompile Include="ComputeShader.hlsl" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,557 @@
#include <stdint.h>
#define WIN32_LEAN_AND_MEAN
#define NOMINMAX
#include <windows.h>
#include <d3d11_1.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <algorithm>
#include "../Source/Config.h"
#include "../Source/Maths.h"
#include "../Source/Test.h"
#include "CompiledVertexShader.h"
#include "CompiledPixelShader.h"
#include "../../../Tracy.hpp"
static HINSTANCE g_HInstance;
static HWND g_Wnd;
ATOM MyRegisterClass(HINSTANCE hInstance);
BOOL InitInstance(HINSTANCE, int);
LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM);
INT_PTR CALLBACK About(HWND, UINT, WPARAM, LPARAM);
static HRESULT InitD3DDevice();
static void ShutdownD3DDevice();
static void RenderFrame();
static float* g_Backbuffer;
static D3D_FEATURE_LEVEL g_D3D11FeatureLevel = D3D_FEATURE_LEVEL_11_0;
static ID3D11Device* g_D3D11Device = nullptr;
static ID3D11DeviceContext* g_D3D11Ctx = nullptr;
static IDXGISwapChain* g_D3D11SwapChain = nullptr;
static ID3D11RenderTargetView* g_D3D11RenderTarget = nullptr;
static ID3D11VertexShader* g_VertexShader;
static ID3D11PixelShader* g_PixelShader;
static ID3D11Texture2D *g_BackbufferTexture, *g_BackbufferTexture2;
static ID3D11ShaderResourceView *g_BackbufferSRV, *g_BackbufferSRV2;
static ID3D11UnorderedAccessView *g_BackbufferUAV, *g_BackbufferUAV2;
static ID3D11SamplerState* g_SamplerLinear;
static ID3D11RasterizerState* g_RasterState;
static int g_BackbufferIndex;
#if DO_COMPUTE_GPU
#include "CompiledComputeShader.h"
struct ComputeParams
{
Camera cam;
int sphereCount;
int screenWidth;
int screenHeight;
int frames;
float invWidth;
float invHeight;
float lerpFac;
int emissiveCount;
};
static ID3D11ComputeShader* g_ComputeShader;
static ID3D11Buffer* g_DataSpheres; static ID3D11ShaderResourceView* g_SRVSpheres;
static ID3D11Buffer* g_DataMaterials; static ID3D11ShaderResourceView* g_SRVMaterials;
static ID3D11Buffer* g_DataParams; static ID3D11ShaderResourceView* g_SRVParams;
static ID3D11Buffer* g_DataEmissives; static ID3D11ShaderResourceView* g_SRVEmissives;
static ID3D11Buffer* g_DataCounter; static ID3D11UnorderedAccessView* g_UAVCounter;
static int g_SphereCount, g_ObjSize, g_MatSize;
static ID3D11Query *g_QueryBegin, *g_QueryEnd, *g_QueryDisjoint;
#endif // #if DO_COMPUTE_GPU
int APIENTRY wWinMain(_In_ HINSTANCE hInstance, _In_opt_ HINSTANCE, _In_ LPWSTR, _In_ int nCmdShow)
{
g_Backbuffer = new float[kBackbufferWidth * kBackbufferHeight * 4];
memset(g_Backbuffer, 0, kBackbufferWidth * kBackbufferHeight * 4 * sizeof(g_Backbuffer[0]));
InitializeTest();
MyRegisterClass(hInstance);
if (!InitInstance (hInstance, nCmdShow))
{
return FALSE;
}
if (FAILED(InitD3DDevice()))
{
ShutdownD3DDevice();
return 0;
}
g_D3D11Device->CreateVertexShader(g_VSBytecode, ARRAYSIZE(g_VSBytecode), NULL, &g_VertexShader);
g_D3D11Device->CreatePixelShader(g_PSBytecode, ARRAYSIZE(g_PSBytecode), NULL, &g_PixelShader);
#if DO_COMPUTE_GPU
g_D3D11Device->CreateComputeShader(g_CSBytecode, ARRAYSIZE(g_CSBytecode), NULL, &g_ComputeShader);
#endif
D3D11_TEXTURE2D_DESC texDesc = {};
texDesc.Width = kBackbufferWidth;
texDesc.Height = kBackbufferHeight;
texDesc.MipLevels = 1;
texDesc.ArraySize = 1;
texDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
#if DO_COMPUTE_GPU
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS;
texDesc.CPUAccessFlags = 0;
#else
texDesc.Usage = D3D11_USAGE_DYNAMIC;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
#endif
texDesc.MiscFlags = 0;
g_D3D11Device->CreateTexture2D(&texDesc, NULL, &g_BackbufferTexture);
g_D3D11Device->CreateTexture2D(&texDesc, NULL, &g_BackbufferTexture2);
D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc = {};
srvDesc.Format = texDesc.Format;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
srvDesc.Texture2D.MipLevels = 1;
srvDesc.Texture2D.MostDetailedMip = 0;
g_D3D11Device->CreateShaderResourceView(g_BackbufferTexture, &srvDesc, &g_BackbufferSRV);
g_D3D11Device->CreateShaderResourceView(g_BackbufferTexture2, &srvDesc, &g_BackbufferSRV2);
D3D11_SAMPLER_DESC smpDesc = {};
smpDesc.Filter = D3D11_FILTER_MIN_MAG_LINEAR_MIP_POINT;
smpDesc.AddressU = smpDesc.AddressV = smpDesc.AddressW = D3D11_TEXTURE_ADDRESS_CLAMP;
g_D3D11Device->CreateSamplerState(&smpDesc, &g_SamplerLinear);
D3D11_RASTERIZER_DESC rasterDesc = {};
rasterDesc.FillMode = D3D11_FILL_SOLID;
rasterDesc.CullMode = D3D11_CULL_NONE;
g_D3D11Device->CreateRasterizerState(&rasterDesc, &g_RasterState);
#if DO_COMPUTE_GPU
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
int camSize;
GetObjectCount(g_SphereCount, g_ObjSize, g_MatSize, camSize);
assert(g_ObjSize == 20);
assert(g_MatSize == 36);
assert(camSize == 88);
D3D11_BUFFER_DESC bdesc = {};
bdesc.ByteWidth = g_SphereCount * g_ObjSize;
bdesc.Usage = D3D11_USAGE_DEFAULT;
bdesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
bdesc.CPUAccessFlags = 0;
bdesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
bdesc.StructureByteStride = g_ObjSize;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataSpheres);
srvDesc.Format = DXGI_FORMAT_UNKNOWN;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
srvDesc.Buffer.FirstElement = 0;
srvDesc.Buffer.NumElements = g_SphereCount;
g_D3D11Device->CreateShaderResourceView(g_DataSpheres, &srvDesc, &g_SRVSpheres);
bdesc.ByteWidth = g_SphereCount * g_MatSize;
bdesc.StructureByteStride = g_MatSize;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataMaterials);
srvDesc.Buffer.NumElements = g_SphereCount;
g_D3D11Device->CreateShaderResourceView(g_DataMaterials, &srvDesc, &g_SRVMaterials);
bdesc.ByteWidth = sizeof(ComputeParams);
bdesc.StructureByteStride = sizeof(ComputeParams);
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataParams);
srvDesc.Buffer.NumElements = 1;
g_D3D11Device->CreateShaderResourceView(g_DataParams, &srvDesc, &g_SRVParams);
bdesc.ByteWidth = g_SphereCount * 4;
bdesc.StructureByteStride = 4;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataEmissives);
srvDesc.Buffer.NumElements = g_SphereCount;
g_D3D11Device->CreateShaderResourceView(g_DataEmissives, &srvDesc, &g_SRVEmissives);
bdesc.ByteWidth = 4;
bdesc.BindFlags |= D3D11_BIND_UNORDERED_ACCESS;
bdesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
bdesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataCounter);
uavDesc.Format = DXGI_FORMAT_R32_TYPELESS;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.NumElements = 1;
uavDesc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
g_D3D11Device->CreateUnorderedAccessView(g_DataCounter, &uavDesc, &g_UAVCounter);
uavDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D;
uavDesc.Texture2D.MipSlice = 0;
g_D3D11Device->CreateUnorderedAccessView(g_BackbufferTexture, &uavDesc, &g_BackbufferUAV);
g_D3D11Device->CreateUnorderedAccessView(g_BackbufferTexture2, &uavDesc, &g_BackbufferUAV2);
D3D11_QUERY_DESC qDesc = {};
qDesc.Query = D3D11_QUERY_TIMESTAMP;
g_D3D11Device->CreateQuery(&qDesc, &g_QueryBegin);
g_D3D11Device->CreateQuery(&qDesc, &g_QueryEnd);
qDesc.Query = D3D11_QUERY_TIMESTAMP_DISJOINT;
g_D3D11Device->CreateQuery(&qDesc, &g_QueryDisjoint);
#endif // #if DO_COMPUTE_GPU
static int framesLeft = 10;
// Main message loop
MSG msg = { 0 };
while (msg.message != WM_QUIT)
{
if (PeekMessage(&msg, NULL, 0U, 0U, PM_REMOVE))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
else
{
RenderFrame();
if( --framesLeft == 0 ) break;
}
}
ShutdownTest();
ShutdownD3DDevice();
return (int) msg.wParam;
}
ATOM MyRegisterClass(HINSTANCE hInstance)
{
ZoneScoped;
WNDCLASSEXW wcex;
memset(&wcex, 0, sizeof(wcex));
wcex.cbSize = sizeof(WNDCLASSEX);
wcex.style = CS_HREDRAW | CS_VREDRAW;
wcex.lpfnWndProc = WndProc;
wcex.cbClsExtra = 0;
wcex.cbWndExtra = 0;
wcex.hInstance = hInstance;
wcex.hCursor = LoadCursor(nullptr, IDC_ARROW);
wcex.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
wcex.lpszClassName = L"TestClass";
return RegisterClassExW(&wcex);
}
BOOL InitInstance(HINSTANCE hInstance, int nCmdShow)
{
ZoneScoped;
g_HInstance = hInstance;
RECT rc = { 0, 0, kBackbufferWidth, kBackbufferHeight };
DWORD style = WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_MINIMIZEBOX;
AdjustWindowRect(&rc, style, FALSE);
HWND hWnd = CreateWindowW(L"TestClass", L"Test", style, CW_USEDEFAULT, CW_USEDEFAULT, rc.right-rc.left, rc.bottom-rc.top, nullptr, nullptr, hInstance, nullptr);
if (!hWnd)
return FALSE;
g_Wnd = hWnd;
ShowWindow(hWnd, nCmdShow);
UpdateWindow(hWnd);
return TRUE;
}
static uint64_t s_Time;
static int s_Count;
static char s_Buffer[200];
static unsigned s_Flags = kFlagProgressive;
static int s_FrameCount = 0;
static void RenderFrame()
{
ZoneScoped;
LARGE_INTEGER time1;
#if DO_COMPUTE_GPU
QueryPerformanceCounter(&time1);
float t = float(clock()) / CLOCKS_PER_SEC;
UpdateTest(t, s_FrameCount, kBackbufferWidth, kBackbufferHeight, s_Flags);
g_BackbufferIndex = 1 - g_BackbufferIndex;
void* dataSpheres = alloca(g_SphereCount * g_ObjSize);
void* dataMaterials = alloca(g_SphereCount * g_MatSize);
void* dataEmissives = alloca(g_SphereCount * 4);
ComputeParams dataParams;
GetSceneDesc(dataSpheres, dataMaterials, &dataParams.cam, dataEmissives, &dataParams.emissiveCount);
dataParams.sphereCount = g_SphereCount;
dataParams.screenWidth = kBackbufferWidth;
dataParams.screenHeight = kBackbufferHeight;
dataParams.frames = s_FrameCount;
dataParams.invWidth = 1.0f / kBackbufferWidth;
dataParams.invHeight = 1.0f / kBackbufferHeight;
float lerpFac = float(s_FrameCount) / float(s_FrameCount + 1);
if (s_Flags & kFlagAnimate)
lerpFac *= DO_ANIMATE_SMOOTHING;
if (!(s_Flags & kFlagProgressive))
lerpFac = 0;
dataParams.lerpFac = lerpFac;
g_D3D11Ctx->UpdateSubresource(g_DataSpheres, 0, NULL, dataSpheres, 0, 0);
g_D3D11Ctx->UpdateSubresource(g_DataMaterials, 0, NULL, dataMaterials, 0, 0);
g_D3D11Ctx->UpdateSubresource(g_DataParams, 0, NULL, &dataParams, 0, 0);
g_D3D11Ctx->UpdateSubresource(g_DataEmissives, 0, NULL, dataEmissives, 0, 0);
ID3D11ShaderResourceView* srvs[] = {
g_BackbufferIndex == 0 ? g_BackbufferSRV2 : g_BackbufferSRV,
g_SRVSpheres,
g_SRVMaterials,
g_SRVParams,
g_SRVEmissives
};
g_D3D11Ctx->CSSetShaderResources(0, ARRAYSIZE(srvs), srvs);
ID3D11UnorderedAccessView* uavs[] = {
g_BackbufferIndex == 0 ? g_BackbufferUAV : g_BackbufferUAV2,
g_UAVCounter
};
g_D3D11Ctx->CSSetUnorderedAccessViews(0, ARRAYSIZE(uavs), uavs, NULL);
g_D3D11Ctx->CSSetShader(g_ComputeShader, NULL, 0);
g_D3D11Ctx->Begin(g_QueryDisjoint);
g_D3D11Ctx->End(g_QueryBegin);
g_D3D11Ctx->Dispatch(kBackbufferWidth/kCSGroupSizeX, kBackbufferHeight/kCSGroupSizeY, 1);
g_D3D11Ctx->End(g_QueryEnd);
uavs[0] = NULL;
g_D3D11Ctx->CSSetUnorderedAccessViews(0, ARRAYSIZE(uavs), uavs, NULL);
++s_FrameCount;
#else
QueryPerformanceCounter(&time1);
float t = float(clock()) / CLOCKS_PER_SEC;
static size_t s_RayCounter = 0;
int rayCount;
UpdateTest(t, s_FrameCount, kBackbufferWidth, kBackbufferHeight, s_Flags);
DrawTest(t, s_FrameCount, kBackbufferWidth, kBackbufferHeight, g_Backbuffer, rayCount, s_Flags);
s_FrameCount++;
s_RayCounter += rayCount;
LARGE_INTEGER time2;
QueryPerformanceCounter(&time2);
uint64_t dt = time2.QuadPart - time1.QuadPart;
++s_Count;
s_Time += dt;
if (s_Count > 10)
{
LARGE_INTEGER frequency;
QueryPerformanceFrequency(&frequency);
double s = double(s_Time) / double(frequency.QuadPart) / s_Count;
sprintf_s(s_Buffer, sizeof(s_Buffer), "%.2fms (%.1f FPS) %.1fMrays/s %.2fMrays/frame frames %i\n", s * 1000.0f, 1.f / s, s_RayCounter / s_Count / s * 1.0e-6f, s_RayCounter / s_Count * 1.0e-6f, s_FrameCount);
SetWindowTextA(g_Wnd, s_Buffer);
OutputDebugStringA(s_Buffer);
s_Count = 0;
s_Time = 0;
s_RayCounter = 0;
}
D3D11_MAPPED_SUBRESOURCE mapped;
g_D3D11Ctx->Map(g_BackbufferTexture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
const uint8_t* src = (const uint8_t*)g_Backbuffer;
uint8_t* dst = (uint8_t*)mapped.pData;
for (int y = 0; y < kBackbufferHeight; ++y)
{
memcpy(dst, src, kBackbufferWidth * 16);
src += kBackbufferWidth * 16;
dst += mapped.RowPitch;
}
g_D3D11Ctx->Unmap(g_BackbufferTexture, 0);
#endif
g_D3D11Ctx->VSSetShader(g_VertexShader, NULL, 0);
g_D3D11Ctx->PSSetShader(g_PixelShader, NULL, 0);
g_D3D11Ctx->PSSetShaderResources(0, 1, g_BackbufferIndex == 0 ? &g_BackbufferSRV : &g_BackbufferSRV2);
g_D3D11Ctx->PSSetSamplers(0, 1, &g_SamplerLinear);
g_D3D11Ctx->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
g_D3D11Ctx->RSSetState(g_RasterState);
g_D3D11Ctx->Draw(3, 0);
g_D3D11SwapChain->Present(0, 0);
FrameMark;
#if DO_COMPUTE_GPU
g_D3D11Ctx->End(g_QueryDisjoint);
// get GPU times
while (g_D3D11Ctx->GetData(g_QueryDisjoint, NULL, 0, 0) == S_FALSE) { Sleep(0); }
D3D10_QUERY_DATA_TIMESTAMP_DISJOINT tsDisjoint;
g_D3D11Ctx->GetData(g_QueryDisjoint, &tsDisjoint, sizeof(tsDisjoint), 0);
if (!tsDisjoint.Disjoint)
{
UINT64 tsBegin, tsEnd;
// Note: on some GPUs/drivers, even when the disjoint query above already said "yeah I have data",
// might still not return "I have data" for timestamp queries before it.
while (g_D3D11Ctx->GetData(g_QueryBegin, &tsBegin, sizeof(tsBegin), 0) == S_FALSE) { Sleep(0); }
while (g_D3D11Ctx->GetData(g_QueryEnd, &tsEnd, sizeof(tsEnd), 0) == S_FALSE) { Sleep(0); }
float s = float(tsEnd - tsBegin) / float(tsDisjoint.Frequency);
static uint64_t s_RayCounter;
D3D11_MAPPED_SUBRESOURCE mapped;
g_D3D11Ctx->Map(g_DataCounter, 0, D3D11_MAP_READ, 0, &mapped);
s_RayCounter += *(const int*)mapped.pData;
g_D3D11Ctx->Unmap(g_DataCounter, 0);
int zeroCount = 0;
g_D3D11Ctx->UpdateSubresource(g_DataCounter, 0, NULL, &zeroCount, 0, 0);
static float s_Time;
++s_Count;
s_Time += s;
if (s_Count > 150)
{
s = s_Time / s_Count;
sprintf_s(s_Buffer, sizeof(s_Buffer), "%.2fms (%.1f FPS) %.1fMrays/s %.2fMrays/frame frames %i\n", s * 1000.0f, 1.f / s, s_RayCounter / s_Count / s * 1.0e-6f, s_RayCounter / s_Count * 1.0e-6f, s_FrameCount);
SetWindowTextA(g_Wnd, s_Buffer);
s_Count = 0;
s_Time = 0;
s_RayCounter = 0;
}
}
#endif // #if DO_COMPUTE_GPU
}
LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
switch (message)
{
case WM_PAINT:
{
PAINTSTRUCT ps;
HDC hdc = BeginPaint(hWnd, &ps);
EndPaint(hWnd, &ps);
}
break;
case WM_DESTROY:
PostQuitMessage(0);
break;
case WM_CHAR:
if (wParam == 'a')
s_Flags = s_Flags ^ kFlagAnimate;
if (wParam == 'p')
{
s_Flags = s_Flags ^ kFlagProgressive;
s_FrameCount = 0;
}
break;
default:
return DefWindowProc(hWnd, message, wParam, lParam);
}
return 0;
}
static HRESULT InitD3DDevice()
{
ZoneScoped;
HRESULT hr = S_OK;
RECT rc;
GetClientRect(g_Wnd, &rc);
UINT width = rc.right - rc.left;
UINT height = rc.bottom - rc.top;
UINT createDeviceFlags = 0;
#ifdef _DEBUG
createDeviceFlags |= D3D11_CREATE_DEVICE_DEBUG;
#endif
D3D_FEATURE_LEVEL featureLevels[] =
{
D3D_FEATURE_LEVEL_11_0,
};
UINT numFeatureLevels = ARRAYSIZE(featureLevels);
hr = D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, nullptr, createDeviceFlags, featureLevels, numFeatureLevels, D3D11_SDK_VERSION, &g_D3D11Device, &g_D3D11FeatureLevel, &g_D3D11Ctx);
if (FAILED(hr))
return hr;
// Get DXGI factory
IDXGIFactory1* dxgiFactory = nullptr;
{
IDXGIDevice* dxgiDevice = nullptr;
hr = g_D3D11Device->QueryInterface(__uuidof(IDXGIDevice), reinterpret_cast<void**>(&dxgiDevice));
if (SUCCEEDED(hr))
{
IDXGIAdapter* adapter = nullptr;
hr = dxgiDevice->GetAdapter(&adapter);
if (SUCCEEDED(hr))
{
hr = adapter->GetParent(__uuidof(IDXGIFactory1), reinterpret_cast<void**>(&dxgiFactory));
adapter->Release();
}
dxgiDevice->Release();
}
}
if (FAILED(hr))
return hr;
// Create swap chain
DXGI_SWAP_CHAIN_DESC sd;
ZeroMemory(&sd, sizeof(sd));
sd.BufferCount = 1;
sd.BufferDesc.Width = width;
sd.BufferDesc.Height = height;
sd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
sd.BufferDesc.RefreshRate.Numerator = 60;
sd.BufferDesc.RefreshRate.Denominator = 1;
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
sd.OutputWindow = g_Wnd;
sd.SampleDesc.Count = 1;
sd.SampleDesc.Quality = 0;
sd.Windowed = TRUE;
hr = dxgiFactory->CreateSwapChain(g_D3D11Device, &sd, &g_D3D11SwapChain);
// Prevent Alt-Enter
dxgiFactory->MakeWindowAssociation(g_Wnd, DXGI_MWA_NO_ALT_ENTER);
dxgiFactory->Release();
if (FAILED(hr))
return hr;
// RTV
ID3D11Texture2D* pBackBuffer = nullptr;
hr = g_D3D11SwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), reinterpret_cast<void**>(&pBackBuffer));
if (FAILED(hr))
return hr;
hr = g_D3D11Device->CreateRenderTargetView(pBackBuffer, nullptr, &g_D3D11RenderTarget);
pBackBuffer->Release();
if (FAILED(hr))
return hr;
g_D3D11Ctx->OMSetRenderTargets(1, &g_D3D11RenderTarget, nullptr);
// Viewport
D3D11_VIEWPORT vp;
vp.Width = (float)width;
vp.Height = (float)height;
vp.MinDepth = 0.0f;
vp.MaxDepth = 1.0f;
vp.TopLeftX = 0;
vp.TopLeftY = 0;
g_D3D11Ctx->RSSetViewports(1, &vp);
return S_OK;
}
static void ShutdownD3DDevice()
{
ZoneScoped;
if (g_D3D11Ctx) g_D3D11Ctx->ClearState();
if (g_D3D11RenderTarget) g_D3D11RenderTarget->Release();
if (g_D3D11SwapChain) g_D3D11SwapChain->Release();
if (g_D3D11Ctx) g_D3D11Ctx->Release();
if (g_D3D11Device) g_D3D11Device->Release();
}

View File

@@ -0,0 +1,13 @@
struct vs2ps
{
float2 uv : TEXCOORD0;
float4 pos : SV_Position;
};
vs2ps main(uint vid : SV_VertexID)
{
vs2ps o;
o.uv = float2((vid << 1) & 2, vid & 2);
o.pos = float4(o.uv * float2(2, 2) + float2(-1, -1), 0, 1);
return o;
}

View File

@@ -0,0 +1,24 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org>

77
extra/color.cpp Normal file
View File

@@ -0,0 +1,77 @@
#include <algorithm>
#include <string.h>
#include <stdio.h>
#include <stdint.h>
#include <math.h>
inline float sqrtfast( float v )
{
union
{
int i;
float f;
} u;
u.f = v;
u.i -= 1 << 23;
u.i >>= 1;
u.i += 1 << 29;
return u.f;
}
inline float linear2sRGB( float v )
{
float s1 = sqrtfast( v );
float s2 = sqrtfast( s1 );
float s3 = sqrtfast( s2 );
return 0.585122381f * s1 + 0.783140355f * s2 - 0.368262736f * s3;
}
int lerp( int v0, int v1, float t )
{
return int( ( 1-t ) * v0 + t * v1 );
}
inline float sRGB2linear( float v )
{
return v * ( v * ( v * 0.305306011f + 0.682171111f ) + 0.012522878f );
}
int main()
{
int c0 = std::min( 255, int( sRGB2linear( 1.f ) * 255 ) );
int c1 = std::min( 255, int( sRGB2linear( 0x44 / 255.f ) * 255 ) );
int s0 = std::min( 255, int( sRGB2linear( 1.f ) * 255 * 0.5 ) );
int s1 = std::min( 255, int( sRGB2linear( 0x44 / 255.f ) * 255 * 0.5 ) );
float target = 80.f;
uint32_t t[256];
memset( t, 0, sizeof( uint32_t ) * 256 );
for( int i=1; i<128; i++ )
{
float m = (i-1) / target;
int l0 = std::min( 255, lerp( s0, c0, m ) );
int l1 = std::min( 255, lerp( s1, c1, m ) );
int g0 = std::min( 255, int( linear2sRGB( l0/255.f ) * 255 ) );
int g1 = std::min( 255, int( linear2sRGB( l1/255.f ) * 255 ) );
g0 = l0;
g1 = l1;
t[i] = 0xFF000000 | ( g1 << 16 ) | ( g0 << 8 ) | g1;
t[uint8_t(-i)] = 0xFF000000 | ( g1 << 16 ) | ( g1 << 8 ) | g0;
}
printf( "uint32_t MemDecayColor[256] = {\n" );
for( int i=0; i<256; i += 8 )
{
printf( " " );
for( int j=i; j<i+8; j++ )
{
printf( " 0x%X,", t[j] );
}
printf( "\n" );
}
printf( "};\n" );
}

22
extra/dxt1divtable.c Normal file
View File

@@ -0,0 +1,22 @@
#include <stdint.h>
#include <stdio.h>
int main()
{
for( int i=0; i<255*3+1; i++ )
{
// replace 4 with 2 for ARM NEON table
uint32_t range = ( 4 << 16 ) / ( 1+i );
if( range > 0xFFFF ) range = 0xFFFF;
if( i % 16 == 15 )
{
printf( "0x%04x,\n", range );
}
else
{
printf( "0x%04x, ", range );
}
}
printf( "\n" );
return 0;
}

36
extra/dxt1table.c Normal file
View File

@@ -0,0 +1,36 @@
#include <stdint.h>
#include <stdio.h>
static const uint8_t IndexTable[4] = { 1, 3, 2, 0 };
int convert( int v )
{
int v0 = v & 0x3;
int v1 = ( v >> 2 ) & 0x3;
int v2 = ( v >> 4 ) & 0x3;
int v3 = ( v >> 6 );
int t0 = IndexTable[v0];
int t1 = IndexTable[v1];
int t2 = IndexTable[v2];
int t3 = IndexTable[v3];
return t0 | ( t1 << 2 ) | ( t2 << 4 ) | ( t3 << 6 );
}
int main()
{
for( int i=0; i<256; i++ )
{
if( i % 16 == 15 )
{
printf( "%i,\n", convert( i ) );
}
else
{
printf( "%i,\t", convert( i ) );
}
}
printf( "\n" );
return 0;
}

4
extra/systrace/build.sh Executable file
View File

@@ -0,0 +1,4 @@
#!/bin/sh
clang tracy_systrace.c -s -Os -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-stack-protector -Wl,-z,norelro -Wl,--build-id=none -nostdlib -ldl -o tracy_systrace
strip --strip-all -R .note.gnu.gold-version -R .comment -R .note -R .note.gnu.build-id -R .note.ABI-tag -R .eh_frame -R .eh_frame_hdr -R .gnu.hash -R .gnu.version -R .got tracy_systrace
sstrip -z tracy_systrace

View File

@@ -0,0 +1,53 @@
#include <fcntl.h>
#include <poll.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <dlfcn.h>
enum { BufSize = 64*1024 };
typedef int (*open_t)( const char*, int, ... );
typedef void (*exit_t)( int );
typedef int (*poll_t)( struct pollfd*, nfds_t, int timeout );
typedef int (*nanosleep_t)( const struct timespec*, struct timespec* );
typedef ssize_t (*read_t)( int, void*, size_t );
typedef ssize_t (*write_t)( int, const void*, size_t );
void _start()
{
void* libc = dlopen( "libc.so", RTLD_LAZY );
open_t sym_open = dlsym( libc, "open" );
exit_t sym_exit = dlsym( libc, "exit" );
poll_t sym_poll = dlsym( libc, "poll" );
nanosleep_t sym_nanosleep = dlsym( libc, "nanosleep" );
read_t sym_read = dlsym( libc, "read" );
write_t sym_write = dlsym( libc, "write" );
char buf[BufSize];
int kernelFd = sym_open( "/sys/kernel/debug/tracing/trace_pipe", O_RDONLY );
if( kernelFd < 0 ) sym_exit( 0 );
struct pollfd pfd;
pfd.fd = kernelFd;
pfd.events = POLLIN | POLLERR;
struct timespec sleepTime;
sleepTime.tv_sec = 0;
sleepTime.tv_nsec = 1000 * 1000 * 10;
for(;;)
{
while( sym_poll( &pfd, 1, 0 ) <= 0 ) sym_nanosleep( &sleepTime, NULL );
const int rd = sym_read( kernelFd, buf, BufSize );
if( rd <= 0 ) break;
sym_write( STDOUT_FILENO, buf, rd );
}
sym_exit( 0 );
}

42
extra/x11_colors.c Normal file
View File

@@ -0,0 +1,42 @@
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE* f = fopen( "rgb.txt", "rb" );
char buf[1024];
int off = 0;
for(;;)
{
int sz = fread( buf+off, 1, 1, f );
if( buf[off] == '\r' || buf[off] == '\n' || sz == 0 )
{
if( off == 0 )
{
if( sz == 0 ) break;
continue;
}
int ok = 1;
for( int i=13; i<off; i++ )
{
if( buf[i] == ' ' ) ok = 0;
}
if( ok == 1 )
{
buf[off] = '\0';
int r, g, b;
sscanf( buf, "%i %i %i", &r, &g, &b );
printf( "%s = 0x%02x%02x%02x,\n", buf+13, r, g, b );
}
off = 0;
}
else
{
off++;
}
if( sz == 0 ) break;
}
fclose( f );
}

BIN
icon/icon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

BIN
icon/icon.pdf Normal file

Binary file not shown.

BIN
icon/icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 925 B

138
icon/icon.svg Normal file
View File

@@ -0,0 +1,138 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="100mm"
height="100mm"
viewBox="0 0 100 100"
version="1.1"
id="svg8"
inkscape:version="0.92.4 (5da689c313, 2019-01-14)"
sodipodi:docname="icon.svg">
<defs
id="defs2" />
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="1.4"
inkscape:cx="103.40559"
inkscape:cy="137.55963"
inkscape:document-units="mm"
inkscape:current-layer="layer1"
showgrid="true"
showguides="true"
inkscape:guide-bbox="true"
inkscape:window-width="2560"
inkscape:window-height="1377"
inkscape:window-x="-8"
inkscape:window-y="-8"
inkscape:window-maximized="1">
<inkscape:grid
type="xygrid"
id="grid3713"
units="mm"
spacingx="0.99999999"
spacingy="0.99999999" />
<sodipodi:guide
position="49.999999,90.999998"
orientation="1,0"
id="guide4820"
inkscape:locked="false" />
</sodipodi:namedview>
<metadata
id="metadata5">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(0,-197)">
<rect
style="fill:#d6e4ff;fill-opacity:1;stroke:#000000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;paint-order:normal;stroke-dashoffset:0"
id="rect3988"
width="90"
height="90"
x="5"
y="201.99998"
rx="0.52916664"
ry="1.4214079e-06" />
<g
id="g4861"
transform="translate(-5.9999905e-7,-1.9999986)">
<rect
ry="1.421408e-06"
rx="0.52916664"
y="217"
x="14.999999"
height="10"
width="70"
id="rect4803"
style="fill:#006ed6;fill-opacity:1;stroke:#004586;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" />
<rect
ry="1.421408e-06"
rx="0.52916664"
y="228"
x="38"
height="10"
width="24"
id="rect4803-8"
style="fill:#006ed6;fill-opacity:1;stroke:#004586;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" />
<rect
ry="1.421408e-06"
rx="0.52916664"
y="239"
x="39"
height="10"
width="22"
id="rect4803-8-0"
style="fill:#006ed6;fill-opacity:1;stroke:#004586;stroke-width:0.99999994;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" />
<rect
ry="1.421408e-06"
rx="0.52916664"
y="250"
x="40"
height="10"
width="20"
id="rect4803-8-7"
style="fill:#006ed6;fill-opacity:1;stroke:#004586;stroke-width:0.99999994;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" />
<rect
ry="1.421408e-06"
rx="0.52916664"
y="261"
x="41"
height="10"
width="17.999998"
id="rect4803-8-77"
style="fill:#006ed6;fill-opacity:1;stroke:#004586;stroke-width:0.99999994;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" />
<rect
ry="1.421408e-06"
rx="0.52916664"
y="272"
x="42.000004"
height="10"
width="15.999996"
id="rect4803-8-1"
style="fill:#006ed6;fill-opacity:1;stroke:#004586;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" />
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.7 KiB

Some files were not shown because too many files have changed in this diff Show More