Compare commits

...

2665 Commits
sigint ... v0.7

Author SHA1 Message Date
Bartosz Taudul
cbcf393332 Release 0.7.0. 2020-06-11 14:30:34 +02:00
Bartosz Taudul
3992eb0542 Move RDO threshold logic to tables. 2020-06-11 14:09:19 +02:00
Bartosz Taudul
2ab605d232 Use dedicated max-of-three-elements function. 2020-06-11 13:45:04 +02:00
Bartosz Taudul
2608ceca05 Improve memory access patterns in RDO compression. 2020-06-11 13:36:17 +02:00
Bartosz Taudul
2723144678 Don't create empty ghost children vectors. 2020-06-11 12:27:17 +02:00
Bartosz Taudul
99544f4655 Clamp previous ghost zones to current sample time. 2020-06-11 12:18:44 +02:00
Bartosz Taudul
191ff93822 Optimize DXT1 index order fixing. 2020-06-11 03:04:09 +02:00
Bartosz Taudul
aa3b0de1f5 Use proper cpuid flag. 2020-06-10 16:25:19 +02:00
Bartosz Taudul
0744355a67 Fix off-by-one (found by lispbub, #43). 2020-06-10 12:21:58 +02:00
Bartosz Taudul
d1ef8ea90b Set owner of file dialogs on windows. 2020-06-10 01:52:17 +02:00
Bartosz Taudul
483bc3f549 Add D3D 12 to README. 2020-06-09 21:00:05 +02:00
Bartosz Taudul
45eb1a6ddb Update NEWS. 2020-06-09 18:18:11 +02:00
Bartosz Taudul
e932cbe162 Add missing OpenCL mentions. 2020-06-09 18:17:43 +02:00
Bartosz Taudul
8bbc40beb2 Bump version, protocol for D3D12. 2020-06-09 11:20:50 +02:00
Bartosz Taudul
5eff06e809 Merge pull request #40 from Xenonic/master
Direct3D 12 GPU Profiling
2020-06-09 11:20:19 +02:00
Andrew Depke
39479b8d93 Merge branch 'master' into master 2020-06-08 23:50:20 -06:00
Andrew Depke
7127e36217 Detailed TracyD3D12NewFrame and synchronization 2020-06-08 23:40:16 -06:00
Andrew Depke
501b356b2b Added semi-automatic query synchronization for N-buffered rendering 2020-06-08 22:57:27 -06:00
Andrew Depke
9473272512 Fixed queryId not looping back 2020-06-08 16:57:31 -06:00
Andrew Depke
6e03bb1c2c Reverted out-of-order execution sorting 2020-06-08 16:24:20 -06:00
Bartosz Taudul
284d49b34b Change rdtscp check to rdtsc check. 2020-06-08 19:35:42 +02:00
Bartosz Taudul
1e8c842444 Update manual. 2020-06-08 18:27:41 +02:00
Bartosz Taudul
929d399995 Fix determination of line width. 2020-06-08 14:02:11 +02:00
Andrew Depke
c8bfa43f22 Added query data sorting to support out-of-order execution 2020-06-08 04:02:54 -06:00
Andrew Depke
c70922f3db Work on nested zones support 2020-06-07 04:55:20 -06:00
Andrew Depke
d15b83b669 Updated manual for Direct3D 12 2020-06-07 02:05:51 -06:00
Andrew Depke
03993072c5 Added mapping range to prevent debug layer warnings 2020-06-07 01:03:43 -06:00
Andrew Depke
bffcc52536 Updated AUTHORS. 2020-06-07 00:51:52 -06:00
Andrew Depke
3282a8d27c Added server support for D3D12 contexts 2020-06-07 00:40:08 -06:00
Andrew Depke
4be5e0bfa1 Initial Direct3D 12 profiling implementation 2020-06-07 00:25:43 -06:00
Bartosz Taudul
c384ec132f Fix position of source separator line. 2020-06-06 20:50:24 +02:00
Bartosz Taudul
9c49ee3dd3 Don't mark windows as write-modified when only reading data. 2020-06-06 20:46:46 +02:00
Bartosz Taudul
bee70ee72b Add OpenCL to description. 2020-06-06 15:37:16 +02:00
Bartosz Taudul
e78bbf3492 Update NEWS. 2020-06-06 15:36:26 +02:00
Bartosz Taudul
eb497f2b9f Symbol resolution should be possible on iOS. 2020-06-06 15:00:57 +02:00
Bartosz Taudul
d35d9b60ff Bump protocol and version for OpenCL support. 2020-06-06 14:57:48 +02:00
Bartosz Taudul
57f1ef05c7 Merge pull request #31 from mcleary/opencl-support
Add OpenCL trace support
2020-06-06 14:56:29 +02:00
Bartosz Taudul
06158de6da Update README. 2020-06-06 12:37:05 +02:00
Bartosz Taudul
ecfeb01aad Set source view content width to max value, regardless of clipping. 2020-06-06 12:37:05 +02:00
Bartosz Taudul
4dc07d6e60 Merge pull request #39 from lbuchy/FixZoneScopedNCS
Fix missing comma in ZoneScopedNCS macro
2020-06-06 12:08:12 +02:00
Logan Buchy
22ef78333d Fix missing comma in ZoneScopedNCS macro 2020-06-05 21:30:16 -07:00
Bartosz Taudul
c33d9cbc6e Merge pull request #38 from benvanik/patch-1
Capture Vulkan timestamps at BOTTOM_OF_PIPE instead of TOP_OF_PIPE
2020-06-06 00:05:36 +02:00
Ben Vanik
5029c12de1 Capture Vulkan timestamps at BOTTOM_OF_PIPE instead of TOP_OF_PIPE
This is so that the GPU will wait for all previous commands to finish (hit bottom of pipe) before taking the enter timestamp and then whatever was recorded within the scope to finish (hit bottom of pipe) before taking the leave timestamp. TOP_OF_PIPE (particularly at the scope exit) was just recording the time of when the recorded work within the VkCtxScope was starting (hit top of the pipe) and not waiting for it to complete.
2020-06-05 15:02:04 -07:00
Bartosz Taudul
a3a7183293 Disable inclusion of non-windows in ctrl-tab list. 2020-06-05 19:23:27 +02:00
Bartosz Taudul
de357b6193 Update ImGui to docking@5d472c489. 2020-06-05 19:21:07 +02:00
Bartosz Taudul
16b116ee83 Enable horizontal scrollbar in source view. 2020-06-05 19:04:12 +02:00
Thales Sabino
a46f83364e Add OpenCL trace support
- Adds the file TracyOpenCL.hpp which contains the API to annotate OpenCL applications
- It works in a similar fashion to the Vulkan annotations
- Adds an example OpenCL application in examples/OpenCLVectorAdd
- Adds "OpenCL Context" to the UI
- Manual entry for annotating OpenCL zones
2020-06-05 10:15:47 +01:00
Bartosz Taudul
6793b34fb5 Update zstd to 1.4.5. 2020-06-04 21:03:27 +02:00
Bartosz Taudul
71d789063e Show only relevant options in asm view. 2020-06-04 19:59:13 +02:00
Bartosz Taudul
3fbc2c8036 Increase buffer size. 2020-06-04 19:54:11 +02:00
Bartosz Taudul
e19a981b8c Fix display of unknown source locations in asm view. 2020-06-04 19:39:43 +02:00
Bartosz Taudul
ce2e01bcd7 Skip processing uarch data if AT&T mode is enabled. 2020-06-04 19:30:44 +02:00
Bartosz Taudul
adc2c12a67 Clear asm data when opening just source view. 2020-06-04 19:20:24 +02:00
Bartosz Taudul
215a58afad Mention dsymutil. 2020-06-04 17:58:48 +02:00
Bartosz Taudul
65314e0c90 Set proper SymbolData contents on symbol retrieval error. 2020-06-04 17:46:39 +02:00
Bartosz Taudul
a33caaaaaa Merge pull request #36 from graydon/warning-fixes
Warning fixes
2020-06-04 11:35:59 +02:00
Graydon Hoare
28a29d071f only write SysTime::used if fscanf succeeds 2020-06-03 19:54:49 -07:00
Graydon Hoare
93b7b5a8e7 ensure regs is initialized even if cpuid fails 2020-06-03 19:54:48 -07:00
Bartosz Taudul
c52936855e Don't read beyond buffer end. 2020-06-04 02:46:46 +02:00
Bartosz Taudul
917da8cff3 Properly terminate combo table. 2020-06-04 02:41:45 +02:00
Bartosz Taudul
0891245b49 Describe incompatible protocol in the tooltip. 2020-06-04 02:15:21 +02:00
Bartosz Taudul
f227bb4d9c Update instruction tables to "may 2020". 2020-06-03 21:17:52 +02:00
Bartosz Taudul
9e500bc897 Handle merging inlined ghost zones. 2020-05-31 21:47:52 +02:00
Bartosz Taudul
994b88f898 Ghost index is only available is statistics are enabled. 2020-05-31 15:24:11 +02:00
Bartosz Taudul
067189c355 Extract ghost zone adding to a separate function. 2020-05-31 14:51:33 +02:00
Bartosz Taudul
d9e97ce772 Add postponed ghost zones when frame data becomes available. 2020-05-31 14:31:39 +02:00
Bartosz Taudul
1154343a20 Don't add ghost zones if full callstack data isn't available. 2020-05-31 14:17:54 +02:00
Bartosz Taudul
de5f8df9d3 UpdateSampleStatistics() returns if all samples were processed.
This effectively is a check if all frames in a callstack are available.
2020-05-31 14:05:16 +02:00
Bartosz Taudul
940b598cf8 Update manual. 2020-05-30 15:39:35 +02:00
Bartosz Taudul
7ce915c4f6 Allow display of symbol address in statistics view. 2020-05-30 15:39:34 +02:00
Bartosz Taudul
5955efabb0 Use combo box for smart/entry/sample location selection. 2020-05-30 15:39:34 +02:00
Bartosz Taudul
8d149c59f8 Mention the failures of capstone library. 2020-05-30 15:39:34 +02:00
Bartosz Taudul
54eb75b063 Report symbol entry address in inline function discovery. 2020-05-30 15:38:59 +02:00
Bartosz Taudul
ff27656533 Backport some fixes from libbacktrace upstream repo. 2020-05-30 14:23:29 +02:00
Bartosz Taudul
71102e3c6d Update NEWS. 2020-05-29 19:01:26 +02:00
Bartosz Taudul
85a10f292b Merge pull request #35 from graydon/localhost-only
Add TRACY_ONLY_LOCALHOST macro to avoid listening on all interfaces.
2020-05-29 11:23:16 +02:00
Graydon Hoare
e76b8ae423 Add TRACY_ONLY_LOCALHOST macro to avoid listening on all interfaces. 2020-05-28 22:13:06 -07:00
Bartosz Taudul
ce0d8d9fb7 Merge pull request #33 from graydon/minor-fixes
Minor fixes
2020-05-28 21:12:02 +02:00
Graydon Hoare
aace9bc76e Add .deps and .dirstamp to .gitignore to allow use as submodule in automake projects. 2020-05-28 11:17:43 -07:00
Graydon Hoare
afac7760ce Add Zone{Text,Value,Name}V macros for conditionally-compiled calls to varname.{Text,Value,Name} 2020-05-28 11:17:42 -07:00
Bartosz Taudul
47b8385c5a Merge pull request #32 from jmanc3/support-horizontal-scrolling
Added support for horizontal scrolling.
2020-05-28 12:00:57 +02:00
jmanc3
e22ece8e79 Added support for horizontal scrolling. 2020-05-27 21:11:49 -05:00
Bartosz Taudul
54a029356d Explicitly store GPU context type. 2020-05-27 18:16:53 +02:00
Bartosz Taudul
4f3934ae6a Use correct symbol address. 2020-05-26 02:08:30 +02:00
Bartosz Taudul
898a10ef82 Display base function is symbol as '[ - self - ]'. 2020-05-25 21:42:01 +02:00
Bartosz Taudul
39ce605711 Update manual. 2020-05-25 01:19:42 +02:00
Bartosz Taudul
734ee0b001 Update NEWS. 2020-05-25 01:16:25 +02:00
Bartosz Taudul
bdba77c0f5 Allow displaying self zone time in frames overview. 2020-05-25 01:15:06 +02:00
Bartosz Taudul
74a79a6921 Add zone child time getter with clamping to time range. 2020-05-25 01:14:44 +02:00
Bartosz Taudul
e0d7ffe754 Update manual. 2020-05-24 16:25:26 +02:00
Bartosz Taudul
7dff7f3e1a Update NEWS. 2020-05-24 16:18:58 +02:00
Bartosz Taudul
7ceb4005cd Don't use alloca in inlined functions inside a loop. 2020-05-24 16:17:54 +02:00
Bartosz Taudul
2b304581cf Implement transfer of integral values for zones. 2020-05-24 16:13:09 +02:00
Bartosz Taudul
1bcde1f2ff Only one branch. 2020-05-24 13:41:41 +02:00
Bartosz Taudul
45a9878193 Don't depend on zero-termination of source code. 2020-05-23 17:07:23 +02:00
Bartosz Taudul
d3b60f913d Extend "first frame" time. 2020-05-23 16:52:58 +02:00
Bartosz Taudul
1ddd395aad Properly setup View, when using command line parameters. 2020-05-23 16:49:27 +02:00
Bartosz Taudul
9e8b68b43d Update NEWS. 2020-05-23 16:40:54 +02:00
Bartosz Taudul
0b900c0a3c Add crash popup. 2020-05-23 16:40:15 +02:00
Bartosz Taudul
64b8e23458 Put win32 artifacts into better directory structure. 2020-05-23 16:17:23 +02:00
Bartosz Taudul
08101061fe Update manual. 2020-05-23 16:10:13 +02:00
Bartosz Taudul
abbf003c24 Update NEWS. 2020-05-23 15:58:23 +02:00
Bartosz Taudul
18b2e3e5be Don't depend on View in Worker. 2020-05-23 15:53:58 +02:00
Bartosz Taudul
2f35c785ee Save/load cached source files. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
5b96809f6f Add notification when source file is loaded from cache. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
dc0f2db3f2 CheckString() doesn't check if string query is still pending. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
ee22cf3b0c Use cached source files. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
d470202bca Cached source files accessor. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
964d65fd3b Cache callstack/symbol/code source files. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
f7341141cf Cache zone source location source files. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
97a5957adc Display source file cache stats. 2020-05-23 15:44:19 +02:00
Bartosz Taudul
ae1155852a Source file cache plumbing. 2020-05-23 15:44:18 +02:00
Bartosz Taudul
f38464cf55 Rename SymbolCodeData to MemoryBlock. 2020-05-23 14:54:16 +02:00
Bartosz Taudul
d052e4c79d Substitution-less source file validator. 2020-05-23 14:54:16 +02:00
Bartosz Taudul
fa2559b3f0 Add StringRef hasher, comparator. 2020-05-23 14:54:16 +02:00
Bartosz Taudul
670a292416 Report whether string is available, or if a query was dispatched. 2020-05-23 13:08:57 +02:00
kudansam
1151ec1328 Fix defines when compiling with -Werror=undef
Some ARM defines fail when compiling with -Werror=undef as they rely on
the missing define mapping to 0.
2020-05-22 15:48:59 +02:00
Bartosz Taudul
619523b43e Symbol might not have ip statistics. 2020-05-21 19:20:20 +02:00
Bartosz Taudul
41972f62a3 Merge pull request #27 from mcleary/prevent-crash-when-exiting
Fix crash when running Tracy from DLLs
2020-05-21 15:32:29 +02:00
Thales Sabino
c2c234cf5a Fix crash when running Tracy from DLLs
Instantiating Tracy from within a DLL will tie its internal threads life-time to the DLL. Windows does not guarantee
that threads will be alive after the main function. This has implications in the Profiler dtor since will try to perform
some deallocations, however, _memory_deallocate_large will try to get the heap of the current thread which can
be invalid at the point of shutdown causing a crash. Checking the pointer here will won't make TRACE_NO_EXIT
work, but it will prevent the Profiler from crashing.
2020-05-21 14:26:29 +01:00
Bartosz Taudul
e9c13b254b Get exclusive samples count for a proper symbol.
If inlines are grouped under a base symbol, the base symbol data
includes all the inline sample counts. This was interfering with control
logic determining if sample parents window can be displayed.
2020-05-19 19:10:40 +02:00
Bartosz Taudul
e376866436 Check for exclusive samples in inlined symbols list. 2020-05-19 19:06:01 +02:00
Bartosz Taudul
4eb78f5c86 Auto-initialize profiler in delayed init scenario. 2020-05-19 13:55:54 +02:00
Bartosz Taudul
fad7e72fd4 Harden against uninitialized rpmalloc.
Initialize rpmalloc either by explicitly calling InitRPMallocThread(),
or by forcing initialization of thread local variables block.
2020-05-19 13:51:11 +02:00
ikrima
dd4c2cf9fa FIX: TracyC Api incorrect spelling of function/build break 2020-05-19 01:22:32 +02:00
Bartosz Taudul
883264c0df Fix typo. 2020-05-18 18:25:49 +02:00
Bartosz Taudul
3a302c18bc Replace bsf with tzcnt. 2020-05-17 01:09:57 +02:00
Bartosz Taudul
665c6d6699 Don't check for allocation validity.
Will fail anyway right afterwards, if nullptr.
2020-05-16 16:40:25 +02:00
Bartosz Taudul
8c1e53f65a Update manual. 2020-05-16 14:56:11 +02:00
Bartosz Taudul
f9ff6f4161 Clamping is not needed. 2020-05-15 02:30:52 +02:00
Bartosz Taudul
f6663b187a Add ghost zone label to ghost zone tooltips. 2020-05-15 02:27:48 +02:00
Bartosz Taudul
ad3cac8578 Incorporate thread colors in ghost zones. 2020-05-15 02:25:28 +02:00
Bartosz Taudul
23de7cb294 Improve source file tooltip. 2020-05-15 01:39:14 +02:00
Bartosz Taudul
03c3a3e7c7 Select proper operand for LEA processing in AT&T mode. 2020-05-15 01:37:08 +02:00
Bartosz Taudul
992fba7e07 Move profiler FPS display to upper-right corner. 2020-05-14 17:53:46 +02:00
Bartosz Taudul
ba9b35dc3b Don't memcpy to nullptr. 2020-05-14 17:38:48 +02:00
Bartosz Taudul
c8ff8540f9 Sleep when connection attempt fails. 2020-05-14 02:27:57 +02:00
Bartosz Taudul
d10ef8c501 Use non-blocking connect() call.
Exploit connect() error codes to determine whether connection was
established. Using poll/select proved to be problematic.
2020-05-14 02:27:04 +02:00
Bartosz Taudul
b166451750 Keep viewShutdown in atomic. 2020-05-13 19:20:20 +02:00
Bartosz Taudul
1fdae70cf8 Store socket handle in atomic. 2020-05-13 19:12:52 +02:00
Bartosz Taudul
7ea9c4baf2 Proper locking for queue/in-flight queries. 2020-05-13 18:52:20 +02:00
Bartosz Taudul
7d57a2ea6d Cast to correct type. 2020-05-13 18:38:40 +02:00
Bartosz Taudul
3f3378e13a Fix typo. 2020-05-13 18:37:37 +02:00
Bartosz Taudul
884de148c9 Target native architecture. 2020-05-13 18:35:56 +02:00
Bartosz Taudul
21c168156c Restrict client to C++11. 2020-05-13 18:15:12 +02:00
Bartosz Taudul
29d7639c38 Downgrade test application C++ version to C++11. 2020-05-13 18:11:55 +02:00
Bartosz Taudul
ce15ad763f Check for shared mutex availability. 2020-05-13 18:11:48 +02:00
Bartosz Taudul
ef4c690c32 Update manual. 2020-05-13 18:06:39 +02:00
Bartosz Taudul
864d86e8b6 Merge pull request #21 from ethercrow/import-chrome-args-metadata
import-chrome: import zone metadata from "args" key
2020-05-12 16:07:57 +02:00
Dmitry Ivanov
bcbf8edd8a import-chrome: import zone metadata from "args" key 2020-05-12 16:05:33 +02:00
Bartosz Taudul
8e11cd5ebb Add support for custom text in ImportEventTimeline. 2020-05-12 11:44:36 +02:00
Bartosz Taudul
1c8aece53c No saving if there's no file selector. 2020-05-11 22:13:07 +02:00
Bartosz Taudul
2195e032a6 Update manual. 2020-05-11 22:11:55 +02:00
Bartosz Taudul
e330b96b3d Allow saving only lines within jump range. 2020-05-11 21:59:45 +02:00
Bartosz Taudul
0790e92cad Support saving asm range. 2020-05-11 21:12:43 +02:00
Bartosz Taudul
9d5d116014 Extract asm saving to a separate function. 2020-05-11 21:08:50 +02:00
Bartosz Taudul
a30a0b852f Update manual. 2020-05-11 19:34:25 +02:00
Bartosz Taudul
6ac970a93f Merge pull request #19 from ethercrow/patch-1
Make TracySourceView.cpp build with somewhat old clang
2020-05-11 16:27:01 +02:00
Dmitry Ivanov
8f509dd41a Make TracySourceView.cpp build with somewhat old clang
This was the only issue preventing build on macOS High Sierra with whatever version of clang it has.
2020-05-11 16:23:50 +02:00
Bartosz Taudul
dafecf2a19 Allow enumerating asm lines from a given base instruction. 2020-05-11 13:35:49 +02:00
Bartosz Taudul
8a7913c095 Update xxh3 to master @ ea9c098e93. 2020-05-11 02:33:12 +02:00
Bartosz Taudul
e9f93f5bc7 Send lean frame images. 2020-05-10 20:16:08 +02:00
Bartosz Taudul
03b5dfacd6 Send lean callstack samples. 2020-05-10 20:00:51 +02:00
Bartosz Taudul
09388f3c99 Send lean callstack allocs. 2020-05-10 19:56:36 +02:00
Bartosz Taudul
5a774c82cc Send lean callstacks. 2020-05-10 19:43:12 +02:00
Bartosz Taudul
f0ade07be8 Send lean memory callstacks. 2020-05-10 19:28:08 +02:00
Bartosz Taudul
2dc07fca0b Send lean allocated source locations. 2020-05-10 19:20:59 +02:00
Bartosz Taudul
94d0232ed3 Update manual. 2020-05-10 17:08:30 +02:00
Bartosz Taudul
50c66174dd Add ability to show callstack to an asm line. 2020-05-10 16:56:38 +02:00
Bartosz Taudul
2f8e817e16 Make PackPointer() part of worker's interface. 2020-05-10 16:56:13 +02:00
Bartosz Taudul
d84495d0e1 Mark inline symbols. 2020-05-10 16:13:19 +02:00
Bartosz Taudul
fdd50840a7 Add a function for showing sample parents. 2020-05-10 16:07:45 +02:00
Bartosz Taudul
91bb392678 Avoid executing strlen() twice in assert-enabled builds. 2020-05-10 15:55:12 +02:00
Bartosz Taudul
507b5e3a46 Merge pull request #18 from txfx/fix-gcc-pedantic
Remove extra semicolons at the end of namespaces
2020-05-10 15:44:15 +02:00
txfx
412d252eea Remove extra semicolons at the end of namespaces 2020-05-10 15:32:39 +02:00
Bartosz Taudul
dee808dd1b Display jump labels in the UI. 2020-05-09 15:14:33 +02:00
Bartosz Taudul
2543bb5e63 Shorter jump labels. 2020-05-09 15:14:25 +02:00
Bartosz Taudul
0de39a1d33 Construct location table during disassembly. 2020-05-09 14:58:06 +02:00
Bartosz Taudul
281f13f4c3 Update manual. 2020-05-09 13:57:06 +02:00
Bartosz Taudul
8cbd209ede Display number of selected lines. 2020-05-09 13:53:11 +02:00
Bartosz Taudul
b9a4446fc1 Update manual. 2020-05-09 13:38:58 +02:00
Bartosz Taudul
8caf6b02c6 Allow switching between Intel and AT&T assembly syntax. 2020-05-09 12:58:09 +02:00
Bartosz Taudul
16aca7e6c8 Update NEWS. 2020-05-09 02:38:01 +02:00
Bartosz Taudul
ad4387a0c0 Implement saving disassembly to a file. 2020-05-09 02:37:18 +02:00
Bartosz Taudul
8b56dd5468 Prevent division by zero. 2020-05-08 01:55:03 +02:00
Bartosz Taudul
70818b49b9 Force connection popup boolean should decay. 2020-05-08 01:49:15 +02:00
Bartosz Taudul
5b29e65bc5 Initial value of DecayValue might be active. 2020-05-08 01:48:37 +02:00
Bartosz Taudul
211dfd7f7e Bump imgui to ecf82ca8066. 2020-05-08 00:05:59 +02:00
Bartosz Taudul
2da6c6b6f5 Fix enforced connection popup position wrt viewports. 2020-05-07 15:27:11 +02:00
Bartosz Taudul
b97b3700c3 Drop appveyor badge from README. 2020-05-07 02:36:34 +02:00
Bartosz Taudul
2ca6b6f2fe Implement display of grouped instruction pointer statistics. 2020-05-07 02:33:37 +02:00
Bartosz Taudul
15454d2253 Select microarchitecture basing on cpuid. 2020-05-07 00:53:31 +02:00
Bartosz Taudul
eab3adfa1d Display CPU info. 2020-05-06 19:18:17 +02:00
Bartosz Taudul
6fda74e281 Save/load cpu id. 2020-05-06 19:18:17 +02:00
Bartosz Taudul
a47c7d467f Send x86 processor info in welcome message. 2020-05-06 19:18:17 +02:00
Bartosz Taudul
f13413922d Use one cpuid implementation. 2020-05-06 18:52:36 +02:00
Bartosz Taudul
322e39d9ad Terser CI names. 2020-05-06 13:30:01 +02:00
Bartosz Taudul
468d3d1077 Add missing include. 2020-05-06 12:50:09 +02:00
Bartosz Taudul
aa7e5b87c4 Merge pull request #15 from rokups/rk/misc-fixes
Misc fixes
2020-05-06 11:08:08 +02:00
Bartosz Taudul
bcaa07cdb4 Add latex manual CI job. 2020-05-06 02:05:53 +02:00
Bartosz Taudul
5dc20aaef3 Upload win32 artifacts. 2020-05-06 02:05:53 +02:00
Bartosz Taudul
14b9124a03 Add gcc-based CI script. 2020-05-06 01:16:39 +02:00
Rokas Kupstys
e40f0c4f2e Fix MinGW build. 2020-05-05 13:23:46 +03:00
Rokas Kupstys
04eaf358d0 Fix linking error in some configurations. Unresolved CallTrace symbol was observed in static MSVC RelWithDebInfo build (but not in debug build). 2020-05-05 13:23:46 +03:00
Rokas Kupstys
6727cc2da4 Add empty TRACY_API instead of using dllexport for static builds on windows. Using dllexport is not correct, because it marks APIs in static lib for export and these APIs would get exported from a DLL that links to tracy.
Make API use TRACY_EXPORTS, and replace TRACYPROFILER_EXPORTS with TRACY_EXPORTS in vcxproj projects.

Swap dllimport with dllexport. Reason for this is a common idiom in CMake: target_compile_definitions(tracy PUBLIC -DTRACY_IMPORTS PRIVATE -DTRACY_EXPORTS). This idiom adds both defines in tracy target, but targets that link to tracy only get TRACY_IMPORTS. Swapped statements ensure that tracy always dllexports it's api and consuming targets always dllimport it.
2020-05-05 13:23:46 +03:00
Bartosz Taudul
3d8ebf9157 Add MSVC github actions config 2020-05-05 02:03:18 +02:00
Bartosz Taudul
c87d8f017f Disable viewports. 2020-05-05 00:48:53 +02:00
Bartosz Taudul
57819055da Merge pull request #13 from ikrima/viewports-pr
ImGui Multiviewport fixes
2020-05-05 00:48:08 +02:00
ikrima
1b5879e176 ImGui Multiviewport fixes
- set ImGuiConfigFlags_ViewportsEnable
- correct render loop logic with viewport api calls, SetNextWindowViewport(), UpdatePlatformWindows(), RenderPlatformWindowsDefault()
- Fix: coords in abs space now, SetNextWindowPos()

NOTE:
- I have viewports turned on by default so you can easy test (comment out io.ConfigFlags |= ImGuiConfigFlags_ViewportsEnable; and you get old behavior)
- Jankiness with multiviewports isn't bc perf hit; it's bc profiler reduces it's tick rate when it's not in focus. So, that bit of logic needs to be updated if you really care
- I haven't encountered any issues over past week but discount that by 50% since i'm new to tracy. No promises some UI wasn't regresssed
- Key things to watch out for is enabling viewports turns ImGui into using absolute monitor coords instead of window coords (ie SetPosition(0,0) => monitor top left, not window top/left
2020-05-04 02:17:15 -07:00
Bartosz Taudul
1d74c400ac Update manual. 2020-05-03 21:29:18 +02:00
Bartosz Taudul
f64e588094 Update NEWS. 2020-05-03 21:20:37 +02:00
Bartosz Taudul
d896e51c5d Save/load discovered clients filters. 2020-05-03 21:19:40 +02:00
Bartosz Taudul
9c56626bdb Implement filtering of discovered clients. 2020-05-03 21:10:25 +02:00
Bartosz Taudul
74e55f584c Small toggle button. 2020-05-03 20:49:21 +02:00
Bartosz Taudul
bb39913339 Cosmetics. 2020-05-03 20:49:13 +02:00
Bartosz Taudul
d99129f0e4 Move ToggleButton() out of TracyView.cpp. 2020-05-03 20:40:53 +02:00
Bartosz Taudul
5ad1023088 Update manual. 2020-05-03 16:01:35 +02:00
Bartosz Taudul
bdd6dc625e Update NEWS. 2020-05-03 14:39:04 +02:00
Bartosz Taudul
88b2b04b0d Display running threads in CPU usage tooltip. 2020-05-03 14:34:22 +02:00
Bartosz Taudul
fb801fa484 Extract time value at mouse cursor to a variable. 2020-05-03 14:34:05 +02:00
Bartosz Taudul
9d94cdbb52 Parametrize color highlighting. 2020-05-03 14:33:47 +02:00
Bartosz Taudul
3aed0ba150 Unformatted colored text printing with uint32 color. 2020-05-03 14:33:11 +02:00
Bartosz Taudul
b69bf49082 Separate thread context data getter. 2020-05-03 14:21:27 +02:00
Bartosz Taudul
ad46f981f9 Update NEWS. 2020-05-03 13:55:09 +02:00
Bartosz Taudul
222d3d661e Change "go to frame" window to popup. 2020-05-03 13:54:37 +02:00
Bartosz Taudul
d4e490f47e Keep frames graph at the top of the window (not in dock space). 2020-05-03 13:30:05 +02:00
Bartosz Taudul
bb043f96ee Use internal NoTabBar flag for central node. 2020-05-03 13:29:51 +02:00
Bartosz Taudul
e3acb635a3 Tune up work area window padding. 2020-05-03 13:27:58 +02:00
Bartosz Taudul
32aa23822b Disable docking in central node. 2020-05-03 13:23:48 +02:00
Bartosz Taudul
3350a78cd8 Put work area in center docking node. 2020-05-03 13:23:26 +02:00
Bartosz Taudul
73fda0b188 Disable docking in the main window. 2020-05-03 13:21:34 +02:00
Bartosz Taudul
476b809c4d Enable docking flag. 2020-05-03 13:21:04 +02:00
Bartosz Taudul
c5f1128209 Switch imgui to docking branch. 2020-05-03 13:20:50 +02:00
Bartosz Taudul
aa7851d1d7 Merge pull request #12 from ikrima/master
Fit & Finish: Build sanitization & Static analysis warning fixes
2020-05-03 00:07:28 +02:00
ikrima
707117c04f Build sanitization & Static analysis warning fixes
- Wrapping FORCEINLINE & WIN32_LEAN_AND_MEAN definess with ifndef bc other libraries may define it and trigger redefinition warning
- Possibly contentious given tone in the manual (:P) but removing variable shadowing in TracySysTrace.cpp
  - Alternate Solution: Add #define TRACY_FORCE_SILENT_WARNINGS toggle-able flag. If flag is enabled, push/pop warning disables that have to be included in client code
2020-05-02 14:52:57 -07:00
ikrima
979f41efd1 Add -disableMetrics to vcpkg script 2020-05-02 14:52:57 -07:00
Bartosz Taudul
e7e3d1105c Register usage is only available in capstone 4.x. 2020-05-02 03:09:16 +02:00
Bartosz Taudul
47ed3d01af Update manual. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
2e75990b6c Fix wrong indent. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
e132849fe2 Don't highlight lines with no dependencies. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
a7ccb3e811 Update NEWS. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
95c9259193 Draw register dependency decorations on scroll bar. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
50a5cce985 Reduce search range. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
0fdb5e6592 Calculate register dependency data. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
93f255e95b Invalidate asm line selection when disassembly is performed. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
654dc2b901 Detect conditional jumps. 2020-05-02 02:34:15 +02:00
Bartosz Taudul
4390aa1015 Print register data in asm lines. 2020-05-01 22:39:54 +02:00
Bartosz Taudul
f4b06ed1fc Register line selection. 2020-05-01 20:35:09 +02:00
Bartosz Taudul
8b2b2f650f Add space for register data in each asm line. 2020-05-01 20:35:09 +02:00
Bartosz Taudul
bb4b08e8cf Don't display operands, if none. 2020-05-01 16:15:33 +02:00
Bartosz Taudul
47b8f052bd Include flags register. 2020-05-01 16:14:27 +02:00
Bartosz Taudul
611bfe49df Display list of read and write registers. 2020-05-01 13:20:19 +02:00
Bartosz Taudul
8014fce6e1 Store list of read and write registers for each asm instruction. 2020-05-01 13:01:47 +02:00
Bartosz Taudul
38116b88a5 Create x86 common register mapping table. 2020-05-01 13:01:47 +02:00
Bartosz Taudul
b74caae685 Handle ending a zone twice. 2020-04-30 19:05:13 +02:00
Bartosz Taudul
a40ba8f4e9 Switch query queue icon to satellite dish. 2020-04-30 18:41:08 +02:00
Bartosz Taudul
035bb2236d Always preserve order of queries. 2020-04-30 02:25:25 +02:00
Bartosz Taudul
8fa0a4dc9e Update mbps data block after terminating connection. 2020-04-29 02:36:38 +02:00
Bartosz Taudul
4634c5cdd3 Update manual. 2020-04-27 19:21:32 +02:00
Bartosz Taudul
2175fa6701 Shorten labels. 2020-04-27 19:21:32 +02:00
Bartosz Taudul
83d6566020 Optional visualization of uarch latency. 2020-04-27 19:21:32 +02:00
Bartosz Taudul
d6e633edd0 Fix typo. 2020-04-27 16:10:43 +02:00
Bartosz Taudul
adc60bf394 Separate uarch data retrieval from tooltip display. 2020-04-27 15:31:32 +02:00
Bartosz Taudul
abd00e28b8 Hackish support for LEA variants. 2020-04-27 00:59:49 +02:00
Bartosz Taudul
70605fc8ed Workaround issues with operand width mismatch. 2020-04-27 00:59:49 +02:00
Bartosz Taudul
5da60b53d0 Add micro architecture tooltips. 2020-04-27 00:59:49 +02:00
Bartosz Taudul
3f00f3f605 Add 'less-than of equal to' character to font. 2020-04-27 00:59:49 +02:00
Bartosz Taudul
800f740fd5 Add micro architecture data. 2020-04-27 00:59:49 +02:00
Bartosz Taudul
9488ee0f9e Add micro architecture data processing utility. 2020-04-27 00:59:49 +02:00
Bartosz Taudul
f43755625c Add uarch selection UI. 2020-04-26 15:00:40 +02:00
Bartosz Taudul
6266d482ae Be explicit about displaying machine code. 2020-04-26 14:51:58 +02:00
Bartosz Taudul
dba594a857 Store CPU architecture. 2020-04-26 14:23:16 +02:00
Bartosz Taudul
5ae2c415b7 Draw a line indicating zeroth column of source code. 2020-04-25 13:52:21 +02:00
Bartosz Taudul
6b831173e4 Don't display asm counts if no asm available. 2020-04-25 13:51:58 +02:00
Bartosz Taudul
c2d84fa288 Mute selectable colors. 2020-04-25 13:22:45 +02:00
Bartosz Taudul
78a56640c3 Open connection popup when a connection is established. 2020-04-25 13:14:27 +02:00
Bartosz Taudul
368caddd00 Separate coloring for types and special values. 2020-04-25 01:01:10 +02:00
Bartosz Taudul
51659ed123 Enable syntax highlighting. 2020-04-25 00:21:15 +02:00
Bartosz Taudul
21506386c4 Allow specification of end address in TextColoredUnformatted. 2020-04-25 00:21:15 +02:00
Bartosz Taudul
3e583b1373 Add C++ tokenizer. 2020-04-25 00:21:15 +02:00
Bartosz Taudul
c87c464f23 Use proper symbol address. 2020-04-24 16:16:53 +02:00
Bartosz Taudul
747f26ef74 Display used CPUs as range, if possible. 2020-04-24 02:02:16 +02:00
Bartosz Taudul
170aeea864 Remove CPU topology tooltip from zone info window. 2020-04-24 01:44:25 +02:00
Bartosz Taudul
14ec246659 Fix typo. 2020-04-24 00:55:57 +02:00
Bartosz Taudul
9a77a59cb2 Display sample percentage columns only if there's data. 2020-04-24 00:49:38 +02:00
Bartosz Taudul
0ff87d8f40 Slight rewording, local paths. 2020-04-22 01:23:03 +02:00
Bartosz Taudul
ffb1f7d465 Merge pull request #9 from nosferalatu/vcpkg_dependencies
Local vcpkg installation
2020-04-22 01:14:24 +02:00
David Farrell
d5cd9d0221 Updated manual's vcpkg instructions 2020-04-21 16:04:33 -07:00
David Farrell
80fdf7517a Use debug vcpkg libraries 2020-04-21 15:17:11 -07:00
David Farrell
09e8ba1208 Updated manual with instructions for install_vcpkg_dependencies.bat 2020-04-21 10:59:37 -07:00
David Farrell
a4a20ddc42 Updated Visual Studio project files to use vcpkg directory for dependencies
This modifies all of the include and lib paths to point to vcpkg/vcpkg/installed/x64-windows-static/include and lib.

With these changes, all executables in Tracy build out of the box in all configurations (assuming you have run the install_vcpkg_depencies.bat script first).

Perhaps it would be better to use a single Visual Studio .props file that all of the .vcxproj files point to so that the include and lib paths are set in a single place, but for now the paths are set separately in each .vcxproj.
2020-04-21 10:59:08 -07:00
David Farrell
b05c2f5327 Added install_vcpkg_dependencies.bat to set up dependencies
This script will build both vcpkg and the dependencies needed by Tracy. It puts everything in the vcpkg/vcpkg directory, and changes no other state on the machine. It is perfectly safe to erase the vcpkg/vcpkg directory and re-run the script to build the dependencies again, although that should only be needed once.

To add new vcpkg dependencies, just modify the 'vcpkg install ...' line in the .bat file.
2020-04-21 10:52:25 -07:00
Bartosz Taudul
865593146a Fix skipping symbol code. 2020-04-19 23:34:34 +02:00
Bartosz Taudul
94276c51ac Update manual. 2020-04-19 16:11:31 +02:00
Bartosz Taudul
e48095062b Allow displaying machine code bytes in disassembly. 2020-04-19 16:07:24 +02:00
Bartosz Taudul
afb9bdce86 Store instruction lengths. 2020-04-19 16:07:24 +02:00
Bartosz Taudul
421f0895b7 Filter invalid jumps. 2020-04-19 16:07:24 +02:00
Bartosz Taudul
ea00efa857 Display disassembly failure notification. 2020-04-19 16:07:24 +02:00
Bartosz Taudul
b157d4c161 Detect disassembly failures. 2020-04-19 14:40:36 +02:00
Bartosz Taudul
c78e11872c Fix jump arrow mouse hover detection. 2020-04-19 14:28:59 +02:00
Bartosz Taudul
1f3b6d01ab Cosmetics. 2020-04-19 14:10:26 +02:00
Bartosz Taudul
0186586fd9 Update manual. 2020-04-18 14:49:14 +02:00
Bartosz Taudul
91ad77d86a Save/load source substitutions. 2020-04-18 14:25:04 +02:00
Bartosz Taudul
ff4b4fd9d9 Update NEWS. 2020-04-17 19:28:39 +02:00
Bartosz Taudul
7a6bc6f554 Substitute source file names in source view. 2020-04-17 19:28:39 +02:00
Bartosz Taudul
01d7fefe52 Perform source file name substitution. 2020-04-17 19:28:39 +02:00
Bartosz Taudul
47cfb4ae35 Expose source substitution interface. 2020-04-17 19:28:39 +02:00
Bartosz Taudul
5f22e35c26 Add UI for source location substitutions. 2020-04-17 19:28:39 +02:00
Bartosz Taudul
b937ad101f Fix handling of ImGui ID stack. 2020-04-17 19:28:39 +02:00
Bartosz Taudul
c79c052528 Display percentage numbers of sample composition times. 2020-04-17 19:28:38 +02:00
Bartosz Taudul
a5bff2f7e5 Sleep to force rescheduling main thread during init.
This fixes problems with first context switch data region possibly not being
available for the main thread, if no rescheduling was performed after sys
tracing has started.
2020-04-14 22:45:32 +02:00
Bartosz Taudul
db9557fc84 Use separate texture compression context for saving traces. 2020-04-14 20:07:30 +02:00
Bartosz Taudul
c2dd3913d7 Cleanup context switch data. 2020-04-14 02:34:28 +02:00
Bartosz Taudul
9fc76990e1 Copy proper amount of memory. 2020-04-14 02:22:48 +02:00
Bartosz Taudul
366153a94f No signed left shifts. 2020-04-14 02:22:48 +02:00
Bartosz Taudul
c54dc10464 Cleanup zone children vectors. 2020-04-14 02:22:47 +02:00
Bartosz Taudul
55f582faaf Use correct print format specifier. 2020-04-14 02:22:47 +02:00
Bartosz Taudul
dd0fb49098 Fix typo. 2020-04-14 02:22:47 +02:00
Bartosz Taudul
b0a58d4664 Don't shift left negative values. 2020-04-14 02:22:47 +02:00
Bartosz Taudul
5db956f546 Update manual. 2020-04-13 21:44:45 +02:00
Bartosz Taudul
f98dfd47fc Update NEWS. 2020-04-13 21:41:52 +02:00
Bartosz Taudul
3b85c51e5f Search for free listen port, if default is occupied. 2020-04-13 21:40:52 +02:00
Bartosz Taudul
1f3dc927c0 Close socket when listening fails. 2020-04-13 21:40:35 +02:00
Bartosz Taudul
5437976e65 Cosmetics. 2020-04-13 21:39:51 +02:00
Bartosz Taudul
0508586108 Update manual. 2020-04-13 17:52:43 +02:00
Bartosz Taudul
5233f8d4ad Mark source lines which generated assembly. 2020-04-13 17:47:41 +02:00
Bartosz Taudul
c43f5e14f2 Update manual. 2020-04-13 15:09:34 +02:00
Bartosz Taudul
2d25e969e9 Fix time span indicators visual jitter. 2020-04-13 15:00:54 +02:00
Bartosz Taudul
a2c4f8c2d1 Prominently expose profiler memory usage. 2020-04-13 14:41:05 +02:00
Bartosz Taudul
b389ccbb38 Issue just one read call when handling server queries. 2020-04-13 14:32:31 +02:00
Bartosz Taudul
1bbece649f Implement socket read without exit check. 2020-04-13 14:22:58 +02:00
Bartosz Taudul
e4ec666479 Don't use std::function in sockets. 2020-04-13 14:14:36 +02:00
Bartosz Taudul
a2187565d1 Optimize non-native-size memcpy. 2020-04-13 13:45:21 +02:00
Bartosz Taudul
aa8b84aa6c Update to ImGui 1.76. 2020-04-13 00:01:53 +02:00
Bartosz Taudul
b8647f968a Don't animate threads on first frame. 2020-04-12 23:41:18 +02:00
Bartosz Taudul
a074d18dfa Don't display source files, if none available. 2020-04-12 23:26:02 +02:00
Bartosz Taudul
5fd5091efd Fix handling of unknown symbols. 2020-04-12 23:18:38 +02:00
Bartosz Taudul
fd027c65e7 Remove -fomit-frame-pointer. 2020-04-12 21:55:47 +02:00
Bartosz Taudul
3398c969ac Disable scrollbar for source view window. 2020-04-12 17:11:51 +02:00
Bartosz Taudul
ef56c7fa7c Display source files time composition in selected function/symbol. 2020-04-12 17:08:58 +02:00
Bartosz Taudul
078014826b Fix detection of hovering over source lines. 2020-04-12 16:21:03 +02:00
Bartosz Taudul
0794cf56ff Sort inline functions list by time spent in function. 2020-04-12 16:13:39 +02:00
Bartosz Taudul
a0f7cb41c3 Merge building inline symbol list with stats collection. 2020-04-12 16:11:24 +02:00
Bartosz Taudul
de18dd46b6 Don't build inline symbols list, if not needed. 2020-04-12 16:05:49 +02:00
Bartosz Taudul
633902cce5 Display inline functions time composition in symbol. 2020-04-12 16:05:01 +02:00
Bartosz Taudul
58cf97ef5d Display wall time in addition to sample counts. 2020-04-11 22:14:56 +02:00
Bartosz Taudul
3ef76df0d1 Update manual. 2020-04-11 20:30:24 +02:00
Bartosz Taudul
c4bddf59e2 Allow access to sampling data before instrumentation is ready. 2020-04-11 18:21:46 +02:00
Bartosz Taudul
6c76c8098b Draw hotness markers next to sample percentage counts. 2020-04-11 01:59:15 +02:00
Bartosz Taudul
0a8287c72d Link to manual at github releases page. 2020-04-11 01:41:03 +02:00
Bartosz Taudul
2c11418d33 Calculate max sample counts during ip map creation. 2020-04-11 01:34:44 +02:00
Bartosz Taudul
f9cf6df3ad Core package is implicit in vcpkg. 2020-04-11 01:25:16 +02:00
Bartosz Taudul
5bc01124c2 Draw jump range and target on scroll bar, when highlighted. 2020-04-10 23:31:25 +02:00
Bartosz Taudul
ca66dc9ba0 More code deduplication. 2020-04-10 23:13:51 +02:00
Bartosz Taudul
ac37898331 Go to white-hot color for ip count over max. 2020-04-10 23:10:26 +02:00
Bartosz Taudul
126a587aa3 Less code duplication. 2020-04-10 23:07:52 +02:00
Bartosz Taudul
61828070c5 Display tooltip for sample percentage. 2020-04-10 23:03:47 +02:00
Bartosz Taudul
f8231bb109 Change main repository to github. 2020-04-10 17:48:59 +02:00
Bartosz Taudul
895e06d778 Draw asm line hotness. 2020-04-10 17:27:57 +02:00
Bartosz Taudul
f6400880b0 Scroll bar decorations for asm lines. 2020-04-10 17:27:57 +02:00
Bartosz Taudul
bcfd32e49f Decorate source scroll bar with line hotness. 2020-04-10 17:27:57 +02:00
Bartosz Taudul
e51844eba3 Decorate source scroll bar with selected and highlighted line. 2020-04-10 16:56:56 +02:00
Bartosz Taudul
ff9fea0abd Cherry-pick f7852fa from imgui. 2020-04-10 15:34:18 +02:00
Bartosz Taudul
a6c0ac4273 Tighten assembly counts in source view. 2020-04-10 00:40:31 +02:00
Bartosz Taudul
14ec7ea6cd Tighten line numbers in source view. 2020-04-10 00:37:46 +02:00
Bartosz Taudul
497e73182a Ditto for source navigation. 2020-04-09 23:28:51 +02:00
Bartosz Taudul
072fed288a Use left and right mouse buttons for asm navigation.
Left click on source line just selects the line and the corresponding
asm lines.
Right click does the above and focuses asm view on the first selected
line.
2020-04-09 23:12:17 +02:00
Bartosz Taudul
47d56f6259 Proper scaling of instruction pointer counts. 2020-04-09 22:52:44 +02:00
Bartosz Taudul
f0c7a751c1 Context-sensitive auto-selection of stats mode in source view. 2020-04-09 22:37:49 +02:00
Bartosz Taudul
9d2c03bc5b Allow showing sample data for whole symbol. 2020-04-09 22:23:57 +02:00
Bartosz Taudul
a2385a8b24 Use correct address range. 2020-04-09 22:21:21 +02:00
Bartosz Taudul
0e1c9e2cd1 Highlight source line corresponding to hovered asm line. 2020-04-09 22:02:06 +02:00
Bartosz Taudul
0791871955 Highlight asm lines for hovered source line. 2020-04-09 21:57:28 +02:00
Bartosz Taudul
1e965edb54 Don't separate inlines by default. 2020-04-09 19:44:42 +02:00
Bartosz Taudul
a339d397ce Don't select addresses outside symbol. 2020-04-09 14:10:07 +02:00
Leander Beernaert
981f1e2e32 Merged in AngryPixelShader/tracy/listener-ipv4-fallback (pull request #46)
ListenSocket: Fallback to ipv4
2020-04-09 10:34:52 +00:00
Leander Beernaert
ac9f12a5f6 Review fixes
Update bracket style.

Remove erroneous else block.
2020-04-09 12:32:35 +02:00
Leander Beernaert
010376518f Fix incorrect handling of ipv4 case 2020-04-09 12:27:54 +02:00
Leander Beernaert
4201ebb28d ListenSocket: Fallback to ipv4
If we can't create a listener socket with the ipv6 protocol, try to
create one with the ipv4 protocol instead. This fixes the ListenSocket
on machines where ipv6 is not available or has been completely disabled.

This patch also exists ListenSocket::Listen() early if we fail to create
the socket.
2020-04-09 12:23:51 +02:00
Bartosz Taudul
241f59b59f Sprinkle some icons. 2020-04-09 02:33:02 +02:00
Bartosz Taudul
554366ad9f Search for address within current symbol. 2020-04-09 02:12:49 +02:00
Bartosz Taudul
d2ebc58be3 Set sensible combo box heights. 2020-04-09 02:09:54 +02:00
Bartosz Taudul
0f42dc2e4c Fix source-less sample count calculations. 2020-04-09 02:04:22 +02:00
Bartosz Taudul
3177865fc2 Follow jump by clicking on jump arrows. 2020-04-09 02:02:06 +02:00
Bartosz Taudul
a715df6338 Tighten assembly source location display. 2020-04-09 01:52:22 +02:00
Bartosz Taudul
6dd765c101 Tighten mnemonic display. 2020-04-09 01:45:38 +02:00
Bartosz Taudul
bae08c27c8 Tighter assembly address display. 2020-04-09 01:31:27 +02:00
Bartosz Taudul
436bd6b9ff Smaller fixed font size. 2020-04-09 01:15:06 +02:00
Bartosz Taudul
643c0867ed Add jump arrows tooltip. 2020-04-09 01:09:57 +02:00
Bartosz Taudul
2cd789662b Handle source-less asm lines selection. 2020-04-08 23:59:21 +02:00
Bartosz Taudul
08c58fe8e3 Separate asm lines selection. 2020-04-08 23:59:10 +02:00
Bartosz Taudul
25346c7a55 Disable movement in source view sub-children. 2020-04-08 23:32:36 +02:00
Bartosz Taudul
450229f5e4 Only change assembly target line when necessary. 2020-04-08 23:30:42 +02:00
Bartosz Taudul
3a1f980a36 Prevent opening obsolete source files. 2020-04-08 23:07:59 +02:00
Bartosz Taudul
3e2260bdcb Add color boxes to file selection. 2020-04-08 23:06:38 +02:00
Bartosz Taudul
d300d17f9e Match source and assembly selection. 2020-04-08 22:57:42 +02:00
Bartosz Taudul
c3abcd9dc1 Update NEWS. 2020-04-08 22:33:12 +02:00
Bartosz Taudul
bb338a1c97 Symbol file selector. 2020-04-08 22:25:36 +02:00
Bartosz Taudul
a1bad4b7be Build list of symbol source files. 2020-04-08 22:18:00 +02:00
Bartosz Taudul
0551cd8e44 Switching between source files from asm view. 2020-04-08 22:10:58 +02:00
Bartosz Taudul
3f01d3bcb1 Selection of inlined function within symbol. 2020-04-08 22:04:33 +02:00
Bartosz Taudul
006919ec55 Mixed source/assembly symbol view. 2020-04-08 22:04:00 +02:00
Bartosz Taudul
5f4145ef0a Update manual. 2020-04-08 19:33:55 +02:00
Bartosz Taudul
b353e8752d Display inlined function stats within symbols. 2020-04-08 19:09:47 +02:00
Bartosz Taudul
dc236a27cc Warn about retrieving context address. 2020-04-08 18:00:31 +02:00
Bartosz Taudul
b05625d444 Display count of inlined functions in symbols. 2020-04-08 17:17:45 +02:00
Bartosz Taudul
51d5ef5b4e Allow merging inlined function stats into base symbol. 2020-04-08 17:12:15 +02:00
Bartosz Taudul
02e1a7669c Add offset-less GetSymbolForAddress(). 2020-04-08 16:55:49 +02:00
Bartosz Taudul
a34cfacb5c Stabilize symbol sorting. 2020-04-08 15:55:34 +02:00
Bartosz Taudul
fe98921e4c Add UI for disabling inlines in sample statistics. 2020-04-08 15:52:53 +02:00
Bartosz Taudul
f59f4f266e Add inline symbols list accessor. 2020-04-08 15:34:14 +02:00
Bartosz Taudul
2a06f1545b Store count of proper and inline symbols in trace dump. 2020-04-08 12:52:06 +02:00
Bartosz Taudul
1da1d31e1c Store list of inline symbols. 2020-04-08 12:44:12 +02:00
Bartosz Taudul
a7fffe7e13 Separate opening source and symbol views. 2020-04-08 02:12:09 +02:00
Bartosz Taudul
09cf160088 Split source and symbol views in SourceView. 2020-04-08 01:58:23 +02:00
Bartosz Taudul
1c0ec60b23 Don't try to display assembly line counts if no symbol is selected. 2020-04-08 01:48:22 +02:00
Bartosz Taudul
6d08f0a196 Update NEWS. 2020-04-07 22:17:50 +02:00
Bartosz Taudul
2e418f24fa Update manual. 2020-04-07 22:17:24 +02:00
Bartosz Taudul
b69aaf04e9 Add support for QPC timer. 2020-04-07 22:01:31 +02:00
Bartosz Taudul
34b512d04b Don't declare unused variables on cygwin. 2020-04-07 21:41:12 +02:00
Bartosz Taudul
8d9a611874 Get rid of unicode ifdefs. 2020-04-07 21:35:37 +02:00
Bartosz Taudul
69c5e667ae Dynamically load Get/SetThreadDescription. 2020-04-07 21:33:03 +02:00
Bartosz Taudul
fc11537e12 Update manual. 2020-04-07 02:12:46 +02:00
Bartosz Taudul
54870e128c Add cautionary information to the manual. 2020-04-06 11:17:24 +02:00
Carsten Juttner
7d47e78025 Merged in carjay/tracy/client_selection_fix (pull request #45)
Unable to select clients on same system that only differ in their port number
2020-04-05 15:16:41 +00:00
Bartosz Taudul
7fca642c3d Compress full-quality DXT1 on AVX2 path. 2020-04-05 17:10:43 +02:00
Bartosz Taudul
b7f32c2a4c Update DXT1 benchmark with Ryzen timings.
Measured at the most commonly reached frequency. Data for peak at max
achieved frequency:
Reference: 173.2 us
SSE: 22.2 us
AVX2: 13.4 us
2020-04-05 17:10:43 +02:00
Bartosz Taudul
29dfb151cb Add Ryzen execution times example. 2020-04-05 16:34:50 +02:00
Carsten Juttner
fbc7a1e452 Since the name is just the IP address it is not unique in case the difference is only in the port number.
So make it unique for selection to work not just for the first element in this case.
2020-04-05 16:19:43 +02:00
Bartosz Taudul
b91c88cdf6 Remove misleading example. 2020-04-05 16:02:22 +02:00
Bartosz Taudul
2ad3f9b51f Index data is already available. 2020-04-05 15:09:13 +02:00
Bartosz Taudul
f1f4f48c38 Perform rate distortion optimization on frame images. 2020-04-05 15:04:18 +02:00
Bartosz Taudul
b19d5731ac Move DXT1 index fixup to texture compression class. 2020-04-05 14:05:43 +02:00
Bartosz Taudul
a6468b6b6e Sleep when clearing queues if listen port is occupied. 2020-04-04 21:08:13 +02:00
Bartosz Taudul
eba427cc5b Update manual. 2020-04-04 14:50:34 +02:00
Bartosz Taudul
6d435e08c8 Handle nullptr in FindStringIdx. 2020-04-04 14:42:00 +02:00
Bartosz Taudul
38bfa7bdb6 Include return instructions in jump out list. 2020-04-04 14:30:48 +02:00
Bartosz Taudul
8c260c9d12 Draw jump outs from symbols. 2020-04-04 14:30:47 +02:00
Bartosz Taudul
268af5b67c Adapt to DPI scaling. 2020-04-04 14:30:47 +02:00
Bartosz Taudul
78a0773f38 Collect jumps outside symbol. 2020-04-04 13:51:55 +02:00
Bartosz Taudul
e02e595eec Align jump arrows to pixels. 2020-04-04 13:42:19 +02:00
Bartosz Taudul
aae161e31a Draw jumps in assembly view. 2020-04-04 03:41:10 +02:00
Bartosz Taudul
bda5c1d13e Ignore out-of-symbol jumps. 2020-04-04 03:13:21 +02:00
Bartosz Taudul
f2b044438d Don't print empty text, just advance cursor. 2020-04-04 02:45:01 +02:00
Bartosz Taudul
3711a66592 Calculate jump table. 2020-04-04 02:25:12 +02:00
Bartosz Taudul
27c125f23b Update manual. 2020-04-03 02:00:20 +02:00
Bartosz Taudul
3f236b7e91 Handle ^C in capture utility on windows. 2020-04-03 02:00:07 +02:00
Bartosz Taudul
a11b52d94f Update NEWS. 2020-04-03 01:46:35 +02:00
Bartosz Taudul
9f15d402de The capture utility will connect to localhost by default. 2020-04-03 01:46:07 +02:00
Bartosz Taudul
b016d9e295 Going from source location to assembly line. 2020-04-02 13:10:18 +02:00
Bartosz Taudul
430aa5564d Add missing vcpkg triplets. 2020-04-02 12:35:42 +02:00
Bartosz Taudul
252e02ba2e One more place where server queue is handled. 2020-04-02 12:15:50 +02:00
Bartosz Taudul
700f189921 Wait for server query queue to drain before terminating. 2020-04-02 12:15:04 +02:00
Bartosz Taudul
562e675a0e Save/load code location data. 2020-04-02 12:12:10 +02:00
Bartosz Taudul
50d9932378 Display number of assembly instructions for each line. 2020-04-02 02:35:41 +02:00
Bartosz Taudul
2b8cf5d132 Add addressess accessor. 2020-04-02 02:17:22 +02:00
Bartosz Taudul
22e9135ab1 Retrieve file name string idx in source view. 2020-04-02 02:15:10 +02:00
Bartosz Taudul
16686739f6 Rebuild string map on trace load. 2020-04-02 02:15:00 +02:00
Bartosz Taudul
e7f4f58886 StringIdx search from string. 2020-04-02 02:08:00 +02:00
Bartosz Taudul
387fdb30b0 Map source location to assembly instructions. 2020-04-02 02:01:58 +02:00
Bartosz Taudul
9f0a6b8231 Make pdqsort always available. 2020-04-02 02:00:21 +02:00
Bartosz Taudul
5dd70c4306 Update manual. 2020-04-02 01:54:59 +02:00
Bartosz Taudul
d3c278cb02 Make source location display optional. 2020-04-02 01:37:56 +02:00
Bartosz Taudul
39cb9f4a32 Display source locations in assembly view. 2020-04-02 01:32:23 +02:00
Bartosz Taudul
b2c2bfc2aa Move HSV color conversion to a separate source file. 2020-04-02 01:04:59 +02:00
Bartosz Taudul
2303f18d39 Add guards to source view header. 2020-04-02 01:02:42 +02:00
Bartosz Taudul
59a49f0698 Jump from assembly to source line. 2020-04-02 00:53:27 +02:00
Bartosz Taudul
fe2dc8c489 Update NEWS. 2020-04-02 00:40:57 +02:00
Bartosz Taudul
327e30fe7c Add source file location tooltip for assembly instructions.
This has bad UX, better solution is needed.
2020-04-02 00:39:17 +02:00
Bartosz Taudul
d204742bb8 Accessor for getting source file, line from address. 2020-04-02 00:31:53 +02:00
Bartosz Taudul
2dd5912fee Cosmetics. 2020-04-02 00:31:41 +02:00
Bartosz Taudul
6392e4d38d Display number of received code locations. 2020-04-02 00:00:46 +02:00
Bartosz Taudul
c8d1f4d3d6 Add code locations size accessor. 2020-04-01 22:38:47 +02:00
Bartosz Taudul
0ec89e9aae Store code address -> source file+line mapping. 2020-04-01 22:37:19 +02:00
Bartosz Taudul
b2a8b53efa Query source location of each assembly instruction. 2020-04-01 21:43:03 +02:00
Bartosz Taudul
0ba0125eb5 Cosmetics. 2020-04-01 21:42:14 +02:00
Bartosz Taudul
a8e8a4a167 Add code address to function, line decoder. 2020-04-01 21:41:33 +02:00
Bartosz Taudul
9e8089ec1a Improve handling of query queue. 2020-04-01 21:05:25 +02:00
Bartosz Taudul
b6ce693ede Include inline parents when filtering sample statistics. 2020-04-01 13:24:32 +02:00
Bartosz Taudul
0f2095f84a Add missing break. 2020-04-01 13:10:26 +02:00
Bartosz Taudul
57779a8ed9 Cosmetics. 2020-03-31 20:44:44 +02:00
Bartosz Taudul
d08f3a6ea0 Check for samples, not ghost zones. 2020-03-31 02:20:34 +02:00
Bartosz Taudul
b7e56d3030 Update manual. 2020-03-31 01:10:39 +02:00
Bartosz Taudul
b957087456 Add "smart" symbol location to sampled statistics. 2020-03-30 23:58:42 +02:00
Bartosz Taudul
eb48d24182 Store context switch data for threads with ghost zones. 2020-03-30 23:41:21 +02:00
Bartosz Taudul
0ad24f6485 Display graphical representation of line percentage. 2020-03-30 22:49:06 +02:00
Bartosz Taudul
44096dfcf2 Move DrawTextContrast() to TracyImGui.hpp. 2020-03-30 22:39:34 +02:00
Bartosz Taudul
c1ed44bd35 Common percentage printing function. 2020-03-30 22:26:45 +02:00
Bartosz Taudul
11aedf2b27 Proper processing of symbol locations in live capture. 2020-03-30 17:10:59 +02:00
Bartosz Taudul
6fe5d0575f Add parent symbol for inlined symbols in sampled statistics. 2020-03-30 02:50:34 +02:00
Bartosz Taudul
17a5faa5e0 Display parent symbol for inline symbols in source view. 2020-03-30 02:46:29 +02:00
Bartosz Taudul
f62df6125f Update NEWS. 2020-03-29 23:02:20 +02:00
Bartosz Taudul
30771bf7cb Gather failure data before terminating connection. 2020-03-29 23:01:57 +02:00
Bartosz Taudul
02cf838ceb Update manual. 2020-03-29 14:12:59 +02:00
Bartosz Taudul
ddfd755ddc Add sponsor button to welcome dialog. 2020-03-29 14:06:54 +02:00
Bartosz Taudul
99b09290b7 Add funding button to README. 2020-03-29 13:57:04 +02:00
Bartosz Taudul
bb978b0eff Add funding metadata. 2020-03-29 13:18:08 +02:00
Bartosz Taudul
36ddd0b98b Don't use new to allocate memory on the client. 2020-03-28 21:27:19 +01:00
Bartosz Taudul
48e4d33bea Support call stacks longer than 255 entries. 2020-03-28 18:04:33 +01:00
Bartosz Taudul
9b8eb69886 Apparently sampled call stacks may be empty. 2020-03-28 16:09:44 +01:00
Bartosz Taudul
d43461584a Don't jump out to symbols without source and code. 2020-03-28 15:11:23 +01:00
Bartosz Taudul
2e1aa844fe Don't try to open invalid files. 2020-03-28 15:06:36 +01:00
Bartosz Taudul
c23f4b2390 Update manual. 2020-03-28 14:42:47 +01:00
Bartosz Taudul
78eb774822 Assembly addresses can be displayed relative to symbol. 2020-03-28 14:42:47 +01:00
Bartosz Taudul
9837e06816 Implement cross-symbol jumping. 2020-03-28 14:27:29 +01:00
Bartosz Taudul
013bb5a4f2 Use generic group categories. 2020-03-28 14:00:21 +01:00
Bartosz Taudul
078109eca2 Update manual. 2020-03-28 01:36:51 +01:00
Bartosz Taudul
86aad15e0c Display jump/call target address. 2020-03-28 01:36:51 +01:00
Bartosz Taudul
28db0f6227 Wait for data to be ready. 2020-03-28 01:17:35 +01:00
Bartosz Taudul
8dba099a56 Revert "Base address is not needed."
This reverts commit 058369bc7a.
2020-03-28 00:57:41 +01:00
Bartosz Taudul
22cae56ab1 Decode jump/call addresses. 2020-03-28 00:53:48 +01:00
Bartosz Taudul
fd3b9ca1e5 No need to format string. 2020-03-27 23:59:09 +01:00
Bartosz Taudul
17009b315f Remove unused variable. 2020-03-27 23:59:03 +01:00
Bartosz Taudul
86ca85f39d Initialize variable. 2020-03-27 23:58:49 +01:00
Bartosz Taudul
058369bc7a Base address is not needed. 2020-03-27 23:58:25 +01:00
Bartosz Taudul
5249eb4428 Fix init order. 2020-03-27 23:57:23 +01:00
Bartosz Taudul
5675044443 Display image name, if source file is unknown. 2020-03-27 21:54:40 +01:00
Bartosz Taudul
d065d28244 Allow assembly view in all calls to SetTextEditorFile. 2020-03-27 21:46:57 +01:00
Bartosz Taudul
78ea40d70c Check for asm/source availability in SetTextEditorFile. 2020-03-27 21:46:57 +01:00
Bartosz Taudul
efbf13fcd4 Don't store inlined symbols locations. 2020-03-27 21:16:23 +01:00
Bartosz Taudul
45b8622bc9 Search for base address when if symbol address is inlined. 2020-03-27 21:04:23 +01:00
Bartosz Taudul
4b78559228 Update manual. 2020-03-27 18:02:05 +01:00
Bartosz Taudul
31a1517d2f Display disassembly of inlined symbols. 2020-03-27 17:59:41 +01:00
Bartosz Taudul
720ed0460b Differentiate between symbol and base address. 2020-03-27 17:59:16 +01:00
Bartosz Taudul
992f4c8c2d Implement search for symbol base from address. 2020-03-27 17:39:42 +01:00
Bartosz Taudul
4c381e13e9 Store list of symbol locations. 2020-03-27 17:34:51 +01:00
Bartosz Taudul
52a853b26f Don't show invalid contents warning for disassembly. 2020-03-27 17:14:46 +01:00
Bartosz Taudul
39923e942b Update NEWS. 2020-03-27 02:08:51 +01:00
Bartosz Taudul
27be025805 Update manual. 2020-03-27 02:07:54 +01:00
Bartosz Taudul
a466362938 Use proper function to send terminate query. 2020-03-27 02:02:36 +01:00
Bartosz Taudul
51bae7855d Display number of in-flight queries. 2020-03-27 02:00:26 +01:00
Bartosz Taudul
089681267f Allow viewing assembly without corresponding source code. 2020-03-27 01:47:06 +01:00
Bartosz Taudul
10ca8b5440 Assembly display is not dependant on sample data. 2020-03-27 01:24:50 +01:00
Bartosz Taudul
2a54f2df5d Reverse the fileselector compile option. 2020-03-26 23:11:26 +01:00
Bartosz Taudul
d495431f24 Reverse the root window compile option. 2020-03-26 23:08:29 +01:00
Bartosz Taudul
58bb5d40c5 Remove support for non-extended font builds. 2020-03-26 23:04:44 +01:00
Bartosz Taudul
5ec1bd0f5e Update manual. 2020-03-26 22:49:46 +01:00
Bartosz Taudul
55be253fb2 Update NEWS. 2020-03-26 22:40:07 +01:00
Bartosz Taudul
c098a03d8f Implement listing all symbols. 2020-03-26 22:39:16 +01:00
Bartosz Taudul
e58b9e870e Use explicit data structure to store symbol list data. 2020-03-26 22:39:16 +01:00
Bartosz Taudul
7018caadb9 Add UI for control of displaying all symbols. 2020-03-26 22:39:16 +01:00
Bartosz Taudul
696c351e6a Allow listing symbols, even if no sampling data has been gathered. 2020-03-26 22:39:16 +01:00
Bartosz Taudul
e6b0bfc90d Display "no entries" message in statistics menu, if appropriate. 2020-03-26 22:39:16 +01:00
Bartosz Taudul
ef96ecd9b8 Use shorter descriptions. 2020-03-26 22:09:56 +01:00
Bartosz Taudul
2db117a7ac Disassemble symbols even if source file has not changed. 2020-03-26 02:23:43 +01:00
Bartosz Taudul
3de4283bd2 Display code size in source file view window. 2020-03-26 02:23:09 +01:00
Bartosz Taudul
6a96b5f1dc Use better wording. 2020-03-26 02:18:24 +01:00
Bartosz Taudul
40281ce2a1 Add default no-op to switch. 2020-03-26 01:07:25 +01:00
Bartosz Taudul
db4971e6c0 Also on windows CI. 2020-03-26 00:20:40 +01:00
Bartosz Taudul
c333606600 Install capstone on CI. 2020-03-26 00:20:04 +01:00
Bartosz Taudul
a49638e9a6 Link with capstone on unix. 2020-03-26 00:18:38 +01:00
Bartosz Taudul
6d5bccdd51 Update manual. 2020-03-26 00:18:38 +01:00
Bartosz Taudul
ca8156efdf Update NEWS. 2020-03-26 00:18:38 +01:00
Bartosz Taudul
4f417854e5 Display disassembly. 2020-03-26 00:18:38 +01:00
Bartosz Taudul
b091c0d4a8 Hide unknown symbols by default in sample statistics. 2020-03-25 23:44:48 +01:00
Bartosz Taudul
7ac03be43b Fix braino. 2020-03-25 22:50:13 +01:00
Bartosz Taudul
3e134cdce5 Disassemble symbol code fragments. 2020-03-25 22:37:34 +01:00
Bartosz Taudul
79db7f4457 Add symbol code accessor. 2020-03-25 22:15:22 +01:00
Bartosz Taudul
53d0b91f26 CPU architecture accessor. 2020-03-25 22:12:18 +01:00
Bartosz Taudul
39da6c7c19 Store CPU architecture. 2020-03-25 21:48:24 +01:00
Bartosz Taudul
add5b29d03 Report CPU architecture in welcome message. 2020-03-25 21:28:02 +01:00
Bartosz Taudul
5a3dedea97 Update manual. 2020-03-25 20:59:59 +01:00
Bartosz Taudul
033433b883 Save/load symbol code. 2020-03-25 20:52:59 +01:00
Bartosz Taudul
a05ec0eed2 Update NEWS. 2020-03-25 20:45:58 +01:00
Bartosz Taudul
ce449ac0e2 Notify server that parameter was handled. 2020-03-25 20:37:26 +01:00
Bartosz Taudul
ea507289c6 Add missing query space extensions. 2020-03-25 20:33:50 +01:00
Bartosz Taudul
cda285ceb7 Display notification when queries are backlogged. 2020-03-25 20:25:33 +01:00
Bartosz Taudul
582e46005e Display symbols code size. 2020-03-25 20:08:39 +01:00
Bartosz Taudul
f114ec3f80 Add code transfer from client to server. 2020-03-25 20:04:55 +01:00
Bartosz Taudul
3e0e120222 Add extra parameter to server queries. 2020-03-25 20:04:01 +01:00
Bartosz Taudul
383918bca4 Display symbol size in sampled statistics window. 2020-03-25 18:45:05 +01:00
Bartosz Taudul
bf52883331 Store symbol length in trace dumps. 2020-03-25 18:37:08 +01:00
Bartosz Taudul
c515a53986 Wrapper for reading nine elements at once. 2020-03-25 18:35:48 +01:00
Bartosz Taudul
c999a74d34 Symbol length transfer. 2020-03-25 18:32:36 +01:00
Bartosz Taudul
d47e6819a8 Collect symbol sizes. 2020-03-25 18:28:28 +01:00
Bartosz Taudul
954c43912d Update manual. 2020-03-25 02:18:17 +01:00
Bartosz Taudul
2417f63bf2 Build instruction pointers map when loading trace. 2020-03-25 01:56:13 +01:00
Bartosz Taudul
43ab2540b0 Update NEWS. 2020-03-25 01:14:50 +01:00
Bartosz Taudul
eae664bd1b Display instruction pointer counts in source view. 2020-03-25 01:09:02 +01:00
Bartosz Taudul
c603eaa1b6 Add symbol instruction pointers map accessor. 2020-03-25 01:08:29 +01:00
Bartosz Taudul
4c92a2619f Pass symbol address to source view. 2020-03-25 00:07:31 +01:00
Bartosz Taudul
1999352004 Remove junk. 2020-03-25 00:00:15 +01:00
Bartosz Taudul
4068ab30e8 Build sampled instruction pointers map for symbols. 2020-03-24 23:54:30 +01:00
Bartosz Taudul
f45276a596 Cherry-pick imgui fix @4986dba27. 2020-03-24 18:12:36 +01:00
Bartosz Taudul
a7cedddcef Use clipper to render source view. 2020-03-24 18:10:56 +01:00
Bartosz Taudul
1c23d7e67a Update manual. 2020-03-23 02:31:06 +01:00
Bartosz Taudul
96a330e034 Improve ghost zones source location logic. 2020-03-23 01:59:57 +01:00
Bartosz Taudul
f248159d55 Update NEWS. 2020-03-22 20:57:01 +01:00
Bartosz Taudul
9672dba765 Replace source file viewer with one that actually works.
This is much simpler, custom implementation of a text file viewer. It is
able to perform these two tasks as intended:
- center source view on the selected line,
- highlight that line.
2020-03-22 20:53:59 +01:00
Bartosz Taudul
57e14c9e5c Update NEWS. 2020-03-22 18:57:43 +01:00
Bartosz Taudul
4626ff4e93 Update NEWS. 2020-03-22 18:56:10 +01:00
Bartosz Taudul
13b5ac92d8 Add notification about display of empty labels. 2020-03-22 18:55:45 +01:00
Bartosz Taudul
4ae13ff3dd Build ghost frame data during live capture. 2020-03-21 18:26:42 +01:00
Bartosz Taudul
d2a53b79d7 Don't check if vector is empty, if we're sure it's not. 2020-03-21 17:59:09 +01:00
Bartosz Taudul
8aeba9dc79 Wait for ghost zones to be ready. 2020-03-21 17:57:43 +01:00
Bartosz Taudul
159cf8c477 Add non-empty version of push_next() to Vector. 2020-03-21 17:56:24 +01:00
Bartosz Taudul
d262ca53ea Add missing zeros to exact time printout. 2020-03-21 16:13:41 +01:00
Bartosz Taudul
c32d9c74b1 Properly display unknown sampled frames in ghost zones. 2020-03-21 15:43:20 +01:00
Bartosz Taudul
8a81d2210c Non-consecutive stack samples are no longer present. 2020-03-21 15:28:34 +01:00
Bartosz Taudul
6c0c508280 Ignore kernel-only stacks.
It is common to receive duplicate stack traces for the same timestamp
(and thread), one containing proper user-space stack, and the second one
containing only kernel frames. Discard the second one, as there's no
documentation how this should be interpreted and the kernel stack is
mostly useless.
2020-03-21 15:25:30 +01:00
Bartosz Taudul
59e859e59a Remove benaphore, use std::mutex on cygwin. 2020-03-19 02:06:54 +01:00
Bartosz Taudul
fe32385a12 Update manual. 2020-03-19 02:06:54 +01:00
Bartosz Taudul
000666a5e2 Update NEWS. 2020-03-19 00:57:50 +01:00
Bartosz Taudul
df7f087b08 Implement ghost zone skipping. 2020-03-19 00:56:56 +01:00
Bartosz Taudul
a2bf5ac199 Display ghost zones by default, if no instrumented zones. 2020-03-19 00:42:20 +01:00
Bartosz Taudul
e11bf1d62d Display frame address in tooltip. 2020-03-19 00:35:19 +01:00
Bartosz Taudul
6444051382 Frames may be missing. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
92e2597192 Fix ghost children times. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
1f4dbd1b2e Parallelize background jobs. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
a48e804e96 Don't reconstruct mem alloc plot in no-statistics builds. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
ac84e77333 Ghost zones display prototype. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
4e4ee2ff2c Add number of call stack samples to thread tooltip. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
77d30adee9 Add per-thread ghost zones switch. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
6daa429b69 Add hidden ghost zones indicator. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
4384b812f1 Smaller nested checkboxes. 2020-03-19 00:35:18 +01:00
Bartosz Taudul
335866be88 Add UI to control ghost zones drawing.
Disabled for now, will be needed in future.
2020-03-19 00:35:18 +01:00
Bartosz Taudul
adb45ed5df Allow checking if ghost zones are ready. 2020-03-17 00:16:39 +01:00
Bartosz Taudul
2f674833b2 Add ghost zones display to view options. 2020-03-16 23:05:07 +01:00
Bartosz Taudul
eb5f7a27e7 Match ghost zones by symbol address. 2020-03-16 23:05:07 +01:00
Bartosz Taudul
b89874850f Pack frame identifiers in ghost zones. 2020-03-16 23:05:07 +01:00
Bartosz Taudul
ead597bacc Display count of ghost zones. 2020-03-16 23:05:07 +01:00
Bartosz Taudul
452341059b Build ghost zones tree. 2020-03-16 23:05:06 +01:00
Bartosz Taudul
693db74380 Add CallstackFrameId comparator. 2020-03-16 23:05:06 +01:00
Bartosz Taudul
377ed48416 Don't over-reserve map. 2020-03-16 23:05:06 +01:00
Bartosz Taudul
aeb3bc410b Pack FrameImage struct. 2020-03-16 23:05:06 +01:00
Bartosz Taudul
c06ea4a3e8 Fix layout. 2020-03-16 23:04:50 +01:00
Bartosz Taudul
8e1333468d Add missing images. 2020-03-14 21:13:08 +01:00
Bartosz Taudul
e785a57c65 Add high-quality frame image capture description. 2020-03-14 21:10:33 +01:00
Bartosz Taudul
b8cc3f59d6 Count number of input and compressed frame image bytes. 2020-03-14 14:35:57 +01:00
Bartosz Taudul
0776dddc35 Display image compression ratio in bits per pixel. 2020-03-14 13:10:15 +01:00
Bartosz Taudul
42ccf332b9 Update manual. 2020-03-14 02:13:03 +01:00
Bartosz Taudul
20a7bf2b23 There are no parents for mid-stack frames. 2020-03-14 02:06:39 +01:00
Bartosz Taudul
a0cdaa8c0b Update NEWS. 2020-03-14 01:47:51 +01:00
Bartosz Taudul
1bb7d05ba0 Display time percentage in statistics menu. 2020-03-14 01:47:18 +01:00
Bartosz Taudul
5046664b8b Use "self time" consistently in the UI. 2020-03-14 01:38:46 +01:00
Bartosz Taudul
4631d884d6 Cleanup samples vector. 2020-03-10 21:46:24 +01:00
Bartosz Taudul
c7afda2562 Exit processing loops when trace has stopped. 2020-03-10 18:56:49 +01:00
Bartosz Taudul
52132b9a29 Update manual. 2020-03-08 16:20:08 +01:00
Bartosz Taudul
e30161c34e Add IP/port tooltip to discovered clients list. 2020-03-08 16:19:58 +01:00
Bartosz Taudul
c6bb08355c Allow specification of port through env variable. 2020-03-08 16:14:36 +01:00
Bartosz Taudul
2671245d41 Update manual. 2020-03-08 15:29:45 +01:00
Bartosz Taudul
4dee706143 Update NEWS. 2020-03-08 15:07:40 +01:00
Bartosz Taudul
1da62c2190 Send deferred lock names. 2020-03-08 15:05:35 +01:00
Bartosz Taudul
a6deabaeee Allow port entry in address field. 2020-03-08 15:02:20 +01:00
Bartosz Taudul
ff37ab7bc6 Handle discovery of multiple clients on the same IP. 2020-03-08 14:51:56 +01:00
Bartosz Taudul
127224acc6 Send listen port in broadcast message. 2020-03-08 14:37:59 +01:00
Bartosz Taudul
14c896573d Separate config for data and broadcast port. 2020-03-08 14:34:09 +01:00
Bartosz Taudul
2ffaa88c9e Fix typo. 2020-03-08 14:19:08 +01:00
Bartosz Taudul
6c95ab4d1e Update manual. 2020-03-08 14:05:57 +01:00
Bartosz Taudul
c83012a83a Update NEWS. 2020-03-08 14:05:53 +01:00
Bartosz Taudul
09d54cf9d9 Display custom lock names. 2020-03-08 13:59:31 +01:00
Bartosz Taudul
e7240cb77d Custom lock name transfer. 2020-03-08 13:47:38 +01:00
Bartosz Taudul
0fe606dd8b Update NEWS. 2020-03-06 22:15:13 +01:00
Bartosz Taudul
3c22134f78 Reconnect to client, if requested. 2020-03-06 22:11:29 +01:00
Bartosz Taudul
9668234903 Allow requesting reconnect on trace discard. 2020-03-06 22:11:17 +01:00
Bartosz Taudul
d25614d50f Allow address/port retrieval from View. 2020-03-06 22:11:00 +01:00
Bartosz Taudul
f945278959 Fix rpmalloc on android. 2020-03-02 17:10:47 +01:00
Bartosz Taudul
2b584a94c5 Fix GLES check. 2020-03-02 17:06:25 +01:00
Bartosz Taudul
71bf352af2 Use mach-o/dyld.h on osx. 2020-03-02 13:51:39 +01:00
Bartosz Taudul
fd363d7d85 Implement getexecname() for osx. 2020-03-02 13:47:34 +01:00
Bartosz Taudul
31dc44feee Fix casts. 2020-03-02 12:29:32 +01:00
Bartosz Taudul
50123298a7 Unify texture compression implementations. 2020-03-02 02:08:14 +01:00
Bartosz Taudul
aa0bf47ec3 Extract texture compression functionality. 2020-03-02 02:00:35 +01:00
Bartosz Taudul
0df309b45c Forward declare LockType. 2020-03-02 01:58:48 +01:00
Bartosz Taudul
abd44069ae Fix off-by-one. 2020-03-01 14:04:10 +01:00
Bartosz Taudul
0b3431289f Fix libbacktrace config for BSD. 2020-03-01 13:02:29 +01:00
Bartosz Taudul
fdc3b01054 Update libbacktrace to ca0de0517. 2020-03-01 12:55:32 +01:00
Bartosz Taudul
c36ed4b8b8 Boring warning fixes. 2020-03-01 01:48:20 +01:00
Bartosz Taudul
8f9ba5d54a Rearrange UI. 2020-03-01 01:32:31 +01:00
Bartosz Taudul
c23984dd6a Fix static assert in rpmalloc. 2020-03-01 01:31:31 +01:00
Bartosz Taudul
5887c9d12c Remove unused variable. 2020-03-01 01:28:56 +01:00
Bartosz Taudul
e93b574c5d Fill-in missing image name. 2020-03-01 01:27:21 +01:00
Bartosz Taudul
e9a32d5dc7 Greatly increase queue block size.
Previous block size could hold only 256 elements (8KB), which stressed
out the memory allocator. Storing 65536 elements (2MB) per block almost
completely reduces the allocator pressure.
2020-03-01 01:15:13 +01:00
Bartosz Taudul
82f463724c Update rpmalloc to 1.4.0.
Notable changes: use C++11 atomics everywhere.
2020-03-01 01:02:25 +01:00
Bartosz Taudul
83316f1299 Fix pointer fixup. 2020-02-29 23:40:21 +01:00
Bartosz Taudul
00dd46b521 Add second screenshot. 2020-02-29 20:09:48 +01:00
Bartosz Taudul
d1ff99d6e3 Callstack frame map must not be touched by statistics. 2020-02-29 19:49:33 +01:00
Bartosz Taudul
8d5755521e Fix no statistics build. 2020-02-29 19:31:51 +01:00
Bartosz Taudul
de3a48f958 Better symbol information. 2020-02-29 19:28:42 +01:00
Bartosz Taudul
452d571e79 Update manual. 2020-02-29 19:19:12 +01:00
Bartosz Taudul
58bf6f8fad Update NEWS. 2020-02-29 19:00:27 +01:00
Bartosz Taudul
07756476b0 Allow viewing global entry stats of a callstack. 2020-02-29 18:59:01 +01:00
Bartosz Taudul
39361f71a1 Allow GetSymbolStats() to fail gracefully. 2020-02-29 18:59:01 +01:00
Bartosz Taudul
9b53792f20 Add call stack sample parents window. 2020-02-29 18:59:01 +01:00
Bartosz Taudul
2589b45af0 Add accessors for new data. 2020-02-29 18:39:45 +01:00
Bartosz Taudul
bc14ca86f3 Count proper number of parent stacks. 2020-02-29 17:58:17 +01:00
Bartosz Taudul
c18584fdbc Iterator may be invalidated. 2020-02-29 17:08:02 +01:00
Bartosz Taudul
4843a1d458 Collect parent call stacks for symbols. 2020-02-29 16:41:22 +01:00
Bartosz Taudul
90f5ae536c Display count of parent call stacks and frames. 2020-02-29 16:32:33 +01:00
Bartosz Taudul
6cc4de8d28 Construct parent call stacks for sampled stack traces. 2020-02-29 16:24:15 +01:00
Bartosz Taudul
935eb4cc61 Use xxh3 for VarArray hashing. 2020-02-29 15:31:05 +01:00
Bartosz Taudul
aed91a4d09 Update manual. 2020-02-29 14:55:14 +01:00
Bartosz Taudul
bdc86f5338 Process postponed samples when new frame data is available. 2020-02-29 14:12:04 +01:00
Bartosz Taudul
0d112e20a5 Don't perform unnecessary hashmap search. 2020-02-29 13:58:11 +01:00
Bartosz Taudul
3412d2232f Postpone processing incomplete sampled callstacks. 2020-02-29 13:45:50 +01:00
Bartosz Taudul
7e4088ed61 Update manual. 2020-02-29 00:58:59 +01:00
Bartosz Taudul
4fac1517d9 Update NEWS. 2020-02-29 00:51:03 +01:00
Bartosz Taudul
7947087694 Print frame image compression ration. 2020-02-29 00:49:42 +01:00
Bartosz Taudul
71a11554cb Update manual. 2020-02-27 23:11:52 +01:00
Bartosz Taudul
49a0072bab Use correct wording. 2020-02-27 23:11:45 +01:00
Bartosz Taudul
5f48f8a3aa Allow hiding symbols with unknown location. 2020-02-27 19:45:41 +01:00
Bartosz Taudul
fea3444625 Use percent print wrapper in more places. 2020-02-27 17:20:34 +01:00
Bartosz Taudul
4fa049188b Add total time percentage for entries in sampling statistics. 2020-02-27 17:05:56 +01:00
Bartosz Taudul
e438e1c53d Add another percent print helper. 2020-02-27 17:05:48 +01:00
Bartosz Taudul
c5e8739435 Live update of sample statistics data. 2020-02-27 16:48:50 +01:00
Bartosz Taudul
8aa70211c0 Display inline functions. 2020-02-27 15:28:58 +01:00
Bartosz Taudul
bc18862dc6 Allow displaying instruction location. 2020-02-27 15:10:50 +01:00
Bartosz Taudul
7dd929a39e Preserve symbol call location. 2020-02-27 15:07:29 +01:00
Bartosz Taudul
2e5ce52086 Implement reading 7 elements. 2020-02-27 15:07:29 +01:00
Bartosz Taudul
17e7add105 By default show self times in statistics menu. 2020-02-27 14:20:19 +01:00
Bartosz Taudul
710a2a64e4 Fix copy pasta. 2020-02-27 14:08:56 +01:00
Bartosz Taudul
4346620afa No need to copy module name. 2020-02-27 13:45:39 +01:00
Bartosz Taudul
fd8a9465d4 Cosmetics. 2020-02-27 13:40:41 +01:00
Bartosz Taudul
9ae71ac4ee Dl_info doesn't destroy data. 2020-02-27 13:28:45 +01:00
Bartosz Taudul
5f6b3d2cd5 No need for module name intermediate buffer. 2020-02-27 13:24:36 +01:00
Bartosz Taudul
474383b656 Only copy symbol strings, if needed. 2020-02-27 13:17:26 +01:00
Bartosz Taudul
2df6f9068a Don't retrieve symbol name for address. 2020-02-27 12:58:01 +01:00
Bartosz Taudul
be5793987e Don't send symbol name. 2020-02-27 12:49:48 +01:00
Bartosz Taudul
9d718eb1e8 Preserve inlined symbol names. 2020-02-27 12:39:05 +01:00
Bartosz Taudul
374ddbe2a6 Replace tabs with radio buttons. 2020-02-27 12:24:28 +01:00
Bartosz Taudul
8173f24515 Add filter to sampling statistics. 2020-02-27 02:31:30 +01:00
Bartosz Taudul
a66f4e0614 Update NEWS. 2020-02-27 02:09:20 +01:00
Bartosz Taudul
27466fc56b Add callstack sampling statistics. 2020-02-27 02:08:20 +01:00
Bartosz Taudul
852e37c8dd Calculate callstack sample data on trace load. 2020-02-27 01:22:36 +01:00
Bartosz Taudul
8288f7c6b7 No need to pack ZoneThreadData. 2020-02-27 01:00:41 +01:00
Bartosz Taudul
c99537c402 Provide default value for sourceLocationZonesReady. 2020-02-27 00:26:58 +01:00
Bartosz Taudul
56dce646cc Symbol address decoding on unix. 2020-02-26 23:38:04 +01:00
Bartosz Taudul
4ddafdeeaf Symbol address decoding for old androids. 2020-02-26 23:24:18 +01:00
Bartosz Taudul
7c506d5426 Remove unused variables. 2020-02-26 23:24:11 +01:00
Bartosz Taudul
75312c7bb6 Display count of symbols. 2020-02-26 23:02:39 +01:00
Bartosz Taudul
a35795f793 Use more appropriate name. 2020-02-26 23:02:34 +01:00
Bartosz Taudul
4511a4de8c Save/load symbol information. 2020-02-26 22:53:18 +01:00
Bartosz Taudul
d04e8ab1d0 Update manual. 2020-02-26 22:48:18 +01:00
Bartosz Taudul
847069a59d Expose symbol source location data. 2020-02-26 22:46:02 +01:00
Bartosz Taudul
26cee8acf0 Perform symbol information queries. 2020-02-26 22:35:15 +01:00
Bartosz Taudul
ef05570540 Symbol address decoding (win32 implementation). 2020-02-26 22:32:42 +01:00
Bartosz Taudul
03ff08a934 Increase max name size. 2020-02-26 22:32:09 +01:00
Bartosz Taudul
d1fcf80c2d Move definition of max symbol name size to one place. 2020-02-26 22:30:11 +01:00
Bartosz Taudul
c0f49c648b Validate size. 2020-02-26 22:27:10 +01:00
Bartosz Taudul
4f052a6b9c Update manual. 2020-02-26 19:16:27 +01:00
Bartosz Taudul
14a068062e Update NEWS. 2020-02-26 19:16:27 +01:00
Bartosz Taudul
02c99fca67 Enable display of symbol address in callstack window. 2020-02-26 19:16:27 +01:00
Bartosz Taudul
890cec9872 Retrieve symbol addresses on unix. 2020-02-26 02:25:45 +01:00
Bartosz Taudul
9231261d73 Retrieve image name on unix. 2020-02-26 02:11:51 +01:00
Bartosz Taudul
fe80a7ed46 Retrieve symbol address on old androids. 2020-02-26 02:06:44 +01:00
Bartosz Taudul
74aa3f861e Update manual. 2020-02-26 01:29:14 +01:00
Bartosz Taudul
98f31b3d43 Update NEWS. 2020-02-26 01:05:37 +01:00
Bartosz Taudul
8f36ee2c89 Display module name in callstack window. 2020-02-26 01:04:51 +01:00
Bartosz Taudul
eb7e8162ff Handle module names on server side. 2020-02-26 00:55:43 +01:00
Bartosz Taudul
abf8c42a7c Send module name. 2020-02-26 00:33:09 +01:00
Bartosz Taudul
7d0dac9ae2 Store callstack frame module name. 2020-02-26 00:32:47 +01:00
Bartosz Taudul
4cf520db93 Unify copying symbol strings. 2020-02-26 00:02:30 +01:00
Bartosz Taudul
d6c0720f8a Save/load sampling period. 2020-02-25 23:46:16 +01:00
Bartosz Taudul
af58649113 Store symbol addresses. 2020-02-25 23:42:59 +01:00
Bartosz Taudul
ca894be51d Store sampling period on server. 2020-02-25 23:13:28 +01:00
Bartosz Taudul
c5b2d14f8c Send sampling period in welcome message. 2020-02-25 23:12:31 +01:00
Bartosz Taudul
2b7f5091f1 Store sampling period. 2020-02-25 23:08:52 +01:00
Bartosz Taudul
3402d16548 Send symbol base address. 2020-02-25 23:03:40 +01:00
Bartosz Taudul
85ffe0ea04 Don't search module list for kernel addresses. 2020-02-24 23:04:53 +01:00
Bartosz Taudul
ece32b47df Zero capacity is invalid. 2020-02-24 23:04:53 +01:00
Bartosz Taudul
4695f60937 Fix display of kernel addresses. 2020-02-24 23:04:53 +01:00
Bartosz Taudul
9c9e854005 Replace list with vector.
Maybe next time let's not forget that there's already a custom
allocating vector available.
2020-02-24 23:04:53 +01:00
Bartosz Taudul
d60641cac4 Update manual. 2020-02-24 01:43:15 +01:00
Bartosz Taudul
e6200899de Update NEWS. 2020-02-24 01:33:43 +01:00
Bartosz Taudul
9e1c24f93e Add icon to filter entry boxes. 2020-02-24 01:33:12 +01:00
Bartosz Taudul
2e3b13ced5 Optional frame image display in messages list. 2020-02-24 01:29:57 +01:00
Bartosz Taudul
d63a15a570 Don't init unused memory on vector realloc.
Warning, this change may break things in unexpected ways!
2020-02-23 20:20:19 +01:00
Bartosz Taudul
d150e688f5 Simpler vector init. 2020-02-23 19:47:09 +01:00
Bartosz Taudul
00d91cb9ab Optimize slab allocation. 2020-02-23 19:43:00 +01:00
Bartosz Taudul
085e1fd43f Deduplicate code. 2020-02-23 19:16:33 +01:00
Bartosz Taudul
adbde78a7a Update NEWS. 2020-02-23 18:24:39 +01:00
Bartosz Taudul
24cd73e366 Fix linux tracing with long pids. 2020-02-23 18:23:53 +01:00
Bartosz Taudul
0fa1d25d98 Disable trace annotations. 2020-02-23 18:20:48 +01:00
Bartosz Taudul
462cfe2fc1 Update manual. 2020-02-23 16:10:24 +01:00
Bartosz Taudul
37d9a20b4a Use caret right to indicate inlined frames, instead of "--". 2020-02-23 16:08:00 +01:00
Bartosz Taudul
7f5e23f2ac Handle one more case of duplicate samples. 2020-02-23 15:57:36 +01:00
Bartosz Taudul
625d380f7a Return value is not used. 2020-02-23 15:53:23 +01:00
Bartosz Taudul
759fd15c03 Don't load vector size twice. 2020-02-23 15:35:08 +01:00
Bartosz Taudul
02d200878d Process queue data in-place. 2020-02-23 15:18:24 +01:00
Bartosz Taudul
96034bca3e Force inline AppendData(), NeedDataSize(). 2020-02-23 14:44:19 +01:00
Bartosz Taudul
6968a42abe Force inline LZ4_NbCommonBytes(). 2020-02-23 14:40:09 +01:00
Bartosz Taudul
bd34c24b84 Increase block size. 2020-02-23 12:35:30 +01:00
Bartosz Taudul
31d2bc1740 Fix typo. 2020-02-23 11:49:01 +01:00
Bartosz Taudul
358de714c8 Don't use "???" external thread name. 2020-02-23 11:39:51 +01:00
Bartosz Taudul
2ec3813cb3 Collapse multiple callstack samples. 2020-02-23 00:24:04 +01:00
Bartosz Taudul
5e2390604d Use proper check for "draw stack samples" option visibility. 2020-02-22 23:20:59 +01:00
Bartosz Taudul
a650717c1d Update manual. 2020-02-22 23:17:40 +01:00
Bartosz Taudul
6b3f47f675 Update NEWS. 2020-02-22 22:35:18 +01:00
Bartosz Taudul
90277953c7 Replace duplicate samples. 2020-02-22 21:36:27 +01:00
Bartosz Taudul
26b13abac8 Pre-fill module cache. 2020-02-22 21:32:18 +01:00
Bartosz Taudul
0a02cf32be Add module name cache. 2020-02-22 21:32:10 +01:00
Bartosz Taudul
6532f622cb Mute color of unknown call frames. 2020-02-22 21:17:36 +01:00
Bartosz Taudul
096e8cd8ae Retrieve module name if symbol name cannot be found. 2020-02-22 21:06:32 +01:00
Bartosz Taudul
d0930e9053 Use maximum possible sampling rate. 2020-02-22 19:08:15 +01:00
Bartosz Taudul
4273939cf5 Local threads must have at least one zone captured. 2020-02-22 18:52:38 +01:00
Bartosz Taudul
d4f9810006 Draw callstack sample data. 2020-02-22 18:52:38 +01:00
Bartosz Taudul
4502858407 Use maximum possible etw buffer size (1MB). 2020-02-22 18:52:38 +01:00
Bartosz Taudul
64d6caf695 Add Int24, Int48 natvis. 2020-02-22 18:52:38 +01:00
Bartosz Taudul
b425374cb7 Add callstack samples drawing option. 2020-02-22 18:52:38 +01:00
Bartosz Taudul
597911e5a8 Save/load callstack samples. 2020-02-22 18:52:38 +01:00
Bartosz Taudul
e270603117 Don't write reference time to memory in each iteration. 2020-02-22 18:52:37 +01:00
Bartosz Taudul
ca39c9877e Display number of callstack samples. 2020-02-22 18:52:37 +01:00
Bartosz Taudul
437771ea85 Process callstack sample data. 2020-02-22 18:52:37 +01:00
Bartosz Taudul
054a6f8563 Send time deltas in callstack sample data packets. 2020-02-22 16:42:47 +01:00
Bartosz Taudul
1ee80e0df5 Send/free callstack sample payloads. 2020-02-22 16:20:43 +01:00
Bartosz Taudul
3b0ed5337b Provide TraceSetInformation() definition for cygwin. 2020-02-22 16:03:07 +01:00
Bartosz Taudul
13febeb902 Use dllimport/dllexport on cygwin. 2020-02-22 16:02:02 +01:00
Bartosz Taudul
baf8e6fe80 No support for sampling on 32-bit windows.
Note that 32-bit applications running on 64-bit windows will perform
sampling.
2020-02-22 14:16:04 +01:00
Bartosz Taudul
23fe3e623d 64-bit only version of callstack payload sender. 2020-02-22 14:05:01 +01:00
Bartosz Taudul
9e9c7db5b1 Send sampled call stacks. 2020-02-22 13:42:09 +01:00
Bartosz Taudul
f186540c4f Fix callstack pointers in 32-bit builds. 2020-02-22 13:38:09 +01:00
Bartosz Taudul
9b9474ada1 Request stack traces for execution sampling events. 2020-02-22 13:13:49 +01:00
Bartosz Taudul
28d0f387ea Setup execution sampling profiling. 2020-02-22 13:13:32 +01:00
Bartosz Taudul
ad77b4f73b Store current process id. 2020-02-22 13:11:16 +01:00
Bartosz Taudul
1f671fbacc Keep sys trace variables local. 2020-02-22 13:08:35 +01:00
Bartosz Taudul
539ccf5a61 Check provider id in etw callback. 2020-02-22 12:56:33 +01:00
Bartosz Taudul
ba0715b295 Replace remaining manual children checks with HasChildren(). 2020-02-21 00:36:45 +01:00
Bartosz Taudul
ecc9369da2 Return zone extra during allocation. 2020-02-20 23:39:40 +01:00
Bartosz Taudul
4bf0af321f Wrapper for allocation and retrieval of zone extra. 2020-02-20 23:37:55 +01:00
Bartosz Taudul
217c16de20 Display exact start and end time for CPU data. 2020-02-20 23:21:54 +01:00
Bartosz Taudul
100afe304e Fix typo. 2020-02-20 23:11:13 +01:00
Bartosz Taudul
c5dbd749e7 Combine ContextSwitchCpu writes. 2020-02-20 02:09:09 +01:00
Bartosz Taudul
54573fb970 Combine ContextSwitchData writes. 2020-02-20 02:05:23 +01:00
Bartosz Taudul
d96fcc95c2 Update list of tracy callstack frames. 2020-02-20 00:31:38 +01:00
Bartosz Taudul
3bd5da9896 Update manual. 2020-02-19 23:51:01 +01:00
Bartosz Taudul
a7d5f38d51 Update NEWS. 2020-02-19 23:33:39 +01:00
Bartosz Taudul
b138db9c4d Display mode in find zone menu. 2020-02-19 23:33:39 +01:00
Bartosz Taudul
b26f642c6e Change "average" to "mean". 2020-02-19 23:29:18 +01:00
Bartosz Taudul
d4f99e4459 Perform cheaper check first. 2020-02-19 22:43:37 +01:00
Bartosz Taudul
9b1fe23b03 Update manual. 2020-02-15 13:50:05 +01:00
Bartosz Taudul
0b82902618 Optimize scalar DXT1 compression. 2020-02-15 13:43:40 +01:00
Bartosz Taudul
b4d8cdd714 Release 0.6.3. 2020-02-13 19:26:37 +01:00
Bartosz Taudul
25f6c5f884 Disable charconv on gcc/clang.
Because there's no way to check whether libstdc++ and libc++ finally
decided to implement the standard fully.
2020-02-13 19:22:08 +01:00
Bartosz Taudul
3140dbb34a Size is never zero at start. 2020-02-13 18:41:32 +01:00
Bartosz Taudul
c870b10387 Offset in just-read block is always zero. 2020-02-13 18:39:46 +01:00
Bartosz Taudul
26584b00c3 Issue just one read call per zone.
Separate reads are issued only for first and last of children.
2020-02-13 18:11:54 +01:00
Bartosz Taudul
ebf2c3ad5b No need to check if file has ended. 2020-02-13 15:41:20 +01:00
Bartosz Taudul
4d892cbae9 Preload next block size. 2020-02-13 15:37:37 +01:00
Bartosz Taudul
232379c72c Optimize reading CPU data. 2020-02-13 01:14:12 +01:00
Bartosz Taudul
c03b8b72da Optimize reading context switches. 2020-02-13 01:12:01 +01:00
Bartosz Taudul
3bb0f33dcc Optimize reading plot data. 2020-02-13 01:04:40 +01:00
Bartosz Taudul
c0a2e9b3f7 Ditto during capture. 2020-02-13 00:54:54 +01:00
Bartosz Taudul
cc0f1f514c Store memory event time and thread data together. 2020-02-13 00:52:29 +01:00
Bartosz Taudul
f9b19631c0 Read memory data in one go. 2020-02-13 00:37:54 +01:00
Bartosz Taudul
51e04673c6 8-element reader wrapper. 2020-02-13 00:35:20 +01:00
Bartosz Taudul
8e825d91e0 Keep refTime in a register. 2020-02-12 20:59:36 +01:00
Bartosz Taudul
39d24d0d4a Set start and srcloc in one go. 2020-02-12 20:46:56 +01:00
Bartosz Taudul
5c6bfcbeee Add combined start + srcloc setter for ZoneEvent. 2020-02-12 20:46:56 +01:00
Bartosz Taudul
354115ef9b Reduce granularity of zone reading progress updates. 2020-02-12 20:03:14 +01:00
Bartosz Taudul
1492536bdb Handle FileReadError. 2020-02-12 19:53:37 +01:00
Bartosz Taudul
eb79507501 Read files using mmap. 2020-02-12 19:43:05 +01:00
Bartosz Taudul
3b345aab37 FileRead::IsEOF() is no longer used. 2020-02-12 19:27:15 +01:00
Bartosz Taudul
cc805b7b74 Add mmap() wrapper. 2020-02-12 19:24:30 +01:00
Bartosz Taudul
fa1747bdb2 Faster total zone count calculation during loading. 2020-02-12 19:15:46 +01:00
Bartosz Taudul
e88df069bd Load zone child number along with zone data. 2020-02-12 02:14:21 +01:00
Bartosz Taudul
925909aa3a Wrapper for reading 6 elements at once. 2020-02-12 02:13:50 +01:00
Bartosz Taudul
1655bf284f More Int24/Int48 optimizations. 2020-02-12 02:00:26 +01:00
Bartosz Taudul
838c0aaaa9 Check if BUS_MCEERR_AR and BUS_MCEERR_AO are defined. 2020-02-12 01:27:03 +01:00
Bartosz Taudul
f562ff780c Don't care about atomic increments of counters. 2020-02-12 00:53:03 +01:00
Bartosz Taudul
88f3e554da Read all CPU zone variables at once. 2020-02-12 00:36:59 +01:00
Bartosz Taudul
5227bc3549 Read all GPU zone variables at once. 2020-02-12 00:34:09 +01:00
Bartosz Taudul
6cab3dcd3a Helper for reading 5 variables. 2020-02-12 00:32:00 +01:00
Bartosz Taudul
00ab76fa19 Discard scratch buffer tricks for better performance. 2020-02-12 00:29:45 +01:00
Bartosz Taudul
f3bb4030f6 Fix ContextSwitchData natvis. 2020-02-11 23:45:04 +01:00
Bartosz Taudul
e0ec98d766 Keep ztime in register. 2020-02-11 23:20:38 +01:00
Bartosz Taudul
2c8d519d70 Fix typo. 2020-02-11 15:12:06 +01:00
Bartosz Taudul
0d2e7f72a7 Move GetThreadHandleImpl() from header to source file.
This removes dependency on unistd.h header (required for syscall() on
linux), which includes a definition of getopt(), which may conflict with
a custom getopt implementation (for example, one that does work on
windows).
2020-02-11 14:40:49 +01:00
Bartosz Taudul
bfdebc5775 Add emphasis on build instructions being in the manual. 2020-02-11 14:15:48 +01:00
Bartosz Taudul
86644ecda0 Store intermediate results in registers, not in memory. 2020-02-11 02:35:50 +01:00
Bartosz Taudul
caace1ce11 Directly access memory, omitting shift.
As always, clang generated the right code here anyways...
2020-02-11 02:24:40 +01:00
Bartosz Taudul
5a995d804b Draw zig-zag as a single stroke. 2020-02-10 23:34:33 +01:00
Bartosz Taudul
06cd57fe6e Delay rounding h/2. 2020-02-10 23:24:58 +01:00
Bartosz Taudul
d80e7cc025 Remove unnecessary round() calls. 2020-02-10 23:20:35 +01:00
Bartosz Taudul
eff040dca6 Optimize drawing zig-zags. 2020-02-10 22:53:59 +01:00
Bartosz Taudul
ad3aa73085 Change order of background tasks.
Now tasks are performed in the following order:
- Context switch based CPU usage graph.
- Memory allocations plot.
- Zone statistics.

This prioritizes appearance of the most notable things first.
2020-02-10 22:31:10 +01:00
Bartosz Taudul
f149f353eb Update ImGui to 1.75. 2020-02-10 16:26:04 +01:00
Bartosz Taudul
abfa4c65df Update fun list of iDevices. 2020-02-10 16:13:32 +01:00
Bartosz Taudul
6f4a10be04 Optimize Int48 reconstruction. 2020-02-10 01:38:45 +01:00
Bartosz Taudul
76afef9117 Direct checks for context switch end validity. 2020-02-10 01:26:31 +01:00
Bartosz Taudul
ae0392a0e5 Don't needlessly convert doubles to integers. 2020-02-10 01:05:42 +01:00
Bartosz Taudul
6e7d5fecb4 Update NEWS. 2020-02-10 00:43:37 +01:00
Bartosz Taudul
d0c2a26ced Fix continuous frames tooltips.
At high zoom-out levels frame tooltip will be always displayed now for
continuous frame sets. This greatly helps with scrubbing to an
appropriate location, basing of frame images.
2020-02-10 00:41:18 +01:00
Bartosz Taudul
ffdd5290bf Clip wait regions display. 2020-02-10 00:12:34 +01:00
Bartosz Taudul
4fc675814a Update NEWS. 2020-02-09 21:33:12 +01:00
Bartosz Taudul
53e5eb749d Compress frame images using zstd.
Memory usage and trace load times:

!comp         587 MB,  439 ms  ->    541 MB,  523 ms    (92%, 119%)
android-vk    197 MB,  136 ms  ->    188 MB,  178 ms    (95%, 130%)
big2         4463 MB,  2.93 s  ->   4198 MB,  3.65 s    (94%, 124%)
fi            483 MB,  346 ms  ->    416 MB,  409 ms    (86%, 118%)
fi-big       3307 MB,  3.15 s  ->   2985 MB,  3.53 s    (90%, 112%)
large       19.74 GB, 10.05 s  ->  19.28 GB, 11.16 s    (97%, 110%)
2020-02-09 21:22:12 +01:00
Bartosz Taudul
99f2734d28 Remove stack dependency. 2020-02-09 17:22:41 +01:00
Bartosz Taudul
78c2f68649 Force inline LZ4 read/write functions. 2020-02-09 17:22:41 +01:00
Bartosz Taudul
28d749982a Fix typo. 2020-02-08 20:03:38 +01:00
Bartosz Taudul
e524f4a050 Update manual. 2020-02-08 19:24:16 +01:00
Bartosz Taudul
99613291fb Restore original 16 KB LZ4 hash table size. 2020-02-08 18:12:44 +01:00
Bartosz Taudul
c658f43bc3 Update NEWS. 2020-02-08 16:16:16 +01:00
Bartosz Taudul
ec5c7cf8a7 Reading zstd compressed traces. 2020-02-08 16:14:43 +01:00
Bartosz Taudul
817e052457 Expose zstd in update utility. 2020-02-08 16:14:43 +01:00
Bartosz Taudul
145514687c Add zstd compression to FileWrite. 2020-02-08 16:14:43 +01:00
Bartosz Taudul
64cfad317d Add zstd to project files. 2020-02-08 15:42:08 +01:00
Bartosz Taudul
f5e6029a51 Add Zstd revision 06a57cf5. 2020-02-08 14:56:59 +01:00
Bartosz Taudul
aaa163a140 Update manual. 2020-02-08 13:45:02 +01:00
Bartosz Taudul
642fd1e8da Display update utility elapsed time. 2020-02-08 13:35:36 +01:00
Bartosz Taudul
edae590b2c Update NEWS. 2020-02-08 13:19:21 +01:00
Bartosz Taudul
4c7f698946 Display compression ratio in update utility. 2020-02-08 13:18:42 +01:00
Bartosz Taudul
f58c96e4b3 Display trace file size in capture utility. 2020-02-08 13:10:41 +01:00
Bartosz Taudul
dd650e08ec Display saved trace size. 2020-02-08 13:07:02 +01:00
Bartosz Taudul
6f554d7f00 Gather file compression statistics. 2020-02-08 12:57:35 +01:00
Bartosz Taudul
068e752aa7 No charconv for MSVC < 16.4.
https://docs.microsoft.com/en-us/cpp/overview/visual-cpp-language-conformance?view=vs-2019#note_charconv
2020-02-06 00:37:27 +01:00
Bartosz Taudul
db408df395 Explain combined values in trace info window. 2020-02-05 23:49:10 +01:00
Bartosz Taudul
7425cd7112 Also display non-user plot data count. 2020-02-05 23:41:53 +01:00
Bartosz Taudul
1bad607e6c Replace "2x" with "2×". 2020-02-05 23:37:03 +01:00
Bartosz Taudul
8624bb95ba Fix double separator in options, if no GPU zones are available. 2020-02-05 23:32:59 +01:00
Bartosz Taudul
9556878b23 Display appropriate notification when there's no message data. 2020-02-05 23:30:52 +01:00
Bartosz Taudul
9fc9f71666 Properly initialize callstack frame tree. 2020-02-05 23:16:18 +01:00
Bartosz Taudul
7a6c96a989 Update NEWS. 2020-02-05 17:08:41 +01:00
Bartosz Taudul
0bb6162a01 Allow sorting memory data listings. 2020-02-05 17:08:41 +01:00
Bartosz Taudul
9aecd43f24 Display exact time in find zone range limit information. 2020-02-03 19:34:51 +01:00
Bartosz Taudul
8b0da6f508 Update manual. 2020-02-02 15:47:17 +01:00
Bartosz Taudul
1f68b739af Update NEWS. 2020-02-02 15:13:50 +01:00
Bartosz Taudul
eca27b9d12 Display exact time, where appropriate. 2020-02-02 15:13:20 +01:00
Bartosz Taudul
e9be4934c4 Display plot item time. 2020-02-02 15:13:20 +01:00
Bartosz Taudul
b1342a10b0 Exact time printing functionality. 2020-02-02 15:13:20 +01:00
Bartosz Taudul
b418c55e89 Verify tiny int printing input. 2020-02-02 14:31:59 +01:00
Bartosz Taudul
182efc0b04 Limit timeline view to <-5,5> days range. 2020-02-02 14:11:42 +01:00
Bartosz Taudul
8627a31912 Maximum time range is 1.6 days. 2020-02-02 14:11:42 +01:00
Bartosz Taudul
ea4b64909f Fix 32-bit short_ptr. 2020-02-02 02:36:28 +01:00
Bartosz Taudul
b55fa19f72 Fix division by zero. 2020-02-01 18:01:24 +01:00
Bartosz Taudul
555203c46a Don't display unknown frames if there's no callstack frames. 2020-02-01 17:49:27 +01:00
Bartosz Taudul
024182b6a0 Fit trace description entry field to window width. 2020-02-01 17:36:51 +01:00
Bartosz Taudul
d9eb04f68e Update NEWS. 2020-01-31 19:20:19 +01:00
Bartosz Taudul
d680fa6cd6 Clip memory allocation lists. 2020-01-31 19:18:20 +01:00
Bartosz Taudul
a84eec1aef Clip found zones list. 2020-01-31 19:07:20 +01:00
Bartosz Taudul
f86fbe7fd5 Allow no-nonsense access to internal short_ptr pointer. 2020-01-31 19:06:21 +01:00
Bartosz Taudul
5f91b48e0c Clip child zone list submissions. 2020-01-31 18:58:31 +01:00
Bartosz Taudul
4a6bc284af Optimize "string (percentage)" printing. 2020-01-31 18:26:28 +01:00
Bartosz Taudul
ec8c661b38 Remove unneeded sprintfs. 2020-01-31 18:26:28 +01:00
Bartosz Taudul
87989a5f09 Again with this non-conformance stupidity! 2020-01-31 18:09:25 +01:00
Bartosz Taudul
289637384d Perform less work if printing integers. 2020-01-31 02:32:12 +01:00
Bartosz Taudul
1475635382 Use std::to_chars() in RealToString(). 2020-01-31 02:02:47 +01:00
Bartosz Taudul
f017b27ae2 RealToString() is always called with separator set to true. 2020-01-31 01:43:24 +01:00
Bartosz Taudul
3050f19e0b Leverage std::to_chars. 2020-01-31 01:30:33 +01:00
Bartosz Taudul
36fb1f96e2 Custom float-printing function. 2020-01-31 01:19:08 +01:00
Bartosz Taudul
8d5f4d7363 Always use init once to initialize rpmalloc. 2020-01-30 20:08:34 +01:00
Bartosz Taudul
9a5104dacf Extract and pass a value, which will be changing. 2020-01-30 01:57:49 +01:00
Bartosz Taudul
c04824890d Add border to frame range highlight. 2020-01-29 02:17:13 +01:00
Bartosz Taudul
a218ca4412 Use correct format specifier. 2020-01-28 22:01:39 +01:00
Bartosz Taudul
7b0483dc16 Fix typo. 2020-01-28 22:00:07 +01:00
Bartosz Taudul
6946c1b205 Reorder initialization list. 2020-01-28 22:00:07 +01:00
Bartosz Taudul
6e0c238f71 Remove unneeded lambda capture. 2020-01-28 21:55:32 +01:00
Bartosz Taudul
e7ce7282a1 Update NEWS. 2020-01-28 21:53:56 +01:00
Bartosz Taudul
212b497f1f Update manual. 2020-01-28 21:52:54 +01:00
Bartosz Taudul
022528bb47 Use Martin Ankerl's robin hood unordered map.
ska::flat_hash_map has bugs and its development is dead.
2020-01-28 21:49:36 +01:00
Bartosz Taudul
ac9479aa3f Change OpenGL headers comment to compilation error. 2020-01-27 17:50:44 +01:00
Bartosz Taudul
c1ba647f5b Fix compilation errors. 2020-01-26 17:48:19 +01:00
Bartosz Taudul
61d71d8716 Update NEWS. 2020-01-26 17:34:36 +01:00
Bartosz Taudul
ff56f1d2f8 Update example application memory requirements. 2020-01-26 17:30:05 +01:00
Bartosz Taudul
1915a9e5f6 Display extra zone data count. 2020-01-26 17:28:37 +01:00
Bartosz Taudul
3e45e4abd9 Store zone children counts as uint32, not uint64.
This, along with the previous change has the following effect on trace
file sizes:

old/0.tracy (0.6.2) {6512 KB} -> new/0.tracy (0.6.3) {6518 KB}  100.10% size change
old/android.tracy (0.6.2) {488901 KB} -> new/android.tracy (0.6.3) {489710 KB}  100.17% size change
old/android-vk.tracy (0.6.2) {78049 KB} -> new/android-vk.tracy (0.6.3) {76570 KB}  98.10% size change
old/asset-new.tracy (0.6.2) {74224 KB} -> new/asset-new.tracy (0.6.3) {74181 KB}  99.94% size change
old/asset-new-id.tracy (0.6.2) {79900 KB} -> new/asset-new-id.tracy (0.6.3) {79875 KB}  99.97% size change
old/asset-old.tracy (0.6.2) {76245 KB} -> new/asset-old.tracy (0.6.3) {76420 KB}  100.23% size change
old/big.tracy (0.6.2) {922594 KB} -> new/big.tracy (0.6.3) {860068 KB}  93.22% size change
old/big2.tracy (0.6.2) {2028646 KB} -> new/big2.tracy (0.6.3) {1990121 KB}  98.10% size change
old/callstack.tracy (0.6.2) {14343 KB} -> new/callstack.tracy (0.6.3) {17707 KB}  123.45% size change
old/callstack-bsd.tracy (0.6.2) {14551 KB} -> new/callstack-bsd.tracy (0.6.3) {12652 KB}  86.94% size change
old/callstack-linux.tracy (0.6.2) {6953 KB} -> new/callstack-linux.tracy (0.6.3) {7012 KB}  100.86% size change
old/callstack-lua.tracy (0.6.2) {20439 KB} -> new/callstack-lua.tracy (0.6.3) {25889 KB}  126.66% size change
old/chicken.tracy (0.6.2) {311549 KB} -> new/chicken.tracy (0.6.3) {293828 KB}  94.31% size change
old/color.tracy (0.6.2) {865 KB} -> new/color.tracy (0.6.3) {866 KB}  100.13% size change
old/crash.tracy (0.6.2) {130 KB} -> new/crash.tracy (0.6.3) {130 KB}  99.85% size change
old/crash2.tracy (0.6.2) {1403 KB} -> new/crash2.tracy (0.6.3) {1327 KB}  94.56% size change
old/ctx.tracy (0.6.2) {3207 KB} -> new/ctx.tracy (0.6.3) {3203 KB}  99.89% size change
old/ctx-android.tracy (0.6.2) {88240 KB} -> new/ctx-android.tracy (0.6.3) {86209 KB}  97.70% size change
old/ctx-big.tracy (0.6.2) {88702 KB} -> new/ctx-big.tracy (0.6.3) {87038 KB}  98.12% size change
old/darkrl.tracy (0.6.2) {15458 KB} -> new/darkrl.tracy (0.6.3) {14560 KB}  94.19% size change
old/darkrl2.tracy (0.6.2) {7824 KB} -> new/darkrl2.tracy (0.6.3) {7435 KB}  95.02% size change
old/darkrl-light-big.tracy (0.6.2) {259652 KB} -> new/darkrl-light-big.tracy (0.6.3) {234625 KB}  90.36% size change
old/darkrl-old.tracy (0.6.2) {66299 KB} -> new/darkrl-old.tracy (0.6.3) {61883 KB}  93.34% size change
old/dxtc-bad.tracy (0.6.2) {7078 KB} -> new/dxtc-bad.tracy (0.6.3) {7048 KB}  99.57% size change
old/frameimages.tracy (0.6.2) {206425 KB} -> new/frameimages.tracy (0.6.3) {203537 KB}  98.60% size change
old/frameimages-big.tracy (0.6.2) {1177638 KB} -> new/frameimages-big.tracy (0.6.3) {1150496 KB}  97.70% size change
old/gn-opengl.tracy (0.6.2) {28587 KB} -> new/gn-opengl.tracy (0.6.3) {27355 KB}  95.69% size change
old/gn-vulkan.tracy (0.6.2) {28553 KB} -> new/gn-vulkan.tracy (0.6.3) {27050 KB}  94.74% size change
old/long.tracy (0.6.2) {1152078 KB} -> new/long.tracy (0.6.3) {1124731 KB}  97.63% size change
old/mem.tracy (0.6.2) {1187810 KB} -> new/mem.tracy (0.6.3) {1187668 KB}  99.99% size change
old/messages-callstack.tracy (0.6.2) {8743 KB} -> new/messages-callstack.tracy (0.6.3) {8608 KB}  98.46% size change
old/multi.tracy (0.6.2) {7735 KB} -> new/multi.tracy (0.6.3) {7304 KB}  94.43% size change
old/new.tracy (0.6.2) {1101 KB} -> new/new.tracy (0.6.3) {1076 KB}  97.79% size change
old/q3bsp-mt.tracy (0.6.2) {912230 KB} -> new/q3bsp-mt.tracy (0.6.3) {849329 KB}  93.10% size change
old/q3bsp-st.tracy (0.6.2) {227162 KB} -> new/q3bsp-st.tracy (0.6.3) {221594 KB}  97.55% size change
old/raytracer.tracy (0.6.2) {1105411 KB} -> new/raytracer.tracy (0.6.3) {977307 KB}  88.41% size change
old/selfprofile.tracy (0.6.2) {196894 KB} -> new/selfprofile.tracy (0.6.3) {184351 KB}  93.63% size change
old/tbrowser.tracy (0.6.2) {8776 KB} -> new/tbrowser.tracy (0.6.3) {7997 KB}  91.13% size change
old/test.tracy (0.6.2) {40498 KB} -> new/test.tracy (0.6.3) {39751 KB}  98.15% size change
old/topology.tracy (0.6.2) {3733 KB} -> new/topology.tracy (0.6.3) {3739 KB}  100.16% size change
old/topology-android.tracy (0.6.2) {5292 KB} -> new/topology-android.tracy (0.6.3) {5177 KB}  97.82% size change
old/tracy-dynamic.tracy (0.6.2) {672684 KB} -> new/tracy-dynamic.tracy (0.6.3) {608221 KB}  90.42% size change
old/tracy-static.tracy (0.6.2) {2310589 KB} -> new/tracy-static.tracy (0.6.3) {2136791 KB}  92.48% size change
old/virtualfile_hc.tracy (0.6.2) {72169 KB} -> new/virtualfile_hc.tracy (0.6.3) {72142 KB}  99.96% size change
old/vk-mt.tracy (0.6.2) {10815 KB} -> new/vk-mt.tracy (0.6.3) {10714 KB}  99.07% size change
old/zfile_hc.tracy (0.6.2) {39065 KB} -> new/zfile_hc.tracy (0.6.3) {39063 KB}  100.00% size change
2020-01-26 17:16:46 +01:00
Bartosz Taudul
f2a226407f Store extra zone data separately.
Extra zone data consists of:
- custom zone name,
- zone text,
- zone callstack index.

If neither of these data values is stored in zone, 5 bytes are saved. If
any one of them is required, extra 4 bytes are added, for an index into
extra data array.

Memory savings:

android         2371 MB -> 2324 MB
big             7593 MB -> 6747 MB
chicken         1687 MB -> 1501 MB
drl-l-b         1119 MB -> 1013 MB
long            4289 MB -> 4190 MB
q3bsp-mt        4399 MB -> 3918 MB
q3bsp-st        1067 MB -> 1027 MB
raytracer       6057 MB -> 5342 MB
selfprofile     1177 MB -> 1079 MB
tracy-dynamic   4489 MB -> 4013 MB
tracy-static    16.2 GB -> 14.3 GB
2020-01-26 16:19:07 +01:00
Bartosz Taudul
3aa427fdd6 Remove queue delay adjustment due to zone text. 2020-01-26 15:09:12 +01:00
Bartosz Taudul
0e79b3ab27 Update NEWS. 2020-01-25 17:49:31 +01:00
Bartosz Taudul
35a2baf5ef GPU zones may also be inactive. 2020-01-25 17:44:46 +01:00
Bartosz Taudul
03656a4b0f Build release library. 2020-01-25 17:29:23 +01:00
Bartosz Taudul
885fa16373 Don't retrieve connection id, if zone is not active. 2020-01-25 17:21:30 +01:00
Bartosz Taudul
ced2d52043 Update CI config. 2020-01-25 17:18:59 +01:00
Bartosz Taudul
536e3e3da3 Add shared library project files. 2020-01-25 17:16:08 +01:00
Bartosz Taudul
3ffbb56fa0 Update manual. 2020-01-25 17:16:08 +01:00
Bartosz Taudul
aa94df0845 Replace rpmalloc_thread_initialize with InitRPMallocThread(). 2020-01-25 17:16:08 +01:00
Bartosz Taudul
ab2fbd6164 Move ParamaterSetup() implementation to header. 2020-01-25 16:51:17 +01:00
Bartosz Taudul
13370dc01c Hide RtlWalkFrameChain inside library. 2020-01-25 16:49:29 +01:00
Bartosz Taudul
ae5c446652 Update manual. 2020-01-25 16:36:58 +01:00
Bartosz Taudul
7c49f4e816 Remove TracyClientDLL.cpp. 2020-01-25 16:36:58 +01:00
Bartosz Taudul
a90004b983 Move Set/GetThreadName() to Tracy API. 2020-01-25 16:36:58 +01:00
Bartosz Taudul
2377911313 Package/core tooltip cosmetics. 2020-01-24 21:32:36 +01:00
Bartosz Taudul
c43bd2bfe2 Add dedicated function to check if zone has children. 2020-01-24 02:17:38 +01:00
Bartosz Taudul
ad3f37aec6 No need for two same check. 2020-01-24 02:16:29 +01:00
Bartosz Taudul
cf07e2564d Update manual. 2020-01-23 22:09:18 +01:00
Bartosz Taudul
443771ecff Only display zone "statistics" button if data is available. 2020-01-23 19:49:23 +01:00
Bartosz Taudul
f4fdd0221b Update NEWS. 2020-01-23 19:11:21 +01:00
Bartosz Taudul
1c1e5d5ee7 Don't check for invalid zones in source location data. 2020-01-23 19:10:56 +01:00
Bartosz Taudul
e31b529b4a Count zones at zone end, not zone begin.
This makes sure only finished, non-zero-timespan zones are counted in
the statistics.
2020-01-23 19:10:56 +01:00
Bartosz Taudul
23601904cd Replace one more end validity check. 2020-01-23 18:50:13 +01:00
Bartosz Taudul
54a767bf81 Use just sign bit to check end value validity. 2020-01-22 22:25:04 +01:00
Bartosz Taudul
9420234a98 Force inline flat_hash_map wrappers. 2020-01-22 22:06:35 +01:00
Bartosz Taudul
3010c7ce63 Don't use proxy for a pointer. 2020-01-22 01:54:25 +01:00
Bartosz Taudul
ea424a4c8d Use custom vector. 2020-01-20 23:58:36 +01:00
Bartosz Taudul
e5ae1ea2cc Only perform search, if necessary. 2020-01-20 23:53:04 +01:00
Bartosz Taudul
6de489a34f Use flat_hash_map for groups. 2020-01-20 23:43:10 +01:00
Bartosz Taudul
6358b4588d Preserve groups order. 2020-01-20 23:40:07 +01:00
Bartosz Taudul
6b3165d3cc Perform map lookup in one place. 2020-01-20 23:34:48 +01:00
Bartosz Taudul
38e7d12b0b Use parallel sort in find zone menu. 2020-01-20 23:21:43 +01:00
Bartosz Taudul
7d78923967 Move parallel sort header mumbo jumbo to a separate file. 2020-01-20 23:21:43 +01:00
Bartosz Taudul
6f31eb2a9d Disable MSVC idiocy. 2020-01-20 22:49:03 +01:00
Bartosz Taudul
46dc85c10c Fix parent identifier extension to 64 bits.
Source location identifiers are signed 16 bits. Extending this value to
64 bits without first casting it to unsigned 16 bit caused bit extension
of the sign bit, making the value clash with "unselected" group
identifier.
2020-01-19 15:25:45 +01:00
Bartosz Taudul
55d03cb03e Hide async queue setup/commit behind macros. 2020-01-19 15:06:11 +01:00
Bartosz Taudul
77a6039e54 Force inline small functions in LZ4. 2020-01-17 01:08:06 +01:00
Bartosz Taudul
cb501298b5 Make QueueDataSize constexpr. 2020-01-14 19:42:11 +01:00
Bartosz Taudul
25082b2bec Don't report CPU topology if delayed init is active.
Reporting topology requires producer to be available, which creates a
deadlock during delayed init data structures construction.

Calling GetProducer() results in a call to GetProfilerThreadData(),
which in turn calls GetProfilerData() to construct its thread local
variable. However, at this point we already are calling
GetProfilerData() (to construct the profiler itself). This would result
in an incorrect double construction of data, but the code already
prevents this by allowing init code to be entered only once. Hence the
deadlock.

Currently this is a non-issue, as no platform which can report CPU
topology needs to use delayed init.
2020-01-14 19:41:34 +01:00
Bartosz Taudul
d272e19bfa Update NEWS. 2020-01-14 02:07:04 +01:00
Bartosz Taudul
91f86ce5b5 Tid to pid mapping may be already known. 2020-01-14 02:06:36 +01:00
Bartosz Taudul
4f8eb53e8b Capture exact tid to pid mapping on windows. 2020-01-14 02:06:22 +01:00
Bartosz Taudul
3ecad1095a Update NEWS. 2020-01-10 02:13:44 +01:00
Bartosz Taudul
727ac634a9 Redraw window contents during resize. 2020-01-10 02:13:44 +01:00
Bartosz Taudul
5f48b08215 Move profiler render to a separate function. 2020-01-10 02:13:44 +01:00
Bartosz Taudul
2f718b57c1 Add TODO. 2020-01-08 18:16:17 +01:00
Bartosz Taudul
fa3811ef61 Bump date. 2020-01-02 00:33:01 +01:00
Bartosz Taudul
3a460d3183 Use _mm_pause() instead of std::this_thread::yield() if possible. 2019-12-31 14:59:54 +01:00
Bartosz Taudul
8b56386ccd Keep atomics on separate cache lines. 2019-12-31 14:46:01 +01:00
Bartosz Taudul
ede90710e5 Release 0.6.2. 2019-12-30 13:23:10 +01:00
Bartosz Taudul
f3d2ca29dd Update manual. 2019-12-30 12:57:44 +01:00
Bartosz Taudul
ed5f534bbb Use proper data type size. 2019-12-28 18:28:37 +01:00
Bartosz Taudul
1d97efa059 Update NEWS. 2019-12-28 18:17:34 +01:00
Bartosz Taudul
24727175d7 Implement copying user data location to clipboard. 2019-12-28 18:17:06 +01:00
Bartosz Taudul
9e76ddb9fd User data directory location getter. 2019-12-28 18:16:45 +01:00
Bartosz Taudul
36942a5ddb Resetting groups should also unselect group. 2019-12-28 18:03:30 +01:00
Bartosz Taudul
0ff6d6364b Don't operate on empty vector. 2019-12-28 18:03:13 +01:00
Bartosz Taudul
bb121b29e3 Update NEWS. 2019-12-28 17:51:50 +01:00
Bartosz Taudul
17afcbce4f Highlight zone from zone list on histogram. 2019-12-28 17:50:54 +01:00
Bartosz Taudul
086c9ce4c8 Update manual. 2019-12-28 17:43:20 +01:00
Bartosz Taudul
cdede08ad6 Update NEWS. 2019-12-28 17:43:13 +01:00
Bartosz Taudul
46ed2fb802 Enhance time distribution zone search.
When the find zone menu is opened from the zone time distribution list,
time range limit will be enabled, preventing display of zones outside of
the selected zone range. This allows much easier finding of child zones
within the selected zone scope.

Limiting zone search to a specified thread(s) might also be helpful
here, but it can be worked around by looking only at zones in a single
thread group.
2019-12-28 17:28:39 +01:00
Bartosz Taudul
e083b2773a Add icon to zone time range display. 2019-12-28 17:24:11 +01:00
Bartosz Taudul
870c9c734f Draw range limit highlight. 2019-12-28 17:23:09 +01:00
Bartosz Taudul
46f3390365 Allow limiting zone time range in find zone menu. 2019-12-28 17:18:07 +01:00
Bartosz Taudul
33e7d175d4 Add zone time range limit controls. 2019-12-28 15:57:07 +01:00
Bartosz Taudul
10f4dbef72 Update NEWS. 2019-12-28 15:49:17 +01:00
Bartosz Taudul
6c6b78bcbd Fix find zone selection time. 2019-12-28 15:48:45 +01:00
Alexander Bock
d1917e1856 Merged in abock/tracy/Alexander-Bock/removed-unused-parameter-for-the-case-wh-1577543619722 (pull request #43)
Removed unused parameter for the case when the TracyOpenGL.hpp is included by TRACY_ENABLE is not defined
2019-12-28 14:36:03 +00:00
Alexander Bock
ce8f10f2b1 Removed unused parameter for the case when the TracyOpenGL.hpp is included by TRACY_ENABLE is not defined 2019-12-28 14:33:47 +00:00
Bartosz Taudul
2b46e63c49 Update NEWS. 2019-12-25 15:08:30 +01:00
Bartosz Taudul
115c66324d Add inclusive time display to time distribution. 2019-12-25 15:07:52 +01:00
Bartosz Taudul
1032eb95d2 Set sane default window size for CPU data. 2019-12-19 17:45:55 +01:00
Bartosz Taudul
ee83c89fae Notify about cleanup stage during import. 2019-12-19 17:31:13 +01:00
Bartosz Taudul
f7f0ec0cec Fix memcpy from nullptr. 2019-12-19 17:30:37 +01:00
Bartosz Taudul
ef9bcb6696 Don't send query if no connection to client.
Fixes chrome import.
2019-12-19 17:23:46 +01:00
Bartosz Taudul
65d146ddca Allow checking if socket is valid. 2019-12-19 17:23:40 +01:00
Bartosz Taudul
02e3f45ed2 Update manual. 2019-12-18 13:39:12 +01:00
Bartosz Taudul
aba221dbb4 Update NEWS. 2019-12-18 13:33:27 +01:00
Bartosz Taudul
e8db86d092 Implement ZoneText() string merging. 2019-12-18 13:33:01 +01:00
Bartosz Taudul
30dbf48fda Update manual. 2019-12-16 21:55:14 +01:00
Dmitry Ivanov
a4d12c26c7 Make chrome:tracing importer support both array and object forms of input. 2019-12-16 21:52:23 +01:00
Bartosz Taudul
14e096d052 Set (some of) the required variables. 2019-12-16 20:42:24 +01:00
Bartosz Taudul
c5ce93136f Update manual. 2019-12-16 19:24:02 +01:00
Bartosz Taudul
0137404449 Invalid source location should be 0, not UINT16_MAX (?). 2019-12-16 19:11:23 +01:00
Bartosz Taudul
5f37faa6ad Update NEWS. 2019-12-16 18:56:11 +01:00
Bartosz Taudul
ab57bfbe73 Chrome:tracing importer implementation. 2019-12-16 18:55:21 +01:00
Bartosz Taudul
ced47f227c Add "import data" constructor to Worker. 2019-12-16 18:55:02 +01:00
Bartosz Taudul
95af214abe Prevent 0 ns timeline view range. 2019-12-16 18:40:38 +01:00
Bartosz Taudul
ccd20af7a1 Add nlohmann-json 3.7.3. 2019-12-16 00:11:00 +01:00
Bartosz Taudul
333f8a0da0 Copy update utility to import-chrome. 2019-12-15 23:28:19 +01:00
Bartosz Taudul
4eaa3d90dd Thread ids are no longer pthread_t. 2019-12-13 15:57:08 +01:00
Bartosz Taudul
4ef2ce4622 Fix _mm256_cvtsi256_si32 on gcc. 2019-12-12 02:13:12 +01:00
Bartosz Taudul
d4f4b73d56 Move socket buffer size from header to source file. 2019-12-08 23:14:48 +01:00
Bartosz Taudul
bb69b5fae2 Update manual. 2019-12-08 16:17:34 +01:00
Bartosz Taudul
9236f0eb8c Update manual. 2019-12-06 01:04:12 +01:00
Bartosz Taudul
129b80ef0f Free source location, if zone is not active. 2019-12-06 00:42:42 +01:00
Bartosz Taudul
0ba95a5b29 Update NEWS. 2019-12-06 00:29:02 +01:00
Bartosz Taudul
b9cdf2cbb7 Expose srcloc allocation in C API. 2019-12-06 00:25:52 +01:00
Bartosz Taudul
399b87fecc Add allocated srcloc zone begin emit functions to C API. 2019-12-06 00:22:49 +01:00
Bartosz Taudul
68ff33d0ba Extract source location allocation functionality. 2019-12-06 00:15:46 +01:00
Bartosz Taudul
e8fcc250a1 Report CPU topology on Linux. 2019-11-30 01:51:29 +01:00
Bartosz Taudul
208d155b96 Update manual. 2019-11-30 01:25:43 +01:00
Bartosz Taudul
cb21fe9eb6 Update NEWS. 2019-11-29 23:10:16 +01:00
Bartosz Taudul
0507460a16 Add markers for package and core thread migrations. 2019-11-29 23:09:30 +01:00
Bartosz Taudul
9cf7629e9b Display CPU package, core tooltip for zone CPU list. 2019-11-29 23:09:11 +01:00
Bartosz Taudul
7ac47ab6bb Display package, core info on CPU data timeline. 2019-11-29 22:57:18 +01:00
Bartosz Taudul
db3e802643 Build reverse CPU topology map. 2019-11-29 22:46:57 +01:00
Bartosz Taudul
712403e9fd Transfer, display, save CPU topology data. 2019-11-29 22:41:41 +01:00
Bartosz Taudul
5148ecdc50 Update NEWS. 2019-11-29 18:30:41 +01:00
Bartosz Taudul
1f8077fd21 Update AUTHORS. 2019-11-29 18:30:12 +01:00
Bartosz Taudul
59371eef5a Obtain CPU topology on windows. 2019-11-29 18:29:31 +01:00
Michał Cichoń
1a0cb07319 Merged in thedmd86/tracy/feature/libbacktrace-mach-o-support (pull request #42)
libbacktrace: Add support for Mach-O (dSYM)
2019-11-29 11:10:31 +00:00
thedmd
a1e2c533f6 libbacktrace: Add support for Mach-O (dSYM)
`macho.cpp` was backported from official libbacktrace repository.
2019-11-29 12:04:47 +01:00
Bartosz Taudul
5b83ae4381 Release 0.6.1. 2019-11-28 23:24:15 +01:00
Bartosz Taudul
2aad20e2f7 Update NEWS. 2019-11-27 01:43:49 +01:00
Bartosz Taudul
ade01e98ad Add time-relative switch to message list in zone info window. 2019-11-27 01:42:44 +01:00
Bartosz Taudul
3e47fd6bcd Display thread color boxes in messages list. 2019-11-27 01:38:47 +01:00
Bartosz Taudul
a7d2d5f08b Fix DeferItem() call. 2019-11-26 01:10:50 +01:00
Bartosz Taudul
28a9800eb7 Update manual. 2019-11-26 00:57:25 +01:00
Bartosz Taudul
cc87cebee3 Display trace parameters only when the connection is active. 2019-11-26 00:57:06 +01:00
Bartosz Taudul
7963617f86 Update NEWS. 2019-11-26 00:00:35 +01:00
Bartosz Taudul
4551553eb4 Implement setting client parameters from server. 2019-11-25 23:59:48 +01:00
Bartosz Taudul
c5c9dfb0c9 Native callstacks are now optional in allocated callstack messages. 2019-11-25 22:54:10 +01:00
Bartosz Taudul
c6b8c6a3a6 Update ImGui to 1.74. 2019-11-25 22:36:54 +01:00
Bartosz Taudul
f500f6e837 Update NEWS. 2019-11-24 01:42:40 +01:00
Bartosz Taudul
ace780ea74 Add CPU data thread highlight. 2019-11-24 01:40:32 +01:00
Bartosz Taudul
07e69a88c2 Fix CPU thread highlight for Vulkan zones. 2019-11-24 01:28:23 +01:00
Bartosz Taudul
2379d422cf GPU zones highlight whole CPU thread timeline. 2019-11-24 01:21:49 +01:00
Bartosz Taudul
7f689dadbe Use backspace icon for buttons erasing text entry. 2019-11-24 01:06:04 +01:00
Bartosz Taudul
1cc5dea616 Ignore BSD tracy-specific callstack frames. 2019-11-21 21:51:58 +01:00
Bartosz Taudul
7a9dcbb572 Update NEWS. 2019-11-21 21:49:03 +01:00
Bartosz Taudul
fb6a92380d Drop support for pre-v0.5 traces. 2019-11-21 21:48:35 +01:00
Bartosz Taudul
b749e816e9 Update NEWS. 2019-11-21 21:35:48 +01:00
Bartosz Taudul
0a6fd9ef5f Link with execinfo on FreeBSD. 2019-11-21 20:41:57 +01:00
Bartosz Taudul
3e4913dc8a Reuse socket address also on BSD. 2019-11-21 20:41:57 +01:00
Bartosz Taudul
37eef59d54 Implement reading sys time on BSD. 2019-11-21 20:41:57 +01:00
Bartosz Taudul
c7a22cc1ff Use libbacktrace on BSD. 2019-11-21 20:41:57 +01:00
Bartosz Taudul
bf778e2989 Add support for BSD in libbacktrace.
https://gcc.gnu.org/ml/gcc-patches/2018-03/msg00068.html
2019-11-21 20:41:57 +01:00
Bartosz Taudul
d1fb639b78 BSD needs libexecinfo for callstack capture. 2019-11-21 02:37:11 +01:00
Bartosz Taudul
bd7b0a8197 Support callstack capture on BSD. 2019-11-21 02:34:42 +01:00
Bartosz Taudul
3c4a7463b8 Retrieve proper thread ids on BSD. 2019-11-21 02:29:17 +01:00
Bartosz Taudul
c79449a6a1 Get proper program name on BSD. 2019-11-21 02:16:12 +01:00
Bartosz Taudul
7940977dba Report physical memory size on BSD. 2019-11-21 02:14:08 +01:00
Bartosz Taudul
3282360382 Bind on both IPv6 and IPv4 on BSD. 2019-11-21 02:03:32 +01:00
Bartosz Taudul
a9cd5b331f BSD needs netinet/in.h for struct sockaddr_in and friends. 2019-11-21 01:28:48 +01:00
Bartosz Taudul
31b6ff4bae Release 0.6.0. 2019-11-17 19:56:42 +01:00
Bartosz Taudul
3854ae11b2 Revert "Remove dead code."
This reverts commit a36b73f745.
2019-11-17 17:38:02 +01:00
Bartosz Taudul
c62732804a Round CPU data font height. 2019-11-17 14:30:10 +01:00
Bartosz Taudul
1e29d12819 More saturation in dynamic colors. 2019-11-16 23:03:46 +01:00
Bartosz Taudul
8ca67e49e4 Scale frame images in tooltips according to DPI scaling. 2019-11-16 22:58:51 +01:00
Bartosz Taudul
2d22372de3 Scale playback contents according to DPI scale. 2019-11-16 22:54:52 +01:00
Bartosz Taudul
37a658d933 Add srcloc color box to zone tooltips. 2019-11-16 22:38:43 +01:00
Bartosz Taudul
a36b73f745 Remove dead code. 2019-11-16 18:34:05 +01:00
Bartosz Taudul
134f108a30 Update NEWS. 2019-11-16 17:58:04 +01:00
Bartosz Taudul
f670d82796 Show migrations when thread is hovered in CPU data window. 2019-11-16 16:51:56 +01:00
Bartosz Taudul
2318bc8553 Pre-v0.5 support will be dropped after v0.6. 2019-11-16 16:38:18 +01:00
Bartosz Taudul
41f9dc0aa1 Cosmetics. 2019-11-16 16:37:08 +01:00
Bartosz Taudul
d9f71643ac Lock event time is known, don't reconstruct it. 2019-11-15 22:50:08 +01:00
Bartosz Taudul
a46731996d Thread list size is known from iteration. 2019-11-15 22:44:44 +01:00
Bartosz Taudul
db930f7f93 Reserve space for thread map, list. 2019-11-15 22:44:36 +01:00
Bartosz Taudul
18fd928a9d Don't display callstack message column if there are no callstacks. 2019-11-15 20:34:19 +01:00
Bartosz Taudul
31e1558467 Use standard includes. 2019-11-15 20:17:55 +01:00
Bartosz Taudul
d7d6a0fa9d More consistent srcloc/thread colors in zone info windows. 2019-11-15 20:13:13 +01:00
Bartosz Taudul
a15e83e590 Use default zone coloring for Lua zones. 2019-11-15 20:07:01 +01:00
Bartosz Taudul
49e3bc8b21 Don't draw unneeded separator. 2019-11-15 20:04:59 +01:00
Bartosz Taudul
5f0cab6b63 Display call stack calls in memory allocation window. 2019-11-15 20:02:21 +01:00
Bartosz Taudul
973fd941d5 Extract call stack calls drawing functionality. 2019-11-15 19:59:13 +01:00
Bartosz Taudul
a518564006 No extended font in no-extended-font path. 2019-11-15 19:58:50 +01:00
Bartosz Taudul
8fd019b474 Update manual. 2019-11-15 01:37:19 +01:00
Bartosz Taudul
26ab2f633f Update NEWS. 2019-11-15 01:23:01 +01:00
Bartosz Taudul
12037b88ff Display messages callstack in messages list. 2019-11-15 01:22:26 +01:00
Bartosz Taudul
49945c7198 Process message callstacks. 2019-11-15 01:22:26 +01:00
Bartosz Taudul
60ae748635 Add C API no-callstack redirect macros. 2019-11-14 23:53:35 +01:00
Bartosz Taudul
95e3fb1663 Add missing C API empty macros. 2019-11-14 23:51:01 +01:00
Bartosz Taudul
9f53150a6a Update macros. 2019-11-14 23:50:52 +01:00
Bartosz Taudul
8286b0b72f Plumbing for message call stacks. 2019-11-14 23:40:41 +01:00
Bartosz Taudul
0befc75f83 Fix conflicts with X.h. 2019-11-14 18:24:29 +01:00
Bartosz Taudul
9cf46e6ae6 Fix lock time announce/terminate in older traces. 2019-11-13 02:04:35 +01:00
Bartosz Taudul
ce997cf6b0 Update manual. 2019-11-11 22:11:26 +01:00
Bartosz Taudul
f7ff0781b6 Properly set background done state in no-statistics builds. 2019-11-11 00:20:33 +01:00
Bartosz Taudul
b946c1d39e Only enable magic fitted vectors in no-statistics builds.
Source location zones pointer fixup is just too slow to be feasible.

Note: no-statistics builds of the graphical profiler don't perform fixup
of view-related pointers (e.g. zone info window zone pointer). This
won't cause crashes, because the pointers are still valid, but the
displayed data will be incorrect and potentially changing in time, as
the pointer can be reused for completely other zone.

Memory usage of ToyPathTracer data, in various scenarios:

Capture + statistics:   7121 MB
Load + statistics:      6057 MB
Capture - statistics:   4876 MB
Load - statistics:      4521 MB
2019-11-11 00:20:33 +01:00
Bartosz Taudul
e1e3bbbe3e Fixup source location zones pointers. 2019-11-11 00:20:33 +01:00
Bartosz Taudul
ae33aa4869 Fitted zone vectors are now magic vectors.
The pointed-to zones in the original children vector can't be freed, so
they are put into a zone pool for re-use by future zones.
2019-11-11 00:20:33 +01:00
Bartosz Taudul
4f962d2fcc Add ZoneEvent re-use pool. 2019-11-11 00:20:33 +01:00
Bartosz Taudul
85ae52b725 Update manual. 2019-11-10 23:30:49 +01:00
Bartosz Taudul
fd7ad586af Make display of zone time in frames toggleable.
And disable it by default, as it can be heavy on resources.
2019-11-10 23:27:37 +01:00
Bartosz Taudul
fa53c2e683 Don't care about memory usage tracking data races. 2019-11-10 19:21:24 +01:00
Bartosz Taudul
9504d6c68f Don't try to delete empty Vectors. 2019-11-10 17:54:50 +01:00
Bartosz Taudul
f2801491bf Don't copy back pointer. 2019-11-10 17:48:54 +01:00
Bartosz Taudul
44f1d3dc1c Use proper memory ordering. 2019-11-10 17:30:38 +01:00
Bartosz Taudul
d4a1168491 Messages are inserted for current thread context. 2019-11-10 17:23:04 +01:00
Bartosz Taudul
003bed573c Use ThreadData cache in zone validation. 2019-11-10 17:20:55 +01:00
Bartosz Taudul
b1c88cd1f2 Cache ThreadData pointer for current thread context. 2019-11-10 17:17:07 +01:00
Bartosz Taudul
ded49edf4c Fix magic vectors in single-threaded Vulkan tooltip. 2019-11-10 16:50:19 +01:00
Bartosz Taudul
672093cf0e Adapt WriteTimeline() to magic vectors. 2019-11-10 16:34:38 +01:00
Bartosz Taudul
4eb8acc973 Magic vectors in automatic GPU drift detection. 2019-11-10 02:27:46 +01:00
Bartosz Taudul
1b6c79fa7b More magic vector fixes. 2019-11-10 02:10:21 +01:00
Bartosz Taudul
226a7b7cfb Magic vectors in GPU children list. 2019-11-10 02:03:31 +01:00
Bartosz Taudul
c65d524725 Magic vectors in GPU zone info window. 2019-11-10 01:59:20 +01:00
Bartosz Taudul
d32e3cb867 Adapt GPU zone utility functions to magic vectors. 2019-11-10 01:56:28 +01:00
Bartosz Taudul
9b52152e77 Adapt GetZoneEnd() for magic vectors. 2019-11-10 01:43:28 +01:00
Bartosz Taudul
7c277234e7 Load GPU zones into magic vectors. 2019-11-10 01:36:13 +01:00
Bartosz Taudul
4ed4e1005c Magic vectors in GPU drawing setup. 2019-11-10 01:35:57 +01:00
Bartosz Taudul
675e6a8d1a Support magic vectors for GPU zones. 2019-11-10 01:30:10 +01:00
Bartosz Taudul
06ad948abc Adapt zone children to magic vectors. 2019-11-10 01:23:44 +01:00
Bartosz Taudul
50efa8f672 Adapt time distribution calculation to magic vectors. 2019-11-10 01:08:15 +01:00
Bartosz Taudul
0c1f3ac16d Adapt zone getters to magic vectors. 2019-11-10 00:57:44 +01:00
Bartosz Taudul
f8edd3a37b Zone statistics reconstructions has to use magic vectors. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
065ba4ce5a Load zones into magic vectors. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
8ab2cf09b7 Handle magic vectors during dispatch. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
60c2b53d47 Add magic field to Vector. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
7be19193d9 Use adapters during zone level iteration. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
85e7125fee Add Vector iterator adapters. 2019-11-10 00:00:40 +01:00
Bartosz Taudul
40e9c8807d Remove unused lambda capture. 2019-11-10 00:00:15 +01:00
Bartosz Taudul
3a317c81c6 Fix logic error. 2019-11-09 23:57:08 +01:00
Bartosz Taudul
b3698ebb0f Merge read calls. 2019-11-09 00:48:20 +01:00
Bartosz Taudul
3e65532eaa Add Read3(), Read4() helpers. 2019-11-09 00:27:49 +01:00
Bartosz Taudul
2131eed4e7 Support multiple types in Read2(). 2019-11-09 00:25:12 +01:00
Bartosz Taudul
e80a19234e Don't store and read compressed thread. 2019-11-09 00:23:09 +01:00
Bartosz Taudul
467d675262 Zone reads can be merged. 2019-11-09 00:08:26 +01:00
Bartosz Taudul
23c59a6fc9 Use query cache. 2019-11-08 23:59:20 +01:00
Bartosz Taudul
ec895372b7 Thread is not needed in ReadTimeline(). 2019-11-08 23:56:11 +01:00
Bartosz Taudul
6ec734c264 Split ReadTimelineUpdateStatistics(). 2019-11-08 23:53:43 +01:00
Bartosz Taudul
c20da5ea70 Move unimportant fields to back of FileRead class. 2019-11-08 23:31:17 +01:00
Bartosz Taudul
31e2bc1141 Free Vector's memory during move assignment. 2019-11-08 22:52:23 +01:00
Bartosz Taudul
a1488a74a1 Perform Vector's swap() as a bitwise move. 2019-11-08 22:50:22 +01:00
Bartosz Taudul
b6213cfbc5 Define Vector's max capacity in one place. 2019-11-08 22:48:44 +01:00
Bartosz Taudul
4b0654afe5 Update manual. 2019-11-07 23:59:12 +01:00
Bartosz Taudul
5df7444cbb Replace djb hash with xxh3. 2019-11-07 23:52:52 +01:00
Bartosz Taudul
17ee1aed5f Add xxhash.
https://github.com/Cyan4973/xxHash/tree/master
e2f4695899e831171ecd2e780078474712ea61d3
2019-11-07 23:52:12 +01:00
Bartosz Taudul
4a9138fc51 Reduce FrameEvent size by 4 bytes.
While it would be nice to store frame times on 48 bytes, it is not
currently possible, as older traces have full 64 bit frame time stamps,
which are only then offset to first frame start time.
2019-11-07 23:05:13 +01:00
Bartosz Taudul
77a449a8f0 Update manual. 2019-11-07 22:37:11 +01:00
Bartosz Taudul
675cbc51cc Store memory free indices as 32 bit.
More than 4 billion memory events seems unlikely.

Memory savings in "mem" trace: 5747 MB -> 5427 MB.
2019-11-07 22:36:51 +01:00
Bartosz Taudul
655864eb7c Enable crash handler on cygwin.
Crash is properly recorded, but the profiler hangs while waiting for
shutdown finish.
2019-11-07 19:20:13 +01:00
Bartosz Taudul
3fd74a92f9 Native threads are used on mingw. 2019-11-07 19:02:54 +01:00
Bartosz Taudul
0f6101b19a Fix mingw/cygwin thread name setter/getter. 2019-11-07 18:58:08 +01:00
Bartosz Taudul
bb2d44ae08 All time deltas must be processed. 2019-11-07 16:14:23 +01:00
Bartosz Taudul
351e220d30 Don't calculate queue delay if delayed init is used.
Queue calibration requires queue access during profiler construction. This
in turn requires construction of profiler data block, *which at this point
is underway*, because the profiler is being constructed.
2019-06-19 17:29:04 +02:00
Bartosz Taudul
c98f1f0b6b Make sure profiler is initialized only once in delayed init scenario. 2019-06-19 17:28:18 +02:00
Bartosz Taudul
9702461b09 Display elapsed time in capture utility. 2019-11-07 01:51:45 +01:00
Bartosz Taudul
ea2c329510 Input data *must not* be changed.
Not even for a short moment.
2019-11-07 01:29:11 +01:00
Bartosz Taudul
4a4fe82a1b No need to inject string terminator.
Comparison in m_data.stringMap already takes string size into account,
as an charutil::StringKey optimization.
2019-11-07 01:28:29 +01:00
Bartosz Taudul
dfad9695d2 Compress frame image data right as it arrives.
This removes the need to store temporary uncompressed image buffers,
which involves constant memory allocation and freeing. Instead, just one
permanent buffer is used, and only because the input data cannot change
during processing.
2019-11-06 23:29:59 +01:00
Bartosz Taudul
46d33f45bf Frame image packer doesn't care about width and height. 2019-11-06 22:53:01 +01:00
Bartosz Taudul
10a3516099 Delete uncompressed frame image data. 2019-11-06 22:38:19 +01:00
Bartosz Taudul
d741fb0af9 Plot can be empty if it was only configured. 2019-11-06 12:08:20 +01:00
Bartosz Taudul
d4f58ddaf3 Use native windows threads on cygwin, mingw. 2019-11-06 01:42:14 +01:00
Bartosz Taudul
df0e28a61f Remove more unneeded includes. 2019-11-06 01:37:58 +01:00
Bartosz Taudul
3abdd7cdaf Remove LZ4 include from TracyProtocol.hpp. 2019-11-06 01:30:20 +01:00
Bartosz Taudul
f53637891a Remove LZ4 include from TracyWorker.hpp. 2019-11-06 01:25:38 +01:00
Bartosz Taudul
5d3392428e Remove unneeded includes. 2019-11-06 01:21:22 +01:00
Bartosz Taudul
6015c964a9 Enable LZ4 fast decompression loop on MSVC. 2019-11-05 22:00:13 +01:00
Bartosz Taudul
ca198e44d3 Remove dead code from concurrentqueue. 2019-11-05 21:40:52 +01:00
Bartosz Taudul
b5590ed197 Include <mutex> for std::once. 2019-11-05 21:40:35 +01:00
Bartosz Taudul
3e9bb80217 More header cleanup. 2019-11-05 20:15:53 +01:00
Bartosz Taudul
6bbf273581 Partial header inclusion cleanup. 2019-11-05 20:09:40 +01:00
Bartosz Taudul
25c39a3311 Update manual. 2019-11-05 18:16:58 +01:00
Bartosz Taudul
c558a9a436 Update NEWS. 2019-11-05 18:10:32 +01:00
Bartosz Taudul
cfce429fca Format plot values according to requested formatting. 2019-11-05 18:08:42 +01:00
Bartosz Taudul
661c4a417b Process and store plot value formatting. 2019-11-05 18:02:08 +01:00
Bartosz Taudul
907574e637 Allow remote plot configuration. 2019-11-05 17:45:19 +01:00
Bartosz Taudul
a7a739eea9 Use precalculated context switch usage data. 2019-11-05 01:41:27 +01:00
Bartosz Taudul
51090e5fb9 Implement ctx switch usage reconstruction. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
8128b3894a Add vector debug macro.
Natvis is lacking in functionality, so this has to do.
2019-11-05 01:28:44 +01:00
Bartosz Taudul
946e328198 Fix 32-bit short_ptr. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
6a500ccdb3 Don't display CPU usage until data is ready. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
50b96c757e Context switch usage reconstruction skeleton. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
a62c4135ad Add context switch usage struct. 2019-11-05 01:28:44 +01:00
Bartosz Taudul
09d6f3f917 Check if CPU graph is not obscured. 2019-11-04 01:15:49 +01:00
Bartosz Taudul
9bc6a3e0ee Add zone color boxes to parent groups in find zone menu. 2019-11-03 22:52:24 +01:00
Bartosz Taudul
68bc82c11b Simplify README. 2019-11-03 22:45:30 +01:00
Bartosz Taudul
9034c9d9e6 Update profiler screenshot. 2019-11-03 22:39:14 +01:00
Bartosz Taudul
209c1fdc72 Small radio buttons in find zone menu. 2019-11-03 22:32:34 +01:00
Bartosz Taudul
f34609fd9b Set per-cpu kernel buffer size to 512 KB.
The default setting was causing events to be lost on Android.
2019-11-03 21:52:20 +01:00
Bartosz Taudul
b8d459d48b Use proper string size (for consistency).
On Android code path this value is ignored.
2019-11-03 21:51:49 +01:00
Bartosz Taudul
9b5ec8451f Remove dead assignment. 2019-11-03 16:57:31 +01:00
Bartosz Taudul
dfc35c1bf1 Fix crashes when callstack frames are not yet available. 2019-11-03 16:44:26 +01:00
Bartosz Taudul
5620597fb4 Use short ptr in VarArray. 2019-11-03 16:29:45 +01:00
Bartosz Taudul
390558b627 Update memory requirements. 2019-11-03 16:29:45 +01:00
Bartosz Taudul
1b33bfd522 Update manual. 2019-11-03 16:29:45 +01:00
Bartosz Taudul
d9c3238462 Save 2 bytes per PlotItem.
Memory savings:

android     2614 MB -> 2487 MB (95%)
chicken     1932 MB -> 1852 MB (95%)
mem         6067 MB -> 5747 MB (94%)
q3bsp-mt    5059 MB -> 5017 MB (99%)
q3bsp-st    1211 MB -> 1171 MB (96%)
2019-11-03 16:29:45 +01:00
Bartosz Taudul
29dcc5c8bc Don't zero-initialize Int48. 2019-11-03 14:33:13 +01:00
Bartosz Taudul
acce6867f1 Selecting a zone in time distribution list opens zone statistics. 2019-11-03 03:08:23 +01:00
Bartosz Taudul
13a7444f03 Add zone color boxes to time distribution table. 2019-11-02 23:14:49 +01:00
Bartosz Taudul
c294e62f5e Add zone color boxes to child zone list. 2019-11-02 23:11:37 +01:00
Bartosz Taudul
1a6f04f6ce Add zone color boxes to zone trace. 2019-11-02 23:05:11 +01:00
Bartosz Taudul
3a304ad054 Add zone color boxes to statistics menu. 2019-11-02 23:00:42 +01:00
Bartosz Taudul
04cb7732b8 Add zone color boxes to compare menu. 2019-11-02 22:58:50 +01:00
Bartosz Taudul
4dde1ca070 Add zone color boxes to find zone menu. 2019-11-02 22:48:00 +01:00
Bartosz Taudul
b7cd28ef72 Add source location color retriever. 2019-11-02 22:45:11 +01:00
Bartosz Taudul
8d7299fe1f Get 64-bit file size. 2019-11-02 22:11:40 +01:00
Bartosz Taudul
4bc1588a5e Clear proper vector. 2019-11-02 16:57:18 +01:00
Bartosz Taudul
ce82bb816b Use short ptr for find zone grouping data.
Overall, the short ptr changes have the following effect on memory
usage:

big         9007 MB -> 8670 MB (96%)
chicken     2007 MB -> 1932 MB (96%)
drl-l-b     1383 MB -> 1304 MB (94%)
q3bsp-mt    5252 MB -> 5059 MB (96%)
long        5152 MB -> 4799 MB (93%)
fi-big      4141 MB -> 4000 MB (96%)
2019-11-02 16:54:12 +01:00
Bartosz Taudul
0df29d1e0b Use short ptr for source location payload data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
04c92f8d19 Use short ptr for callstack payload storage. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
b0e52f20f8 Use short ptr for FrameImage storage. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
72efbe28ed Use short ptr for message data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
52062f96d0 Use short ptr for GPU context map. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
308c280e40 Use short ptr for GPU context query data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
1e4022e05b Use proper comparison. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
03656b2320 Remove unused variable. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
a40bbacb17 Use short ptr for CPU zone data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
cb20bf01f9 Use short ptr for GPU zone data. 2019-11-02 16:54:11 +01:00
Bartosz Taudul
c7664b0a98 Use short ptr in LockEventPtr. 2019-11-02 16:17:45 +01:00
Bartosz Taudul
181d16459c Use short ptr for Vector data. 2019-11-02 16:17:45 +01:00
Bartosz Taudul
ea23d2b91a Use short ptr for frame images. 2019-11-02 15:43:32 +01:00
Bartosz Taudul
2a28c6cc72 Use short ptr for callstack frame data. 2019-11-02 15:43:32 +01:00
Bartosz Taudul
654f54d877 Add short pointer class, storing 6 bytes. 2019-11-02 15:43:32 +01:00
Bartosz Taudul
45ff14d678 Fix saving source location payload data. 2019-11-02 14:28:59 +01:00
Bartosz Taudul
16bc862904 Save sizes of children vectors to prevent reallocation. 2019-11-02 12:38:32 +01:00
Bartosz Taudul
c99dc5c431 Disable SetGpuStart() assert for compat with old traces.
Currently the unknown GPU start is indicated by a -1 value, but it was
maximum int value previously. While the assert check is valid for newly
created traces, it will fire off if an older trace is loaded.

Temporarily disabling the check (effectively until only 0.6 traces are
supported) fixes the problem, as the max int value (0x7f...) has its
high bits removed and the low bytes will be sign extended during number
reconstruction, making it -1, as intended.
2019-11-02 02:41:51 +01:00
Bartosz Taudul
a9738deae7 Update manual. 2019-11-01 20:49:02 +01:00
Bartosz Taudul
b4103c56a5 Update NEWS. 2019-11-01 20:49:02 +01:00
Bartosz Taudul
0552d75400 Allow filtering entries in statistics menu. 2019-11-01 20:49:02 +01:00
Bartosz Taudul
5ff40b05b3 Update manual. 2019-11-01 20:29:02 +01:00
Bartosz Taudul
f88ec0c141 Convert namespaces combo box to radio buttons. 2019-11-01 20:23:22 +01:00
Bartosz Taudul
13b656fe61 Make srcloc dynamic color depend on function name. 2019-11-01 20:17:25 +01:00
Bartosz Taudul
ca0fae33d1 Remove obsolete assert.
Before-terminate-events now include events that have time delta
processing, with no memory to free.
2019-11-01 20:10:24 +01:00
Bartosz Taudul
2d46b50dd0 Update manual. 2019-11-01 02:13:02 +01:00
Bartosz Taudul
bb25c82110 Update NEWS. 2019-11-01 02:08:47 +01:00
Bartosz Taudul
d38257ea90 Add zone coloring mode based on source location. 2019-11-01 02:07:55 +01:00
Bartosz Taudul
39988ad636 Check for shutdown in background processing thread. 2019-10-31 21:41:21 +01:00
Bartosz Taudul
6a6009dbdf Update manual. 2019-10-31 15:00:35 +01:00
Bartosz Taudul
190dd456a7 Update NEWS. 2019-10-31 15:00:35 +01:00
Bartosz Taudul
978071f2ba Allow grouping zones by parent. 2019-10-31 15:00:22 +01:00
Bartosz Taudul
c0df3dd965 Implement getting zone parent when thread id is known. 2019-10-31 14:59:52 +01:00
Bartosz Taudul
456deefdbc Keep child idx on stack. 2019-10-30 23:55:21 +01:00
Bartosz Taudul
25b610a36f Pack child into GPU start/end in GpuEvent (saves 4 bytes).
long    5152 MB -> 5061 MB
2019-10-30 23:50:37 +01:00
Bartosz Taudul
7319293081 Use proper scale for next time of collapsed items. 2019-10-30 23:17:46 +01:00
Bartosz Taudul
e8286600d1 Use -1 as invalid GPU start time. 2019-10-30 23:12:43 +01:00
Bartosz Taudul
7ce8c772ad Disallow negative GPU times.
Shouldn't happen, but GPU timestamps are a shitshow, so better be safe
than sorry.
2019-10-30 22:37:07 +01:00
Bartosz Taudul
0ac432dd25 Better GPU time check. 2019-10-30 22:35:58 +01:00
Bartosz Taudul
ae4794ab4c Save 2 bytes in ContextSwitchData and ContextSwitchCpu. 2019-10-30 22:25:46 +01:00
Bartosz Taudul
99d198d0bf Pack csAlloc in MemEvent (saves 3 bytes).
Memory usage change on selected traces:

android     2699 MB -> 2613 MB
chicken     2019 MB -> 2007 MB
mem         6308 MB -> 6068 MB
q3bsp-mt    5283 MB -> 5252 MB
q3bsp-st    1241 MB -> 1211 MB
2019-10-30 22:01:13 +01:00
Bartosz Taudul
94da3b8467 Update manual. 2019-10-29 23:11:08 +01:00
Bartosz Taudul
d54ff0f9c2 Update NEWS. 2019-10-29 23:11:03 +01:00
Bartosz Taudul
1f0c18882c Don't collect sys time after application has exited. 2019-10-29 23:05:14 +01:00
Bartosz Taudul
079e21ea43 Leave two threads for smooth operation of profiler. 2019-10-29 22:53:03 +01:00
Bartosz Taudul
3e19fbc2fb Instrument functions. 2019-10-29 22:45:30 +01:00
Bartosz Taudul
516ec6883d Limit number of rendered frames. 2019-10-29 22:45:01 +01:00
Bartosz Taudul
5bcf288333 Integrate Tracy. 2019-10-29 22:27:04 +01:00
Bartosz Taudul
546eeda1cd Ignore compiled shaders. 2019-10-29 22:25:10 +01:00
Bartosz Taudul
0b1eff8b0d Add aras-p's ToyPathTracer.
https://github.com/aras-p/ToyPathTracer
b076563906169aa2f9e6d7218ef85decf81f8f72
2019-10-29 22:21:34 +01:00
Bartosz Taudul
789b95f259 Force inline small functions. 2019-10-29 01:32:09 +01:00
Bartosz Taudul
8c8f15c420 Force inline Slab::AllocInit(). 2019-10-29 01:19:40 +01:00
Bartosz Taudul
0ceba49d78 Update NEWS. 2019-10-28 23:44:54 +01:00
Bartosz Taudul
706e031046 Update manual. 2019-10-28 23:43:44 +01:00
Bartosz Taudul
6f0dc2885f Fix connection abort. 2019-10-28 23:32:51 +01:00
Bartosz Taudul
8050622b0f Read and decompress network data on a separate thread. 2019-10-28 23:22:50 +01:00
Bartosz Taudul
e0356ae01e Cosmetics. 2019-10-28 22:53:06 +01:00
Bartosz Taudul
99b7e8ad92 Close socket when shutting down. 2019-10-28 22:52:52 +01:00
Bartosz Taudul
788ca2e5df Spawn no-op network thread. 2019-10-28 22:45:10 +01:00
Bartosz Taudul
fb71800557 Update manual. 2019-10-28 22:15:12 +01:00
Bartosz Taudul
106411e1f6 Add missing freeaddrinfo(). 2019-10-27 13:39:01 +01:00
Bartosz Taudul
5956366118 Need to explicitly specify gl3w as OpenGL loader. 2019-10-27 12:45:31 +01:00
Bartosz Taudul
7f07f5beb4 Free child time stack. 2019-10-26 23:32:16 +02:00
Bartosz Taudul
312b7190f8 Mention that only release builds should be profiled. 2019-10-26 16:59:54 +02:00
Bartosz Taudul
f024a05a01 Document another funny optimization. 2019-10-26 16:49:52 +02:00
Bartosz Taudul
01985f50ef Cache source location zones counter search. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
dfe99c2604 Update capture utility in the manual. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
f1fe2df780 Add data transferred display to capture utility. 2019-10-26 16:18:03 +02:00
Bartosz Taudul
dda192985a General updates to the manual. 2019-10-26 16:05:43 +02:00
Bartosz Taudul
0b142a7b29 Remove FAQ. 2019-10-26 15:33:40 +02:00
Bartosz Taudul
492b7f9134 Update connection speed in the manual. 2019-10-26 14:37:45 +02:00
Bartosz Taudul
6aab54cfc4 Improve frame time graph in the manual. 2019-10-26 14:10:47 +02:00
Bartosz Taudul
f7155d7a77 Update context switches in the manual. 2019-10-26 14:00:32 +02:00
Bartosz Taudul
cccabe9b64 Update connection popup in the manual. 2019-10-26 13:54:57 +02:00
Bartosz Taudul
1d0084aa28 Add cache for last accessed source location zones. 2019-10-25 21:29:55 +02:00
Bartosz Taudul
b5419944aa Only write to memory if value has changed. 2019-10-25 21:28:55 +02:00
Bartosz Taudul
779063a18b Cache last shrinked source location. 2019-10-25 21:07:28 +02:00
Bartosz Taudul
294793367f Cache last CheckSourceLocation query.
Just knowing that the query was performed is enough here -- this
function adds a new source location entry, if there already isn't one.
2019-10-25 21:01:33 +02:00
Bartosz Taudul
0f2503d334 Send time deltas in GPU time events. 2019-10-25 19:52:01 +02:00
Bartosz Taudul
1ce25d3aef Init cache in-place. 2019-10-25 19:19:35 +02:00
Bartosz Taudul
8fa5188176 Send delta times for context switches. 2019-10-25 19:13:11 +02:00
Bartosz Taudul
25b3cdc1ee Send thread wakeups when handling disconnect request. 2019-10-25 18:22:42 +02:00
Bartosz Taudul
c8e5489e99 Group caches together. 2019-10-25 18:16:27 +02:00
Bartosz Taudul
29c42cc8d7 Fix assert. 2019-10-25 01:00:32 +02:00
Bartosz Taudul
17a51c898e No need to check if vector is empty. 2019-10-25 00:54:46 +02:00
Bartosz Taudul
b5e759bc5a Don't calculate child index twice. 2019-10-25 00:54:46 +02:00
Bartosz Taudul
70f1074490 Don't iterate over children to calculate zone self time. 2019-10-25 00:33:44 +02:00
Bartosz Taudul
d6a8a8532f Prevent storing variable on stack. 2019-10-24 23:40:21 +02:00
Bartosz Taudul
1fe76be955 Don't reconstruct lock event time on insert. 2019-10-24 23:25:04 +02:00
Bartosz Taudul
b83d0f46d9 Improve updating last time.
Avoid LHS, don't write if don't need to.
2019-10-24 23:23:52 +02:00
Bartosz Taudul
721f3c8925 Callstack is already zero-initialized. 2019-10-24 23:05:39 +02:00
Bartosz Taudul
45332fd837 Don't read memory when setting values. 2019-10-24 23:03:13 +02:00
Bartosz Taudul
c9da5f1474 Use cached thread retriever. 2019-10-24 22:34:18 +02:00
Bartosz Taudul
5873561b54 Add cached thread retriever. 2019-10-24 22:33:48 +02:00
Bartosz Taudul
06bc802107 Avoid load-hit-store. 2019-10-24 22:24:00 +02:00
Bartosz Taudul
04b132b6e2 Check if requested data size doesn't overflow buffer. 2019-10-24 21:22:22 +02:00
Bartosz Taudul
01ceedb57a Focus out labels in connection window. 2019-10-24 00:54:19 +02:00
Bartosz Taudul
c5a6c7bf63 Display transferred data size. 2019-10-24 00:47:25 +02:00
Bartosz Taudul
1cfb5adc44 Count transferred data size. 2019-10-24 00:47:16 +02:00
Bartosz Taudul
2d31ca993e Update NEWS. 2019-10-24 00:13:12 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
cf88265304 Full 64-bit register is set by rdtsc. 2019-10-21 01:13:55 +02:00
Bartosz Taudul
699ff43f1e Update timings. 2019-10-20 22:18:20 +02:00
Bartosz Taudul
07b66cd4ab Move fake source location out of loop. 2019-10-20 22:18:05 +02:00
Bartosz Taudul
909503403b Simplify delay calibration. 2019-10-20 22:13:29 +02:00
Bartosz Taudul
411e4d42ac Move disassembly from FAQ to manual. 2019-10-20 21:23:16 +02:00
Bartosz Taudul
c774534b47 Use rdtsc instead of rdtscp.
But rdtscp is serializing!

No, it's not. Quoting the Intel Instruction Set Reference:

"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible. But it does not wait for previous stores to
be globally visible, and subsequent instructions may begin execution
before the read operation is performed.",

"The RDTSC instruction is not a serializing instruction. It does not
necessarily wait until all previous instructions have been executed
before reading the counter. Similarly, subsequent instructions may begin
execution before the read operation is performed."

So, the difference is in waiting for prior instructions to finish
executing. Notice that even in the rdtscp case, execution of the
following instructions may commence before time measurement is finished
and data stores may be still pending.

But, you may say, Intel in its "How to Benchmark Code Execution Times"
document shows that using rdtscp is superior to rdstc. Well, not
exactly. What they do show is that when a *single function* is
considered, there are ways to measure its execution time with little to
no error.

This is not what Tracy is doing.

In our case there is no way to determine absolute "this is before" and
"this is after" points of a zone, as we probably already are inside
another zone.  Stopping the CPU execution, so that a deeply nested zone
may be measured with great precision, will skew the measurements of all
parent zones.

And this is not what we want to measure, anyway. We are not interested
in how a *single function* behaves, but how a *whole program* behaves.
The out-of-order CPU behavior may influence the measurements? Good! We
are interested in that. We want to see *how* the code is really
executed. How is *stopping* the CPU to make a timer read an appropriate
thing to do, when we want to see how a program is performing?

At least that's the theory.

And besides all that, the profiling overhead is now reduced.
2019-10-20 20:52:33 +02:00
Bartosz Taudul
30fc2f02ab Omit calculation of on-stack variable address. 2019-10-20 19:42:29 +02:00
Bartosz Taudul
5c92eae3ed Add early exit for invalid times. 2019-10-20 18:47:50 +02:00
Bartosz Taudul
d592af9c2f Fix TRACY_NO_STATISTICS build. 2019-10-20 17:32:20 +02:00
Bartosz Taudul
5816dc2b11 Don't cache timedist data if ctx switch data is incomplete. 2019-10-20 17:03:30 +02:00
Bartosz Taudul
ccdc102d5a Cache zone time distribution data. 2019-10-20 03:24:58 +02:00
Bartosz Taudul
4d761def61 Microoptimize comparison. 2019-10-16 20:26:39 +02:00
Bartosz Taudul
14292f9e35 Update manual. 2019-10-15 21:57:49 +02:00
Bartosz Taudul
f89bc970ee Update NEWS. 2019-10-15 21:50:22 +02:00
Bartosz Taudul
bfbd09b619 Add CPU usage graph tooltip. 2019-10-15 21:47:37 +02:00
Bartosz Taudul
7a9d4aecd3 Fix graph height calculation. 2019-10-15 21:41:06 +02:00
Bartosz Taudul
4372ad1bc3 Allow disabling CPU usage graph. 2019-10-15 21:37:16 +02:00
Bartosz Taudul
c28bab59b5 Improve look of CPU usage graph. 2019-10-15 21:20:00 +02:00
Bartosz Taudul
5aeeefefbd Draw CPU usage graph. 2019-10-15 16:55:15 +02:00
Bartosz Taudul
3ae5c125f6 Implement counting CPU usage (ctx switch) at a given time. 2019-10-15 16:54:43 +02:00
Bartosz Taudul
3ce6b1205f Don't iterate over 256 CPUs. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
eccb0b1e4a Track max CPU present in context switch data. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
bdb8516d04 Make sure context switch end time wasn't set already. 2019-10-15 14:54:28 +02:00
Bartosz Taudul
a20c6604c3 Add natvis for ContextSwitchData and ContextSwitchCpu. 2019-10-15 14:11:02 +02:00
Bartosz Taudul
fefa3b4693 Improve options UI. 2019-10-15 01:49:36 +02:00
Bartosz Taudul
dffe65f8e2 Update manual. 2019-10-14 20:52:18 +02:00
Bartosz Taudul
f0c77b4ef4 Add annotation list window. 2019-10-14 20:52:18 +02:00
Bartosz Taudul
1ad246b4ca Update manual. 2019-10-14 20:17:28 +02:00
Bartosz Taudul
c6207ed0e9 Move extra tools to main window button bar popup. 2019-10-14 20:07:55 +02:00
Bartosz Taudul
fc7f77eb7a Add implementation of disablable button. 2019-10-14 20:06:57 +02:00
Bartosz Taudul
6de8e6987f Sort annotations. 2019-10-14 19:04:37 +02:00
Bartosz Taudul
5c47467c88 Fix includes. 2019-10-13 17:13:15 +02:00
Bartosz Taudul
671a8f673e Don't interact with unfocused annotations. 2019-10-13 17:01:55 +02:00
Bartosz Taudul
98ab83c69b Update manual. 2019-10-13 17:00:07 +02:00
Bartosz Taudul
ae2c9b4859 Update NEWS. 2019-10-13 16:30:07 +02:00
Bartosz Taudul
e462335f83 Save/load annotations. 2019-10-13 16:29:24 +02:00
Bartosz Taudul
c2f38d0db7 Implement removal of user data files. 2019-10-13 16:29:02 +02:00
Bartosz Taudul
9d0316342d Move Annotation struct to a proper place. 2019-10-13 16:28:40 +02:00
Bartosz Taudul
20cf1d9f83 Implement color selection for annotation region. 2019-10-13 16:14:22 +02:00
Bartosz Taudul
f9e860f559 Display annotation text on timeline. 2019-10-13 15:59:48 +02:00
Bartosz Taudul
1527e7bc10 Add annotation modification window. 2019-10-13 15:50:37 +02:00
Bartosz Taudul
5fed86dae7 Allow adding annotations to timeline. 2019-10-13 15:28:52 +02:00
Bartosz Taudul
215dc8a804 More compact GpuEvent struct (save 4 bytes).
Memory usage reduction of various traces:

big         9011 -> 9007
frameimages 561  -> 552
fi-big      4144 -> 4139
long        5253 -> 5125
2019-10-13 14:42:52 +02:00
Bartosz Taudul
c044df6324 Display number of GPU zones. 2019-10-13 14:21:28 +02:00
Bartosz Taudul
1ae49c14a2 GPU zone count accessor. 2019-10-13 14:13:28 +02:00
Bartosz Taudul
5e1894dd79 Count GPU zones. 2019-10-13 14:13:04 +02:00
Bartosz Taudul
c3870f8837 Use proper type. 2019-10-10 20:30:08 +02:00
Bartosz Taudul
707f113bda Add missing NOMINMAX definitions. 2019-10-10 20:29:06 +02:00
Bartosz Taudul
d4620b4157 Fix UI. 2019-10-09 22:33:02 +02:00
Bartosz Taudul
0a358ac1f0 Time distribution may now only include running time. 2019-10-09 22:13:52 +02:00
Bartosz Taudul
6ced346e08 Different sorting modes for zone time distribution. 2019-10-09 21:42:46 +02:00
Bartosz Taudul
f03b7f33ed Update NEWS. 2019-10-07 22:33:55 +02:00
Bartosz Taudul
ed1f722c51 Display trace file name in trace info window. 2019-10-07 21:36:19 +02:00
Bartosz Taudul
4c4099877d Track trace file name in TracyView. 2019-10-07 21:36:19 +02:00
Bartosz Taudul
c6f320d2d8 Store file name in FileRead. 2019-10-07 21:32:27 +02:00
Bartosz Taudul
7cf3608493 Avoid unused variables. 2019-10-05 02:11:45 +02:00
Bartosz Taudul
4ba885ac95 Update manual. 2019-10-04 21:47:30 +02:00
Bartosz Taudul
3e3bd30290 Update NEWS. 2019-10-04 21:38:08 +02:00
Bartosz Taudul
1cd5ccb3c1 Display zone time distribution. 2019-10-04 21:34:00 +02:00
Bartosz Taudul
871e1f1c37 Describe workaround for exiting from within a zone. 2019-10-04 20:43:08 +02:00
Bartosz Taudul
e481b5ba22 Add missing thread sent indication. 2019-10-04 19:18:47 +02:00
Bartosz Taudul
4e7e9ee3b1 Update manual. 2019-10-04 18:53:06 +02:00
Bartosz Taudul
5111275770 Highlight hovered zone on the find zone zones list. 2019-10-04 13:02:26 +02:00
Bartosz Taudul
b913c17f5b Add "no grouping" mode to find zone zones list. 2019-10-04 12:42:05 +02:00
Bartosz Taudul
9e1935f070 Make C API symbols visible across dlls. 2019-10-03 22:39:26 +02:00
Bartosz Taudul
eba64b30a3 TracySystem.cpp should be always compiled in. 2019-10-03 22:34:06 +02:00
Bartosz Taudul
f2bb933f49 Use proper background color. 2019-10-02 00:49:30 +02:00
Bartosz Taudul
3b223c64d4 Darken to background color to hide overhang.
This only handles the root window case. When the profiler is embedded in
other application, the window background color is not matched.
2019-10-01 23:17:36 +02:00
Bartosz Taudul
db29d309a2 Lambda capture is not needed here. 2019-10-01 22:42:43 +02:00
Bartosz Taudul
68f476834f Make sure TracyCountBits() always returns uint64_t. 2019-10-01 22:42:29 +02:00
Bartosz Taudul
65ea33a60f Store memory callstack data as 24-bit ints.
This reduces MemEvent size from 40 to 38 bytes.

Memory usage reduction:

chicken     2027 -> 2019
mem         6468 -> 6308
q3bsp-mt    5304 -> 5283
2019-10-01 22:38:17 +02:00
Bartosz Taudul
f0b957ec56 Store callstacks on 24 bits.
ZoneEvent is now 27 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9224 -> 9011  (97%)
chicken         2044 -> 2027  (99%)
drl-l-b         1443 -> 1383  (95%)
long            5327 -> 5253  (98%)
q3bsp-mt        5400 -> 5304  (98%)
selfprofile     1403 -> 1382  (98%)
2019-10-01 22:38:17 +02:00
Bartosz Taudul
c631e33f81 Add 24-bit int implementation. 2019-10-01 21:48:34 +02:00
Bartosz Taudul
472959b29f Remove irrelevant comment. 2019-10-01 01:15:43 +02:00
Bartosz Taudul
717a212563 Save another 2 bytes per ZoneEvent.
ZoneEvent is not 28 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9527 -> 9224  (96%)
chicken         2107 -> 2044  (97%)
drl-l-b         1479 -> 1443  (97%)
long            5412 -> 5327  (98%)
q3bsp-mt        5592 -> 5400  (96%)
selfprofile     1443 -> 1403  (97%)
2019-10-01 01:05:37 +02:00
Bartosz Taudul
4964aa9547 Assert on getting index only for active strings. 2019-10-01 00:40:58 +02:00
Bartosz Taudul
acfcfb09ce Hide context switch options, if no data is available. 2019-09-30 23:46:10 +02:00
Bartosz Taudul
ffdb6d8a3b Update manual. 2019-09-30 23:43:07 +02:00
Bartosz Taudul
36fdc71588 Update NEWS. 2019-09-30 23:40:13 +02:00
Bartosz Taudul
0e56682964 Darkening of inactive thread regions. 2019-09-30 23:37:36 +02:00
Bartosz Taudul
e758e98ca4 Update manual. 2019-09-29 21:16:44 +02:00
Bartosz Taudul
80ff267a77 Update NEWS. 2019-09-29 21:03:40 +02:00
Bartosz Taudul
599fa17e4f Expose extreme compression level in update utility. 2019-09-29 21:03:08 +02:00
Bartosz Taudul
6e7e8eff87 Set extreme compression level to really be extreme. 2019-09-29 21:02:01 +02:00
Bartosz Taudul
2470936050 Don't perform background tasks during trace upgrade. 2019-09-29 20:52:25 +02:00
Bartosz Taudul
947eb56f3d Add loading/saving messages to update utility. 2019-09-29 20:48:18 +02:00
Bartosz Taudul
d228bcb622 Pack StringIdx in 24 bits.
This reduces ZoneEvent size from 32 to 30 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9902 -> 9527  (96%)
chicken         2172 -> 2107  (97%)
ctx-big          311 ->  309  (99%)
drl-l-b         1570 -> 1479  (94%)
long            5496 -> 5412  (98%)
mem             6468 -> 6468  (100%)
q3bsp-mt        5784 -> 5592  (96%)
selfprofile     1486 -> 1443  (97%)
2019-09-29 20:32:42 +02:00
Bartosz Taudul
781ebeb835 Add table initializing alloc to slab allocator. 2019-09-29 20:18:16 +02:00
Bartosz Taudul
59632f0d37 One more place to check if srcloc zones are ready. 2019-09-29 20:17:47 +02:00
Bartosz Taudul
873d536845 Display number of strings. 2019-09-29 19:22:50 +02:00
Bartosz Taudul
c91ae667d1 Add string count getter. 2019-09-29 19:22:15 +02:00
Bartosz Taudul
cb6a3f3334 Highlight CPU data timeline from thread tooltip. 2019-09-29 18:55:31 +02:00
Bartosz Taudul
3b8ab5715f Highlight CPU data timeline from CPU data window. 2019-09-29 18:53:58 +02:00
Bartosz Taudul
cafb5d6a99 Highlight threads on CPU data timeline. 2019-09-29 18:49:48 +02:00
Aleksei Skriabin
05a2fa487f Merged in Vuhdo/tracy/strstr_nocase_fix (pull request #41)
strstr_nocase() typo fix.
2019-09-28 17:55:31 +00:00
Aleksei Skriabin
c0c2f4536a strstr_nocase() typo fix. 2019-09-28 14:20:29 +05:00
Bartosz Taudul
2356069eac Update manual. 2019-09-27 18:15:32 +02:00
Bartosz Taudul
130365f4ff Inject tracy_systrace into filesystem and use instead of cat.
Statistics for a one-minute trace:

  Capture tool | Running time | Running regions
---------------+--------------+-----------------
      cat      |    25.11 s   |     392,300
tracy_systrace |    10.41 s   |      12,249
2019-09-27 15:51:29 +02:00
Bartosz Taudul
3dba4088ee Embed precompiled tracy_systrace for android. 2019-09-27 15:50:58 +02:00
Bartosz Taudul
0850a5e4a3 Use a proper build script. 2019-09-27 00:06:45 +02:00
Bartosz Taudul
6094d69479 Manually load required symbols. 2019-09-27 00:05:41 +02:00
Bartosz Taudul
9de2d312a3 Tiny binary. 2019-09-26 23:54:08 +02:00
Bartosz Taudul
6f5dd44f1f Helper for reading data from kernel more efficiently. 2019-09-26 22:55:02 +02:00
Bartosz Taudul
c09f3c0676 Add thread color boxes to CPU data window. 2019-09-25 02:12:35 +02:00
Bartosz Taudul
0cc0b456cc Update NEWS. 2019-09-24 23:59:20 +02:00
Bartosz Taudul
6c5627d8e4 Add thread color boxes to memory allocations listings. 2019-09-24 23:58:11 +02:00
Bartosz Taudul
581fd920a1 Add thread color boxes to lock info. 2019-09-24 23:52:52 +02:00
Bartosz Taudul
12e2bcb691 Add thread color boxes to zone info windows. 2019-09-24 23:51:47 +02:00
Bartosz Taudul
ad2dd09c25 Add thread color boxes to zone tooltips. 2019-09-24 23:50:00 +02:00
Bartosz Taudul
47f81d0ba4 Add thread color box to memory plot tooltip. 2019-09-24 23:47:51 +02:00
Bartosz Taudul
9c86102bad Add thread color box to CPU data on timeline. 2019-09-24 23:46:54 +02:00
Bartosz Taudul
a7e3324eba Add thread color boxes to GPU context tooltips. 2019-09-24 23:45:36 +02:00
Bartosz Taudul
6ffbd00b0c Add thread color box to crash info. 2019-09-24 23:42:25 +02:00
Bartosz Taudul
c73a74b8d5 Add thread color boxes to memory allocation info. 2019-09-24 23:41:28 +02:00
Bartosz Taudul
e9b815a3b8 Show thread color boxes in find zone menu. 2019-09-24 23:38:29 +02:00
Bartosz Taudul
06fe469598 Add thread color boxes to messages thread list. 2019-09-24 23:33:33 +02:00
Bartosz Taudul
e7578777c3 Update ImGui to 1.73. 2019-09-24 23:32:03 +02:00
Bartosz Taudul
63184f8762 Better Vulkan thread heuristics. 2019-09-24 00:55:24 +02:00
Bartosz Taudul
891e7711e9 Update manual. 2019-09-24 00:20:41 +02:00
Bartosz Taudul
49abad2dec Update manual. 2019-09-23 17:30:00 +02:00
Bartosz Taudul
7503bb1aee Update NEWS. 2019-09-23 17:28:32 +02:00
Bartosz Taudul
a5ba74ed13 Handle multiple Vulkan threads. 2019-09-23 17:27:49 +02:00
Bartosz Taudul
0f68e1e981 Send thread id in GPU zone end message.
We don't care about OpenGL zone thread ids, so the identifier is zeroed.
2019-09-23 16:06:14 +02:00
Bartosz Taudul
daf64c703a Serialize Vulkan GPU profiling messages.
Since Vulkan can be multi-threaded, the guarantee of GPU time data
arriving after CPU time data can't be held with asynchronous messages.
Use serial queue instead.
2019-09-23 15:38:16 +02:00
Bartosz Taudul
9a49f49cfd Also build test with TRACY_ON_DEMAND enabled. 2019-09-21 15:50:27 +02:00
Bartosz Taudul
2a9b1b3cf3 Allow easy adding of tracy flags in test application. 2019-09-21 15:49:54 +02:00
Bartosz Taudul
a5fecc350b Update manual. 2019-09-21 15:47:37 +02:00
Bartosz Taudul
c2728832af Update NEWS. 2019-09-21 15:43:31 +02:00
Bartosz Taudul
82cd667b30 Allow specifying network port in server. 2019-09-21 15:43:01 +02:00
Bartosz Taudul
fb63dd89bc Update manual. 2019-09-21 15:21:29 +02:00
Bartosz Taudul
e13cbf52fd Allow changing tracy port in client. 2019-09-21 15:11:15 +02:00
Bartosz Taudul
140654961c Update NEWS. 2019-09-21 15:03:35 +02:00
Bartosz Taudul
dfb9ae1a90 Update manual. 2019-09-21 15:03:09 +02:00
Bartosz Taudul
a221f121ba Extract lock state handling to a separate context class. 2019-09-21 14:55:14 +02:00
Bartosz Taudul
4c736aecfa Use fibonacci hashing to determine thread colors. 2019-09-21 14:03:42 +02:00
Bartosz Taudul
7a1fb4e0bd Proper message when call stack trees are not available. 2019-09-21 00:57:12 +02:00
Bartosz Taudul
46f7235e32 Display proper message when there are no active allocations. 2019-09-21 00:54:30 +02:00
Bartosz Taudul
feddd58b46 Better way to scale ImGui style. 2019-09-21 00:52:13 +02:00
Bartosz Taudul
d8e0853cd8 Multithreaded frame image compression. 2019-09-20 23:03:12 +02:00
Bartosz Taudul
6f5a23a198 Add task dispatcher to server. 2019-09-20 22:58:12 +02:00
Bartosz Taudul
e1e5d6bd47 Add const version of PackFrameImage().
Temporary buffer needs to be handled outside of the function.
2019-09-20 22:55:55 +02:00
Bartosz Taudul
b362baed5f Minor UI improvements. 2019-09-19 01:10:33 +02:00
Bartosz Taudul
6fbfd12d1f Update manual. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
0fed97b241 Update NEWS. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
6a0512fe16 Allow comparing frame times. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
8fe9b56b6f Calculate frame statistics. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
b99675ae60 Use thread color for collapsed zones. 2019-09-16 20:34:55 +02:00
Bartosz Taudul
36b2b8f71f Always return static thread color if dynamic colors are disabled. 2019-09-16 20:31:32 +02:00
Bartosz Taudul
5796f19a3b Focus out exact memory plot value. 2019-09-16 20:27:16 +02:00
Bartosz Taudul
7673028dba Fix skipping memory data. 2019-09-16 15:42:25 +02:00
Bartosz Taudul
5429f04614 Don't use source location data before it's ready. 2019-09-16 15:37:57 +02:00
Bartosz Taudul
e0105451f6 Update manual. 2019-09-12 20:14:54 +02:00
Bartosz Taudul
ab5a128674 Update NEWS. 2019-09-12 20:09:37 +02:00
Bartosz Taudul
6d00a56c61 Draw thread migrations across CPU cores. 2019-09-12 20:08:57 +02:00
Bartosz Taudul
c1731f864b Update manual. 2019-09-11 19:05:53 +02:00
Bartosz Taudul
23b6e5156b Display thread color in thread tooltip. 2019-09-11 19:01:27 +02:00
Bartosz Taudul
2872edce5d Use thread colors in context switch graph. 2019-09-11 18:56:54 +02:00
Bartosz Taudul
8ddafe4153 Extract color highlight functionality. 2019-09-11 18:52:25 +02:00
Bartosz Taudul
0850145811 Disable color box drag and drop. 2019-09-11 18:48:28 +02:00
Bartosz Taudul
2cec6f5482 Add thread colors to options menu. 2019-09-11 18:44:06 +02:00
Bartosz Taudul
4ea62ecb06 Extract small color box drawing. 2019-09-11 18:38:10 +02:00
Bartosz Taudul
00409b0b94 Extract thread color getter. 2019-09-11 18:34:48 +02:00
Bartosz Taudul
9cd359f0b9 Update manual. 2019-09-08 14:38:40 +02:00
Bartosz Taudul
2464eedbd4 Update NEWS. 2019-09-08 14:34:35 +02:00
Bartosz Taudul
a5a6b11b63 Zones can now have dynamic colors. 2019-09-08 14:33:30 +02:00
Bartosz Taudul
2714152f84 Allow calculating zone depth. 2019-09-08 14:16:12 +02:00
Bartosz Taudul
cdc4575dba Setup tid -> thread data mapping when loading trace. 2019-09-08 14:15:40 +02:00
Bartosz Taudul
ea6a0a58a7 Thread data accessor. 2019-09-08 14:07:16 +02:00
Bartosz Taudul
3a9ff94580 Update manual. 2019-09-08 13:38:19 +02:00
Bartosz Taudul
1e57ae3f49 Update NEWS. 2019-09-08 13:20:13 +02:00
Bartosz Taudul
c9a1d3d7e5 Display zone color in zone info window. 2019-09-08 13:19:43 +02:00
Bartosz Taudul
b7522ec4c1 Allow getting zone color sans higlights, etc. 2019-09-08 13:16:00 +02:00
Bartosz Taudul
6ef282dd1a Notify user that the data might not be correct. 2019-09-07 18:20:26 +02:00
Bartosz Taudul
17e6a97552 Let's leave this here. 2019-09-07 17:49:54 +02:00
Bartosz Taudul
a0814a2e5c Correctly calculate discontinuous frames time. 2019-09-07 17:39:39 +02:00
Bartosz Taudul
aac0a36a2d Don't use source location zones before they are ready. 2019-09-07 17:23:11 +02:00
Bartosz Taudul
70ae2f763d Update manual. 2019-09-07 17:20:51 +02:00
Bartosz Taudul
b4e019e7e7 Update NEWS. 2019-09-07 17:00:05 +02:00
Bartosz Taudul
3449f0777e Display zone time on frames plot. 2019-09-07 16:55:49 +02:00
Bartosz Taudul
0b1a6047f6 Add different highlight for zones selected on histogram. 2019-09-07 15:33:11 +02:00
Bartosz Taudul
57a2b62edc Display number of threads for pids in CPU data list. 2019-09-04 01:43:56 +02:00
Bartosz Taudul
0837463f05 Describe how wonderful linux interfaces are. 2019-09-03 21:45:19 +02:00
Bartosz Taudul
37661fd2ee Fix 32 bit NEON version of DXT1 compression.
This reverts commit b32e8fa24e.

Apparently it is possible to receive non-uniform data in alpha channel, which
breaks the original assumption about not needing the mask. This seemed to be a
problem only on 32 bit NEON implementation of DXT1 compression. Other
implementations handle such data without degradation of visual output.
2019-09-03 21:37:07 +02:00
Bartosz Taudul
aa2530d442 Display external thread name (if applicable) on CPU data timeline. 2019-08-31 19:37:05 +02:00
Bartosz Taudul
be36e7a19c Update manual. 2019-08-31 01:08:03 +02:00
Bartosz Taudul
86cb477811 Pack ZoneThreadData.
This reduces struct size from 10 to 8 bytes. Assumes 48-bit pointers
(4-level paging)!

Memory savings (MB):

android     2766    ->  2757    (99%)
big         10.29 G ->  9902    (96%)
chicken     2244    ->  2172    (96%)
ctx-android 228     ->  224     (98%)
drl-l-b     1635    ->  1570    (96%)
gn-vulkan   244     ->  240     (98%)
long        5656    ->  5496    (97%)
q3bsp-mt    6043    ->  5784    (95%)
selfprofile 1554    ->  1486    (95%)
2019-08-31 00:55:51 +02:00
Bartosz Taudul
3ec534cdf3 Prevent "ntdll.dll" from appearing as a thread name. 2019-08-30 23:09:07 +02:00
Bartosz Taudul
7a6564feae Only recycle producers, if there's no data in queue.
("The queue" is per-thread partial queue here.)

This fixes a problem where one thread writes to the queue, then is
terminated, making the (partially filled) queue available for other
threads to recycle. If another thread re-owns the queue, it will change
the associated thread id, while part of the queue was filled by the
original thread. This obviously created invalid data during dequeue.

The fix makes the recycling process check not only for queue inactivity
(which is marked when the original thread terminates), but also if the
queue is empty, preventing mixing data from different threads.
2019-08-30 14:28:44 +02:00
Bartosz Taudul
1c0c6311ec Fix skipping data when loading traces. 2019-08-30 01:16:42 +02:00
Bartosz Taudul
217a3781e6 Fix possible wrong process name for pid 0. 2019-08-30 00:59:54 +02:00
Bartosz Taudul
19f8f9f101 Use proper type. 2019-08-30 00:56:11 +02:00
Bartosz Taudul
a8d204821e Signed left shift is undefined. 2019-08-29 18:42:29 +02:00
Bartosz Taudul
adfc4eb59b Store UdpListen instance in an unique ptr. 2019-08-29 18:36:55 +02:00
Bartosz Taudul
5e8b2a0723 Display wakeup times in zone wait regions list. 2019-08-28 23:03:16 +02:00
Bartosz Taudul
0e89105bdb Update NEWS. 2019-08-28 21:40:58 +02:00
Bartosz Taudul
fc0593a840 Update manual. 2019-08-28 21:38:51 +02:00
Bartosz Taudul
6f25ad5fcb Save per-trace options. 2019-08-28 21:35:08 +02:00
Bartosz Taudul
fc5293b1ae Only scroll message list to bottom if capture is live. 2019-08-28 21:04:28 +02:00
Bartosz Taudul
a2f968d843 Compress thread id in MessageData. 2019-08-28 21:03:01 +02:00
Bartosz Taudul
ede26b0caf Fix skipping zone levels. 2019-08-28 20:47:19 +02:00
Bartosz Taudul
ee14ff6d6e Update manual. 2019-08-28 20:39:29 +02:00
Bartosz Taudul
85027c185d Extract notification area drawing to a separate function. 2019-08-28 20:27:39 +02:00
Bartosz Taudul
a8eb99efcc Add notification icons when a drawing a category is disabled. 2019-08-28 20:24:14 +02:00
Bartosz Taudul
5b0ccef373 Change some icons. 2019-08-28 20:17:38 +02:00
Bartosz Taudul
fd5014be6f GetThreadString() is no longer used. 2019-08-28 20:08:16 +02:00
Bartosz Taudul
28a20e631e Preserve frame graph position and scale. 2019-08-28 19:52:36 +02:00
Bartosz Taudul
17d4a82ca5 Preserve timeline vertical scroll position. 2019-08-28 19:49:27 +02:00
Bartosz Taudul
abde0c252d Update NEWS. 2019-08-28 19:46:08 +02:00
Bartosz Taudul
f37797db44 Save/load view state. 2019-08-28 19:45:22 +02:00
Bartosz Taudul
dc5444ff0f Notify UserData that view state should be preserved.
This is only active when a trace is loaded from a file (and state should
be persistent for future sessions using this trace), or when state is
saved to a file (so that future sessions will use current state).

No state is preserved by default, i.e. when the trace was not saved to a
file.
2019-08-28 19:37:01 +02:00
Bartosz Taudul
949c9cb121 Move some view data to a separate structure. 2019-08-28 19:35:54 +02:00
Bartosz Taudul
38bfae13dd Add helper function for opening files. 2019-08-28 19:28:31 +02:00
Bartosz Taudul
2a0d6ce4ad Add notification area indicator for hidden timeline items. 2019-08-28 18:36:05 +02:00
Bartosz Taudul
ed83762a1a Keep things simple. 2019-08-28 01:29:58 +02:00
Bartosz Taudul
d95e24f66b Update NEWS. 2019-08-27 23:20:56 +02:00
Bartosz Taudul
ef287c8aab Display external thread names of profiled program on CPU data timeline. 2019-08-27 23:17:53 +02:00
Bartosz Taudul
8eb7220dd7 Use the new thread name getter. 2019-08-27 23:08:14 +02:00
Bartosz Taudul
3c092b4bec Add thread name getter combining local and external thread names. 2019-08-27 23:00:13 +02:00
Bartosz Taudul
f8e3d1ad0a Try to fix current program's thread names.
External thread names can be cut-off to include only the first 15-or-so
characters. If a local thread name is known and its beginning matches
the external name, use the local name instead.
2019-08-27 22:41:03 +02:00
Bartosz Taudul
8bb13ca09e Use captured program name in CPU data.
This fixes android application names, which are cut to show only last
15-or-so letters.
2019-08-27 22:35:53 +02:00
Bartosz Taudul
f76f38777e Signed minus unsigned is unsigned... 2019-08-26 19:09:12 +02:00
Bartosz Taudul
eb78ecd0fd Display frame number in playback window. 2019-08-26 19:01:59 +02:00
Bartosz Taudul
3e4d3efbdb Extract frame number getter. 2019-08-26 19:01:51 +02:00
Bartosz Taudul
00b26c1acf Fix TRACY_NO_SYSTEM_TRACING. 2019-08-26 18:02:10 +02:00
Bartosz Taudul
fbeee3cf61 Fix (?) invalid function pointer signature. 2019-08-26 17:59:58 +02:00
Bartosz Taudul
78127dc357 System threads only allow limited information queries. 2019-08-25 00:33:22 +02:00
Bartosz Taudul
e5a11ad593 Allow sorting CPU data table by different columns. 2019-08-25 00:17:06 +02:00
Bartosz Taudul
5e2614bcfa Update NEWS. 2019-08-24 23:50:18 +02:00
Bartosz Taudul
4376757912 Display thread ids in options menu. 2019-08-24 23:43:36 +02:00
Bartosz Taudul
2b9ec14c92 Display threads ids as base-10 numbers. 2019-08-24 23:41:33 +02:00
Bartosz Taudul
deb59b4c38 Somehow fix event ordering. 2019-08-24 01:43:55 +02:00
Bartosz Taudul
1e74a89924 Check if there's data to read from kernel.
Reading from kernel pipe, while being a blocking operation, spin locks the
thread.
2019-08-24 01:06:21 +02:00
Bartosz Taudul
8f6e94d75c Sleep if sys trace pipe buffer underruns. 2019-08-24 00:42:00 +02:00
Bartosz Taudul
2d50d07438 Allow completely disabling system tracing. 2019-08-21 01:16:25 +02:00
Bartosz Taudul
5c8937eba2 Update manual. 2019-08-20 23:59:47 +02:00
Bartosz Taudul
0cbb853945 Add missing SetThreadName() calls. 2019-08-20 16:23:00 +02:00
Bartosz Taudul
332262dd84 Shorter thread names. 2019-08-20 16:22:54 +02:00
Bartosz Taudul
247acd03ee Kernel tracing on android. 2019-08-20 15:49:40 +02:00
Bartosz Taudul
e427d67347 Don't bail out if unimportant variables are not available. 2019-08-20 12:19:05 +02:00
Bartosz Taudul
bfda30be0b Use su on android to set tracing variables. 2019-08-20 12:18:46 +02:00
Bartosz Taudul
1712431dfd Compress external threads. Saves 4 bytes per ctx switch.
Dropped support for loading context switch data in previous versions of
traces.
2019-08-19 23:09:58 +02:00
Bartosz Taudul
21e7a4bb16 Extract thread compression into a separate class. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
94382f54ca Move FileVersion() to TracyFileHeader.hpp. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
9d87a8394d Add missing getline() implementation for android API < 18. 2019-08-19 15:26:09 +02:00
Bartosz Taudul
fd245bb5df Fix includes for gettid() on android. 2019-08-19 15:09:47 +02:00
Bartosz Taudul
9be6f4a414 Fix typo. 2019-08-19 13:03:37 +02:00
Bartosz Taudul
d209bb4d01 Add missing function pointer checks. 2019-08-19 12:47:27 +02:00
Bartosz Taudul
e60b2884f4 Mark local threads with different color. 2019-08-18 14:57:44 +02:00
Bartosz Taudul
19857473e3 Also collect information on local threads. 2019-08-18 14:56:17 +02:00
Bartosz Taudul
9a3974b8f1 Display process times in graphical form. 2019-08-18 14:51:25 +02:00
Bartosz Taudul
2eed28b19f Highlight current process. 2019-08-18 14:46:59 +02:00
Bartosz Taudul
ae9cae781a Display CPU migrations percentage. 2019-08-18 14:44:00 +02:00
Bartosz Taudul
691fe06bfe Compare pids to determine if thread is local untracked. 2019-08-18 14:40:04 +02:00
Bartosz Taudul
95f4162870 Display number of tracked processes. 2019-08-18 14:30:52 +02:00
Bartosz Taudul
7a036b56b1 Add icon to CPU data button. 2019-08-18 14:30:01 +02:00
Bartosz Taudul
c5060da185 Display unknown pid as unknown. 2019-08-18 14:28:56 +02:00
Bartosz Taudul
faac08865a Display basic information about CPU usage. 2019-08-18 12:28:38 +02:00
Bartosz Taudul
3b8518f7b6 Save/load CPU thread data. 2019-08-18 01:53:38 +02:00
Bartosz Taudul
62dbe522c5 Add accessors. 2019-08-18 01:51:02 +02:00
Bartosz Taudul
103645c2fa Calculate cpu thread data statistics. 2019-08-18 01:50:49 +02:00
Bartosz Taudul
1498417a8d Save/load tid to pid mapping. 2019-08-17 22:36:21 +02:00
Bartosz Taudul
20e8a5ecc8 Create tid to pid mapping. 2019-08-17 22:32:41 +02:00
Bartosz Taudul
fa573ef4cf Display PID. 2019-08-17 22:21:02 +02:00
Bartosz Taudul
678e942e9f Transfer PID of profiled program. 2019-08-17 22:19:04 +02:00
Bartosz Taudul
1024992493 React to enter key in "go to frame" dialog. 2019-08-17 22:01:06 +02:00
Bartosz Taudul
258cf38d64 Fix flicker. 2019-08-17 21:59:08 +02:00
Bartosz Taudul
77c636c3fd Retrieve module name for threads with no names on windows. 2019-08-17 21:24:40 +02:00
Bartosz Taudul
0ea8789f39 Display CPU core in waking up thread popup. 2019-08-17 21:24:40 +02:00
Bartosz Taudul
f7589bde02 Trace thread wakeups on linux. 2019-08-17 17:18:11 +02:00
Bartosz Taudul
580944af65 Update manual. 2019-08-17 17:11:12 +02:00
Bartosz Taudul
414f903cc5 Collect thread wakeup data. 2019-08-17 17:05:29 +02:00
Bartosz Taudul
f957f64ce1 No magic numbers. 2019-08-17 16:26:59 +02:00
Bartosz Taudul
26be78530f Use signed number to calculate frame offset. 2019-08-17 15:22:54 +02:00
Bartosz Taudul
e9080bdbcd Hardcode windows PID 4 as "System". 2019-08-17 03:44:47 +02:00
Bartosz Taudul
40eb8a5a03 Proper check for invalid handle. 2019-08-17 03:44:11 +02:00
Bartosz Taudul
65e62dea06 Display thread ids next to thread names in CPU data. 2019-08-17 03:06:54 +02:00
Bartosz Taudul
6c1dd8eaec Cast thread handle to DWORD. 2019-08-16 21:21:37 +02:00
Bartosz Taudul
6c53cac15e Fix uninitialized variable. 2019-08-16 21:20:04 +02:00
Bartosz Taudul
d7104c752a Cygwin compat layer. 2019-08-16 21:16:04 +02:00
Bartosz Taudul
819ef2a82b External process/thread name retrieval on linux. 2019-08-16 21:00:42 +02:00
Bartosz Taudul
26e93b35c6 Update manual. 2019-08-16 20:31:16 +02:00
Bartosz Taudul
f63b6d0985 Update NEWS. 2019-08-16 19:53:08 +02:00
Bartosz Taudul
e975c4d7bf Also retrieve external thread names. 2019-08-16 19:49:16 +02:00
Bartosz Taudul
134a8c5d2a Fix positioning. 2019-08-16 19:32:25 +02:00
Bartosz Taudul
edd5338faa Display untracked threads. 2019-08-16 19:30:46 +02:00
Bartosz Taudul
ccaf92afc4 Save/load external process names. 2019-08-16 19:24:38 +02:00
Bartosz Taudul
fe7f56b022 Implement retrieval of external process names. 2019-08-16 19:22:23 +02:00
Bartosz Taudul
56e6795c76 Add per-cpu context switch tooltips. 2019-08-16 18:39:03 +02:00
Bartosz Taudul
7e81f3250e Add CPU tooltip. 2019-08-16 18:39:03 +02:00
Bartosz Taudul
8e71e2dba5 Draw per-CPU global context switch data. 2019-08-16 18:22:57 +02:00
Bartosz Taudul
c212661714 Allow determining whether thread is local to profiled program. 2019-08-16 17:59:25 +02:00
Bartosz Taudul
cef7e4b8d0 Save/load per-cpu context switches. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
8bc4258e29 Display count of per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
a92034d59d CPU data accessor. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
69527d2f71 Collect per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
9e0fe226df Add small font. 2019-08-16 16:02:57 +02:00
Bartosz Taudul
83fddd9aa6 Fix unicode builds. 2019-08-16 13:09:27 +02:00
Bartosz Taudul
9d5240c597 Mutable char array is required here due to shit API design. 2019-08-16 13:03:20 +02:00
Bartosz Taudul
42c71d7e46 Fix loading old traces. 2019-08-16 00:24:29 +02:00
Bartosz Taudul
95879d2bd9 Use proper UI element to indicate selectable items. 2019-08-16 00:12:03 +02:00
Bartosz Taudul
889eddd646 Pack ContextSwitchData. Saves 3 bytes per context switch region. 2019-08-15 23:53:47 +02:00
Bartosz Taudul
e90ddf7ee5 Don't search whole data set twice. 2019-08-15 23:03:37 +02:00
Bartosz Taudul
c22c259a13 Pack time and thread in MemEvent.
This saves 4 bytes per logged memory allocation. Memory savings for
selected traces:

android     2945 MB -> 2766 MB
chicken     2261 MB -> 2245 MB
q3bsp-mt    6085 MB -> 6043 MB
mem         6788 MB -> 6468 MB
2019-08-15 23:02:43 +02:00
Bartosz Taudul
9618ee3581 Fix skipping locks. 2019-08-15 22:24:27 +02:00
Bartosz Taudul
e43a57f6b3 Remove irrelevant comments. 2019-08-15 21:51:47 +02:00
Bartosz Taudul
a635e54a79 Pack MessageData. 2019-08-15 21:42:24 +02:00
Bartosz Taudul
04c8830f86 Cosmetics. 2019-08-15 21:38:00 +02:00
Bartosz Taudul
45401fc54c Use proper variable name. 2019-08-15 21:34:19 +02:00
Bartosz Taudul
8b73dece98 Preserve magic time values when loading old traces. 2019-08-15 21:30:37 +02:00
Bartosz Taudul
41beff29a9 Remove redundant GetTimeBegin().
Traces now start at zero time.
2019-08-15 21:04:20 +02:00
Bartosz Taudul
c9d7b96c81 Prevent int16_t -> int64_t promotion on negative numbers. 2019-08-15 20:58:16 +02:00
Bartosz Taudul
3db3952135 Hackfix for broken lock terminate times. 2019-08-15 20:45:00 +02:00
Bartosz Taudul
5e20b3f28a Pack time and source location in LockEvent. 2019-08-15 20:39:16 +02:00
Bartosz Taudul
2e31c26ae5 Update manual. 2019-08-15 20:21:09 +02:00
Bartosz Taudul
723e6ac192 Update NEWS. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
bf3ad57456 Pack start time and srcloc together in ZoneEvent.
This reduces ZoneEvent struct size by 2 bytes. Memory savings on various
captures:

10.62 GB -> 10.29 GB
 2342 MB ->  2276 MB
 1706 MB ->  1635 MB
 6277 MB ->  6085 MB
2019-08-15 20:17:36 +02:00
Bartosz Taudul
3e06daef31 Update manual. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
3148c5d736 Update NEWS. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
f5775a2d6e Display list of CPUs on which zone was running. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
042e6c9e11 Set initial time of old traces to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
350e526ec0 Fix crash when zone exists before thread context switches appear. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
b322d20c19 Store received timestamps offset to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
c021f4cf2c Update NEWS. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
659907c972 Store srcloc identifiers using 16 bit.
This reduces various structure sizes by 2 bytes. Memory usage reduction
on various traces:

big               11 GB -> 10.62 GB
chicken         2436 MB ->  2342 MB
drl-light-big   1761 MB ->  1706 MB
q3bsp-mt        6469 MB ->  6277 MB
2019-08-15 20:15:48 +02:00
Bartosz Taudul
416113fdcb Drop support for ETC1 frame images. 2019-08-15 16:29:50 +02:00
Bartosz Taudul
32c7d13159 Count size of some more structures. 2019-08-15 14:15:40 +02:00
Bartosz Taudul
8a205ef224 Update NEWS. 2019-08-15 02:29:02 +02:00
Bartosz Taudul
14a373a3b8 Add number of CPU cores to host info. 2019-08-15 02:28:35 +02:00
Bartosz Taudul
aa00b1c4c4 Add Win10 wait reasons. 2019-08-15 01:48:50 +02:00
Bartosz Taudul
69077e4e6f Finish sending context switches during disconnect. 2019-08-14 23:06:13 +02:00
Bartosz Taudul
6dc79cf14e Cosmetics. 2019-08-14 23:05:58 +02:00
Bartosz Taudul
c0b524d8de Add a separate method for clearing serial queue. 2019-08-14 22:39:12 +02:00
Bartosz Taudul
bccb845908 Update manual. 2019-08-14 22:19:11 +02:00
Bartosz Taudul
690a6d12d7 Properly handle incomplete context switch data. 2019-08-14 22:10:54 +02:00
Bartosz Taudul
7549c50bab Fix time range reset condition. 2019-08-14 21:53:09 +02:00
Bartosz Taudul
29819321b9 Update manual. 2019-08-14 21:45:34 +02:00
Bartosz Taudul
77a8e47c7d Update NEWS. 2019-08-14 21:33:43 +02:00
Bartosz Taudul
26f417a841 Add option to display running time in find zone menu. 2019-08-14 21:33:43 +02:00
Bartosz Taudul
9ec0724ffb Support dynamic recalculation of min, max and total time. 2019-08-14 21:33:42 +02:00
Bartosz Taudul
ee77ff020a Optimize calculation of zone running time. 2019-08-14 20:47:21 +02:00
Bartosz Taudul
a194c93740 Allow checking if context switch data is available. 2019-08-14 20:26:55 +02:00
Bartosz Taudul
9a364fe5fe Cache context switch data queries. 2019-08-14 20:16:11 +02:00
Bartosz Taudul
cf4e04440e Update manual. 2019-08-14 18:42:04 +02:00
Bartosz Taudul
a5ef38812e Display list of regions where thread was waiting. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
d520f1cc48 Display zone running time in zone tooltip. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
1ae540c7a1 Display zone running time in zone info window. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
858c94e12e Add interface for calculation zone running time. 2019-08-14 18:28:52 +02:00
Bartosz Taudul
0b12db5ee6 Display number of thread running state regions. 2019-08-14 17:36:19 +02:00
Bartosz Taudul
fadac0b433 Display thread running time. 2019-08-14 17:12:48 +02:00
Bartosz Taudul
3e01ca3269 Calculate how long thread was in running time. 2019-08-14 17:12:48 +02:00
Bartosz Taudul
72918cda19 Include recorded context switches in thread lifetime. 2019-08-14 17:03:33 +02:00
Bartosz Taudul
ca9078845c Update NEWS. 2019-08-14 16:53:19 +02:00
Bartosz Taudul
71b54dd48a Always collect thread names.
This fixes an issue when a thread was destroyed before its name could be
retrieved.
2019-08-14 16:52:04 +02:00
Bartosz Taudul
5e199d1ab3 Support ftrace on ARM. 2019-08-14 16:28:54 +02:00
Bartosz Taudul
5fbb811f5d Degrade ARM timer to monotonic raw clock.
The monotonic raw clock has the same accuracy as reading cntvct registers, but
using clock_gettime() has a measurable impact on queueing time (135 us vs
83 us).

This change is needed to enable ftrace time readings on ARM linux, which
doesn't provide any way to get raw cntvct readings, like x86-tsc on x86.
2019-08-14 16:19:02 +02:00
Bartosz Taudul
42865d7c7b Don't set x86-tsc clock on non-x86 platforms. 2019-08-14 15:14:36 +02:00
Bartosz Taudul
54a9132bb5 Skip context switch events in on demand mode, if no connection. 2019-08-14 15:09:33 +02:00
Bartosz Taudul
602c38c6c0 Allow checking timer implementation. 2019-08-14 14:35:44 +02:00
Bartosz Taudul
e39b1abce5 Handle linux wait states. 2019-08-14 14:02:31 +02:00
Bartosz Taudul
3988b56c92 Capture context switches on linux. 2019-08-14 13:56:15 +02:00
Bartosz Taudul
0bb0c10e3c Revert "Save one byte on ContextSwitchData."
Counting bits is hard, let's go shopping.
2019-08-14 13:55:05 +02:00
Bartosz Taudul
3996516fce One more SetThreadName() to change. 2019-08-14 02:27:01 +02:00
Bartosz Taudul
92b6da7cc2 SetThreadName() only works on the current thread.
This breaking change is required, because kernel trace facilities use
kernel thread ids, which are inaccessible from the pthread_t level.
2019-08-14 02:22:45 +02:00
Bartosz Taudul
339b7fd2a6 Use kernel thread ids on linux. 2019-08-14 01:57:36 +02:00
Bartosz Taudul
8925d026a9 Cosmetics. 2019-08-14 01:57:36 +02:00
Bartosz Taudul
e5c40b74ee Update manual. 2019-08-13 21:18:52 +02:00
Bartosz Taudul
2be38d912e Update NEWS. 2019-08-13 17:19:43 +02:00
Bartosz Taudul
73cbf2eead Use windows thread ids on cygwin. 2019-08-13 16:22:58 +02:00
Bartosz Taudul
71a5cffc13 Add context switch tooltips. 2019-08-13 16:20:43 +02:00
Bartosz Taudul
f285e0f5cc Save one byte on ContextSwitchData. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
d77c87ae1c Allow disabling context switch drawing. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
874a2596f7 Improve context switches drawing. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
b313e46139 Keep event trace properties to terminate trace on exit. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
7f856a1b16 Very bad context switch visualization. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
9417ad994d Save/load context switch data. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
1c937ad9bb Implement skipping frame image data. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
8c494eabbf Display number of context switch regions. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
0b03fed61c Add context switch accessor. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
419f74280d Store context switches. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
90d26cb1b6 Collect and send context switch events. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
fe0f1aea07 Add system tracing skeleton. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
8aa0be39d5 Drop support for CPU id queries. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
0b944c88bb Add a note about condition variables. 2019-08-12 17:01:01 +02:00
Bartosz Taudul
d6f32a0839 Serialize lock processing.
This makes is much easier to process on the server and opens new
optimization possibilities. It also fixes theoretical problems, which
may be caused by invalid ordering of events with the same timestamp.
2019-08-12 13:51:01 +02:00
Bartosz Taudul
0431c03556 Add serial queue interface. 2019-08-12 13:27:15 +02:00
Bartosz Taudul
760357d6ea Explain why there are two methods for filling serial queue. 2019-08-12 13:19:10 +02:00
Bartosz Taudul
7fbf2fa2ec Update NEWS. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
6398ecb344 Drop support for pre-0.4 traces. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
154c902e03 Handle legacy file versions. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
54076e717c Update font awesome to 5.10.1. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
a9b41eb657 Rework processing bad files. 2019-08-12 12:04:27 +02:00
Bartosz Taudul
9b6328f962 Release 0.5.0. 2019-08-10 22:14:14 +02:00
Bartosz Taudul
530f293c49 Better way to handle auto scrolling. 2019-08-10 22:06:51 +02:00
Bartosz Taudul
6fca188603 Update tech docs. 2019-08-08 19:25:35 +02:00
Bartosz Taudul
4d2c7899ab Allow skipping invariant TSC check. 2019-08-08 19:21:39 +02:00
Bartosz Taudul
3a221dafde Display error messages on console, if available. 2019-08-08 19:18:05 +02:00
Bartosz Taudul
aada588129 Proper buffer reset. 2019-08-04 17:48:19 +02:00
Bartosz Taudul
8ae90a6cbd Merge branch 'connection-popup' 2019-08-04 16:20:02 +02:00
Bartosz Taudul
177b79a528 Update manual. 2019-08-04 16:19:51 +02:00
Bartosz Taudul
d2490bb62b Update NEWS. 2019-08-04 15:58:05 +02:00
Bartosz Taudul
853e9c17e3 Display client address. 2019-08-04 15:56:52 +02:00
Bartosz Taudul
07da2e506a Fix deadlock problems. 2019-08-04 15:55:42 +02:00
Rokas K. (rku)
37b03559f0 Merged in rokups/tracy/mingw-fixes (pull request #39)
Fix multiple build errors when compiling with MinGW.
2019-08-04 13:36:42 +00:00
Rokas Kupstys
b391e4c21a Fix multiple build errors when compiling with MinGW. 2019-08-04 15:49:46 +03:00
Rokas Kupstys
b24ac75111 Move connection window into a popup when connected. 2019-08-04 13:58:43 +03:00
Bartosz Taudul
eed7039853 Another GPU time adjust fix. 2019-08-04 01:42:44 +02:00
Bartosz Taudul
e87b8d455e Use Theil estimator randomized approximation. 2019-08-04 01:40:11 +02:00
Bartosz Taudul
8953a2652e Update manual. 2019-08-04 00:48:29 +02:00
Bartosz Taudul
dee70ea8d9 Update NEWS. 2019-08-04 00:40:40 +02:00
Bartosz Taudul
6898fd9e42 GPU time adjust fixes. 2019-08-04 00:38:08 +02:00
Bartosz Taudul
9b7384b407 Fix multiple GPU drift entry fields. 2019-08-04 00:33:31 +02:00
Bartosz Taudul
323c37bd33 Fix GPU zone search. 2019-08-04 00:30:09 +02:00
Bartosz Taudul
a642abfde0 Implement automatic GPU clock drift calculation. 2019-08-04 00:23:23 +02:00
Bartosz Taudul
401b879ece Update NEWS. 2019-08-03 15:20:54 +02:00
Bartosz Taudul
da88e32887 Display FPS counts next to frame times. 2019-08-03 15:20:31 +02:00
Bartosz Taudul
51bdbdb71f Update manual. 2019-08-03 15:09:19 +02:00
Bartosz Taudul
6f9e3aaa50 Update NEWS. 2019-08-03 14:56:50 +02:00
Bartosz Taudul
6c958f6177 Increase height of frame graph. 2019-08-03 14:55:08 +02:00
Bartosz Taudul
58003e7a6b Draw target frame time lines. 2019-08-03 14:55:08 +02:00
Bartosz Taudul
a76622d17a Cache last searched ThreadData. 2019-08-03 14:35:01 +02:00
Bartosz Taudul
8a0701025d Update imgui to 1.72b. 2019-08-02 20:56:32 +02:00
Bartosz Taudul
b3a1f932c3 Update tech docs. 2019-08-02 20:46:20 +02:00
Bartosz Taudul
fb745de5ed Update NEWS. 2019-08-02 20:30:15 +02:00
Bartosz Taudul
12969ee497 Track thread context.
This change exploits the fact that events are processed in batches
originating from a single thread. A single message changing thread
context is enough to handle multiple messages, as opposed to inclusion
of thread identifier in each message.
2019-08-02 20:18:08 +02:00
Bartosz Taudul
9b6c405485 Bin number shouldn't be floating point. 2019-08-02 19:43:08 +02:00
Bartosz Taudul
138743f880 Update manual. 2019-08-01 23:24:51 +02:00
Bartosz Taudul
ba162940a3 Update NEWS. 2019-08-01 23:14:56 +02:00
Bartosz Taudul
a4e7a341c0 Proper handling of disconnect request. 2019-08-01 23:14:09 +02:00
Bartosz Taudul
344d36086f Simplify loop. 2019-07-31 18:53:51 +02:00
Bartosz Taudul
f41834370c Also display number of visible messages. 2019-07-31 02:16:14 +02:00
Bartosz Taudul
ccd88a9e27 Add text coloring to memory window. 2019-07-31 02:06:01 +02:00
Bartosz Taudul
68df815ef6 Display total message count. 2019-07-31 00:34:24 +02:00
Bartosz Taudul
526f3a55bc Update imgui to 1.72. 2019-07-30 22:53:52 +02:00
Bartosz Taudul
ca3571fd2b Still more. 2019-07-30 01:30:31 +02:00
Bartosz Taudul
47423e6263 And more. 2019-07-30 01:29:13 +02:00
Bartosz Taudul
d3783ae359 Remove magic template syntax. 2019-07-30 01:28:21 +02:00
Bartosz Taudul
9c28b82954 RPMallocInit and RPMallocThreadInit are identical. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
28220a5fbf Update manual. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
e289f2b8c0 Update tech docs. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
a6a3f45810 Fill in thread id during dequeue, not during enqueue. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
142ef53b42 Dequeue items from a single thread. 2019-07-29 23:44:08 +02:00
Bartosz Taudul
c7f769c52b Allow dequeuing from a single producer, retrieving thread id. 2019-07-29 23:29:30 +02:00
Bartosz Taudul
6cad76ae67 Store thread id in queue producer. 2019-07-29 23:13:06 +02:00
Bartosz Taudul
7ae9a28e32 Drop BlockingConcurrentQueue. 2019-07-29 22:58:13 +02:00
Bartosz Taudul
480a427e07 No need to hash thread ids anymore. 2019-07-29 22:36:04 +02:00
Bartosz Taudul
c60af95053 Remove unused const. 2019-07-29 22:33:32 +02:00
Bartosz Taudul
2d42abf552 Remove CannoAlloc functions. 2019-07-29 22:31:32 +02:00
Bartosz Taudul
b142860c8d More implicit producer removal. 2019-07-29 22:29:39 +02:00
Bartosz Taudul
db6eceb1a6 Producers must be explicit. 2019-07-29 22:25:28 +02:00
Bartosz Taudul
89928fde7b Queue must be always able to alloc. 2019-07-29 22:13:16 +02:00
Bartosz Taudul
a03734afa6 Remove more debug code. 2019-07-29 22:01:06 +02:00
Bartosz Taudul
e9a0145cd5 Remove MCDBGQ_NOLOCKFREE_IMPLICITPRODBLOCKINDEX. 2019-07-29 21:56:53 +02:00
Bartosz Taudul
b496f1ff90 Remove MOODYCAMEL_QUEUE_INTERNAL_DEBUG. 2019-07-29 21:52:49 +02:00
Bartosz Taudul
beaadc3a56 Remove always disabled MCDBGQ_TRACKMEM code. 2019-07-29 21:51:29 +02:00
Bartosz Taudul
82a4a6d9cc Add tracy_ prefix to concurrentqueue.h file name. 2019-07-29 21:47:50 +02:00
Bartosz Taudul
5dff7b5d1e AVX2 version of plot min max calculation.
Slightly faster (~5%) than the autovectorized serial code.
2019-07-29 20:59:22 +02:00
Bartosz Taudul
7a878cf4c7 Pause playback when playback window is closed. 2019-07-29 01:51:45 +02:00
Bartosz Taudul
461f49feb8 Fix drawing zones at extreme zoom out levels.
This is needed due to int64_t -> uint64_t zone end cast hack. There are
no side effects: zero time represents start of the timer, which would be
unix epoch or system bootup time.
2019-07-28 01:58:59 +02:00
Bartosz Taudul
3e74d041c9 Link with Thread Building Blocks, if available. 2019-07-28 01:53:39 +02:00
Bartosz Taudul
2e8d20b6e8 Keep zone info windows headers at top. 2019-07-27 13:28:18 +02:00
Bartosz Taudul
1afcd24dc6 Use big font in zone info windows. 2019-07-27 13:25:31 +02:00
Bartosz Taudul
245c6f9f01 Use big font in lock info window. 2019-07-27 13:18:59 +02:00
Bartosz Taudul
93195b6647 Move trace version display to trace statistics section. 2019-07-27 13:14:44 +02:00
Bartosz Taudul
2654a3010c Keep trace info header at top of the window. 2019-07-27 13:13:50 +02:00
Bartosz Taudul
e1af87744b Use less space for call stack tree headers. 2019-07-27 13:10:53 +02:00
Bartosz Taudul
fb11d67d8e Keep memory window header at top. 2019-07-27 13:06:23 +02:00
Bartosz Taudul
5c3095707a Filter out invalid Windows filename characters.
Do so even on unix, to allow easy transfer of user config between
different machines.
2019-07-27 01:21:11 +02:00
Bartosz Taudul
705a2fa3f4 Update manual. 2019-07-27 01:09:39 +02:00
Bartosz Taudul
9962873522 Update NEWS. 2019-07-26 23:44:20 +02:00
Bartosz Taudul
a7ef99d0b0 Keep find zone, compare headers at top. 2019-07-26 23:43:41 +02:00
Bartosz Taudul
f2cdb64aae Display trace descriptions in compare menu. 2019-07-26 23:33:49 +02:00
Bartosz Taudul
be3b458f28 Load second trace user data in compare menu. 2019-07-26 23:25:03 +02:00
Bartosz Taudul
e5d5af84fd Allow setting custom description of the trace. 2019-07-26 23:21:28 +02:00
Bartosz Taudul
c7e32a16ec Assert on invalid file names. 2019-07-26 23:15:12 +02:00
Bartosz Taudul
27965e8690 Add user data storage handler. 2019-07-26 23:15:12 +02:00
Bartosz Taudul
34cc7183d0 Trace-specific save path retrieval. 2019-07-26 23:15:12 +02:00
Bartosz Taudul
3ec1771f5a Move config directory retrieval to a separate function. 2019-07-26 22:42:50 +02:00
Bartosz Taudul
c1b70c6519 Display histogram time range on histogram. 2019-07-26 22:25:21 +02:00
Bartosz Taudul
276d764141 Fix cygwin. 2019-07-26 00:02:57 +02:00
Bartosz Taudul
36de7b2cc7 Fix incomplete headers. 2019-07-25 23:41:42 +02:00
Bartosz Taudul
e659220602 Use generic std::call_once() on other platforms. 2019-07-25 23:30:47 +02:00
Bartosz Taudul
5f96c55a3e Add background tasks notification tooltip. 2019-07-25 21:21:20 +02:00
Bartosz Taudul
9f29ddd562 Messages window should be scrollable due to thread list. 2019-07-25 20:50:30 +02:00
Bartosz Taudul
d3e8fe0133 Add messages filter clear button. 2019-07-25 20:49:44 +02:00
Bartosz Taudul
31aeadeba0 Update NEWS. 2019-07-25 20:45:16 +02:00
Bartosz Taudul
0f574b5547 Verify source file modification time against capture time. 2019-07-25 20:44:10 +02:00
Bartosz Taudul
16a40f2e1f Revert "Explicitly link with required libraries."
This reverts commit abaa0e8f6e.
2019-07-25 20:41:58 +02:00
Bartosz Taudul
c4b472b6e0 Ditto for call stack window. 2019-07-25 20:34:35 +02:00
Bartosz Taudul
269c3d4530 Keep statistics window headers always on top of the window. 2019-07-25 19:57:29 +02:00
Bartosz Taudul
2291b91ee0 Remove unnecessary separators. 2019-07-25 19:50:22 +02:00
Bartosz Taudul
d31d1f5946 Detect and report clang-cl. 2019-07-25 19:03:58 +02:00
Bartosz Taudul
55c9f41060 Vulkan queue doesn't need to be stored. 2019-07-25 18:54:32 +02:00
Bartosz Taudul
5a81fd5e6b Try updating to newest vcpkg on CI. 2019-07-25 18:46:57 +02:00
Bartosz Taudul
30f76d34a3 Fix printf warnings. 2019-07-25 18:41:52 +02:00
Bartosz Taudul
37c76edcd8 Explicitly require long long abs(). 2019-07-25 18:36:27 +02:00
Bartosz Taudul
abaa0e8f6e Explicitly link with required libraries.
This fixed clang-cl build.
2019-07-25 18:30:34 +02:00
Bartosz Taudul
1b79c35aac Don't use char8_t. 2019-07-25 12:58:16 +02:00
Bartosz Taudul
bd66c9dc5a Update NEWS. 2019-07-24 23:55:26 +02:00
Bartosz Taudul
7074b8ed8f Display notification popup during trace cleanup. 2019-07-24 23:54:47 +02:00
Bartosz Taudul
076bf1b475 Go back to build directory. 2019-07-24 23:32:41 +02:00
Bartosz Taudul
b19d633d68 Update tech docs. 2019-07-24 23:15:45 +02:00
Bartosz Taudul
a7e0b1614a Update manual. 2019-07-24 23:14:53 +02:00
Bartosz Taudul
52452038ad Update CI config. 2019-07-24 22:33:15 +02:00
Bartosz Taudul
57615775ea Migrate to VS2019, vcpkg. 2019-07-24 22:24:17 +02:00
Bartosz Taudul
e5a3d7aa25 Workaround scroll-to-message regression. 2019-07-24 21:40:39 +02:00
Bartosz Taudul
9ad9045078 Disable messages following when focusing on a message. 2019-07-24 02:21:51 +02:00
Bartosz Taudul
d908e148ec Update DXT1 AVX timings. 2019-07-22 20:01:16 +02:00
Bartosz Taudul
092e830264 Use shifts instead of const vector and. 2019-07-22 19:56:47 +02:00
Bartosz Taudul
cdbaec38eb Update tech docs. 2019-07-20 16:46:54 +02:00
Bartosz Taudul
db80673b9e Update DXT1 AVX timings. 2019-07-20 14:54:52 +02:00
Bartosz Taudul
178dc9eba7 Combine block data directly in AVX registers. 2019-07-20 14:52:34 +02:00
Bartosz Taudul
396c28011e Update DXT1 compression timings. 2019-07-19 22:16:33 +02:00
Bartosz Taudul
a6300ef7d1 Ditto on ARM. 2019-07-19 22:13:56 +02:00
Bartosz Taudul
dc49f2f76a Move DXT1 index conversion to server. 2019-07-19 21:46:58 +02:00
Bartosz Taudul
11ba77ced5 Use pthread_once() to initialize rpmalloc on linux. 2019-07-19 20:15:56 +02:00
Bartosz Taudul
4c28593031 Fix races in rpmalloc initialization.
Ensure rpmalloc_thread_initialize() int worker threads is called only after
rpmalloc_initialize() was called on the main profiler thread.
2019-07-19 19:25:27 +02:00
Bartosz Taudul
cef8124247 Replace or with addition to enable usra instruction. 2019-07-19 01:40:27 +02:00
Bartosz Taudul
fd4689a6e2 Don't perform unnecessary ands. 2019-07-19 01:19:52 +02:00
Bartosz Taudul
06296283b7 Fix texture completeness. 2019-07-19 00:53:34 +02:00
Bartosz Taudul
5da2076214 Add optional 2x zoom to frame images playback. 2019-07-19 00:51:52 +02:00
Bartosz Taudul
1c0c5f5282 Disable bilinear filtering for frame images. 2019-07-19 00:51:42 +02:00
Bartosz Taudul
dc992266fd Simplify OpenGL query checks. 2019-07-16 19:42:06 +02:00
Bartosz Taudul
9ec6c1e12d Basic technical documentation. 2019-07-15 21:00:12 +02:00
Bartosz Taudul
b99315ffbe Add some notes on how to get the most accurate results. 2019-07-13 20:49:56 +02:00
Bartosz Taudul
74a40c230f MinGW is also supported. 2019-07-13 20:49:50 +02:00
Bartosz Taudul
0ce93f714b Cosmetics. 2019-07-13 20:49:36 +02:00
Bartosz Taudul
ff9637e884 Update DXT1 timings table.
Clang is able to get much better times on ARM (around 430 us for both
ARM32 and ARM64 NEON). The reference implementation is 1.13 ms on clang.
2019-07-13 20:24:58 +02:00
Bartosz Taudul
f65373ece7 Replace two packs with one shuffle. 2019-07-13 20:01:12 +02:00
Bartosz Taudul
fc83f97ad3 Same for AVX/SSE. 2019-07-13 19:34:08 +02:00
Bartosz Taudul
62a167541c No need to mask out indices. 2019-07-13 19:07:25 +02:00
Bartosz Taudul
5633dc5a87 Add ARM64 NEON timings for DXT1 compression. 2019-07-13 15:32:07 +02:00
Alex
0c5ea710b0 Merged in z33ky/tracy/const-frame-image (pull request #37)
Constify frame-image pointer in API.
2019-07-13 13:09:21 +00:00
Bartosz Taudul
7bb9549e84 ARM64 specific NEON implementation of DXT1 compression. 2019-07-13 14:31:33 +02:00
Alexander 'z33ky' Hirsch
c6e8dc8d63 Constify frame-image pointer in API. 2019-07-13 12:33:55 +02:00
Bartosz Taudul
4c93952ffb Update manual. 2019-07-13 02:03:26 +02:00
Bartosz Taudul
eceff55f5a Add message filtering. 2019-07-13 01:48:43 +02:00
Bartosz Taudul
4944efa51f Update NEWS. 2019-07-13 01:27:26 +02:00
Bartosz Taudul
387674a40a Auto-scroll message list to bottom. 2019-07-13 01:25:37 +02:00
Bartosz Taudul
bcecd6e3a6 Always keep message list options at top. 2019-07-13 00:40:02 +02:00
Bartosz Taudul
c48ab4cb23 Use big font in trace information window. 2019-07-12 19:19:36 +02:00
Bartosz Taudul
7fb9bde9e9 Pass big font to TracyView. 2019-07-12 19:16:56 +02:00
Bartosz Taudul
fc28f827bc Rearrange trace information window. 2019-07-12 19:12:04 +02:00
Bartosz Taudul
5a4c7518ed Update manual. 2019-07-12 19:03:05 +02:00
Bartosz Taudul
290c895f83 Update NEWS. 2019-07-12 18:47:20 +02:00
Bartosz Taudul
2e774f4626 Save/load application info. 2019-07-12 18:45:35 +02:00
Bartosz Taudul
8c9d46ef29 Display application info in info window. 2019-07-12 18:39:07 +02:00
Bartosz Taudul
d64ab7db5a Store app info messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
60d2384a6a Allow sending application information messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
cd018e88a4 Update manual. 2019-07-11 20:32:39 +02:00
Bartosz Taudul
689f4999e3 Reorder threads by drag and drop. 2019-07-11 20:29:20 +02:00
Bartosz Taudul
29d8911c6b Fix Vector::erase(). 2019-07-11 20:29:20 +02:00
Bartosz Taudul
a1ce5fc1f6 Add include for built-in __get_cpuid() on gcc/clang. 2019-07-10 02:09:19 +02:00
Bartosz Taudul
90369335cf Update NEWS. 2019-07-10 02:05:27 +02:00
Bartosz Taudul
c164a70b9d Check for rdstcp/invariant tsc support. 2019-07-10 02:04:14 +02:00
Bartosz Taudul
c0670848d2 Reuse variable. 2019-07-08 02:08:06 +02:00
Bartosz Taudul
05dd9a5e59 Update DXT1 timings. 2019-07-08 00:16:06 +02:00
Bartosz Taudul
17dbbe67de Remove dependency on range subtraction. 2019-07-08 00:14:36 +02:00
Bartosz Taudul
a33205e3bd Update DXT1 timings. 2019-07-08 00:01:57 +02:00
Bartosz Taudul
af1bd3e1fa Faster horizontal add. 2019-07-07 23:57:23 +02:00
Bartosz Taudul
bde9045af5 Update DXT1 timings.
SSE takes a hit due to unfavourable codegen.
2019-07-06 00:51:19 +02:00
Bartosz Taudul
b32e8fa24e Ditto for NEON. 2019-07-06 00:18:53 +02:00
Bartosz Taudul
d236d4b70f Ditto for AVX2. 2019-07-06 00:05:32 +02:00
Bartosz Taudul
f62b21c21d Masking alpha out is not needed.
We assume that alpha value is constant for the whole image. The range
calculation is max - min, so alpha zeroes out. The color normalization
to range is color - min, so alpha also zeroes out here.
2019-07-05 23:58:19 +02:00
Bartosz Taudul
e9676ea1d5 Update DXT1 timings. 2019-07-05 18:38:52 +02:00
Bartosz Taudul
03189a30b8 Two ands less in NEON DXT1 compression. 2019-07-05 18:37:25 +02:00
Bartosz Taudul
275d992cb1 Two ands less in AVX2 DXT1 compression. 2019-07-05 18:22:42 +02:00
Bartosz Taudul
c89358d6b9 Two ands less in SSE DXT1 compression. 2019-07-05 18:17:50 +02:00
Bartosz Taudul
5bfc62f1bf iOS device name decoding. 2019-06-19 09:59:46 +02:00
Bartosz Taudul
59b4f84ce5 Display unknown implementer, part as hex values. 2019-07-03 21:18:17 +02:00
Bartosz Taudul
c6f6c368b2 Decode ARM CPU names. 2019-07-03 21:01:34 +02:00
Bartosz Taudul
e26ab8e9f6 Make forwarding functions more compact. 2019-07-03 18:05:38 +02:00
Bartosz Taudul
94b470ba66 Chomp newline from end of thread string. 2019-07-03 16:12:37 +02:00
Bartosz Taudul
d664b93ae0 Describe why there's no CPU usage graph in android traces. 2019-07-03 00:08:30 +02:00
Bartosz Taudul
f80a0a87bd Remove const qualifier from TracyCZoneCtx.
Some containers don't support storing const types. This struct, as
visible to user, is immutable, so treat it as if const was declared
here.
2019-07-01 19:16:15 +02:00
Bartosz Taudul
080ec6e836 Expand manual wrt manual zone scope management. 2019-07-01 18:29:24 +02:00
Bartosz Taudul
bdfb568742 Fix div tables for max range on all channels. 2019-07-01 12:31:06 +02:00
Bartosz Taudul
684a119a2c Fix order of checks for including intrinsics. 2019-07-01 11:45:16 +02:00
Bartosz Taudul
6b06b64caf Smaller histogram controls. 2019-06-30 18:11:19 +02:00
Bartosz Taudul
3c45476012 Update timings again. 2019-06-30 12:16:22 +02:00
Bartosz Taudul
983c48994b Write block data directly to memory. 2019-06-30 11:44:32 +02:00
Bartosz Taudul
9b8c18f99e Improve readability. 2019-06-30 11:44:00 +02:00
Bartosz Taudul
43042a2aa8 Update DXT1 timings table. 2019-06-30 03:39:37 +02:00
Bartosz Taudul
52b6bdb55a Force inline ProcessRGB functions. 2019-06-30 03:33:14 +02:00
Bartosz Taudul
ddd89dcce5 Add DXT1 AVX2 timings. 2019-06-30 03:23:20 +02:00
Bartosz Taudul
8c06f7288c AVX2 DXT1 compression. 2019-06-30 03:20:58 +02:00
Bartosz Taudul
a1e3d9765f Update DXT1 SSE timings. 2019-06-29 12:23:29 +02:00
Bartosz Taudul
2e893bba91 Use division tables. 2019-06-29 12:16:49 +02:00
Bartosz Taudul
b73f428739 Add DXT1 div table generator. 2019-06-29 11:49:52 +02:00
Bartosz Taudul
370fead4b2 Update DXT1 timings table. 2019-06-29 02:10:35 +02:00
Bartosz Taudul
ab9f036f5e Integrate CheckSolid into ProcessRGB. 2019-06-29 02:04:08 +02:00
Bartosz Taudul
50ac219e97 Update NEON timings in DXT1 table. 2019-06-28 22:40:04 +02:00
Bartosz Taudul
faf6bb97a4 DXT1 NEON color index packing. 2019-06-28 22:36:44 +02:00
Bartosz Taudul
4ee45259f2 Update SSE timings in DXT1 table. 2019-06-28 22:00:59 +02:00
Bartosz Taudul
2df1eaaa7e Pack color indices using SSE. 2019-06-28 21:58:10 +02:00
Bartosz Taudul
d593e5dfa9 DXT1 SIMD color index table generator. 2019-06-28 21:57:38 +02:00
Bartosz Taudul
3208d6c803 Add ARM NEON DXT1 compression timings to manual. 2019-06-28 14:26:00 +02:00
Bartosz Taudul
fcb5b4b888 NEON DXT1 compression. 2019-06-28 14:24:16 +02:00
Bartosz Taudul
e8d4ba492b Unify shifts. 2019-06-28 13:05:32 +02:00
Bartosz Taudul
be4900c822 NEON CheckSolid. 2019-06-28 01:47:04 +02:00
Bartosz Taudul
33486fa3cf Update ARM timings. 2019-06-27 22:47:26 +02:00
Bartosz Taudul
3c066f1527 Simplify code. 2019-06-27 22:40:03 +02:00
Bartosz Taudul
77c6acbc48 Update manual. 2019-06-27 22:30:05 +02:00
Bartosz Taudul
72a0d4c2ab Rest of SSE DXTC compression. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
137b28e110 SSE CheckSolid. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
3d590b6b8c Initialize rpmalloc in compression thread. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
10bcc8c770 Switch to DXT1 textures in profiler utility. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
1939c31165 Experimental DXT1 compressor. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
79eb1b9029 Swap queue and dequeue only if queue has contents. 2019-06-27 13:37:09 +02:00
Bartosz Taudul
aa4ce30dff Update manual. 2019-06-27 13:32:57 +02:00
Bartosz Taudul
7dc7ece2bd Add staging area for frame images.
Compressing frame images on a separate thread may cause frame image
arrival before frames are sent. Fix this issue by creating a staging
area in which frame images will wait for frames to arrive.

This probably breaks playback functionality, as non-existent frames may
be queried, but this problem seems to be very hard to find, so let's
ignore it for now.
2019-06-27 13:24:35 +02:00
Bartosz Taudul
bb35f9a897 Compress frame images in a separate thread. 2019-06-27 13:24:35 +02:00
Bartosz Taudul
7ebd2162c6 Add ETC1 compression thread. 2019-06-26 22:57:24 +02:00
Bartosz Taudul
f565e11976 Store frame images in queue. 2019-06-26 22:52:24 +02:00
Bartosz Taudul
fc106079c5 Remove CPU migration highlight for zones. 2019-06-26 21:35:09 +02:00
Bartosz Taudul
3bf23e15bb Update manual. 2019-06-26 21:07:12 +02:00
Bartosz Taudul
e3aa0a5c88 Update NEWS. 2019-06-26 21:03:34 +02:00
Bartosz Taudul
bc3c375b02 Display crash icon in notification area. 2019-06-26 21:02:04 +02:00
Bartosz Taudul
b8794f64be Extract crash tooltip to a separate function. 2019-06-26 21:01:54 +02:00
Bartosz Taudul
281dcf7c1f Cast to proper types. 2019-06-26 19:33:37 +02:00
Bartosz Taudul
8ce41b3543 Proper init order of thread local thread handle. 2019-06-26 19:32:52 +02:00
Bartosz Taudul
64980a1e6f Use async resolv service. 2019-06-26 18:49:21 +02:00
Bartosz Taudul
5e97e83401 Address can't change. 2019-06-26 18:46:51 +02:00
Bartosz Taudul
913c1e57a6 Add threaded resolv service. 2019-06-26 18:46:51 +02:00
Bartosz Taudul
a8cb257474 Revert "Resolve client host name using DNS."
This reverts commit 48df667a37.
2019-06-26 17:58:23 +02:00
Bartosz Taudul
0aa0b4ac8a Try lower query counts in out-of-memory situations. 2019-06-26 16:43:56 +02:00
Bartosz Taudul
659631fc06 Make vulkan query count variable. 2019-06-26 16:42:51 +02:00
Bartosz Taudul
bc7f2c49c8 GetThreadHandle() might be used by application's code. 2019-06-25 15:44:49 +02:00
Bartosz Taudul
0b656c3469 Update manual. 2019-06-24 21:15:33 +02:00
Bartosz Taudul
9ca254307a Add callstack versions of C API macros. 2019-06-24 21:10:41 +02:00
Bartosz Taudul
c749a2e3fe Add C API for plots and messages. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
48e08acb62 Add C API for frame markup. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
ee99ce833c Implement memory allocation tracking for C API. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
281477f7f9 Tokens must be retrieved for each enqueue. 2019-06-24 20:12:14 +02:00
Bartosz Taudul
06a41708a7 Move TLS accesses close together. 2019-06-24 19:38:44 +02:00
Bartosz Taudul
c4f0965851 Don't use cached thread id to retrieve main thread id. 2019-06-24 19:38:07 +02:00
Bartosz Taudul
a56c47a6a0 Store thread handle in a thread local variable.
This saves us a non-inlineable function call. Thread local block is
accessed anyway, since we need to get the token, so we already have the
pointer and don't need to get it a second time (which is done inside
Windows' GetCurrentThreadId()). We also don't need to store the thread
id in ScopedZone anymore, as it was a micro-optimization to save us the
second GetThreadHandle() call.

This change has a measurable effect of reducing enqueue time from ~10 to
~8 ns.

A further optimization would be to completely skip thread handle
retrieval during zone capture and do it instead on retrieval of data
from the queue. Since each thread has its own producer ("token"), the
thread handle should be accessible during the dequeue operation. This is
a much more invasive change, that would require a) modification of the
queue, b) additional processing of dequeued data to inject the thread
handle.
2019-06-24 19:19:47 +02:00
Bartosz Taudul
46b75c5a19 Only enable tracy-internal GetThreadHandle if tracy is enabled. 2019-06-24 19:18:52 +02:00
Bartosz Taudul
79bfac9ca9 Use proper popcnt for gcc/clang (including cygwin). 2019-06-24 18:56:04 +02:00
Bartosz Taudul
9375afdbed All variables must be defined before goto. 2019-06-23 00:36:25 +02:00
Bartosz Taudul
6bdfedead2 Update nfd to ceb75f7abf3. 2019-06-23 00:35:19 +02:00
Bartosz Taudul
815ad7df28 Update manual. 2019-06-23 00:21:56 +02:00
Bartosz Taudul
a8dcd5d153 Ctrl-click on frame in frame overview to show playback window. 2019-06-23 00:11:46 +02:00
Bartosz Taudul
f125254d14 Cosmetics. 2019-06-23 00:00:16 +02:00
Bartosz Taudul
2f707bd152 Improve frame label drawing logic. 2019-06-22 23:49:30 +02:00
Bartosz Taudul
7217a99dc2 Always show at least one pixel of a frame in frame overview. 2019-06-22 22:48:32 +02:00
Bartosz Taudul
c48cd10f35 Don't divide by zero in zero-length zones. 2019-06-22 22:42:57 +02:00
Bartosz Taudul
1d4117f515 Fix typo. 2019-06-22 14:55:01 +02:00
Bartosz Taudul
ad26eaa9f1 Don't put "select/unselect all" buttons in a separate line. 2019-06-22 14:43:58 +02:00
Bartosz Taudul
0944eab707 Add background tasks icon. 2019-06-22 14:37:17 +02:00
Bartosz Taudul
4d4190c825 Update NEWS. 2019-06-22 14:25:35 +02:00
Bartosz Taudul
e33690c5f3 Allow switching whitespace visibility in source code view. 2019-06-22 14:24:39 +02:00
Bartosz Taudul
53fe688bff Update ImGuiColorTextEdit to 0a88824f7de8d. 2019-06-22 14:19:10 +02:00
Bartosz Taudul
18cef20db9 Silence signed/unsigned comparison warnings. 2019-06-22 14:15:25 +02:00
Bartosz Taudul
8f7be5a0fa Allow only 2^32-1 frame images. 2019-06-22 14:11:45 +02:00
Bartosz Taudul
fadf8e3e0a Can't read negative number of bytes.
This completely ignores error handling, which probably should be added.
The code behavior doesn't change, as the existing comparisons and
asserts already promoted the signed value to unsigned.
2019-06-22 14:08:48 +02:00
Bartosz Taudul
1c41229766 Use proper type for buffer size comparison. 2019-06-22 14:07:53 +02:00
Bartosz Taudul
70a7033a64 Use proper type for iteration. 2019-06-22 14:07:26 +02:00
Bartosz Taudul
1ea647a1dd Use proper type for srcloc highlight decay value. 2019-06-22 14:06:25 +02:00
Bartosz Taudul
aaefd6e1d6 Simplify code. 2019-06-22 14:06:10 +02:00
Bartosz Taudul
6a82f666a7 Cosmetics. 2019-06-22 14:05:18 +02:00
Bartosz Taudul
54ae4c84ba Silence warning about unused variable. 2019-06-22 14:04:48 +02:00
Bartosz Taudul
de953bfaa8 Use proper data type for callstack storage in GPU zones. 2019-06-22 14:04:27 +02:00
Bartosz Taudul
323f0e1ae3 Don't create variable for exception in catch block. 2019-06-22 13:41:24 +02:00
Bartosz Taudul
eb4c7ca9ea Ignore useless warnings. 2019-06-22 13:40:00 +02:00
Bartosz Taudul
a3ce08a9f9 Display zone time as percentage of average zone time. 2019-06-22 13:22:13 +02:00
Bartosz Taudul
5fde56d96a Remove hidden zone time without profiling tooltip. 2019-06-22 13:10:46 +02:00
Bartosz Taudul
850815534e Insert frame mark at beginning of on-demand connection. 2019-06-21 19:39:41 +02:00
Bartosz Taudul
fd9fc880a6 Send current time in on-demand welcome message. 2019-06-21 19:39:41 +02:00
Bartosz Taudul
48df667a37 Resolve client host name using DNS.
This operation is blocking and should be made asynchronous.
2019-06-21 19:27:41 +02:00
Bartosz Taudul
659ef87974 Animate highlighted messages on the timeline. 2019-06-21 14:25:51 +02:00
Bartosz Taudul
bb44e80e5a Use smaller UI elements in selected places. 2019-06-21 14:15:46 +02:00
Bartosz Taudul
8259816de3 Improve playback interruptions on user input. 2019-06-21 13:08:41 +02:00
Bartosz Taudul
a916c28269 Build test application on appveyor. 2019-06-19 22:17:11 +02:00
Bartosz Taudul
ae4f9663aa Selecting frames stops playback. 2019-06-19 20:05:23 +02:00
Bartosz Taudul
51135c1d20 Pulse hover-info line on histograms. 2019-06-19 20:01:41 +02:00
Bartosz Taudul
d44c4b00fb Implement outliers cutoff in compare menu. 2019-06-18 22:27:25 +02:00
Bartosz Taudul
d66be0e033 Update manual. 2019-06-18 21:02:49 +02:00
Bartosz Taudul
3fcd73680c Simulate client activity time advancement. 2019-06-18 20:56:42 +02:00
Bartosz Taudul
800d95c089 Display discovered clients activity times. 2019-06-18 20:51:12 +02:00
Bartosz Taudul
5309e6d94a Broadcast client activity time. 2019-06-18 20:46:12 +02:00
Bartosz Taudul
1a32edebf2 Extract text printing functions. 2019-06-18 20:43:28 +02:00
Bartosz Taudul
aa5259b20a Use the same port (8086) for both TCP and UDP traffic. 2019-06-18 20:28:03 +02:00
Bartosz Taudul
0e5a7263d9 Define broadcast message, add versioning. 2019-06-18 20:26:40 +02:00
Bartosz Taudul
0b394c3f53 Don't need to keep last broadcast time in Profiler class. 2019-06-18 20:15:09 +02:00
Bartosz Taudul
99e638b3fc Normalize values during compare by default. 2019-06-18 19:41:20 +02:00
Bartosz Taudul
5e6bc30bab Support GL_EXT_disjoint_timer_query with EXT postfix. 2019-06-18 16:34:27 +02:00
Bartosz Taudul
2d3e7ee796 More aggressive broadcast repeat timeout. 2019-06-18 00:54:58 +02:00
Bartosz Taudul
53863fe0e7 Set sane initial window sizes. 2019-06-17 23:49:10 +02:00
Bartosz Taudul
ae70f694dd Update manual. 2019-06-17 20:25:25 +02:00
Bartosz Taudul
b8b1fae900 Don't confuse user by suggesting the list is complete. 2019-06-17 20:24:47 +02:00
Bartosz Taudul
dd4c61e964 Update NEWS. 2019-06-17 20:04:14 +02:00
Bartosz Taudul
11dc8e67e5 Change broadcast rate from 5s to 3s. 2019-06-17 19:57:17 +02:00
Bartosz Taudul
6bf8081f5b Remove debug leftovers. 2019-06-17 19:52:44 +02:00
Bartosz Taudul
12e44fc605 Missing include. 2019-06-17 19:51:58 +02:00
Bartosz Taudul
5a359aa376 Allow connecting to broadcasting clients. 2019-06-17 19:50:34 +02:00
Bartosz Taudul
67daff1452 Display list of broadcasting clients. 2019-06-17 19:45:47 +02:00
Bartosz Taudul
36989da2c6 Also store client address. 2019-06-17 19:45:36 +02:00
Bartosz Taudul
265913d969 Process client broadcasts. 2019-06-17 19:34:48 +02:00
Bartosz Taudul
e0bbb41976 Add UDP listen socket and IP address wrapper. 2019-06-17 19:23:43 +02:00
Bartosz Taudul
de058d2a0d Don't hardcode broadcast port. 2019-06-17 18:37:34 +02:00
Bartosz Taudul
1b3b3a94a2 Broadcast protocol version and process name. 2019-06-17 18:34:35 +02:00
Bartosz Taudul
0b9ef7e514 Disable broadcast if TRACY_NO_BROADCAST is defined. 2019-06-17 18:18:58 +02:00
Bartosz Taudul
e609c0fdce UDP broadcast loop. 2019-06-17 02:25:09 +02:00
Bartosz Taudul
40e517594b Add UDP broadcast socket. 2019-06-17 02:24:55 +02:00
Bartosz Taudul
5db6cc4eee Update manual. 2019-06-17 01:24:48 +02:00
Bartosz Taudul
60f0b81faf More compact welcome dialog. 2019-06-17 01:21:55 +02:00
Bartosz Taudul
38ebc2e989 Add icon to "go to frame" button. 2019-06-17 01:13:32 +02:00
Bartosz Taudul
eed849c589 Add reset button to min bin value fields. 2019-06-17 01:12:24 +02:00
Bartosz Taudul
add5c0fb87 Perform proper division. 2019-06-17 01:09:25 +02:00
Bartosz Taudul
b2bbd95430 Changing log time requires bin cache reset. 2019-06-17 01:05:46 +02:00
Bartosz Taudul
e30cf7eafa Update NEWS. 2019-06-17 01:02:52 +02:00
Bartosz Taudul
f27cead040 Add hovered frame markers on histogram. 2019-06-17 01:01:56 +02:00
Bartosz Taudul
099933e66d Add outlier removal to frame time histogram. 2019-06-17 00:44:34 +02:00
Bartosz Taudul
b1f49d4c69 Update manual. 2019-06-16 17:22:29 +02:00
Bartosz Taudul
507e4db14b Update NEWS. 2019-06-16 17:15:47 +02:00
Bartosz Taudul
efe65e2e64 Display currently hovered zone on histogram. 2019-06-16 17:14:47 +02:00
Bartosz Taudul
6a4f7ce1ca Track currently hovered zone. 2019-06-16 17:05:56 +02:00
Bartosz Taudul
6e8b5381a5 Ctrl-click on a zone to go straight to zone statistics. 2019-06-16 17:00:25 +02:00
Bartosz Taudul
d361261993 Open playback from frame using ctrl+left click. 2019-06-16 16:49:21 +02:00
Bartosz Taudul
d683699ba9 Don't recalculate histogram bins every frame.
This remedies slowdown that was only visible when a histogram of a large
number of zones (~100 million) was displayed. The slowdown was caused by
std::accumulate() calls over whole set of zones.
2019-06-16 16:41:52 +02:00
Bartosz Taudul
14398dd4e8 Move bin setup closer to bin usage. 2019-06-16 16:29:18 +02:00
Bartosz Taudul
761405e2a7 Clip histogram highlight to graph area. 2019-06-16 16:23:24 +02:00
Bartosz Taudul
26178dfb00 Update manual. 2019-06-16 02:23:11 +02:00
Bartosz Taudul
df163f627b Update NEWS. 2019-06-16 01:59:30 +02:00
Bartosz Taudul
89f798158f Implement outlier cutoff on histogram. 2019-06-16 01:58:44 +02:00
Bartosz Taudul
8009c6412e Add "minimum values in bin" parameter to histogram. 2019-06-16 01:58:44 +02:00
Bartosz Taudul
4186a71ee7 Cache sorted begin and end iterators. 2019-06-16 01:28:36 +02:00
Bartosz Taudul
26f223e4cd Don't shrink histogram bin buffers. 2019-06-16 00:25:22 +02:00
Bartosz Taudul
f7a590de98 Improve ETC1 timing table. 2019-06-15 21:08:35 +02:00
Bartosz Taudul
103be314e7 Update NEON ETC1 compression timings. 2019-06-15 15:38:05 +02:00
Bartosz Taudul
014c3ed63b Use non-reference, optimized NEON ETC1 compression. 2019-06-15 15:35:57 +02:00
Bartosz Taudul
31a4a45b14 Ignore memory free faults if running on apple.
There's a case in MoltenVK initialization where overloading operator new
and operator delete works for std::string destruction, but not
construction.
2019-06-13 14:15:17 +02:00
Bartosz Taudul
ab4e99229d Indicate whether client is running on apple shitware. 2019-06-13 14:05:15 +02:00
Bartosz Taudul
e05669a80f Add ETC1 compression timings for ARM. 2019-06-13 02:12:03 +02:00
Bartosz Taudul
e5d5abf59a Add NEON path for ETC1 compression. 2019-06-13 02:04:19 +02:00
Bartosz Taudul
5c1bae812a Add frame images to test application. 2019-06-13 01:53:47 +02:00
Bartosz Taudul
2eb67b4684 Add test image. 2019-06-13 01:48:13 +02:00
Bartosz Taudul
516bdcec9b Rewrite playback logic. 2019-06-13 00:12:06 +02:00
Bartosz Taudul
c43f8562ec Rename "sync view" to "sync timeline". 2019-06-12 23:46:14 +02:00
Bartosz Taudul
756379b9d8 Update manual. 2019-06-12 23:45:27 +02:00
Bartosz Taudul
bdfd2c07be Right-click on a frame to set frame in playback. 2019-06-12 23:14:19 +02:00
Bartosz Taudul
796ca57067 Update imgui to 1.71. 2019-06-12 22:53:23 +02:00
Bartosz Taudul
d3e0163dd4 Add byteswap for apple. 2019-06-12 16:54:44 +02:00
Bartosz Taudul
8827f568e4 Update manual. 2019-06-12 15:35:00 +02:00
Bartosz Taudul
afa967afb0 Flip frame image if need be. 2019-06-12 15:30:08 +02:00
Bartosz Taudul
37d1457b44 Frame image may need flipping. 2019-06-12 15:28:32 +02:00
Bartosz Taudul
29fd4b1fe9 Don't animate frame changes during playback. 2019-06-12 13:25:45 +02:00
Bartosz Taudul
61bad76e5a Update manual. 2019-06-12 01:48:11 +02:00
Bartosz Taudul
a936f22a91 Add frame images playback window. 2019-06-12 01:48:11 +02:00
Bartosz Taudul
eb6ac5e6e1 Store frame reference in frame images. 2019-06-12 00:55:02 +02:00
Bartosz Taudul
38b76ea32d Add frame images vector accessor. 2019-06-12 00:14:44 +02:00
Bartosz Taudul
04dd33f5c4 Fix mismatched linkage. 2019-06-11 23:51:12 +02:00
Rokas K. (rku)
c4e05b6264 Merged in rokups/tracy/dllimport-cleanup (pull request #36)
Clean up imported functions in multi-dll projects.

Approved-by: Till Rathmann <till.rathmann@gmx.de>
2019-06-11 15:04:34 +00:00
Bartosz Taudul
de544ef959 Update manual. 2019-06-11 02:25:03 +02:00
Bartosz Taudul
84a52c5d62 Add join discord button. 2019-06-11 02:12:34 +02:00
Bartosz Taudul
57b8b425ba Discard send buffer data after disconnect. 2019-06-10 02:11:29 +02:00
Bartosz Taudul
2cf50427be Add FastVector to natvis. 2019-06-10 01:50:26 +02:00
Bartosz Taudul
2bc7a9bd30 Close listen socket in destructor. 2019-06-09 18:14:26 +02:00
Bartosz Taudul
5f8eadfb16 Release zone id stack. 2019-06-09 17:56:41 +02:00
Bartosz Taudul
a3173965d6 Same for Vis() reference. 2019-06-09 17:51:37 +02:00
Bartosz Taudul
2aa6f70765 Drawing locks may invalidate Vis() iterator. 2019-06-09 17:46:59 +02:00
Bartosz Taudul
80dff1ede1 Add connection id for on-demand mode.
Long-lived zones could send their end events without begin events in a
following scenario:

1. On-demand connection is made.
2. Zone begin is emitted, m_active is set to true.
3. Connection is terminated.
4. A new connection is made.
5. Zone end is emitted, because m_active is true.

To this point it was assumed that all zone end events will happen before
a new connection is made, but it's not necessarily true.
2019-06-09 17:15:47 +02:00
Bartosz Taudul
0db9c73d76 Immediately react to connection termination. 2019-06-09 16:51:39 +02:00
Bartosz Taudul
cc5bad294a More strict memory ordering for on-demand connection status. 2019-06-09 16:48:00 +02:00
Bartosz Taudul
e2d42fae2f We're done here, don't try to send termination request. 2019-06-09 16:25:52 +02:00
Bartosz Taudul
496f866add Don't send data when connection is terminated.
There are only two cases for which HandleServerQuery() returns false.
Either data can't be read from the socket (which is checked by HasData()
call before calling HandleServerQuery()), or if the server sent
termination query. In both these cases there's no need to send data
anymore.
2019-06-09 16:19:40 +02:00
Bartosz Taudul
23e7850162 Make DequeueStatus enum class. 2019-06-09 16:14:30 +02:00
Bartosz Taudul
34d89d39a1 Prevent double freeing of socket. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
b1f8d9fba1 Send server termination query on server disconnect. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
2c780f1af4 Allow sending immediate termination query from server. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
139299389b Add comments to client connection handling. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
d6d7b82529 Ignore invalid frame images in on-demand mode. 2019-06-09 15:37:49 +02:00
Bartosz Taudul
4c2ff80ac8 Restore frame counting for on-demand mode. 2019-06-09 15:23:01 +02:00
Bartosz Taudul
50cda7720f Handle frame image instrumentation failures. 2019-06-09 13:44:53 +02:00
Bartosz Taudul
5d5b12dce4 Add a note about expected lifetime of image data. 2019-06-09 13:20:46 +02:00
Bartosz Taudul
99c8144154 Show performance difference of async capture. 2019-06-09 13:17:08 +02:00
Bartosz Taudul
22d7b2c78d Polishing words. 2019-06-09 12:50:14 +02:00
Bartosz Taudul
bef1988800 Compress frame images using LZ4. 2019-06-08 12:17:18 +02:00
Bartosz Taudul
c3c116317d Fences must be deleted. 2019-06-08 12:08:20 +02:00
Bartosz Taudul
00a468162d Fix signed/unsigned comparison. 2019-06-08 00:57:25 +02:00
Bartosz Taudul
5470dae120 Add AVX2 ETC1 timings to the manual. 2019-06-08 00:54:46 +02:00
Bartosz Taudul
9ef128995a Add AVX2 version of etcpak. 2019-06-08 00:50:39 +02:00
Bartosz Taudul
7e9539ef2d AVX implies SSE 4.1. 2019-06-08 00:39:19 +02:00
Bartosz Taudul
76379a761a Update manual. 2019-06-08 00:06:37 +02:00
Bartosz Taudul
1954caa806 Fix listings. 2019-06-08 00:04:10 +02:00
Bartosz Taudul
2c01f244a8 Update NEWS. 2019-06-07 20:14:21 +02:00
Bartosz Taudul
fc5a8f7e3a Assign frame image to the correct frame (including offset). 2019-06-07 20:13:08 +02:00
Bartosz Taudul
784c4da53a Include frame offset in frame image message. 2019-06-07 20:09:29 +02:00
Rokas Kupstys
9bd1037347 Clean up imported functions in multi-dll projects. 2019-06-07 19:50:08 +03:00
Bartosz Taudul
8c912890f0 Proper case in includes. 2019-06-07 01:35:35 +02:00
Bartosz Taudul
d271634a95 Keep one ETC1 compression buffer. 2019-06-07 01:29:24 +02:00
Bartosz Taudul
34a6fe7055 _bswap may be already defined. 2019-06-07 01:07:51 +02:00
Bartosz Taudul
ff5170d0e9 Silence warnings. 2019-06-07 01:03:28 +02:00
Bartosz Taudul
42a30bffe1 Frame images are now ETC1 compressed. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
a654b642ef Compress frame images to ETC1 before sending. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
aff3246f82 Add ETC1 compressor. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
646e7327b8 Show loading progress of frame images. 2019-06-06 23:40:37 +02:00
Bartosz Taudul
f8a4909c96 Display number of frame images in a trace. 2019-06-06 23:21:36 +02:00
Bartosz Taudul
9cd95db4e3 Delay creation of frame image texture. 2019-06-06 23:14:49 +02:00
Bartosz Taudul
129155946b Actually set current texture pointer. 2019-06-06 23:10:01 +02:00
Bartosz Taudul
6b2741ccdb Save/load frame images. 2019-06-06 23:08:19 +02:00
Bartosz Taudul
6ae4afa0f4 Display frame images also on frame time graph. 2019-06-06 22:43:39 +02:00
Bartosz Taudul
08fb2b7337 Tooltip cosmetics. 2019-06-06 22:32:20 +02:00
Bartosz Taudul
c46576a68c Flip UV. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
cd2f572a2f Use proper index. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
beea31edd0 Show frame images in frame tooltips. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
82d4fe7236 Add texture wrapper. 2019-06-06 22:14:51 +02:00
Bartosz Taudul
af56f41e32 Add frame image accessor. 2019-06-06 22:14:51 +02:00
Bartosz Taudul
34b84bb284 Add frame image index to frame data. 2019-06-06 21:44:48 +02:00
Bartosz Taudul
e5bb6011c5 Frame image transfer prototype. 2019-06-06 21:39:54 +02:00
Bartosz Taudul
a37348c5c7 Increase contrast. 2019-06-06 01:45:41 +02:00
Bartosz Taudul
2b917c6adf Draw index area labels with contrast. 2019-06-06 01:40:23 +02:00
Bartosz Taudul
45039fc417 Don't format colored text where not necessary. 2019-06-03 01:36:03 +02:00
Bartosz Taudul
4f5286a860 Add unformatted colored text extension function. 2019-06-03 01:35:53 +02:00
Bartosz Taudul
ff6768986e Move imgui extension function to an appropriate place. 2019-06-03 01:35:32 +02:00
Bartosz Taudul
c433e76c7a Use TextUnformatted in TextCentered. 2019-06-03 01:28:45 +02:00
Bartosz Taudul
42fefde161 Protect against plot range equal zero. 2019-06-03 01:19:01 +02:00
Bartosz Taudul
76ed1e666f Remove unused variable. 2019-06-02 20:01:19 +02:00
Bartosz Taudul
d5b3ec9503 Locale keeps being changed by system libraries... 2019-06-02 19:59:31 +02:00
Bartosz Taudul
83b838f783 Use C locale for decimal point character. 2019-06-02 19:39:07 +02:00
Bartosz Taudul
557d4d7de4 Add logo to manual. 2019-06-02 18:12:15 +02:00
Bartosz Taudul
7c7e32d49e Set window icon. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
9e3cf3d387 Add 64x64 embedded png icon. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
c0326b9ba0 Add stb_image. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
8ae33fbb1e Add icon to win32 profiler executable. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
55c1e263b7 Add icon files. 2019-06-02 18:05:49 +02:00
Bartosz Taudul
5181f1be5b Update manual. 2019-06-02 15:57:59 +02:00
Bartosz Taudul
07cb6b0e93 Update NEWS. 2019-06-02 15:40:56 +02:00
Bartosz Taudul
79215ea73e Implement linked selection in compare menu. 2019-06-02 15:40:19 +02:00
Bartosz Taudul
794d155fde Update manual. 2019-06-02 15:22:29 +02:00
Bartosz Taudul
ceaf1226b2 Update NEWS. 2019-06-02 15:06:57 +02:00
Bartosz Taudul
c05766e637 Display notification about worker background tasks. 2019-06-02 15:00:50 +02:00
Bartosz Taudul
5681096486 Track status of worker background tasks. 2019-06-02 15:00:38 +02:00
Bartosz Taudul
96b1df67b9 Get proper yMin, yMax values. 2019-06-02 13:58:30 +02:00
Bartosz Taudul
9bbaab8897 Draw on a correct window. 2019-06-02 13:40:35 +02:00
Bartosz Taudul
3a561b4eed Save thread state should be atomic. 2019-06-02 13:15:55 +02:00
Bartosz Taudul
0059cb3ab0 Switch default namespace display to "short". 2019-06-02 12:57:42 +02:00
Bartosz Taudul
7aca6b72d1 Don't block worker when in save file dialog. 2019-05-28 19:57:18 +02:00
Bartosz Taudul
c93170cd42 Move saving trace dump to a separate thread. 2019-05-28 19:56:18 +02:00
Bartosz Taudul
845f3a2ddf Use std::shared_mutex for locking worker access. 2019-05-28 19:21:53 +02:00
Bartosz Taudul
145ca30df9 There's no __popcnt64 in 32 bit winapi. 2019-05-28 18:18:54 +02:00
Bartosz Taudul
b3812146cb Fix atomics initialization. 2019-05-27 14:09:55 +02:00
Bartosz Taudul
18a9741b5d Use proper check. 2019-05-22 14:19:25 +02:00
Bartosz Taudul
340837e202 Callstack decode for android api <= 21.
libbacktrace/elf.cpp:3249:3: error: use of undeclared identifier 'dl_iterate_phdr'
2019-05-22 14:14:30 +02:00
Bartosz Taudul
84efe070fe Make callstack logic more obvious. 2019-05-22 14:05:44 +02:00
Bartosz Taudul
61d775ecc8 Calculate end point before loop. 2019-05-19 16:26:59 +02:00
Bartosz Taudul
8f85a0da2c Don't check twice for the same thing. 2019-05-19 16:23:19 +02:00
Bartosz Taudul
007e434a05 Force inline FillPages(). 2019-05-19 13:46:53 +02:00
Bartosz Taudul
9122d3516c Force inline GetPage(). 2019-05-19 13:45:02 +02:00
Bartosz Taudul
30c398cd96 Don't allocate memory for empty pages in memory map. 2019-05-19 13:15:54 +02:00
Bartosz Taudul
952e466287 Rearrange code. 2019-05-19 12:47:45 +02:00
Bartosz Taudul
860a4625e1 Update manual. 2019-05-12 16:28:59 +02:00
Bartosz Taudul
7cdd8954dc Update NEWS. 2019-05-12 16:27:26 +02:00
Bartosz Taudul
b95d834891 Split contended and uncontended locks in lock list. 2019-05-12 16:26:19 +02:00
Bartosz Taudul
0da1e8551f Track lock contention status. 2019-05-12 16:17:17 +02:00
Bartosz Taudul
a714cd4369 Typo. 2019-05-12 15:59:53 +02:00
Bartosz Taudul
63066cf6a5 Fix logic error. 2019-05-12 15:48:25 +02:00
Bartosz Taudul
e612cef6c2 Optimize drawing frames. 2019-05-11 13:47:06 +02:00
Bartosz Taudul
fcb052cd13 Update manual. 2019-05-10 21:11:58 +02:00
Bartosz Taudul
fd815655ac Update NEWS. 2019-05-10 21:11:54 +02:00
Bartosz Taudul
7cc5149355 Improve timeline message tooltips. 2019-05-10 20:36:06 +02:00
Bartosz Taudul
74575250a5 Save message color data in trace dumps. 2019-05-10 20:32:47 +02:00
Bartosz Taudul
8cbd83c752 Use message color on message lists. 2019-05-10 20:26:27 +02:00
Bartosz Taudul
4850e19ebd Store color in message data. 2019-05-10 20:26:27 +02:00
Bartosz Taudul
797ebd3caf Cosmetics. 2019-05-10 20:20:08 +02:00
Bartosz Taudul
efc54babe3 Transfer of colored messages. 2019-05-10 20:17:44 +02:00
Bartosz Taudul
6a09229ae7 Remove error bars and collection cost from GPU zone display.
There's no way to know how much this takes on a GPU.
2019-05-10 02:31:23 +02:00
Bartosz Taudul
721a818dcc Visual transition of error bars and collection cost markers. 2019-05-10 02:27:42 +02:00
Bartosz Taudul
9051b6e206 Update imgui color text edit to a179931d. 2019-05-09 18:56:13 +02:00
Bartosz Taudul
273a874de9 Update imgui to 1.70. 2019-05-09 18:53:31 +02:00
Bartosz Taudul
ee1a653667 Update manual. 2019-05-09 13:43:28 +02:00
Bartosz Taudul
0c6dde2eef Update NEWS. 2019-05-09 13:38:37 +02:00
Bartosz Taudul
54c8a882c9 Allow restricting call stack frame tree to active allocations. 2019-05-09 13:37:28 +02:00
Bartosz Taudul
98eaacec90 Update LZ4 to 1.9.1. 2019-05-01 16:53:48 +02:00
Bartosz Taudul
9ec8704dad Don't include LZ4 headers in tracy headers.
The LZ4 implementation is wrapped in tracy namespace, but it also adds
some defines, which may conflict with other LZ4 implementations.
2019-05-01 12:57:42 +02:00
Bartosz Taudul
303bbdd512 Update manual. 2019-04-26 23:29:05 +02:00
Bartosz Taudul
209a34e6ab Update NEWS. 2019-04-26 23:20:51 +02:00
Bartosz Taudul
a18a6869bc Allow limiting frame stats to visible frames. 2019-04-26 23:19:31 +02:00
Bartosz Taudul
fdd96fe251 Allow changing frame set from trace info window. 2019-04-26 22:49:36 +02:00
Bartosz Taudul
0e32de293f Update manual. 2019-04-23 17:20:05 +02:00
Bartosz Taudul
26aa3a23fb Display number of visible data points on a plot. 2019-04-23 17:17:25 +02:00
Bartosz Taudul
2c9d9d0d27 /proc/stat might be inaccessible. 2019-04-04 15:25:26 +02:00
Bartosz Taudul
cffc6e21d3 Use open to open webpage on osx. 2019-04-04 13:58:13 +02:00
Bartosz Taudul
a7886cf82c Replace linear search with hash lookup. 2019-04-03 16:24:16 +02:00
Bartosz Taudul
82dad3fb97 Use proper type. 2019-04-01 20:43:42 +02:00
Bartosz Taudul
d0d5184c04 Set options for proper socket. 2019-04-01 20:14:00 +02:00
Bartosz Taudul
687915299a Don't try to set socket options on invalid socket. 2019-04-01 20:08:27 +02:00
Bartosz Taudul
8d0d6b576a Update manual. 2019-04-01 20:06:43 +02:00
Bartosz Taudul
78e8d4aefe Display query backlog. 2019-04-01 19:55:54 +02:00
Bartosz Taudul
20e6813461 Store send queue size in mbps block. 2019-04-01 19:55:37 +02:00
Bartosz Taudul
d8d30bd875 Update NEWS. 2019-04-01 19:49:57 +02:00
Bartosz Taudul
9010b2c142 Put queries into queue if send buffer is full. 2019-04-01 19:47:29 +02:00
Bartosz Taudul
deeea0ee70 Track space left in send buffer. 2019-04-01 19:37:39 +02:00
Bartosz Taudul
57dff0abc9 Add server query queue. 2019-04-01 19:26:50 +02:00
Bartosz Taudul
c07c6d11b7 Define server query packet. 2019-04-01 19:21:53 +02:00
Bartosz Taudul
57cd6d3ed5 Allow retrieval of socket send buffer size. 2019-04-01 18:50:37 +02:00
Bartosz Taudul
45750a05a1 Only smooth zoom now. 2019-04-01 18:39:09 +02:00
Bartosz Taudul
cd774b9e96 Store two entries in zone self time cache.
This accounts for situation when zone information window is open and a
tooltip for another zone is displayed.
2019-03-30 00:54:22 +01:00
Bartosz Taudul
48a07bf4f8 Cache zone self times. 2019-03-30 00:52:25 +01:00
Bartosz Taudul
b0ab3c6139 High compression mode is now a bit better. 2019-03-27 02:26:39 +01:00
Bartosz Taudul
fef417f286 Store total number of CPU and GPU zones in trace. 2019-03-27 01:46:54 +01:00
Bartosz Taudul
2e6ac050f4 Use custom vector swap. 2019-03-26 23:02:39 +01:00
Bartosz Taudul
6c5efbfdce Implement custom vector swap. 2019-03-26 23:02:32 +01:00
Bartosz Taudul
a632d9e2a3 Add zone vector cache.
Zone children will be now collected in staging vectors. When the zone is
ended (and no children can be added anymore to it), a size-fitted vector
is allocated using slab allocation. The over-allocated vector is then
put into cache for use in future zones.

This is only active for vectors <= 8192 elements, or 64 KB (chosen
arbitrarily), to reduce time spent on copying memory.

Overall, this change should have the following effects:
- System memory allocation pressure reduction, due to re-usage of
  vectors, which eliminates the need for constant growth.
- Reduction of memory usage, because children vectors are now fitted to
  required size.
- Slight increase of zone processing time, due to memory copying?
2019-03-26 22:06:00 +01:00
Bartosz Taudul
11f4dcbf1e Consistent variable naming. 2019-03-26 21:41:44 +01:00
Bartosz Taudul
52f76a45ed Display separators for bin counts in compare menu. 2019-03-26 20:27:28 +01:00
Bartosz Taudul
99fca9e069 Fix loading old traces when skipping locks. 2019-03-26 20:25:29 +01:00
Bartosz Taudul
fe675b91be Ditto for frame counts. 2019-03-26 20:19:56 +01:00
Bartosz Taudul
021368fb59 Display bin counts with separators. 2019-03-26 20:18:20 +01:00
Bartosz Taudul
df3e8597c4 Focusing timeline on crash from trace info window. 2019-03-24 23:55:38 +01:00
Bartosz Taudul
7792963e31 Interaction with crash label in options menu. 2019-03-24 23:52:36 +01:00
Bartosz Taudul
2f397c892b Middle click on crash label to center view on it. 2019-03-24 23:50:33 +01:00
Bartosz Taudul
f52c8e9855 Update manual. 2019-03-24 23:47:02 +01:00
Bartosz Taudul
bd838ac84a Update NEWS. 2019-03-24 13:55:12 +01:00
Bartosz Taudul
1c495f077b Allow changing display order of threads. 2019-03-24 13:54:36 +01:00
Bartosz Taudul
f7eca24e18 Use ordered thread vector in message list. 2019-03-24 13:41:14 +01:00
Bartosz Taudul
a633c50991 Use ordered threads vector in options. 2019-03-24 13:41:02 +01:00
Bartosz Taudul
e957590350 Mirror thread data in a reorderable vector. 2019-03-24 13:37:43 +01:00
Bartosz Taudul
6ad820a76a Display tooltip for plot point over limits. 2019-03-23 02:24:45 +01:00
Bartosz Taudul
532bf19efa Don't draw many illegible plot points. 2019-03-22 20:11:24 +01:00
Bartosz Taudul
e6baee2bf9 Reduce number of max plot probes per column. 2019-03-22 20:11:10 +01:00
Bartosz Taudul
ff6034dfbf Change label to us. 2019-03-22 18:54:47 +01:00
Bartosz Taudul
e879016ffa Add Lua callstack capture time measurement. 2019-03-22 14:47:08 +01:00
Bartosz Taudul
302ad87686 Fix typo. 2019-03-21 22:06:37 +01:00
Bartosz Taudul
94ed1c637c Try to check if cntcvt reads are monotonic.
https://lore.kernel.org/patchwork/patch/904607/
2019-03-21 21:59:51 +01:00
Bartosz Taudul
7f57b3dba9 Fallback to reading CLOCK_MONOTONIC_RAW, if available. 2019-03-21 21:49:23 +01:00
Bartosz Taudul
3ccb831efb Fix calculation of frame histogram data. 2019-03-21 21:30:08 +01:00
Bartosz Taudul
e79fa04a8b Don't fail when timer accuracy is low. 2019-03-21 21:24:07 +01:00
Bartosz Taudul
fa556d2d65 Use common access-and-insert pattern for VisData. 2019-03-19 22:12:24 +01:00
Bartosz Taudul
fddba168c6 Track next time to search for. 2019-03-18 19:39:37 +01:00
Bartosz Taudul
f530dfb0e9 Apply the same optimization for GPU zones. 2019-03-18 18:48:27 +01:00
Bartosz Taudul
94a1957338 Optimize zone skipping. 2019-03-18 18:42:58 +01:00
Bartosz Taudul
02db5f52d1 Pass nspx to zone drawing functions. 2019-03-18 18:40:03 +01:00
Bartosz Taudul
2931c83442 Lookup further at the beginning of the collapsed zones area. 2019-03-18 18:32:45 +01:00
Bartosz Taudul
e19f2f26e1 Optimize drawing collapsed CPU zones. 2019-03-18 18:24:27 +01:00
Bartosz Taudul
b5fce70f25 Fix rapid advancing to next frames. 2019-03-17 20:51:54 +01:00
Bartosz Taudul
5fb478a7df Update NEWS. 2019-03-17 17:22:17 +01:00
Bartosz Taudul
e034eabeb8 Animate plot ranges. 2019-03-17 17:21:30 +01:00
Bartosz Taudul
b6ccb9d686 Allocation times may be displayed relative to zone start. 2019-03-17 16:53:09 +01:00
Bartosz Taudul
d2cca5dc3f Allow custom time offset in memory allocation list. 2019-03-17 16:47:44 +01:00
Bartosz Taudul
f0aadfe066 Don't push the same zone on zone info stack multiple times. 2019-03-17 16:43:20 +01:00
Bartosz Taudul
06421cf5ca Always auto-resize memory allocation info window. 2019-03-17 16:39:27 +01:00
Bartosz Taudul
2f22776249 Update manual. 2019-03-17 16:35:50 +01:00
Bartosz Taudul
fdb06fdd1f Update NEWS. 2019-03-17 16:33:44 +01:00
Bartosz Taudul
4914ef6b14 Display zone messages in zone info window. 2019-03-17 16:33:18 +01:00
Bartosz Taudul
016f7ac4b6 Allow retrieval of zone's thread data. 2019-03-17 16:17:47 +01:00
Bartosz Taudul
b4bfdb7872 Dim information about no memory events. 2019-03-17 02:56:26 +01:00
Bartosz Taudul
17718b4d25 Fix asserts. 2019-03-16 20:36:06 +01:00
Bartosz Taudul
28dfa21fda Move conditional out of loop. 2019-03-16 14:46:21 +01:00
Bartosz Taudul
7e6a8135df Remove double indirection in GetNextLockEvent(). 2019-03-16 14:18:43 +01:00
Bartosz Taudul
6db1a9ccd4 Use lock thread ranges in lock tooltips. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
833151b868 Don't search for lock events outside of thread range. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
200621f952 Use lock ranges for early exclusion test. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
67f14be6aa Update lock ranges when loading trace. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
8ced8a457c Update thread time range on lock event insert. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
dc981550a1 Load lock event time to a variable. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
4d66317bc3 Add per-thread time ranges to lock maps. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
71e20e7e7f Store lock map as flat_hash_map with pointer values. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
5fbc14c487 Fix skipping plots in version >= 0.4.5. 2019-03-15 15:27:37 +01:00
Bartosz Taudul
b43d962194 Set labels for input text fields. 2019-03-15 02:35:27 +01:00
Bartosz Taudul
6a36bb2fc2 Add hints to input text fields. 2019-03-15 01:31:06 +01:00
Bartosz Taudul
476287b5f2 Update imguicolortextedit. 2019-03-15 01:06:27 +01:00
Bartosz Taudul
a10ec49a60 Don't use obsolete function. 2019-03-15 01:00:43 +01:00
Bartosz Taudul
98b4b69386 Update imgui to 1.69. 2019-03-15 00:55:53 +01:00
Bartosz Taudul
e6ca8fc75f Update NEWS. 2019-03-14 01:35:40 +01:00
Bartosz Taudul
5177629130 Add standard deviation explanation tooltips. 2019-03-14 01:34:50 +01:00
Bartosz Taudul
18e7b9df11 Add standard deviations to compare menu. 2019-03-14 01:32:50 +01:00
Bartosz Taudul
a0299cc63a Optimize calculation of standard deviation. 2019-03-14 01:23:37 +01:00
Bartosz Taudul
f57cac9042 Initialize SourceLocationZones in-place. 2019-03-14 01:15:19 +01:00
Bartosz Taudul
d3fdd6b1d1 Display standard deviation. 2019-03-14 01:14:06 +01:00
Bartosz Taudul
d64f07f853 Don't search for thread for empty timelines. 2019-03-14 01:10:57 +01:00
Bartosz Taudul
b7fe29f750 Offload timeline statistics update to a background thread. 2019-03-13 01:46:05 +01:00
Bartosz Taudul
737738ac73 Wait for source location zones in update tool.
Not really an issue, as it is build without full fledged statistics.
2019-03-13 01:28:42 +01:00
Bartosz Taudul
3b051b1119 Add callstack depth vs time plot. 2019-03-12 20:23:36 +01:00
Bartosz Taudul
01c7712c92 More extensive call stack capture times table. 2019-03-10 23:37:41 +01:00
Bartosz Taudul
935f69469b Call stack is limited to 62 frames on windows. 2019-03-10 23:37:27 +01:00
Bartosz Taudul
9563c8316d Optimize lock drawing.
Don't iterate over locks that are present in only one thread, if only
contended lock events are to be displayed. Such locks cannot be
contended.
2019-03-09 14:20:34 +01:00
Bartosz Taudul
e8d9de2f77 Briefly describe contributor's work. 2019-03-09 12:36:54 +01:00
Bartosz Taudul
cbfd524b6c Set sane messages window column widths. 2019-03-09 00:37:58 +01:00
Bartosz Taudul
815d7fdcb4 Set sane callstack window column widths. 2019-03-09 00:34:04 +01:00
Bartosz Taudul
5445ffb149 Set sane statistics window column widths. 2019-03-09 00:30:53 +01:00
Bartosz Taudul
8fc0727d54 Update NEWS. 2019-03-09 00:16:14 +01:00
Bartosz Taudul
0748655797 Allow opening source file view from statistics menu. 2019-03-09 00:15:23 +01:00
Bartosz Taudul
761a08b055 Dim location in statistics menu. 2019-03-09 00:08:57 +01:00
Bartosz Taudul
9fd8a20d7c Use small checkbox in appropriate places. 2019-03-08 18:39:41 +01:00
Bartosz Taudul
e004dc85a9 Display waiting dots in "waiting for connection" window. 2019-03-07 17:00:40 +01:00
Bartosz Taudul
f69f9d4660 Disable window transparency. 2019-03-07 01:18:24 +01:00
Bartosz Taudul
535d7b2da1 Add waiting dots to statistics menu. 2019-03-07 00:59:43 +01:00
Bartosz Taudul
aa054f1f46 Add waiting dots to compare traces menu. 2019-03-07 00:59:02 +01:00
Bartosz Taudul
6e4bc7d9c5 Add waiting dots to memory data in zone info window. 2019-03-07 00:57:32 +01:00
Bartosz Taudul
d547700e50 Update time in a common location. 2019-03-07 00:57:25 +01:00
Bartosz Taudul
d0d7131e35 Properly restore threadMap.
This fixes thread ids returned by CompressThread in loaded traces. The
bug was manifesting by not displaying memory events in zone info window.
2019-03-07 00:49:25 +01:00
Bartosz Taudul
07bcca9dc0 Don't pre-fill threadExpand, if not needed. 2019-03-07 00:49:06 +01:00
Bartosz Taudul
f2f19241e6 Display waiting dots in find zone menu during precompute. 2019-03-06 18:25:39 +01:00
Bartosz Taudul
d5914d2e7b Extract drawing waiting dots. 2019-03-06 18:16:21 +01:00
Bartosz Taudul
a4740c1b1c Add animation to loading progress window. 2019-03-06 02:49:21 +01:00
Bartosz Taudul
cee625b375 Animate frame selection expansion. 2019-03-06 01:45:39 +01:00
Bartosz Taudul
4b1c0ff0c5 Fix frame selection when zoom anim is active. 2019-03-06 01:45:26 +01:00
Bartosz Taudul
a26d0bf2b4 Update NEWS. 2019-03-06 01:17:40 +01:00
Bartosz Taudul
00de21f7e7 Smooth zooming on mouse scroll. 2019-03-06 01:15:38 +01:00
Bartosz Taudul
dc2743bcc0 Manual refinements. 2019-03-05 23:39:36 +01:00
Bartosz Taudul
17fb589415 Try dladdr() resolution if libbacktrace fails. 2019-03-05 20:43:47 +01:00
Bartosz Taudul
49f1277e55 Cast void* to char*. 2019-03-05 20:20:55 +01:00
Bartosz Taudul
3a7ed53c5c Update manual. 2019-03-05 20:09:33 +01:00
Bartosz Taudul
b45224d8de Update NEWS. 2019-03-05 19:50:08 +01:00
Bartosz Taudul
171434075d Group call stack improvements together. 2019-03-05 19:49:43 +01:00
Bartosz Taudul
364c20a771 Correct parameter number. 2019-03-05 19:48:02 +01:00
Bartosz Taudul
e798f0f28d Don't send lua callstacks if platform doesn't support callstacks. 2019-03-05 19:47:31 +01:00
Bartosz Taudul
bc87762012 Merge native callstack with the allocated one. 2019-03-05 19:44:08 +01:00
Bartosz Taudul
ebf09bebae Only one callstack may be in-flight at any time.
Save for the allocated callstack, but this will be solved in another
way. There's no need to keep pending callstacks in a map.
2019-03-05 19:44:08 +01:00
Bartosz Taudul
afe2fad1a7 Send native callstack before allocated one. 2019-03-05 19:18:43 +01:00
Bartosz Taudul
4509412efb Fast callstack retrieval for linux. 2019-03-05 18:56:39 +01:00
Bartosz Taudul
1bbf296351 Use fast callstack frame decoding to cut callstack. 2019-03-05 02:42:51 +01:00
Bartosz Taudul
cb62b63fe2 Fast callstack frame decoder.
Returns only function name, doesn't retrieve inlined functions, doesn't
perform demangling.
2019-03-05 02:42:51 +01:00
Bartosz Taudul
b11f932078 Cut lua callstack at lua_pcall. 2019-03-05 02:42:51 +01:00
Bartosz Taudul
ec73178733 Move callstack cutting to a separate function. 2019-03-05 02:42:51 +01:00
Bartosz Taudul
b37a9055d5 Force inline SendLuaCallstack(). 2019-03-05 02:42:51 +01:00
Bartosz Taudul
3e81e44d75 Separate processing for allocated callstacks.
Only native callstack is used for the moment.
2019-03-05 02:42:50 +01:00
Bartosz Taudul
d229c1bc1b Send native callstack along with allocated callstack. 2019-03-05 02:42:50 +01:00
Bartosz Taudul
e13286936c Mark templated functions inline. 2019-03-03 22:09:20 +01:00
Bartosz Taudul
f6913eecf0 Don't display custom stack frames as pointers. 2019-03-03 18:20:55 +01:00
Bartosz Taudul
dc74297439 Add missing const qualifiers. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
bef31ba073 Separate message for zone begin with alloc src loc and callstack. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
66b8a13e77 Store callstack alloc payloads. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
9fc022346b Replace frame pointers with callstack frame ids. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
664847211c Pack/unpack frame pointer to callstack frame id. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
1feedb17ac Add callstack frame identifier and the required plumbing. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
cf8d17c2ec Move VarArray hash calculation to a separate function. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
e3c31e4a4e Send callstack alloc payload. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
b8501956f9 Lua callstack sending. 2019-03-03 18:04:02 +01:00
Bartosz Taudul
2c16bdd538 Don't go out of tracy namespace. 2019-03-03 01:59:24 +01:00
Bartosz Taudul
c0d0d0d42b Display keys as keys in manual. 2019-03-01 02:08:57 +01:00
Bartosz Taudul
fc63b6b07d Display trace version in trace info window. 2019-03-01 01:47:36 +01:00
Bartosz Taudul
6065d25335 Extend list of tracy callstack frames. 2019-03-01 01:41:10 +01:00
Bartosz Taudul
e284248995 Fix display of last message. 2019-03-01 01:30:56 +01:00
Bartosz Taudul
8fd09fe8f0 Get proper width. 2019-03-01 01:20:20 +01:00
Bartosz Taudul
42c94f7142 Handle discontinuous frame end mismatch failure. 2019-02-28 19:32:42 +01:00
Bartosz Taudul
d80dc82b96 Don't display invalid thread in failure dialog. 2019-02-28 19:31:45 +01:00
Bartosz Taudul
06f83bbdef Update manual. 2019-02-28 19:25:30 +01:00
Bartosz Taudul
d863245b49 Serialize discontinuous frame messages. 2019-02-28 19:21:23 +01:00
Bartosz Taudul
f15bfb88a3 Update manual. 2019-02-27 21:34:00 +01:00
Bartosz Taudul
b4daad684c Display frame numbers in zone trace. 2019-02-27 21:12:56 +01:00
Bartosz Taudul
8d784f7fae Display inline frames in all call stacks. 2019-02-27 20:55:58 +01:00
Bartosz Taudul
38c4909d96 Update NEWS. 2019-02-27 20:38:13 +01:00
Bartosz Taudul
422ed1f452 Special mode for callstack grouping in find zone menu. 2019-02-27 20:37:38 +01:00
Bartosz Taudul
851ae9077b Make small callstack button tooltip optional. 2019-02-27 19:59:49 +01:00
Bartosz Taudul
e20e7caab0 Increase size of frame left/right buttons. 2019-02-27 19:58:44 +01:00
Bartosz Taudul
d08ea6e2bd Update NEWS. 2019-02-25 15:12:17 +01:00
Bartosz Taudul
b89db6e926 Don't send CPU usage data when there's no readings. 2019-02-25 15:11:35 +01:00
Bartosz Taudul
963d2b3ca8 CPU usage getter for apple. 2019-02-25 15:04:06 +01:00
Bartosz Taudul
85f29a0f22 Collect system time before server connection is made. 2019-02-24 19:12:17 +01:00
Bartosz Taudul
3c77ad7179 Update NEWS. 2019-02-24 19:04:49 +01:00
Bartosz Taudul
bafc8a1330 Implement getting CPU usage in linux. 2019-02-24 19:02:49 +01:00
Bartosz Taudul
0c75a5178c Update NEWS. 2019-02-24 18:42:34 +01:00
Bartosz Taudul
06dddd15b4 Update manual. 2019-02-24 18:41:47 +01:00
Bartosz Taudul
e9aeb0c522 Darken timeline outside of capture area. 2019-02-24 18:35:03 +01:00
Bartosz Taudul
fb96d60256 Adjust last time in mem alloc/free and sys time messages. 2019-02-24 18:26:23 +01:00
Bartosz Taudul
4610f79b94 Update manual. 2019-02-24 17:49:20 +01:00
Bartosz Taudul
9f621bf67f Improve lock label tooltips. 2019-02-24 17:44:44 +01:00
Bartosz Taudul
c78aedae62 Zoom-to-range for lock labels. 2019-02-24 17:30:58 +01:00
Bartosz Taudul
021d369b80 Fix calculation of thread lock extent. 2019-02-24 17:30:18 +01:00
Bartosz Taudul
271d7ccaa3 Bring plot tooltip up-to-par. 2019-02-24 17:01:46 +01:00
Bartosz Taudul
bf2ecbae36 Middle-click on thread label to zoom to thread extent. 2019-02-24 17:01:46 +01:00
Bartosz Taudul
62162d4cdb Display count of messages and locks in thread tooltip. 2019-02-24 17:01:46 +01:00
Bartosz Taudul
6dceec4ea8 Improve thread tooltip.
- In addition to the first event time, also display last one and the
  time period.
- Include messages and locks in thread events discovery.
2019-02-24 17:01:46 +01:00
Bartosz Taudul
a0b5ac33cc Always show label of a crashed thread. 2019-02-23 01:34:45 +01:00
Bartosz Taudul
ba769ab23a Update manual. 2019-02-23 01:32:06 +01:00
Bartosz Taudul
ccabba58d9 Update NEWS. 2019-02-23 01:28:57 +01:00
Bartosz Taudul
af3eb93e4a Hide tracks that don't have anything to display. 2019-02-23 01:28:12 +01:00
Bartosz Taudul
7ab326c4fe Don't clip area above track. 2019-02-23 00:59:09 +01:00
Bartosz Taudul
5f3f6d18bb Cosmetics. 2019-02-23 00:37:48 +01:00
Bartosz Taudul
53992b9b6b Don't hide hex thread id in tooltip. 2019-02-23 00:34:01 +01:00
Bartosz Taudul
29fcddca0b Display frame count in frame type selection dropdown. 2019-02-22 21:07:33 +01:00
Bartosz Taudul
ae53c8e6f0 Don't display threads with no messages. 2019-02-22 21:07:33 +01:00
Bartosz Taudul
48b5b25a6a Display count of messages in threads. 2019-02-22 21:07:33 +01:00
Bartosz Taudul
2ee86ef126 Display bigger program name in welcome dialog. 2019-02-22 02:44:41 +01:00
Bartosz Taudul
9bd13b02e9 Small string changes in the welcome dialog. 2019-02-22 02:41:13 +01:00
Bartosz Taudul
81c2515199 Bump protocol version. 2019-02-21 23:25:43 +01:00
Bartosz Taudul
542d081027 Update manual. 2019-02-21 23:24:18 +01:00
Bartosz Taudul
06fcfdd493 Update NEWS. 2019-02-21 23:11:49 +01:00
Bartosz Taudul
0b9fa8f3c8 Track CPU usage also on cygwin. 2019-02-21 23:11:09 +01:00
Bartosz Taudul
d0c1b9bf67 Proper formatting of plot values. 2019-02-21 23:07:32 +01:00
Bartosz Taudul
e190faa7e1 Save/load CPU usage plot. 2019-02-21 22:56:59 +01:00
Bartosz Taudul
e9baa80bf3 Process CPU usage reports. 2019-02-21 22:56:59 +01:00
Bartosz Taudul
9f4f5bcb63 CPU usage retrieval. 2019-02-21 22:45:53 +01:00
Bartosz Taudul
c8fb3f24cc Update NEWS. 2019-02-21 21:19:53 +01:00
Bartosz Taudul
f1dd4ef3d9 Animate thread position and height. 2019-02-21 21:18:41 +01:00
Bartosz Taudul
e945902f40 Merge visibility and show full options into one struct. 2019-02-21 20:24:08 +01:00
Bartosz Taudul
bc713463d8 Improve zooming animation. 2019-02-21 20:00:29 +01:00
Bartosz Taudul
b33c61cead Update imgui to 1.68. 2019-02-21 18:40:27 +01:00
Bartosz Taudul
938d8ce69e Properly initialize demangled pointer. 2019-02-21 15:04:17 +01:00
Bartosz Taudul
44009b6fda Use mach_absolute_time() to get time on iOS. 2019-02-21 14:45:13 +01:00
Bartosz Taudul
e839a3153f Just use getprogname(). 2019-02-21 11:40:56 +01:00
Bartosz Taudul
c4d46f1c24 No libproc.h on iOS. 2019-02-21 11:33:45 +01:00
Till Rathmann
9d7c4a2861 Merged in tillrathmann/tracy (pull request #33)
Fixed DLL support
2019-02-20 17:24:12 +00:00
Till Rathmann
29140afe0c Fixed compiler warnings. 2019-02-20 17:50:49 +01:00
Till Rathmann
77abc3bffd Fixed DLL support. 2019-02-20 16:15:13 +01:00
Bartosz Taudul
5ddc605dd6 Update manual. 2019-02-20 16:04:30 +01:00
Bartosz Taudul
22329ae5d9 Collect call stacks on apple. 2019-02-20 16:01:41 +01:00
Bartosz Taudul
34d24b16bb Retrieve memory size on apple. 2019-02-20 13:52:55 +01:00
Bartosz Taudul
9c966b6224 Process name retrieval on apple. 2019-02-20 13:13:29 +01:00
Bartosz Taudul
8f75839d66 Fix apple target detection. 2019-02-20 12:43:48 +01:00
Bartosz Taudul
0fd3025a7e Fix typo. 2019-02-20 02:51:03 +01:00
Bartosz Taudul
5afadcb11d Fix if condition. 2019-02-19 21:51:41 +01:00
Bartosz Taudul
a75d602f6e Update manual. 2019-02-19 21:37:40 +01:00
Bartosz Taudul
34695fca60 Update README. 2019-02-19 20:46:17 +01:00
Bartosz Taudul
4048ebf7c5 Update NEWS. 2019-02-19 20:44:55 +01:00
Bartosz Taudul
ef5e30056e Implement delayed initialization of the profiler.
Enabled on osx, ios.
2019-02-19 20:43:30 +01:00
Bartosz Taudul
d560f7a203 Cosmetics. 2019-02-19 19:36:30 +01:00
Bartosz Taudul
3f914834b7 Hide rest of statics. 2019-02-19 19:33:37 +01:00
Bartosz Taudul
9fabafbeca Fix DLL code. 2019-02-19 18:46:59 +01:00
Bartosz Taudul
2421e05c27 Prevent direct access to s_profiler. 2019-02-19 18:38:08 +01:00
Bartosz Taudul
d865d1cc87 Disallow direct access to s_token. 2019-02-19 18:27:00 +01:00
Bartosz Taudul
44753dd4ac thread_local implies static. 2019-02-19 16:52:05 +01:00
Bartosz Taudul
3a562ae6c9 Fix display of unresolved call stack frames. 2019-02-19 16:37:34 +01:00
Bartosz Taudul
9040953e13 Proper constructors are needed. 2019-02-18 14:57:46 +01:00
Bartosz Taudul
c9c4d2845a Provide empty {Gpu,Vk}CtxScope classes if tracy is disabled.
This may be needed if some wrapping is done, abstracting the OpenGL and
Vulkan tracing.
2019-02-18 14:45:09 +01:00
Bartosz Taudul
081b1069f6 Properly count number of locks in options menu. 2019-02-17 17:19:17 +01:00
Bartosz Taudul
9dd1d6a744 Don't display locks with no lock events. 2019-02-17 17:13:20 +01:00
Bartosz Taudul
062058315e Update manual. 2019-02-17 17:10:18 +01:00
Bartosz Taudul
a66f1e2135 Update NEWS. 2019-02-17 17:07:35 +01:00
Bartosz Taudul
a2819baa35 Split locks as single and multithreaded in options menu. 2019-02-17 17:06:39 +01:00
Bartosz Taudul
5cc738593f Fix drawing lock highlight. 2019-02-17 16:57:52 +01:00
Bartosz Taudul
92a7e02e73 Highlight locks hovered in the options menu. 2019-02-17 16:53:33 +01:00
Bartosz Taudul
bec27f7d60 Handle highlighting lock in fast-exit code path. 2019-02-17 16:49:18 +01:00
Bartosz Taudul
1e32821097 Move drawing lock header to a separate function. 2019-02-17 16:49:03 +01:00
Bartosz Taudul
d2a39e29e1 Update manual. 2019-02-17 16:35:10 +01:00
Bartosz Taudul
f4fc604845 Update NEWS. 2019-02-17 16:23:07 +01:00
Bartosz Taudul
ea4f4ebb3a Highlight selected/hovered lock. 2019-02-17 16:20:56 +01:00
Bartosz Taudul
d4fb6fde2b Fix printf type. 2019-02-17 00:29:01 +01:00
Bartosz Taudul
4422fce55c Don't decompress GpuZone threads while saving trace.
Saving the threads compressed in GPU zones and memory event has the
following result on trace dump sizes:

043/aa.tracy (0.4.3) {10055 KB} -> 044/aa.tracy (0.4.4) {9975 KB}  99.20% size change
043/android.tracy (0.4.3) {542739 KB} -> 044/android.tracy (0.4.4) {519248 KB}  95.67% size change
043/asset-new.tracy (0.4.3) {78403 KB} -> 044/asset-new.tracy (0.4.4) {75899 KB}  96.81% size change
043/asset-new-id.tracy (0.4.3) {84341 KB} -> 044/asset-new-id.tracy (0.4.4) {81771 KB}  96.95% size change
043/asset-old.tracy (0.4.3) {80688 KB} -> 044/asset-old.tracy (0.4.4) {78410 KB}  97.18% size change
043/big.tracy (0.4.3) {939577 KB} -> 044/big.tracy (0.4.4) {938427 KB}  99.88% size change
043/callstack.tracy (0.4.3) {14557 KB} -> 044/callstack.tracy (0.4.4) {14465 KB}  99.37% size change
043/callstack-linux.tracy (0.4.3) {6949 KB} -> 044/callstack-linux.tracy (0.4.4) {6942 KB}  99.90% size change
043/crash.tracy (0.4.3) {131 KB} -> 044/crash.tracy (0.4.4) {127 KB}  97.10% size change
043/crash2.tracy (0.4.3) {1422 KB} -> 044/crash2.tracy (0.4.4) {1412 KB}  99.29% size change
043/darkrl.tracy (0.4.3) {15767 KB} -> 044/darkrl.tracy (0.4.4) {15663 KB}  99.34% size change
043/darkrl2.tracy (0.4.3) {7947 KB} -> 044/darkrl2.tracy (0.4.4) {7886 KB}  99.23% size change
043/darkrl-old.tracy (0.4.3) {67448 KB} -> 044/darkrl-old.tracy (0.4.4) {67004 KB}  99.34% size change
043/deadlock.tracy (0.4.3) {5984 KB} -> 044/deadlock.tracy (0.4.4) {5986 KB}  100.03% size change
043/gn-opengl.tracy (0.4.3) {29005 KB} -> 044/gn-opengl.tracy (0.4.4) {28885 KB}  99.59% size change
043/gn-vulkan.tracy (0.4.3) {29352 KB} -> 044/gn-vulkan.tracy (0.4.4) {29257 KB}  99.68% size change
043/long.tracy (0.4.3) {1182800 KB} -> 044/long.tracy (0.4.4) {1176584 KB}  99.47% size change
043/mem.tracy (0.4.3) {1369067 KB} -> 044/mem.tracy (0.4.4) {1262406 KB}  92.21% size change
043/multi.tracy (0.4.3) {8004 KB} -> 044/multi.tracy (0.4.4) {7944 KB}  99.24% size change
043/new.tracy (0.4.3) {1108 KB} -> 044/new.tracy (0.4.4) {1099 KB}  99.18% size change
043/q3bsp-mt.tracy (0.4.3) {949855 KB} -> 044/q3bsp-mt.tracy (0.4.4) {937574 KB}  98.71% size change
043/q3bsp-st.tracy (0.4.3) {240347 KB} -> 044/q3bsp-st.tracy (0.4.4) {230092 KB}  95.73% size change
043/selfprofile.tracy (0.4.3) {197708 KB} -> 044/selfprofile.tracy (0.4.4) {197659 KB}  99.98% size change
043/tbrowser.tracy (0.4.3) {9503 KB} -> 044/tbrowser.tracy (0.4.4) {9503 KB}  100.00% size change
043/test.tracy (0.4.3) {40700 KB} -> 044/test.tracy (0.4.4) {40699 KB}  100.00% size change
043/virtualfile_hc.tracy (0.4.3) {72424 KB} -> 044/virtualfile_hc.tracy (0.4.4) {72304 KB}  99.83% size change
043/zfile_hc.tracy (0.4.3) {39419 KB} -> 044/zfile_hc.tracy (0.4.4) {39328 KB}  99.77% size change
2019-02-17 00:29:01 +01:00
Bartosz Taudul
760e9105d0 Don't decompress memory thread data while saving trace. 2019-02-17 00:27:41 +01:00
Bartosz Taudul
9ee494c0f4 Store thread compression layout in trace dump. 2019-02-16 22:48:29 +01:00
Bartosz Taudul
d030674b83 Simplify loading memory events. 2019-02-16 22:32:14 +01:00
Bartosz Taudul
569a9fb9be Change order of file version checks during loading memory events. 2019-02-16 22:26:50 +01:00
Bartosz Taudul
88b7961421 Allocate memory for all zones at the current level at once. 2019-02-16 20:53:07 +01:00
Bartosz Taudul
470600fbc2 Don't thrash memory bandwith during file load. 2019-02-16 20:42:50 +01:00
Bartosz Taudul
c127f51767 Load time offsets to scratch buffers. 2019-02-15 02:46:25 +01:00
Bartosz Taudul
8fd685c877 Properly track memory usage in slab allocator. 2019-02-15 02:28:31 +01:00
Bartosz Taudul
23d12d2633 Allocate new block, if we're at the end of current one. 2019-02-15 02:04:37 +01:00
Bartosz Taudul
7b023e533d Use big allocation mode for Vector's reserve_exact. 2019-02-15 01:59:33 +01:00
Bartosz Taudul
930190f2cb Support big allocations in slab allocator. 2019-02-15 01:59:33 +01:00
Bartosz Taudul
1cefd4d8ac Don't use reserve_exact for temporary things. 2019-02-15 01:43:30 +01:00
Bartosz Taudul
127be8e995 GpuEvent doesn't need init. 2019-02-15 01:31:58 +01:00
Bartosz Taudul
e8d15e8295 Mirror zone child grouping for GPU zones. 2019-02-14 01:38:34 +01:00
Bartosz Taudul
e24ac42755 Add self time to GPU zone info window. 2019-02-14 01:31:06 +01:00
Bartosz Taudul
0fad23dbae Add GPU zone self time in tooltip. 2019-02-14 01:28:27 +01:00
Bartosz Taudul
f06609eb61 GPU child zones time getter. 2019-02-14 01:28:12 +01:00
Bartosz Taudul
92c1420c30 Improve handling of post-load background jobs.
Background tasks (source location zones sorting, reconstruction of
memory plot) are now started only after trace loading is finished.
Multithreaded sorting was previously impacting trace load times.

Only one thread is used to perform both jobs, one after another.
2019-02-14 01:17:37 +01:00
Bartosz Taudul
080873003b Simplify support for 0.2 traces. 2019-02-14 01:13:11 +01:00
Bartosz Taudul
bd1c1d044b Force inline read/write time offset functions. 2019-02-14 00:17:50 +01:00
Bartosz Taudul
631f81e9dc Use Vector to store children data instead of std::vector. 2019-02-13 02:32:25 +01:00
Bartosz Taudul
40d0c72982 Use memcpy and memset instead of per-element copy and zero. 2019-02-13 02:23:56 +01:00
Bartosz Taudul
d854998856 Support non-trivially-copyable items in Vector. 2019-02-13 02:20:31 +01:00
Bartosz Taudul
08642d034b Preserve string length in string map. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
17e1894034 Add specialized string key for hash map. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
ec37f59c14 Replace manual comparison with memcmp. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
e4e20b47ca Handle dropped connection in capture utility. 2019-02-12 11:13:53 +01:00
Bartosz Taudul
d32c070a9e Two more places where connection can silently drop. 2019-02-12 11:07:12 +01:00
Bartosz Taudul
7f11260bf0 Handle dropped connection during handshake. 2019-02-12 01:41:09 +01:00
Bartosz Taudul
8717fe5730 Window position may be negative. 2019-02-12 01:26:14 +01:00
Bartosz Taudul
e254f049a5 Update manual. 2019-02-10 17:33:39 +01:00
Bartosz Taudul
c22d7f9b62 Update NEWS. 2019-02-10 17:25:19 +01:00
Bartosz Taudul
147b31f014 Implement grouping children zones. 2019-02-10 17:21:01 +01:00
Bartosz Taudul
76186f3221 Allow zone name retrieval from source location. 2019-02-10 16:45:19 +01:00
Bartosz Taudul
48c721c4b9 Fix natvis display of exact reserved vector's capacity. 2019-02-10 16:36:09 +01:00
Bartosz Taudul
740486a0ce Add children locations grouping button. 2019-02-10 16:14:13 +01:00
Bartosz Taudul
b7bd3696b7 Do not draw time subdividers on a nanosecond scale. 2019-02-10 16:04:04 +01:00
Bartosz Taudul
c7e64bb8a8 Replace select() with poll(). 2019-02-10 15:45:23 +01:00
Bartosz Taudul
d18c3432a4 Fix call stack window. 2019-02-10 13:38:14 +01:00
Bartosz Taudul
2d50664180 Use multiply instead of divide. 2019-02-10 13:01:16 +01:00
Bartosz Taudul
f1940aab2e Use help marker helper function. 2019-02-10 03:02:54 +01:00
Bartosz Taudul
96e38501b6 Use unformatted text drawing where possible. 2019-02-10 02:50:34 +01:00
Bartosz Taudul
ecdb672130 Add simple checks against invalid window position. 2019-02-10 02:11:59 +01:00
Bartosz Taudul
3a8abdf9c1 Integer time specialization is not needed anymore. 2019-02-10 01:14:34 +01:00
Bartosz Taudul
2ad0258925 Don't print trailing zeros in fractions (e.g. 2.5 instead of 2.50). 2019-02-10 01:12:22 +01:00
Bartosz Taudul
af16872693 Don't display fractional part if it's 0. 2019-02-10 01:03:35 +01:00
Bartosz Taudul
e4f4fee6d4 Optimize printing days. 2019-02-10 01:02:57 +01:00
Bartosz Taudul
ee66b1354d IntTable10 is not needed. 2019-02-10 00:51:13 +01:00
Bartosz Taudul
d940e315bd Optimize TimeToString(). 2019-02-08 22:11:06 +01:00
Bartosz Taudul
3c4394489c Workaround GCC bug #67274.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67274
2019-02-08 11:54:29 +01:00
Bartosz Taudul
a47202b9ac Let's try building on ubuntu 1804. 2019-02-08 02:38:26 +01:00
Bartosz Taudul
0a03f25c9f Add Dedmen Miller to AUTHORS. 2019-02-08 02:37:30 +01:00
Bartosz Taudul
053932249c Style fixes. 2019-02-08 02:29:24 +01:00
Dedmen Miller (Dedmenmiller)
8fb6c0dfcb Merged in dedmenmiller/tracy/findZoneSorting (pull request #31)
Add sorting for findZone zonelist
2019-02-08 00:53:48 +00:00
Dedmen Miller
ab0dc0da11 Use memcpy 2019-02-07 16:10:28 +01:00
Dedmen Miller (Dedmenmiller)
bfdba8c2a5 Merged in dedmenmiller/tracy/cleanerTimeToString (pull request #32)
Cleaner TimeToString
2019-02-07 14:13:52 +00:00
Dedmen Miller
59ae188a7f Cleanup 2019-02-07 14:51:34 +01:00
Dedmen Miller
7361d696c5 Return proper buf 2019-02-07 14:38:42 +01:00
Dedmen Miller
bfa5386bbe Cleanup 2019-02-07 14:36:33 +01:00
Dedmen Miller
e4ef491fdf Cleaner TimeToString 2019-02-07 13:14:52 +01:00
Dedmen Miller
92c872dfc0 Added sorting for findZone zonelist 2019-02-07 12:25:05 +01:00
Bartosz Taudul
0e6350d95e Grouping by function names is a more sane default. 2019-02-06 23:09:38 +01:00
Bartosz Taudul
90c1428aac Update manual. 2019-02-06 23:05:58 +01:00
Bartosz Taudul
f18aa9c33f Update NEWS. 2019-02-06 22:39:15 +01:00
Bartosz Taudul
b945f83169 Don't separate inclusive/exclusive counts.
There is no way for one frame to have both. Coloring is preserved and is
now determined by presence of children.
2019-02-06 22:36:21 +01:00
Bartosz Taudul
1953a1a1d5 Notify user about pitfalls of function name grouping. 2019-02-06 22:02:59 +01:00
Bartosz Taudul
70ea9e7712 Implement grouping call stack tree by function names. 2019-02-06 21:56:49 +01:00
Bartosz Taudul
044b7e1522 Add function name grouping controls. 2019-02-06 21:45:26 +01:00
Bartosz Taudul
7aa24864bf Make it easier to add new matches against tracy own stack frames. 2019-02-06 21:07:41 +01:00
Bartosz Taudul
104415ced8 Display base frame, not inline frame, if inlines are not shown. 2019-02-06 14:17:18 +01:00
Bartosz Taudul
bb4d390bc7 Update manual. 2019-02-06 14:03:54 +01:00
Bartosz Taudul
bb8002ec08 Update NEWS. 2019-02-06 13:54:23 +01:00
Bartosz Taudul
c2e9c00a38 Add top-down call stack memory tree. 2019-02-06 13:53:14 +01:00
Bartosz Taudul
c689a494da Move call stack paths calculation to a separate function. 2019-02-06 13:46:50 +01:00
Bartosz Taudul
dbf8115771 Same for linux. 2019-02-04 02:33:03 +01:00
Bartosz Taudul
4dc05195ca Skip internal call stack capture inline frames for MSVC. 2019-02-04 02:27:13 +01:00
Bartosz Taudul
9dd869a5eb Fix call stacks on cygwin. 2019-02-02 13:58:17 +01:00
Bartosz Taudul
e801943b90 Array index is changing here. 2019-01-31 18:37:59 +01:00
Bartosz Taudul
52a7f3a39a Update manual. 2019-01-30 01:56:31 +01:00
Bartosz Taudul
b2c57151a6 Update NEWS. 2019-01-30 01:54:52 +01:00
Bartosz Taudul
b0d319890b Allow sorting find zone groups by mean time per call. 2019-01-30 01:54:18 +01:00
Bartosz Taudul
c5fd347401 Initialize variable. 2019-01-29 23:18:36 +01:00
Bartosz Taudul
89ddfd0006 Remove dead code. 2019-01-29 23:18:36 +01:00
Bartosz Taudul
653caf159f Assign return value only once. 2019-01-29 22:21:01 +01:00
Bartosz Taudul
852fe03cbc More references. 2019-01-29 22:10:14 +01:00
Bartosz Taudul
5e3390894d Use preincrementation for iterators. 2019-01-29 22:01:47 +01:00
Bartosz Taudul
d6c616848c Use reference instead of repeated deep dereferences. 2019-01-29 21:59:52 +01:00
Bartosz Taudul
1585be7ff3 Rearrange Socket to reduce struct size. 2019-01-29 21:56:10 +01:00
Bartosz Taudul
b7fd0bdc9c Use proper type. 2019-01-29 21:53:56 +01:00
Bartosz Taudul
1b3f10148d Fix logic snafu. 2019-01-29 21:46:14 +01:00
Bartosz Taudul
a708bebbfd Use language neutral header for callstack capability detection.
This fixes call stack collection in C API when TRACY_CALLSTACK is
defined.
2019-01-27 13:41:32 +01:00
Bartosz Taudul
16b398ffeb Update manual. 2019-01-27 00:22:25 +01:00
Bartosz Taudul
01bddf95a6 Trace inline function calls on MSVC call stacks. 2019-01-26 23:50:58 +01:00
Bartosz Taudul
d86e36cc62 Fix progress of loading CPU zones. 2019-01-26 22:18:07 +01:00
Bartosz Taudul
39680ad315 Boost lock loading time. 2019-01-24 22:44:09 +01:00
Bartosz Taudul
606a4502e0 Fix MSVC build.
"SIGINT is not supported for any Win32 application."
2019-01-24 20:04:08 +01:00
Bartosz Taudul
901d690d55 Update manual. 2019-01-24 19:27:14 +01:00
Bartosz Taudul
f83df89a58 Update NEWS. 2019-01-24 19:14:10 +01:00
Bartosz Taudul
34c9cf512e Disconnect on ^C in capture utility. 2019-01-24 19:13:09 +01:00
Bartosz Taudul
66a5e06803 Allow disconnecting from a client. 2019-01-24 19:00:34 +01:00
Bartosz Taudul
922993cbbb Add placeholder worker disconnect command. 2019-01-24 18:51:55 +01:00
Dedmen Miller (Dedmenmiller)
f474669ab5 Merged in dedmenmiller/tracy-1/dedmenmiller/fixed-offset-in-histogram-with-nonlog-ti-1548339368610 (pull request #30)
Fixed offset in histogram with non-log time
2019-01-24 15:03:22 +00:00
Dedmen Miller (Dedmenmiller)
e83e63caa4 Fix other lines 2019-01-24 15:02:36 +00:00
Dedmen Miller (Dedmenmiller)
72966a24a3 Fixed offset in histogram with non-log time 2019-01-24 14:16:23 +00:00
Bartosz Taudul
c67d91c6ac Display numerical thread id in thread tooltip. 2019-01-23 18:15:19 +01:00
Bartosz Taudul
71f1a0b31e Display self time percentage in find zone menu. 2019-01-23 18:11:47 +01:00
Bartosz Taudul
56b530e99c Fix tooltip active area. 2019-01-23 18:04:31 +01:00
Bartosz Taudul
5019b2e507 Update manual. 2019-01-23 14:31:02 +01:00
Bartosz Taudul
0dc323a74e Update NEWS. 2019-01-23 14:26:31 +01:00
Bartosz Taudul
7f015b1b24 Implement self time in find zone menu. 2019-01-23 14:25:45 +01:00
Bartosz Taudul
92766430d9 Add "self time" checkbox to find zone menu. 2019-01-23 14:25:28 +01:00
Bartosz Taudul
42af2d14cc Calculate self min and max times of source location zones. 2019-01-23 14:24:22 +01:00
Bartosz Taudul
118fab1561 Fast version of zone child time getter.
This one can only be used when all child zones are properly ended.
2019-01-23 13:59:14 +01:00
Bartosz Taudul
3d2cc2d54d Display zone self time. 2019-01-23 13:44:11 +01:00
Bartosz Taudul
06292f1a3f Add zone child time getter. 2019-01-23 13:39:44 +01:00
Bartosz Taudul
ef17699887 Fix order of inline and base subframes. 2019-01-21 17:12:01 +01:00
Bartosz Taudul
c4f755e77b Update manual. 2019-01-20 19:44:07 +01:00
Bartosz Taudul
07ff01f4dd Update NEWS. 2019-01-20 19:35:25 +01:00
Bartosz Taudul
49b0a3500d Enable tracing incline functions in callstacks. 2019-01-20 19:33:37 +01:00
Bartosz Taudul
ddad475c19 Make it possible to store multiple frames at single frame address. 2019-01-20 19:11:48 +01:00
Bartosz Taudul
b9dc9f043c Make nohash operator() const. 2019-01-20 18:41:26 +01:00
Bartosz Taudul
bf7cc0a0d5 Add missing header for PRIxMAX. 2019-01-20 17:17:09 +01:00
Bartosz Taudul
481024e4ce Add libbacktrace to list of libraries in manual. 2019-01-20 17:08:56 +01:00
Bartosz Taudul
ed69b04cee Update NEWS. 2019-01-20 16:57:31 +01:00
Bartosz Taudul
9e7714c45a Decode callstack frames using libbacktrace. 2019-01-20 16:55:59 +01:00
Bartosz Taudul
79a5a860a5 Compile with libbacktrace on linux. 2019-01-20 16:55:33 +01:00
Bartosz Taudul
420a50feea Function inlining test. 2019-01-20 16:55:09 +01:00
Bartosz Taudul
0ce3dfaba7 Add modified libbacktrace.
https://github.com/ianlancetaylor/libbacktrace
5a99ff7fed66b8ea8f09c9805c138524a7035ece
2019-01-20 16:53:45 +01:00
Bartosz Taudul
d4e9baa0d9 Display time savings also as time percentage. 2019-01-20 03:16:32 +01:00
Rokas K. (rku)
31bbdfe2f2 Merged in rokups/tracy/mingw-support (pull request #26)
MingW support
2019-01-20 00:44:44 +00:00
Bartosz Taudul
f6edbccfc8 Fix triangle rendering. 2019-01-19 14:22:45 +01:00
Bartosz Taudul
ac791fd19f Update imgui to 1.67. Also update imguicolortextedit. 2019-01-19 14:05:54 +01:00
Rokas Kupstys
36c76456f7 Fix mistakes from MingW support commit. 2019-01-19 15:03:43 +02:00
Rokas Kupstys
8157e3a0b3 Fix builds with MingW. 2019-01-19 13:53:10 +02:00
Bartosz Taudul
32f0a27d3b Update manual. 2019-01-16 02:10:21 +01:00
Bartosz Taudul
92f3a4bba0 Add ZoneText and ZoneName to the C API. 2019-01-16 02:10:21 +01:00
Bartosz Taudul
49e270d8a6 Detect zone end without begin failure. 2019-01-16 00:45:48 +01:00
Bartosz Taudul
7b6b6862ed Add histogram, compare screenshots to README. 2019-01-15 19:55:41 +01:00
Bartosz Taudul
6a84520126 C is also supported. 2019-01-15 19:50:41 +01:00
Bartosz Taudul
c10e6051f2 Update manual. 2019-01-15 19:50:24 +01:00
Bartosz Taudul
b72d30af80 Allow disabling zone verification. 2019-01-15 18:59:05 +01:00
Bartosz Taudul
708fdfea49 Track memory alloc+free matching failures. 2019-01-15 18:56:26 +01:00
Bartosz Taudul
ecf9a299de Check for proper number of failure reasons. 2019-01-15 18:56:17 +01:00
Bartosz Taudul
76ab70a948 Simplify failure detection code. 2019-01-15 18:55:47 +01:00
Bartosz Taudul
3e3ee0ec2f There may be no source location associated with failure. 2019-01-15 18:54:41 +01:00
Bartosz Taudul
9944a73444 Store failure reason strings in Worker. 2019-01-15 18:42:15 +01:00
Bartosz Taudul
3cd97138fc Capture utility also displays failure messages. 2019-01-14 23:52:38 +01:00
Bartosz Taudul
5a3856dff0 Update NEWS. 2019-01-14 23:45:23 +01:00
Bartosz Taudul
57decf5875 Display failure information. 2019-01-14 23:42:58 +01:00
Bartosz Taudul
ac6e7439e2 TODO: track memory allocation tracking failures. 2019-01-14 23:26:32 +01:00
Bartosz Taudul
c3246ca3b5 Gracefully store failure states. 2019-01-14 23:22:31 +01:00
Bartosz Taudul
4dc339c933 Close connection when zone validation fails. 2019-01-14 23:12:11 +01:00
Bartosz Taudul
c3b67e4482 Perform zone stack validation. 2019-01-14 23:08:34 +01:00
Bartosz Taudul
dcc6bee607 Process zone validation messages. 2019-01-14 22:56:10 +01:00
Bartosz Taudul
8e52ab318b Send zone validation messages.
This is only performed for C API, as C++ scoped zones are always
properly ordered, due to RAII. With manual submission of zone begin and
end events there's no such guarantee.
2019-01-14 22:36:54 +01:00
Bartosz Taudul
970108fbbf Track zone id for verification purposes. 2019-01-14 22:36:54 +01:00
Bartosz Taudul
a2fd09d938 Add zone validation queue item. 2019-01-14 22:36:54 +01:00
Bartosz Taudul
1a8518dcc2 Allow filtering zones in on-demand mode. 2019-01-14 22:36:54 +01:00
Bartosz Taudul
1f0d1fdfdc C API prototype. 2019-01-14 21:07:29 +01:00
Bartosz Taudul
73cbd7dc3a Add deadlock test. 2019-01-14 18:48:16 +01:00
Bartosz Taudul
a5736a9c1b Change crash visuals in options menu. 2019-01-14 18:48:16 +01:00
Bartosz Taudul
12bd93ca5b Update manual. 2019-01-14 13:16:00 +01:00
Bartosz Taudul
a95e8a5424 Hide internals behind TracyVkCtx typedef. 2019-01-14 12:40:54 +01:00
Bartosz Taudul
070888f80d Make it possible to have multiple vulkan contexts.
API change!
2019-01-10 17:11:17 +01:00
Bartosz Taudul
ae288c6a6a Move vulkan macros at the end of TracyVulkan.hpp. 2019-01-10 16:41:04 +01:00
Bartosz Taudul
da8b01357d Proper skipping of locks in 0.4.1+ (fixes compare menu). 2019-01-08 17:19:04 +01:00
Bartosz Taudul
cb50cf9de6 Last time is stored in worker. 2019-01-08 15:44:29 +01:00
Bartosz Taudul
9c6d037859 Another unneeded capture. 2019-01-06 21:15:49 +01:00
Bartosz Taudul
096022a718 Proper string printing. 2019-01-06 21:15:26 +01:00
Bartosz Taudul
d1beb12dc3 Remove unused variable. 2019-01-06 21:14:02 +01:00
Bartosz Taudul
13a0ddfe03 No need to perform capture here. 2019-01-06 21:11:36 +01:00
Bartosz Taudul
fbe8eb3585 Fix initialization of atomics. 2019-01-06 21:09:56 +01:00
Bartosz Taudul
6a1c552c61 Reduce zone loading time. 2019-01-06 20:49:37 +01:00
Bartosz Taudul
d6953d5e73 Update NEWS. 2019-01-06 19:20:39 +01:00
Bartosz Taudul
dabdf1360f Display trace loading time. 2019-01-06 19:20:17 +01:00
Bartosz Taudul
77c9a8c407 Add support for notification text in View. 2019-01-06 19:14:24 +01:00
Bartosz Taudul
980c54e349 Track trace loading time. 2019-01-06 19:09:50 +01:00
Bartosz Taudul
5ac26ce084 Init common Worker variables in header. 2019-01-06 19:04:50 +01:00
Bartosz Taudul
a313ed4720 Track separate time offset for GPU times.
This is second version of 0.4.2 dump file format. Previous 0.4.2 format
cannot be read anymore.

041/aa.tracy (0.4.1) {18987 KB} -> 042/aa.tracy (0.4.2) {10051 KB}  52.94% size change
041/android.tracy (0.4.1) {696753 KB} -> 042/android.tracy (0.4.2) {542738 KB}  77.90% size change
041/asset-new.tracy (0.4.1) {97163 KB} -> 042/asset-new.tracy (0.4.2) {78402 KB}  80.69% size change
041/asset-new-id.tracy (0.4.1) {105683 KB} -> 042/asset-new-id.tracy (0.4.2) {84341 KB}  79.81% size change
041/asset-old.tracy (0.4.1) {100205 KB} -> 042/asset-old.tracy (0.4.2) {80688 KB}  80.52% size change
041/big.tracy (0.4.1) {2246014 KB} -> 042/big.tracy (0.4.2) {939578 KB}  41.83% size change
041/crash.tracy (0.4.1) {143 KB} -> 042/crash.tracy (0.4.2) {131 KB}  91.37% size change
041/crash2.tracy (0.4.1) {3411 KB} -> 042/crash2.tracy (0.4.2) {1420 KB}  41.63% size change
041/darkrl.tracy (0.4.1) {31818 KB} -> 042/darkrl.tracy (0.4.2) {15762 KB}  49.54% size change
041/darkrl2.tracy (0.4.1) {18778 KB} -> 042/darkrl2.tracy (0.4.2) {7945 KB}  42.31% size change
041/darkrl-old.tracy (0.4.1) {151346 KB} -> 042/darkrl-old.tracy (0.4.2) {67449 KB}  44.57% size change
041/deadlock.tracy (0.4.1) {53 KB} -> 042/deadlock.tracy (0.4.2) {52 KB}  98.55% size change
041/gn-opengl.tracy (0.4.1) {45860 KB} -> 042/gn-opengl.tracy (0.4.2) {29005 KB}  63.25% size change
041/gn-vulkan.tracy (0.4.1) {45618 KB} -> 042/gn-vulkan.tracy (0.4.2) {29352 KB}  64.34% size change
041/long.tracy (0.4.1) {1583550 KB} -> 042/long.tracy (0.4.2) {1182800 KB}  74.69% size change
041/mem.tracy (0.4.1) {1243058 KB} -> 042/mem.tracy (0.4.2) {1369067 KB}  110.14% size change
041/multi.tracy (0.4.1) {14519 KB} -> 042/multi.tracy (0.4.2) {8000 KB}  55.10% size change
041/new.tracy (0.4.1) {1439 KB} -> 042/new.tracy (0.4.2) {1105 KB}  76.75% size change
041/q3bsp-mt.tracy (0.4.1) {1414323 KB} -> 042/q3bsp-mt.tracy (0.4.2) {949855 KB}  67.16% size change
041/q3bsp-st.tracy (0.4.1) {301334 KB} -> 042/q3bsp-st.tracy (0.4.2) {240347 KB}  79.76% size change
041/selfprofile.tracy (0.4.1) {399648 KB} -> 042/selfprofile.tracy (0.4.2) {197704 KB}  49.47% size change
041/tbrowser.tracy (0.4.1) {13052 KB} -> 042/tbrowser.tracy (0.4.2) {9503 KB}  72.81% size change
041/test.tracy (0.4.1) {60309 KB} -> 042/test.tracy (0.4.2) {40700 KB}  67.49% size change
041/virtualfile_hc.tracy (0.4.1) {108967 KB} -> 042/virtualfile_hc.tracy (0.4.2) {72424 KB}  66.46% size change
041/zfile_hc.tracy (0.4.1) {58814 KB} -> 042/zfile_hc.tracy (0.4.2) {39418 KB}  67.02% size change
2019-01-03 21:52:43 +01:00
Bartosz Taudul
d49b005900 Display dump file size change in the update utility. 2018-12-30 23:47:43 +01:00
Bartosz Taudul
f8ef5b726a Store time deltas, instead of absolute time in trace dumps.
This change greatly reduces the size of saved dumps, but increase the
cost of processing during loading. One notable outlier in the dataset
below is mem.tracy, which increased in size, even if changes in the
memory dump saving scheme decrease size of the other traces.

041/aa.tracy (0.4.1) {18987 KB} -> 042/aa.tracy (0.4.2) {10140 KB}  53.40% size change
041/android.tracy (0.4.1) {696753 KB} -> 042/android.tracy (0.4.2) {542738 KB}  77.90% size change
041/asset-new.tracy (0.4.1) {97163 KB} -> 042/asset-new.tracy (0.4.2) {78402 KB}  80.69% size change
041/asset-new-id.tracy (0.4.1) {105683 KB} -> 042/asset-new-id.tracy (0.4.2) {84341 KB}  79.81% size change
041/asset-old.tracy (0.4.1) {100205 KB} -> 042/asset-old.tracy (0.4.2) {80688 KB}  80.52% size change
041/big.tracy (0.4.1) {2246014 KB} -> 042/big.tracy (0.4.2) {943083 KB}  41.99% size change
041/crash.tracy (0.4.1) {143 KB} -> 042/crash.tracy (0.4.2) {131 KB}  91.39% size change
041/crash2.tracy (0.4.1) {3411 KB} -> 042/crash2.tracy (0.4.2) {1425 KB}  41.80% size change
041/darkrl.tracy (0.4.1) {31818 KB} -> 042/darkrl.tracy (0.4.2) {15897 KB}  49.96% size change
041/darkrl2.tracy (0.4.1) {18778 KB} -> 042/darkrl2.tracy (0.4.2) {8002 KB}  42.62% size change
041/darkrl-old.tracy (0.4.1) {151346 KB} -> 042/darkrl-old.tracy (0.4.2) {67945 KB}  44.89% size change
041/deadlock.tracy (0.4.1) {53 KB} -> 042/deadlock.tracy (0.4.2) {52 KB}  98.55% size change
041/gn-opengl.tracy (0.4.1) {45860 KB} -> 042/gn-opengl.tracy (0.4.2) {30983 KB}  67.56% size change
041/gn-vulkan.tracy (0.4.1) {45618 KB} -> 042/gn-vulkan.tracy (0.4.2) {31349 KB}  68.72% size change
041/long.tracy (0.4.1) {1583550 KB} -> 042/long.tracy (0.4.2) {1225316 KB}  77.38% size change
041/mem.tracy (0.4.1) {1243058 KB} -> 042/mem.tracy (0.4.2) {1369291 KB}  110.15% size change
041/multi.tracy (0.4.1) {14519 KB} -> 042/multi.tracy (0.4.2) {8110 KB}  55.86% size change
041/new.tracy (0.4.1) {1439 KB} -> 042/new.tracy (0.4.2) {1108 KB}  77.01% size change
041/q3bsp-mt.tracy (0.4.1) {1414323 KB} -> 042/q3bsp-mt.tracy (0.4.2) {949855 KB}  67.16% size change
041/q3bsp-st.tracy (0.4.1) {301334 KB} -> 042/q3bsp-st.tracy (0.4.2) {240347 KB}  79.76% size change
041/selfprofile.tracy (0.4.1) {399648 KB} -> 042/selfprofile.tracy (0.4.2) {197713 KB}  49.47% size change
041/tbrowser.tracy (0.4.1) {13052 KB} -> 042/tbrowser.tracy (0.4.2) {9503 KB}  72.81% size change
041/test.tracy (0.4.1) {60309 KB} -> 042/test.tracy (0.4.2) {40700 KB}  67.49% size change
041/virtualfile_hc.tracy (0.4.1) {108967 KB} -> 042/virtualfile_hc.tracy (0.4.2) {72839 KB}  66.85% size change
041/zfile_hc.tracy (0.4.1) {58814 KB} -> 042/zfile_hc.tracy (0.4.2) {39608 KB}  67.35% size change
2018-12-30 23:42:17 +01:00
Bartosz Taudul
59ed5775d9 Release v0.4.1. 2018-12-30 17:57:55 +01:00
Bartosz Taudul
6c9337563d Update year in copyright notice. 2018-12-30 17:51:17 +01:00
Bartosz Taudul
370eda557c Manual improvements. 2018-12-30 17:50:52 +01:00
302 changed files with 249740 additions and 35499 deletions

View File

@@ -2,18 +2,26 @@ version: '{build}'
platform:
- x64
image:
- Visual Studio 2017
- Ubuntu
before_build:
- cmd: cd profiler\build\win32
- cmd: nuget restore
- cmd: cd ..\..\..
- Visual Studio 2019
- Ubuntu1804
install:
- cmd: cd c:\tools\vcpkg
- cmd: git pull
- cmd: bootstrap-vcpkg.bat
- cmd: vcpkg install freetype glfw3 capstone[core,arm,arm64,x86] --triplet x64-windows-static
- cmd: vcpkg integrate install
- cmd: cd %APPVEYOR_BUILD_FOLDER%
build_script:
- cmd: msbuild .\update\build\win32\update.vcxproj
- cmd: msbuild .\profiler\build\win32\Tracy.vcxproj
- cmd: msbuild .\capture\build\win32\capture.vcxproj
- sh: sudo apt-get update && sudo apt-get -y install libglfw3-dev libgtk2.0-dev
- cmd: msbuild .\library\win32\TracyProfiler.vcxproj /property:Configuration=Release
- sh: sudo apt-get update && sudo apt-get -y install libglfw3-dev libgtk2.0-dev libcapstone-dev
- sh: make -C update/build/unix debug release
- sh: make -C profiler/build/unix debug release
- sh: make -C capture/build/unix debug release
- sh: make -C library/unix debug release
- sh: make -C test
- sh: make -C test clean
- sh: make -C test TRACYFLAGS=-DTRACY_ON_DEMAND
test: off

1
.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1 @@
github: wolfpld

BIN
.github/sponsor.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 KiB

30
.github/workflows/gcc.yml vendored Normal file
View File

@@ -0,0 +1,30 @@
name: gcc
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install libraries
run: sudo apt-get update && sudo apt-get -y install libglfw3-dev libgtk2.0-dev libcapstone-dev
- name: Profiler GUI
run: make -C profiler/build/unix debug release
- name: Update utility
run: make -C update/build/unix debug release
- name: Capture utility
run: make -C capture/build/unix debug release
- name: Library
run: make -C library/unix debug release
- name: Test application
run: |
make -C test
make -C test clean
make -C test TRACYFLAGS=-DTRACY_ON_DEMAND

28
.github/workflows/latex.yml vendored Normal file
View File

@@ -0,0 +1,28 @@
name: Manual
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Fix stupidity
run: |
cp AUTHORS AUTHORS.
cp LICENSE LICENSE.
- name: Compile LaTeX
uses: xu-cheng/latex-action@v2
with:
working_directory: manual
root_file: tracy.tex
- uses: actions/upload-artifact@v2
with:
name: manual
path: manual/tracy.pdf

47
.github/workflows/msvc.yml vendored Normal file
View File

@@ -0,0 +1,47 @@
name: MSVC
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: windows-2019
steps:
- uses: actions/checkout@v2
- uses: microsoft/setup-msbuild@v1.0.0
- name: Integrate vcpkg
run: vcpkg integrate install
- name: Build vcpkg libraries
run: vcpkg install freetype glfw3 capstone[arm,arm64,x86] --triplet x64-windows-static
- name: Profiler GUI Debug
run: msbuild .\profiler\build\win32\Tracy.vcxproj /property:Configuration=Debug /property:Platform=x64
- name: Profiler GUI Release
run: msbuild .\profiler\build\win32\Tracy.vcxproj /property:Configuration=Release /property:Platform=x64
- name: Update utility Debug
run: msbuild .\update\build\win32\update.vcxproj /property:Configuration=Debug /property:Platform=x64
- name: Update utility Release
run: msbuild .\update\build\win32\update.vcxproj /property:Configuration=Release /property:Platform=x64
- name: Capture utility Debug
run: msbuild .\capture\build\win32\capture.vcxproj /property:Configuration=Debug /property:Platform=x64
- name: Capture utility Release
run: msbuild .\capture\build\win32\capture.vcxproj /property:Configuration=Release /property:Platform=x64
- name: Library
run: msbuild .\library\win32\TracyProfiler.vcxproj /property:Configuration=Release /property:Platform=x64
- name: Package binaries
run: |
mkdir bin
mkdir bin\dev
copy profiler\build\win32\x64\Release\Tracy.exe bin
copy update\build\win32\x64\Release\update.exe bin
copy capture\build\win32\x64\Release\capture.exe bin
copy library\win32\x64\Release\TracyProfiler.dll bin\dev
copy library\win32\x64\Release\TracyProfiler.lib bin\dev
7z a Tracy.7z bin
- uses: actions/upload-artifact@v2
with:
path: Tracy.7z

20
.gitignore vendored
View File

@@ -12,10 +12,18 @@ imgui.ini
test/tracy_test
test/tracy_test.exe
*/build/unix/*-*
manual/tracy.aux
manual/tracy.log
manual/tracy.out
manual/tracy.pdf
manual/tracy.synctex.gz
manual/tracy.toc
manual/t*.aux
manual/t*.log
manual/t*.out
manual/t*.pdf
manual/t*.synctex.gz
manual/t*.toc
manual/t*.bbl
manual/t*.blg
profiler/build/win32/packages
profiler/build/win32/Tracy.aps
# include the vcpkg install script but not the files it produces
vcpkg/*
!vcpkg/install_vcpkg_dependencies.bat
.deps/
.dirstamp

16
AUTHORS
View File

@@ -1,7 +1,11 @@
Bartosz Taudul <wolf.pld@gmail.com>
Kamil Klimek <kamil.klimek@sharkbits.com>
Bartosz Szreder <zgredder@gmail.com>
Arvid Gerstmann <dev@arvid-g.de>
Rokas Kupstys <rokups@zoho.com>
Till Rathmann <till.rathmann@gmx.de>
Sherief Farouk <sherief.personal@gmail.com>
Kamil Klimek <kamil.klimek@sharkbits.com> (initial find zone implementation)
Bartosz Szreder <zgredder@gmail.com> (view/worker split)
Arvid Gerstmann <dev@arvid-g.de> (compatibility fixes)
Rokas Kupstys <rokups@zoho.com> (compatibility fixes, initial CI work, MingW support)
Till Rathmann <till.rathmann@gmx.de> (DLL support)
Sherief Farouk <sherief.personal@gmail.com> (compatibility fixes)
Dedmen Miller <dedmen@dedmen.de> (find zone bug fixes, improvements)
Michał Cichoń <michcic@gmail.com> (OSX call stack decoding backport)
Thales Sabino <thales@codeplay.com> (OpenCL support)
Andrew Depke <andrewdepke@gmail.com> (Direct3D 12 support)

77
FAQ.md
View File

@@ -1,77 +0,0 @@
# A quick tracy FAQ
### I already use VTune/perf/Very Sleepy/callgrind/MSVC profiler.
These are statistical profilers, which can be used to find hot spots in the code. This is very useful, but it won't show you the underlying reason for semi-random frame stutter that may occur every couple of seconds.
### You can use Telemetry for that.
Telemetry license costs about 8000 $ per year. Tracy is open source software. Telemetry doesn't have Lua bindings.
### You can use the free Brofiler. Crytek does use it, so it has to be good.
After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
### So tracy is supposedly faster?
My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second.
### Bullshit, RAD is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per second in real-time!"
Tracy can perform network transfer of 15 million zones per second. Should the client and server be on separate machines, this number will be even higher, but you will need more than a gigabit link to achieve the maximum throughput. [Click here for a video of a max-throughput capture.](https://www.youtube.com/watch?v=DSMIHShKGAc)
### Can I connect to my application at any time and start profiling at this moment?
By default no, all events are registered from the beginning of program execution and are waiting in a queue. There's a separate on-demand mode, enabled by using a `TRACY_ON_DEMAND` macro.
### Am I seeing correctly that the profiler allocates one gigabyte of memory per second?
Only in extreme cases. Normal usage has much lower memory pressure.
### Why do you do magic with the static initialization order? Everyone says that's a bad practice.
It allows tracking construction of static objects and memory allocations performed before main() is entered.
### There's no support for consoles.
Welp. But there's mobile support.
### I do need console support.
The code is open. Write your own, then send a patch.
### I don't believe you can capture a zone in 15 ns. Show me the code!
Following is the annotated assembly code (generated from C++ sources) that's responsible for logging start of the zone:
```
call qword ptr [__imp_GetCurrentThreadId]
mov r14d,eax
mov qword ptr [rsp+0F0h],r14 // save thread id for later use
mov r12d,10h
mov rax,qword ptr gs:[58h] // TLS
mov r15,qword ptr [rax] // queue address
mov rdi,qword ptr [r12+r15] // data address
mov rbp,qword ptr [rdi+20h] // buffer counter
mov rbx,rbp
and ebx,7Fh // 128 item buffer
jne Application::InnerLoop+66h --+
mov rdx,rbp |
mov rcx,rdi |
call enqueue_begin_alloc | // reclaim/alloc next buffer
shl rbx,5 <---------------------+ // buffer items are 32 bytes
add rbx,qword ptr [rdi+40h]
mov byte ptr [rbx],4 // queue item type
rdtscp
mov dword ptr [rbx+19h],ecx // cpu id
shl rdx,20h
or rax,rdx // 64 bit timestamp
mov qword ptr [rbx+1],rax
mov qword ptr [rbx+9],r14 // thread id
lea rax,[__tracy_source_location] // static struct address
mov qword ptr [rbx+11h],rax
lea rax,[rbp+1] // increment buffer counter
mov qword ptr [rdi+20h],rax
```
There's also a second code block, for the end of the zone. It's similar, but a bit smaller, as it can use some of the variables that were retrieved above.

View File

@@ -1,7 +1,7 @@
Tracy Profiler (https://bitbucket.org/wolfpld/tracy) is licensed under the
Tracy Profiler (https://github.com/wolfpld/tracy) is licensed under the
3-clause BSD license.
Copyright (c) 2017-2018, Bartosz Taudul <wolf.pld@gmail.com>
Copyright (c) 2017-2020, Bartosz Taudul <wolf.pld@gmail.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without

254
NEWS
View File

@@ -6,9 +6,261 @@ Note: Release numbers are nothing more than numbers. There are some
"missing" versions due to trace file changes during development. This is not
a mistake.
v0.5 (xxxx-xx-xx)
v0.7 (2020-06-11)
-----------------
This is the last release which will be able to load pre-v0.6 traces. Use the
update utility to convert your old traces now!
- chrome:tracing importer now imports zone metadata from "args" key.
- Added display of statistical mode to find zone menu.
- Automatic stack sampling is now available on windows.
- Properly handle tracing on long-running systems.
- Message list entries can now show associated frame image.
- Call stack window will now display module names.
- Symbol location in call stack window may now also display symbol address.
- Statistics menu can now be used to display call stack sampling data or
list available symbols.
- All call paths leading to the sampled instruction in a call stack can be
now displayed.
- Frame image compression ratio (lossless in-memory compression, not taking
into account DXT compression) is displayed in playback window.
- Allow reconnection straight from the discard data dialog.
- Added ability to set custom names for locks.
- Improved handling of network ports.
- Added time percentage display to instrumentation statistics.
- Display of ghost zones (generated from automated call stack sampling).
- Notify when empty labels display is enabled.
- Small fragments of executable code will be now sent from client to server.
- Added notification about query backlog.
- Fixed performance problem with query backlog.
- Display number of in-flight queries, in addition to query backlog.
- Improved failure reports.
- The capture utility will connect to localhost by default.
- Added optional support for QPC timer on windows.
- Complete rewrite of source file viewer. It is now 100% reliable when going
to a source location.
- Symbol source view was added.
- Extension of source file viewer.
- Can display source file, assembly view, or both at the same time.
- May include display of statistical profiling data.
- Ability to switch between source files which were used to build the
symbol.
- Ability to switch between inlined functions which are incorporated into
the symbol.
- Graphical representation of control flow in program.
- Display of micro-architectural data for each assembly instruction.
- Tracking register dependencies between assembly instructions.
- Disassembly may be saved to a file, in order to be processed by external
tools.
- If the default listening port is occupied, profiler will now try listening
on other ports.
- Added possibility to perform source file names substitution.
- Profiler windows can be now docked.
- CPU usage tooltip now displays a list of running threads.
- Added possibility to filter discovered clients list.
- Source files are now cached during capture.
- Profiler will now display a popup when application crashes.
- Added ability to send simple integral values as extra payload for zones.
- Per-frame zone times on the frames plot can now display self time.
- Ability to bind only on localhost interface.
- OpenCL profiling.
- Direct3D 12 profiling.
v0.6.3 (2020-02-13)
-------------------
- Fixed performance issues with loading saved traces on Ryzen CPUs.
- Profiler window contents are now properly updated during window resize.
- Improved tid to pid mapping on windows.
- Zero length and unfinished zones are no longer taken into account for
statistics.
- Build files for shared library are now available (experimental).
- GPU zones now also have "active" parameter.
- Further reduction of memory usage and on-disk trace size.
- Replaced ska::flat_hash_map with robin-hood-hashing.
- Speed-up rendering of long lists of items.
- Exact event time is displayed in some places in the UI.
- Memory allocation lists can now be sorted.
- Added display of trace file compression ratio.
- Optional Zstd compression of trace files.
- Frame images are now internally compressed using Zstd (instead of LZ4).
- Fix display of continuous frame set tooltips.
v0.6.2 (2019-12-30)
-------------------
- Improved call stack decoding on OSX.
- Collection of CPU topology data.
- C API now supports allocated source locations.
- Added chrome:tracing importer.
- Allow merging of ZoneText() strings.
- Time distribution can now show both exclusive and inclusive times.
- Display proper value of selection time in find zone menu.
- Implemented limiting find zone search to a specified time range.
- Highlight hovered zone from find zone menu zone list on the histogram.
- Allow copying user data directory location to the clipboard.
v0.6.1 (2019-11-28)
-------------------
- Dropped support for pre-v0.5 traces.
- Improve BSD support.
- GPU zone CPU thread highlight will now highlight whole thread, not only
the thread name.
- Added CPU thread highlight for CPU data items.
- Client parameters may be now set from the server.
- Minor UI fixes.
v0.6 (2019-11-17)
-----------------
This is the last release which will be able to load pre-v0.5 traces. Use the
update utility to convert your old traces now!
- Dropped support for pre-v0.4 traces.
- Major memory usage decrease.
- Significant network bandwidth decrease.
- Implemented context switch capture on selected platforms.
- Zone timings in various UI places can now take into account only the
time when the thread was executing.
- Zone information window can now display regions in which thread was
suspended by the operating system.
- CPUs on which the zone was running are enumerated.
- Thread activity regions can be graphed on the timeline.
- API breakage: SetThreadName() now only works on current thread.
- Fixed thread name retrieval after thread is destroyed.
- Added number of CPU cores to host info.
- Limited number of possible source locations to 64K.
- Limited supported capture length to 1.6 days.
- CPU cores are now displayed on the timeline.
- Thread execution workload is displayed, including threads from external
programs.
- Thread migrations across CPU cores can be graphed.
- System-wide workload distribution is now plotted on the timeline.
- Added "CPU data" window showing programs competing for CPU during the
capture.
- Switched to using native thread identifiers (relatively small numbers), as
opposed to pthreads identifiers, which in reality were pointers.
- Improved thread name discovery if context switch capture is enabled.
- Per-trace state is now preserved between profiling sessions:
- Timeline view position.
- Item categories draw/hide settings.
- Timeline zones will be highlighted using a different color, when a
matching time range is selected on histogram.
- Per-frame zone times are now displayed on the frames plot when a zone is
selected in the find zone menu.
- Zone color is now displayed in zone information window.
- Zone colors can now be determined basing on depth and thread or source
location.
- Thread colors are displayed across the profiler application.
- Frame times can be now compared.
- Expose more lock handling functionality.
- Network port can be now specified by the user.
- Proper handling of multithreaded Vulkan code.
- Added extreme compression level in update utility.
- Added time distribution data in the zone information window.
- Trace file name is now displayed in trace information window.
- Annotations can be now added to the timeline.
- Server now performs network data retrieval and decompression on a dedicated
thread.
- Added examples of Tracy integration.
- Allow grouping of zones in the find zone menu by zone parent or with no
grouping.
- Zone list in the statistics window can be now filtered.
- Implemented configuration of plots.
- Messages can now collect call stacks.
v0.5 (2019-08-10)
-----------------
This is the last release which will be able to load pre-v0.4 traces. Use the
update utility to convert your old traces now!
- Major decrease of trace dump file size.
- Major optimizations across the board.
- Vcpkg is now used for library management on Windows.
- Display dump file size change in the update utility.
- Added notification area.
- Display trace loading time.
- Display background processing tasks after trace is loaded.
- Display trace save notification.
- Show crash icon, if there was a crash.
- Added C API.
- Profiling session may now gracefully terminate, due to incorrect
instrumentation. A popup with termination reason will be displayed.
- Call stack improvements.
- Call stack frames now have a proper source file and file line
information on Linux.
- Single call stack frame may now have multiple entries, representing
inlined function calls.
- Call stack grouping in the find zone menu now has a special display
mode.
- Call stack memory allocations tree improvements:
- Add top-down variant to complement the previously available bottom-up
one.
- Add ability to group tree nodes by function name.
- Allow restricting tree to display only active allocations.
- Added support for Lua call stack capture.
- Self time of zones may be now displayed in the find zone menu.
- Added ability to disconnect from a client.
- Find zone groups can now be sorted by mean time per call.
- Zones displayed in the find zone menu can be now grouped by order of
appearance, execution time or name.
- Time is now displayed without trailing fractional zeros (e.g. "2.5 ms"
instead of "2.50 ms").
- Child zones displayed in zone info window can be now grouped by source
location.
- Selected or hovered lock is now highlighted on the timeline.
- Locks are now grouped into single and multithreaded (contended and
uncontended) in the options menu locks list.
- On broken platforms the profiler can now be initialized as needed (and
possible), taking a performance and functionality hit.
- User experience improvements in the graphical profiler.
- Thread position and height is now animated, to eliminate flickering that
was happening when depth of displayed zones was changing.
- Zooming in/out using the mouse wheel is now animated.
- Plot range adjustment is now animated.
- Various other UI improvements.
- System CPU usage is now being monitored.
- Threads that have nothing to display in the current view are now hidden by
default.
- Dimmed-out the timeline outside the profiling area.
- Source file view can now be opened also from statistics menu.
- Display standard deviation in find zone and compare traces menus.
- Display zone messages in zone information window.
- Display order of threads can be changed in the options menu.
- Prevent deadlocks by querying socket send buffer size.
- Frame set statistics can be now limited to frames visible on the screen.
- Messages can be now colored.
- Zone selection in compare traces menu can be now linked to the other
trace.
- Added support for frame image (screen shot) storage.
- Implemented ability to cut off outliers on histograms.
- Zone or frame that is currently hovered by the mouse cursor will be
highlighted on the histogram.
- Server now displays available clients in the local network.
- Source code whitespace visibility can now be enabled or disabled.
- Profiler will now check if proper timer readings can be performed on
x86/x64.
- Application can now log app-specific information, similarly to how the
host info reports system information.
- Message list will automatically scroll down to the most recent message.
- Feature will disable when the list is scrolled by user.
- To re-enable, scroll to the bottom of the list.
- Message list can be now filtered.
- A notification popup will be displayed during trace cleanup.
- Source file view won't be available if a source file is newer than the
capture.
- Added ability to set custom trace descriptions.
- Added frame time target lines.
- FPS counts are now displayed next to frame times.
- GPU drift value can be now automatically measured.
- Connection window is now a popup hidden under a dedicated button.
v0.4.1 (2018-12-30)
-------------------
- Active frame set can be now switched by clicking on a frame set on the
timeline.
- Add ability to go to a specified frame.

View File

@@ -1,69 +1,22 @@
# Tracy Profiler
[![Build status](https://ci.appveyor.com/api/projects/status/968a88arq06gm3el/branch/master?svg=true)](https://ci.appveyor.com/project/wolfpld/tracy/branch/master)
[![Sponsor](.github/sponsor.png)](https://github.com/sponsors/wolfpld/)
Tracy is a real time, nanosecond resolution frame profiler that can be used for remote or embedded telemetry of your application. It can profile CPU (C++, Lua), GPU (OpenGL, Vulkan) and memory. It also can display locks held by threads and their interactions with each other.
### A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.
Tracy supports profiling CPU (C, C++11, Lua), GPU (OpenGL, Vulkan, OpenCL, Direct3D 12), memory, locks, context switches, per-frame screenshots and more.
For usage **and build process** instructions, consult the user manual [at the following address](https://github.com/wolfpld/tracy/releases).
![](doc/profiler.png)
Tracy requires compiler support for C++11, Thread Local Storage and a way to workaround static initialization order fiasco. There are no other requirements. The following platforms are confirmed to be working (this is not a complete list):
![](doc/profiler2.png)
- Windows (x86, x64)
- Linux (x86, x64, ARM, ARM64)
- Android (ARM, x86)
- FreeBSD (x64)
- Cygwin (x64)
- WSL (x64)
- OSX (x64)
The following compilers are supported:
- MSVC
- gcc
- clang
[Changelog](NEWS)
[Introduction to Tracy Profiler v0.2](https://www.youtube.com/watch?v=fB5B46lbapc)
[New features in Tracy Profiler v0.3](https://www.youtube.com/watch?v=3SXpDpDh2Uo)
[New features in Tracy Profiler v0.4](https://www.youtube.com/watch?v=eAkgkaO8B9o)
[A quick FAQ.](FAQ.md)
[List of changes.](NEWS)
### High-level overview
![](doc/design.svg)
Tracy is split into client and server side. The client side collects events using a high-efficiency queue and awaits for an incoming connection. The server part connects to client and receives collected data from the client, which is then reconstructed into a viewable timeline. The transfer is performed using a TCP connection.
### Performance impact
To check how much slowdown is introduced by using Tracy, I have profiled [etcpak](https://bitbucket.org/wolfpld/etcpak), which is the fastest ETC texture compression utility there is. I used an 8192×8192 test image as input data and instrumented everything down to the 4×4 pixel block compression function (that's 4 million blocks to compress). It should be noted that Tracy needs to calibrate its internal timers at each run. This introduces a delay of 115 ms (on my machine), which is negligible when doing lengthy profiling runs, but it skews the results of etcpak timing. The following times have this delay subtracted, to give focus on zone collection impact, which is the thing that really matters here.
| Scenario | Zones | Clean run | Profiling run | Difference |
|-------------------------------------------------------|---------|-----------|---------------|------------|
| Compression of an image to ETC1 format | 4194568 | 0.94 s | 1.003 s | +0.063 s |
| Compression of an image to ETC2 format, with mip-maps | 5592822 | 1.034 s | 1.119 s | +0.085 s |
In both scenarios the per-zone time cost is at ~15 ns. This is in line with the measured 8 ns single event collection time (each zone has to report start and end event).
## Usage instructions
The user manual for Tracy is available [at the following address](https://bitbucket.org/wolfpld/tracy/downloads/tracy.pdf). It provides information about the integration process, required code markup and so on.
## Features
#### Marking locks
![](doc/locks.png)
#### Plotting data
![](doc/plot.png)
#### Message log
![](doc/messages.png)
#### Approximation of capture cost
![](doc/cost.png)
[New features in Tracy Profiler v0.4](https://www.youtube.com/watch?v=eAkgkaO8B9o)
[New features in Tracy Profiler v0.5](https://www.youtube.com/watch?v=P6E7qLMmzTQ)
[New features in Tracy Profiler v0.6](https://www.youtube.com/watch?v=uJkrFgriuOo)
[New features in Tracy Profiler v0.7](https://www.youtube.com/watch?v=_hU7vw00MZ4)

7
TODO Normal file
View File

@@ -0,0 +1,7 @@
"Would be nice to have" list for 1.0 release:
=============================================
* Pack queue items tightly in the queues.
* Use level-of-detail system for plots.
* Use per-thread lock data structures.
* Use DTrace for BSD/OSX context switch capture.

View File

@@ -17,13 +17,19 @@
#define ZoneScopedNC(x,y)
#define ZoneText(x,y)
#define ZoneTextV(x,y,z)
#define ZoneName(x,y)
#define ZoneNameV(x,y,z)
#define ZoneValue(x)
#define ZoneValueV(x,y)
#define FrameMark
#define FrameMarkNamed(x)
#define FrameMarkStart(x)
#define FrameMarkEnd(x)
#define FrameImage(x,y,z,w,a)
#define TracyLockable( type, varname ) type varname;
#define TracyLockableN( type, varname, desc ) type varname;
#define TracySharedLockable( type, varname ) type varname;
@@ -31,11 +37,16 @@
#define LockableBase( type ) type
#define SharedLockableBase( type ) type
#define LockMark(x) (void)x;
#define LockableName(x,y,z);
#define TracyPlot(x,y)
#define TracyPlotConfig(x,y)
#define TracyMessage(x,y)
#define TracyMessageL(x)
#define TracyMessageC(x,y,z)
#define TracyMessageLC(x,y)
#define TracyAppInfo(x,y)
#define TracyAlloc(x,y)
#define TracyFree(x)
@@ -53,6 +64,14 @@
#define TracyAllocS(x,y,z)
#define TracyFreeS(x,y)
#define TracyMessageS(x,y,z)
#define TracyMessageLS(x,y)
#define TracyMessageCS(x,y,z,w)
#define TracyMessageLCS(x,y,z)
#define TracyParameterRegister(x)
#define TracyParameterSetup(x,y,z,w)
#else
#include "client/TracyLock.hpp"
@@ -77,13 +96,19 @@
#define ZoneScopedNC( name, color ) ZoneNamedNC( ___tracy_scoped_zone, name, color, true )
#define ZoneText( txt, size ) ___tracy_scoped_zone.Text( txt, size );
#define ZoneTextV( varname, txt, size ) varname.Text( txt, size );
#define ZoneName( txt, size ) ___tracy_scoped_zone.Name( txt, size );
#define ZoneNameV( varname, txt, size ) varname.Name( txt, size );
#define ZoneValue( value ) ___tracy_scoped_zone.Value( value );
#define ZoneValueV( varname, value ) varname.Value( value );
#define FrameMark tracy::Profiler::SendFrameMark();
#define FrameMarkNamed( name ) tracy::Profiler::SendFrameMark( name, tracy::QueueType::FrameMarkMsg );
#define FrameMark tracy::Profiler::SendFrameMark( nullptr );
#define FrameMarkNamed( name ) tracy::Profiler::SendFrameMark( name );
#define FrameMarkStart( name ) tracy::Profiler::SendFrameMark( name, tracy::QueueType::FrameMarkMsgStart );
#define FrameMarkEnd( name ) tracy::Profiler::SendFrameMark( name, tracy::QueueType::FrameMarkMsgEnd );
#define FrameImage( image, width, height, offset, flip ) tracy::Profiler::SendFrameImage( image, width, height, offset, flip );
#define TracyLockable( type, varname ) tracy::Lockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, #type " " #varname, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracyLockableN( type, varname, desc ) tracy::Lockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, desc, __FILE__, __LINE__, 0 }; return &srcloc; }() };
#define TracySharedLockable( type, varname ) tracy::SharedLockable<type> varname { [] () -> const tracy::SourceLocationData* { static const tracy::SourceLocationData srcloc { nullptr, #type " " #varname, __FILE__, __LINE__, 0 }; return &srcloc; }() };
@@ -91,16 +116,27 @@
#define LockableBase( type ) tracy::Lockable<type>
#define SharedLockableBase( type ) tracy::SharedLockable<type>
#define LockMark( varname ) static const tracy::SourceLocationData __tracy_lock_location_##varname { nullptr, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; varname.Mark( &__tracy_lock_location_##varname );
#define LockableName( varname, txt, size ) varname.CustomName( txt, size );
#define TracyPlot( name, val ) tracy::Profiler::PlotData( name, val );
#define TracyPlotConfig( name, type ) tracy::Profiler::ConfigurePlot( name, type );
#define TracyMessage( txt, size ) tracy::Profiler::Message( txt, size );
#define TracyMessageL( txt ) tracy::Profiler::Message( txt );
#define TracyAppInfo( txt, size ) tracy::Profiler::MessageAppInfo( txt, size );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyMessage( txt, size ) tracy::Profiler::Message( txt, size, TRACY_CALLSTACK );
# define TracyMessageL( txt ) tracy::Profiler::Message( txt, TRACY_CALLSTACK );
# define TracyMessageC( txt, size, color ) tracy::Profiler::MessageColor( txt, size, color, TRACY_CALLSTACK );
# define TracyMessageLC( txt, color ) tracy::Profiler::MessageColor( txt, color, TRACY_CALLSTACK );
# define TracyAlloc( ptr, size ) tracy::Profiler::MemAllocCallstack( ptr, size, TRACY_CALLSTACK );
# define TracyFree( ptr ) tracy::Profiler::MemFreeCallstack( ptr, TRACY_CALLSTACK );
#else
# define TracyMessage( txt, size ) tracy::Profiler::Message( txt, size, 0 );
# define TracyMessageL( txt ) tracy::Profiler::Message( txt, 0 );
# define TracyMessageC( txt, size, color ) tracy::Profiler::MessageColor( txt, size, color, 0 );
# define TracyMessageLC( txt, color ) tracy::Profiler::MessageColor( txt, color, 0 );
# define TracyAlloc( ptr, size ) tracy::Profiler::MemAlloc( ptr, size );
# define TracyFree( ptr ) tracy::Profiler::MemFree( ptr );
#endif
@@ -114,10 +150,15 @@
# define ZoneScopedS( depth ) ZoneNamedS( ___tracy_scoped_zone, depth, true )
# define ZoneScopedNS( name, depth ) ZoneNamedNS( ___tracy_scoped_zone, name, depth, true )
# define ZoneScopedCS( color, depth ) ZoneNamedCS( ___tracy_scoped_zone, color, depth, true )
# define ZoneScopedNCS( name, color, depth ) ZoneNamedNCS( ___tracy_scoped_zone, name, color depth, true )
# define ZoneScopedNCS( name, color, depth ) ZoneNamedNCS( ___tracy_scoped_zone, name, color, depth, true )
# define TracyAllocS( ptr, size, depth ) tracy::Profiler::MemAllocCallstack( ptr, size, depth );
# define TracyFreeS( ptr, depth ) tracy::Profiler::MemFreeCallstack( ptr, depth );
# define TracyMessageS( txt, size, depth ) tracy::Profiler::Message( txt, size, depth );
# define TracyMessageLS( txt, depth ) tracy::Profiler::Message( txt, depth );
# define TracyMessageCS( txt, size, color, depth ) tracy::Profiler::MessageColor( txt, size, color, depth );
# define TracyMessageLCS( txt, color, depth ) tracy::Profiler::MessageColor( txt, color, depth );
#else
# define ZoneNamedS( varname, depth, active ) ZoneNamed( varname, active )
# define ZoneNamedNS( varname, name, depth, active ) ZoneNamedN( varname, name, active )
@@ -131,8 +172,16 @@
# define TracyAllocS( ptr, size, depth ) TracyAlloc( ptr, size )
# define TracyFreeS( ptr, depth ) TracyFree( ptr )
# define TracyMessageS( txt, size, depth ) TracyMessage( txt, size )
# define TracyMessageLS( txt, depth ) TracyMessageL( txt )
# define TracyMessageCS( txt, size, color, depth ) TracyMessageC( txt, size, color )
# define TracyMessageLCS( txt, color, depth ) TracyMessageLC( txt, color )
#endif
#define TracyParameterRegister( cb ) tracy::Profiler::ParameterRegister( cb );
#define TracyParameterSetup( idx, name, isBool, val ) tracy::Profiler::ParameterSetup( idx, name, isBool, val );
#endif
#endif

197
TracyC.h Normal file
View File

@@ -0,0 +1,197 @@
#ifndef __TRACYC_HPP__
#define __TRACYC_HPP__
#include <stddef.h>
#include <stdint.h>
#include "client/TracyCallstack.h"
#include "common/TracyApi.h"
#ifdef __cplusplus
extern "C" {
#endif
#ifndef TRACY_ENABLE
typedef const void* TracyCZoneCtx;
#define TracyCZone(c,x)
#define TracyCZoneN(c,x,y)
#define TracyCZoneC(c,x,y)
#define TracyCZoneNC(c,x,y,z)
#define TracyCZoneEnd(c)
#define TracyCZoneText(c,x,y)
#define TracyCZoneName(c,x,y)
#define TracyCZoneValue(c,x)
#define TracyCAlloc(x,y)
#define TracyCFree(x)
#define TracyCFrameMark
#define TracyCFrameMarkNamed(x)
#define TracyCFrameMarkStart(x)
#define TracyCFrameMarkEnd(x)
#define TracyCFrameImage(x,y,z,w,a)
#define TracyCPlot(x,y)
#define TracyCMessage(x,y)
#define TracyCMessageL(x)
#define TracyCMessageC(x,y,z)
#define TracyCMessageLC(x,y)
#define TracyCAppInfo(x,y)
#define TracyCZoneS(x,y,z)
#define TracyCZoneNS(x,y,z,w)
#define TracyCZoneCS(x,y,z,w)
#define TracyCZoneNCS(x,y,z,w,a)
#define TracyCAllocS(x,y,z)
#define TracyCFreeS(x,y)
#define TracyCMessageS(x,y,z)
#define TracyCMessageLS(x,y)
#define TracyCMessageCS(x,y,z,w)
#define TracyCMessageLCS(x,y,z)
#else
#ifndef TracyConcat
# define TracyConcat(x,y) TracyConcatIndirect(x,y)
#endif
#ifndef TracyConcatIndirect
# define TracyConcatIndirect(x,y) x##y
#endif
struct ___tracy_source_location_data
{
const char* name;
const char* function;
const char* file;
uint32_t line;
uint32_t color;
};
struct ___tracy_c_zone_context
{
uint32_t id;
int active;
};
// Some containers don't support storing const types.
// This struct, as visible to user, is immutable, so treat it as if const was declared here.
typedef /*const*/ struct ___tracy_c_zone_context TracyCZoneCtx;
TRACY_API uint64_t ___tracy_alloc_srcloc( uint32_t line, const char* source, const char* function );
TRACY_API uint64_t ___tracy_alloc_srcloc_name( uint32_t line, const char* source, const char* function, const char* name, size_t nameSz );
TRACY_API TracyCZoneCtx ___tracy_emit_zone_begin( const struct ___tracy_source_location_data* srcloc, int active );
TRACY_API TracyCZoneCtx ___tracy_emit_zone_begin_callstack( const struct ___tracy_source_location_data* srcloc, int depth, int active );
TRACY_API TracyCZoneCtx ___tracy_emit_zone_begin_alloc( uint64_t srcloc, int active );
TRACY_API TracyCZoneCtx ___tracy_emit_zone_begin_alloc_callstack( uint64_t srcloc, int depth, int active );
TRACY_API void ___tracy_emit_zone_end( TracyCZoneCtx ctx );
TRACY_API void ___tracy_emit_zone_text( TracyCZoneCtx ctx, const char* txt, size_t size );
TRACY_API void ___tracy_emit_zone_name( TracyCZoneCtx ctx, const char* txt, size_t size );
TRACY_API void ___tracy_emit_zone_value( TracyCZoneCtx ctx, uint64_t value );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyCZone( ctx, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCZoneN( ctx, name, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCZoneC( ctx, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCZoneNC( ctx, name, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), TRACY_CALLSTACK, active );
#else
# define TracyCZone( ctx, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
# define TracyCZoneN( ctx, name, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
# define TracyCZoneC( ctx, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
# define TracyCZoneNC( ctx, name, color, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin( &TracyConcat(__tracy_source_location,__LINE__), active );
#endif
#define TracyCZoneEnd( ctx ) ___tracy_emit_zone_end( ctx );
#define TracyCZoneText( ctx, txt, size ) ___tracy_emit_zone_text( ctx, txt, size );
#define TracyCZoneName( ctx, txt, size ) ___tracy_emit_zone_name( ctx, txt, size );
#define TracyCZoneValue( ctx, value ) ___tracy_emit_zone_value( ctx, value );
TRACY_API void ___tracy_emit_memory_alloc( const void* ptr, size_t size );
TRACY_API void ___tracy_emit_memory_alloc_callstack( const void* ptr, size_t size, int depth );
TRACY_API void ___tracy_emit_memory_free( const void* ptr );
TRACY_API void ___tracy_emit_memory_free_callstack( const void* ptr, int depth );
TRACY_API void ___tracy_emit_message( const char* txt, size_t size, int callstack );
TRACY_API void ___tracy_emit_messageL( const char* txt, int callstack );
TRACY_API void ___tracy_emit_messageC( const char* txt, size_t size, uint32_t color, int callstack );
TRACY_API void ___tracy_emit_messageLC( const char* txt, uint32_t color, int callstack );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyCAlloc( ptr, size ) ___tracy_emit_memory_alloc_callstack( ptr, size, TRACY_CALLSTACK )
# define TracyCFree( ptr ) ___tracy_emit_memory_free_callstack( ptr, TRACY_CALLSTACK )
# define TracyCMessage( txt, size ) ___tracy_emit_message( txt, size, TRACY_CALLSTACK );
# define TracyCMessageL( txt ) ___tracy_emit_messageL( txt, TRACY_CALLSTACK );
# define TracyCMessageC( txt, size, color ) ___tracy_emit_messageC( txt, size, color, TRACY_CALLSTACK );
# define TracyCMessageLC( txt, color ) ___tracy_emit_messageLC( txt, color, TRACY_CALLSTACK );
#else
# define TracyCAlloc( ptr, size ) ___tracy_emit_memory_alloc( ptr, size );
# define TracyCFree( ptr ) ___tracy_emit_memory_free( ptr );
# define TracyCMessage( txt, size ) ___tracy_emit_message( txt, size, 0 );
# define TracyCMessageL( txt ) ___tracy_emit_messageL( txt, 0 );
# define TracyCMessageC( txt, size, color ) ___tracy_emit_messageC( txt, size, color, 0 );
# define TracyCMessageLC( txt, color ) ___tracy_emit_messageLC( txt, color, 0 );
#endif
TRACY_API void ___tracy_emit_frame_mark( const char* name );
TRACY_API void ___tracy_emit_frame_mark_start( const char* name );
TRACY_API void ___tracy_emit_frame_mark_end( const char* name );
TRACY_API void ___tracy_emit_frame_image( const void* image, uint16_t w, uint16_t h, uint8_t offset, int flip );
#define TracyCFrameMark ___tracy_emit_frame_mark( 0 );
#define TracyCFrameMarkNamed( name ) ___tracy_emit_frame_mark( name );
#define TracyCFrameMarkStart( name ) ___tracy_emit_frame_mark_start( name );
#define TracyCFrameMarkEnd( name ) ___tracy_emit_frame_mark_end( name );
#define TracyCFrameImage( image, width, height, offset, flip ) ___tracy_emit_frame_image( image, width, height, offset, flip );
TRACY_API void ___tracy_emit_plot( const char* name, double val );
TRACY_API void ___tracy_emit_message_appinfo( const char* txt, size_t size );
#define TracyCPlot( name, val ) ___tracy_emit_plot( name, val );
#define TracyCAppInfo( txt, color ) ___tracy_emit_message_appinfo( txt, color );
#ifdef TRACY_HAS_CALLSTACK
# define TracyCZoneS( ctx, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCZoneNS( ctx, name, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCZoneCS( ctx, color, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { NULL, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCZoneNCS( ctx, name, color, depth, active ) static const struct ___tracy_source_location_data TracyConcat(__tracy_source_location,__LINE__) = { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; TracyCZoneCtx ctx = ___tracy_emit_zone_begin_callstack( &TracyConcat(__tracy_source_location,__LINE__), depth, active );
# define TracyCAllocS( ptr, size, depth ) ___tracy_emit_memory_alloc_callstack( ptr, size, depth )
# define TracyCFreeS( ptr, depth ) ___tracy_emit_memory_free_callstack( ptr, depth )
# define TracyCMessageS( txt, size, depth ) ___tracy_emit_message( txt, size, depth );
# define TracyCMessageLS( txt, depth ) ___tracy_emit_messageL( txt, depth );
# define TracyCMessageCS( txt, size, color, depth ) ___tracy_emit_messageC( txt, size, color, depth );
# define TracyCMessageLCS( txt, color, depth ) ___tracy_emit_messageLC( txt, color, depth );
#else
# define TracyCZoneS( ctx, depth, active ) TracyCZone( ctx, active )
# define TracyCZoneNS( ctx, name, depth, active ) TracyCZoneN( ctx, name, active )
# define TracyCZoneCS( ctx, color, depth, active ) TracyCZoneC( ctx, color, active )
# define TracyCZoneNCS( ctx, name, color, depth, active ) TracyCZoneNC( ctx, name, color, active )
# define TracyCAllocS( ptr, size, depth ) TracyCAlloc( ptr, size )
# define TracyCFreeS( ptr, depth ) TracyCFree( ptr )
# define TracyCMessageS( txt, size, depth ) TracyCMessage( txt, size )
# define TracyCMessageLS( txt, depth ) TracyCMessageL( txt )
# define TracyCMessageCS( txt, size, color, depth ) TracyCMessageC( txt, size, color )
# define TracyCMessageLCS( txt, color, depth ) TracyCMessageLC( txt, color )
#endif
#endif
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -18,8 +18,26 @@
#include "common/tracy_lz4.cpp"
#include "client/TracyProfiler.cpp"
#include "client/TracyCallstack.cpp"
#include "client/TracySysTime.cpp"
#include "client/TracySysTrace.cpp"
#include "common/TracySocket.cpp"
#include "client/tracy_rpmalloc.cpp"
#include "client/TracyDxt1.cpp"
#if TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 3 || TRACY_HAS_CALLSTACK == 4 || TRACY_HAS_CALLSTACK == 6
# include "libbacktrace/alloc.cpp"
# include "libbacktrace/dwarf.cpp"
# include "libbacktrace/fileline.cpp"
# include "libbacktrace/mmapio.cpp"
# include "libbacktrace/posix.cpp"
# include "libbacktrace/sort.cpp"
# include "libbacktrace/state.cpp"
# if TRACY_HAS_CALLSTACK == 4
# include "libbacktrace/macho.cpp"
# else
# include "libbacktrace/elf.cpp"
# endif
#endif
#ifdef _MSC_VER
# pragma comment(lib, "ws2_32.lib")

View File

@@ -1,73 +0,0 @@
//
// Tracy profiler
// ----------------
//
// On multi-DLL projects compile and
// link with this source file (and none
// other) in the executable and in
// DLLs / shared objects that link to
// the main DLL.
//
// Define TRACY_ENABLE to enable profiler.
#include "common/TracySystem.cpp"
#ifdef TRACY_ENABLE
#include "client/TracyProfiler.hpp"
#include "client/concurrentqueue.h"
#include "common/TracyQueue.hpp"
namespace tracy
{
#ifdef _MSC_VER
# define DLL_IMPORT __declspec(dllimport)
#else
# define DLL_IMPORT
#endif
DLL_IMPORT moodycamel::ConcurrentQueue<QueueItem>::ExplicitProducer* get_token();
DLL_IMPORT void*(*get_rpmalloc())(size_t size);
DLL_IMPORT void(*get_rpfree())(void* ptr);
DLL_IMPORT Profiler& get_profiler();
#if defined TRACY_HW_TIMER && __ARM_ARCH >= 6
DLL_IMPORT int64_t(*get_GetTimeImpl())();
int64_t(*GetTimeImpl)() = get_GetTimeImpl();
#endif
#ifdef TRACY_COLLECT_THREAD_NAMES
DLL_IMPORT std::atomic<ThreadNameData*>& get_threadNameData();
DLL_IMPORT void(*get_rpmalloc_thread_initialize())();
std::atomic<ThreadNameData*>& s_threadNameData = get_threadNameData();
void(*rpmalloc_thread_initialize_fpt)() = get_rpmalloc_thread_initialize();
void rpmalloc_thread_initialize(void)
{
rpmalloc_thread_initialize_fpt();
}
#endif
static void*(*rpmalloc_fpt)(size_t size) = get_rpmalloc();
static void(*rpfree_fpt)(void* ptr) = get_rpfree();
RPMALLOC_RESTRICT void* rpmalloc(size_t size)
{
return rpmalloc_fpt(size);
}
void rpfree(void* ptr)
{
rpfree_fpt(ptr);
}
Profiler& s_profiler = get_profiler();
thread_local ProducerWrapper s_token { get_token() };
}
#endif

351
TracyD3D12.hpp Normal file
View File

@@ -0,0 +1,351 @@
#ifndef __TRACYD3D12_HPP__
#define __TRACYD3D12_HPP__
#ifndef TRACY_ENABLE
#define TracyD3D12Context(device, queue) nullptr
#define TracyD3D12Destroy(ctx)
#define TracyD3D12NamedZone(ctx, varname, cmdList, name, active)
#define TracyD3D12NamedZoneC(ctx, varname, cmdList, name, color, active)
#define TracyD3D12Zone(ctx, cmdList, name)
#define TracyD3D12ZoneC(ctx, cmdList, name, color)
#define TracyD3D12Collect(ctx)
namespace tracy
{
class D3D12ZoneScope {};
}
using TracyD3D12Ctx = void*;
#else
#include "Tracy.hpp"
#include "client/TracyProfiler.hpp"
#include <cstdlib>
#include <cassert>
#include <d3d12.h>
#include <dxgi.h>
#include <wrl/client.h>
#include <queue>
namespace tracy
{
struct D3D12QueryPayload
{
uint32_t m_queryIdStart = 0;
uint32_t m_queryCount = 0;
};
// Command queue context.
class D3D12QueueCtx
{
friend class D3D12ZoneScope;
static constexpr uint32_t MaxQueries = 64 * 1024; // Queries are begin and end markers, so we can store half as many total time durations. Must be even!
bool m_initialized = false;
ID3D12Device* m_device;
ID3D12CommandQueue* m_queue;
uint8_t m_context;
Microsoft::WRL::ComPtr<ID3D12QueryHeap> m_queryHeap;
Microsoft::WRL::ComPtr<ID3D12Resource> m_readbackBuffer;
// In-progress payload.
uint32_t m_queryLimit = MaxQueries;
uint32_t m_queryCounter = 0;
uint32_t m_previousQueryCounter = 0;
uint32_t m_activePayload = 0;
Microsoft::WRL::ComPtr<ID3D12Fence> m_payloadFence;
std::queue<D3D12QueryPayload> m_payloadQueue;
public:
D3D12QueueCtx(ID3D12Device* device, ID3D12CommandQueue* queue)
: m_device(device)
, m_queue(queue)
, m_context(GetGpuCtxCounter().fetch_add(1, std::memory_order_relaxed))
{
// Verify we support timestamp queries on this queue.
if (queue->GetDesc().Type == D3D12_COMMAND_LIST_TYPE_COPY)
{
D3D12_FEATURE_DATA_D3D12_OPTIONS3 featureData{};
if (FAILED(device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS3, &featureData, sizeof(featureData))))
{
assert(false && "Platform does not support profiling of copy queues.");
}
}
uint64_t timestampFrequency;
if (FAILED(queue->GetTimestampFrequency(&timestampFrequency)))
{
assert(false && "Failed to get timestamp frequency.");
}
uint64_t cpuTimestamp;
uint64_t gpuTimestamp;
if (FAILED(queue->GetClockCalibration(&gpuTimestamp, &cpuTimestamp)))
{
assert(false && "Failed to get queue clock calibration.");
}
cpuTimestamp = Profiler::GetTime();
D3D12_QUERY_HEAP_DESC heapDesc{};
heapDesc.Type = queue->GetDesc().Type == D3D12_COMMAND_LIST_TYPE_COPY ? D3D12_QUERY_HEAP_TYPE_COPY_QUEUE_TIMESTAMP : D3D12_QUERY_HEAP_TYPE_TIMESTAMP;
heapDesc.Count = m_queryLimit;
heapDesc.NodeMask = 0; // #TODO: Support multiple adapters.
while (FAILED(device->CreateQueryHeap(&heapDesc, IID_PPV_ARGS(&m_queryHeap))))
{
m_queryLimit /= 2;
heapDesc.Count = m_queryLimit;
}
// Create a readback buffer, which will be used as a destination for the query data.
D3D12_RESOURCE_DESC readbackBufferDesc{};
readbackBufferDesc.Alignment = 0;
readbackBufferDesc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
readbackBufferDesc.Width = m_queryLimit * sizeof(uint64_t);
readbackBufferDesc.Height = 1;
readbackBufferDesc.DepthOrArraySize = 1;
readbackBufferDesc.Format = DXGI_FORMAT_UNKNOWN;
readbackBufferDesc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // Buffers are always row major.
readbackBufferDesc.MipLevels = 1;
readbackBufferDesc.SampleDesc.Count = 1;
readbackBufferDesc.SampleDesc.Quality = 0;
readbackBufferDesc.Flags = D3D12_RESOURCE_FLAG_NONE;
D3D12_HEAP_PROPERTIES readbackHeapProps{};
readbackHeapProps.Type = D3D12_HEAP_TYPE_READBACK;
readbackHeapProps.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
readbackHeapProps.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
readbackHeapProps.CreationNodeMask = 0;
readbackHeapProps.VisibleNodeMask = 0; // #TODO: Support multiple adapters.
if (FAILED(device->CreateCommittedResource(&readbackHeapProps, D3D12_HEAP_FLAG_NONE, &readbackBufferDesc, D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&m_readbackBuffer))))
{
assert(false && "Failed to create query readback buffer.");
}
if (FAILED(device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&m_payloadFence))))
{
assert(false && "Failed to create payload fence.");
}
auto* item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuNewContext);
MemWrite(&item->gpuNewContext.cpuTime, cpuTimestamp);
MemWrite(&item->gpuNewContext.gpuTime, gpuTimestamp);
memset(&item->gpuNewContext.thread, 0, sizeof(item->gpuNewContext.thread));
MemWrite(&item->gpuNewContext.period, 1E+09f / static_cast<float>(timestampFrequency));
MemWrite(&item->gpuNewContext.context, m_context);
MemWrite(&item->gpuNewContext.accuracyBits, uint8_t{ 0 });
MemWrite(&item->gpuNewContext.type, GpuContextType::Direct3D12);
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem(*item);
#endif
Profiler::QueueSerialFinish();
m_initialized = true;
}
void NewFrame()
{
m_payloadQueue.emplace(D3D12QueryPayload{ m_previousQueryCounter, m_queryCounter });
m_previousQueryCounter += m_queryCounter;
m_queryCounter = 0;
if (m_previousQueryCounter >= m_queryLimit)
{
m_previousQueryCounter -= m_queryLimit;
}
m_queue->Signal(m_payloadFence.Get(), ++m_activePayload);
}
void Collect()
{
ZoneScopedC(Color::Red4);
#ifdef TRACY_ON_DEMAND
if (!GetProfiler().IsConnected())
{
m_queryCounter = 0;
return;
}
#endif
// Find out what payloads are available.
const auto newestReadyPayload = m_payloadFence->GetCompletedValue();
const auto payloadCount = m_payloadQueue.size() - (m_activePayload - newestReadyPayload);
if (!payloadCount)
{
return; // No payloads are available yet, exit out.
}
D3D12_RANGE mapRange{ 0, m_queryLimit * sizeof(uint64_t) };
// Map the readback buffer so we can fetch the query data from the GPU.
void* readbackBufferMapping = nullptr;
if (FAILED(m_readbackBuffer->Map(0, &mapRange, &readbackBufferMapping)))
{
assert(false && "Failed to map readback buffer.");
}
auto* timestampData = static_cast<uint64_t*>(readbackBufferMapping);
for (uint32_t i = 0; i < payloadCount; ++i)
{
const auto& payload = m_payloadQueue.front();
for (uint32_t j = 0; j < payload.m_queryCount; ++j)
{
const auto counter = (payload.m_queryIdStart + j) % m_queryLimit;
const auto timestamp = timestampData[counter];
const auto queryId = counter;
auto* item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuTime);
MemWrite(&item->gpuTime.gpuTime, timestamp);
MemWrite(&item->gpuTime.queryId, static_cast<uint16_t>(queryId));
MemWrite(&item->gpuTime.context, m_context);
Profiler::QueueSerialFinish();
}
m_payloadQueue.pop();
}
m_readbackBuffer->Unmap(0, nullptr);
}
private:
tracy_force_inline uint32_t NextQueryId()
{
assert(m_queryCounter < m_queryLimit && "Submitted too many GPU queries! Consider increasing MaxQueries.");
const uint32_t id = (m_previousQueryCounter + m_queryCounter) % m_queryLimit;
m_queryCounter += 2; // Allocate space for a begin and end query.
return id;
}
tracy_force_inline uint8_t GetId() const
{
return m_context;
}
};
class D3D12ZoneScope
{
const bool m_active;
D3D12QueueCtx* m_ctx = nullptr;
ID3D12GraphicsCommandList* m_cmdList = nullptr;
uint32_t m_queryId = 0; // Used for tracking in nested zones.
public:
tracy_force_inline D3D12ZoneScope(D3D12QueueCtx* ctx, ID3D12GraphicsCommandList* cmdList, const SourceLocationData* srcLocation, bool active)
#ifdef TRACY_ON_DEMAND
: m_active(active && GetProfiler().IsConnected())
#else
: m_active(active)
#endif
{
if (!m_active) return;
m_ctx = ctx;
m_cmdList = cmdList;
m_queryId = ctx->NextQueryId();
cmdList->EndQuery(ctx->m_queryHeap.Get(), D3D12_QUERY_TYPE_TIMESTAMP, m_queryId);
auto* item = Profiler::QueueSerial();
#if defined(TRACY_HAS_CALLSTACK) && defined(TRACY_CALLSTACK)
MemWrite(&item->hdr.type, QueueType::GpuZoneBeginCallstackSerial);
#else
MemWrite(&item->hdr.type, QueueType::GpuZoneBeginSerial);
#endif
MemWrite(&item->gpuZoneBegin.cpuTime, Profiler::GetTime());
MemWrite(&item->gpuZoneBegin.srcloc, reinterpret_cast<uint64_t>(srcLocation));
MemWrite(&item->gpuZoneBegin.thread, GetThreadHandle());
MemWrite(&item->gpuZoneBegin.queryId, static_cast<uint16_t>(m_queryId));
MemWrite(&item->gpuZoneBegin.context, ctx->GetId());
Profiler::QueueSerialFinish();
#if defined(TRACY_HAS_CALLSTACK) && defined(TRACY_CALLSTACK)
GetProfiler().SendCallstack(TRACY_CALLSTACK);
#endif
}
tracy_force_inline ~D3D12ZoneScope()
{
if (!m_active) return;
const auto queryId = m_queryId + 1; // Our end query slot is immediately after the begin slot.
m_cmdList->EndQuery(m_ctx->m_queryHeap.Get(), D3D12_QUERY_TYPE_TIMESTAMP, queryId);
auto* item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuZoneEndSerial);
MemWrite(&item->gpuZoneEnd.cpuTime, Profiler::GetTime());
MemWrite(&item->gpuZoneEnd.thread, GetThreadHandle());
MemWrite(&item->gpuZoneEnd.queryId, static_cast<uint16_t>(queryId));
MemWrite(&item->gpuZoneEnd.context, m_ctx->GetId());
Profiler::QueueSerialFinish();
m_cmdList->ResolveQueryData(m_ctx->m_queryHeap.Get(), D3D12_QUERY_TYPE_TIMESTAMP, m_queryId, 2, m_ctx->m_readbackBuffer.Get(), m_queryId * sizeof(uint64_t));
}
};
static inline D3D12QueueCtx* CreateD3D12Context(ID3D12Device* device, ID3D12CommandQueue* queue)
{
InitRPMallocThread();
auto* ctx = static_cast<D3D12QueueCtx*>(tracy_malloc(sizeof(D3D12QueueCtx)));
new (ctx) D3D12QueueCtx{ device, queue };
return ctx;
}
static inline void DestroyD3D12Context(D3D12QueueCtx* ctx)
{
ctx->~D3D12QueueCtx();
tracy_free(ctx);
}
}
using TracyD3D12Ctx = tracy::D3D12QueueCtx*;
#define TracyD3D12Context(device, queue) tracy::CreateD3D12Context(device, queue);
#define TracyD3D12Destroy(ctx) tracy::DestroyD3D12Context(ctx);
#define TracyD3D12NewFrame(ctx) ctx->NewFrame();
#define TracyD3D12NamedZone(ctx, varname, cmdList, name, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location, __LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::D3D12ZoneScope varname{ ctx, cmdList, &TracyConcat(__tracy_gpu_source_location, __LINE__), active };
#define TracyD3D12NamedZoneC(ctx, varname, cmdList, name, color, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location, __LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::D3D12ZoneScope varname{ ctx, cmdList, &TracyConcat(__tracy_gpu_source_location, __LINE__), active };
#define TracyD3D12Zone(ctx, cmdList, name) TracyD3D12NamedZone(ctx, ___tracy_gpu_zone, cmdList, name, true)
#define TracyD3D12ZoneC(ctx, cmdList, name, color) TracyD3D12NamedZoneC(ctx, ___tracy_gpu_zone, cmdList, name, color, true)
#define TracyD3D12Collect(ctx) ctx->Collect();
#endif
#endif

View File

@@ -23,6 +23,10 @@ static inline void LuaRegister( lua_State* L )
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneBeginN" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneBeginS" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneBeginNS" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneEnd" );
lua_pushcfunction( L, detail::noop );
lua_setfield( L, -2, "ZoneText" );
@@ -81,6 +85,18 @@ static inline void LuaRemove( char* script )
memset( script, ' ', end - script );
script = end;
}
else if( strncmp( script + 10, "BeginS(", 7 ) == 0 )
{
auto end = FindEnd( script + 17 );
memset( script, ' ', end - script );
script = end;
}
else if( strncmp( script + 10, "BeginNS(", 8 ) == 0 )
{
auto end = FindEnd( script + 18 );
memset( script, ' ', end - script );
script = end;
}
else
{
script += 10;
@@ -112,6 +128,7 @@ static inline void LuaRemove( char* script )
#include "common/TracyColor.hpp"
#include "common/TracyAlign.hpp"
#include "common/TracyForceInline.hpp"
#include "common/TracySystem.hpp"
#include "client/TracyProfiler.hpp"
@@ -119,161 +136,190 @@ namespace tracy
{
#ifdef TRACY_ON_DEMAND
extern thread_local LuaZoneState s_luaZoneState;
TRACY_API LuaZoneState& GetLuaZoneState();
#endif
namespace detail
{
static inline int LuaZoneBegin( lua_State* L )
#ifdef TRACY_HAS_CALLSTACK
static tracy_force_inline void SendLuaCallstack( lua_State* L, uint32_t depth )
{
assert( depth <= 64 );
lua_Debug dbg[64];
const char* func[64];
uint32_t fsz[64];
uint32_t ssz[64];
uint32_t spaceNeeded = 4; // cnt
uint32_t cnt;
for( cnt=0; cnt<depth; cnt++ )
{
if( lua_getstack( L, cnt+1, dbg+cnt ) == 0 ) break;
lua_getinfo( L, "Snl", dbg+cnt );
func[cnt] = dbg[cnt].name ? dbg[cnt].name : dbg[cnt].short_src;
fsz[cnt] = uint32_t( strlen( func[cnt] ) );
ssz[cnt] = uint32_t( strlen( dbg[cnt].source ) );
spaceNeeded += fsz[cnt] + ssz[cnt];
}
spaceNeeded += cnt * ( 4 + 4 + 4 ); // source line, function string length, source string length
auto ptr = (char*)tracy_malloc( spaceNeeded + 4 );
auto dst = ptr;
memcpy( dst, &spaceNeeded, 4 ); dst += 4;
memcpy( dst, &cnt, 4 ); dst += 4;
for( uint32_t i=0; i<cnt; i++ )
{
const uint32_t line = dbg[i].currentline;
memcpy( dst, &line, 4 ); dst += 4;
memcpy( dst, fsz+i, 4 ); dst += 4;
memcpy( dst, func[i], fsz[i] ); dst += fsz[i];
memcpy( dst, ssz+i, 4 ); dst += 4;
memcpy( dst, dbg[i].source, ssz[i] ), dst += ssz[i];
}
assert( dst - ptr == spaceNeeded + 4 );
TracyLfqPrepare( QueueType::CallstackAlloc );
MemWrite( &item->callstackAlloc.ptr, (uint64_t)ptr );
MemWrite( &item->callstackAlloc.nativePtr, (uint64_t)Callstack( depth ) );
TracyLfqCommit;
}
static inline int LuaZoneBeginS( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = s_luaZoneState.counter++;
if( zoneCnt != 0 && !s_luaZoneState.active ) return 0;
s_luaZoneState.active = s_profiler.IsConnected();
if( !s_luaZoneState.active ) return 0;
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
const uint32_t color = Color::DeepSkyBlue3;
TracyLfqPrepare( QueueType::ZoneBeginAllocSrcLocCallstack );
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
lua_getinfo( L, "Snl", &dbg );
const auto srcloc = Profiler::AllocSourceLocation( dbg.currentline, dbg.source, dbg.name ? dbg.name : dbg.short_src );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, srcloc );
TracyLfqCommit;
const uint32_t line = dbg.currentline;
const auto func = dbg.name ? dbg.name : dbg.short_src;
const auto fsz = strlen( func );
const auto ssz = strlen( dbg.source );
// Data layout:
// 4b payload size
// 4b color
// 4b source line
// fsz function name
// 1b null terminator
// ssz source file name
// 1b null terminator
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memcpy( ptr + 4, &color, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, func, fsz+1 );
memcpy( ptr + 12 + fsz + 1, dbg.source, ssz + 1 );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginAllocSrcLoc );
#ifdef TRACY_RDTSCP_OPT
MemWrite( &item->zoneBegin.time, Profiler::GetTime( item->zoneBegin.cpu ) );
#ifdef TRACY_CALLSTACK
const uint32_t depth = TRACY_CALLSTACK;
#else
uint32_t cpu;
MemWrite( &item->zoneBegin.time, Profiler::GetTime( cpu ) );
MemWrite( &item->zoneBegin.cpu, cpu );
const auto depth = uint32_t( lua_tointeger( L, 1 ) );
#endif
MemWrite( &item->zoneBegin.thread, GetThreadHandle() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
SendLuaCallstack( L, depth );
return 0;
}
static inline int LuaZoneBeginNS( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
TracyLfqPrepare( QueueType::ZoneBeginAllocSrcLocCallstack );
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
lua_getinfo( L, "Snl", &dbg );
size_t nsz;
const auto name = lua_tolstring( L, 1, &nsz );
const auto srcloc = Profiler::AllocSourceLocation( dbg.currentline, dbg.source, dbg.name ? dbg.name : dbg.short_src, name, nsz );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, srcloc );
TracyLfqCommit;
#ifdef TRACY_CALLSTACK
const uint32_t depth = TRACY_CALLSTACK;
#else
const auto depth = uint32_t( lua_tointeger( L, 2 ) );
#endif
SendLuaCallstack( L, depth );
return 0;
}
#endif
static inline int LuaZoneBegin( lua_State* L )
{
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
return LuaZoneBeginS( L );
#else
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
TracyLfqPrepare( QueueType::ZoneBeginAllocSrcLoc );
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
lua_getinfo( L, "Snl", &dbg );
const auto srcloc = Profiler::AllocSourceLocation( dbg.currentline, dbg.source, dbg.name ? dbg.name : dbg.short_src );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, srcloc );
TracyLfqCommit;
return 0;
#endif
}
static inline int LuaZoneBeginN( lua_State* L )
{
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
return LuaZoneBeginNS( L );
#else
#ifdef TRACY_ON_DEMAND
const auto zoneCnt = s_luaZoneState.counter++;
if( zoneCnt != 0 && !s_luaZoneState.active ) return 0;
s_luaZoneState.active = s_profiler.IsConnected();
if( !s_luaZoneState.active ) return 0;
const auto zoneCnt = GetLuaZoneState().counter++;
if( zoneCnt != 0 && !GetLuaZoneState().active ) return 0;
GetLuaZoneState().active = GetProfiler().IsConnected();
if( !GetLuaZoneState().active ) return 0;
#endif
const uint32_t color = Color::DeepSkyBlue3;
TracyLfqPrepare( QueueType::ZoneBeginAllocSrcLoc );
lua_Debug dbg;
lua_getstack( L, 1, &dbg );
lua_getinfo( L, "Snl", &dbg );
const uint32_t line = dbg.currentline;
const auto func = dbg.name ? dbg.name : dbg.short_src;
size_t nsz;
const auto name = lua_tolstring( L, 1, &nsz );
const auto fsz = strlen( func );
const auto ssz = strlen( dbg.source );
// Data layout:
// 4b payload size
// 4b color
// 4b source line
// fsz function name
// 1b null terminator
// ssz source file name
// 1b null terminator
// nsz zone name
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 + nsz );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memcpy( ptr + 4, &color, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, func, fsz+1 );
memcpy( ptr + 12 + fsz + 1, dbg.source, ssz + 1 );
memcpy( ptr + 12 + fsz + 1 + ssz + 1, name, nsz );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginAllocSrcLoc );
#ifdef TRACY_RDTSCP_OPT
MemWrite( &item->zoneBegin.time, Profiler::GetTime( item->zoneBegin.cpu ) );
#else
uint32_t cpu;
MemWrite( &item->zoneBegin.time, Profiler::GetTime( cpu ) );
MemWrite( &item->zoneBegin.cpu, cpu );
#endif
MemWrite( &item->zoneBegin.thread, GetThreadHandle() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
const auto srcloc = Profiler::AllocSourceLocation( dbg.currentline, dbg.source, dbg.name ? dbg.name : dbg.short_src, name, nsz );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, srcloc );
TracyLfqCommit;
return 0;
#endif
}
static inline int LuaZoneEnd( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
assert( s_luaZoneState.counter != 0 );
s_luaZoneState.counter--;
if( !s_luaZoneState.active ) return 0;
if( !s_profiler.IsConnected() )
assert( GetLuaZoneState().counter != 0 );
GetLuaZoneState().counter--;
if( !GetLuaZoneState().active ) return 0;
if( !GetProfiler().IsConnected() )
{
s_luaZoneState.active = false;
GetLuaZoneState().active = false;
return 0;
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneEnd );
#ifdef TRACY_RDTSCP_OPT
MemWrite( &item->zoneEnd.time, Profiler::GetTime( item->zoneEnd.cpu ) );
#else
uint32_t cpu;
MemWrite( &item->zoneEnd.time, Profiler::GetTime( cpu ) );
MemWrite( &item->zoneEnd.cpu, cpu );
#endif
MemWrite( &item->zoneEnd.thread, GetThreadHandle() );
tail.store( magic + 1, std::memory_order_release );
TracyLfqPrepare( QueueType::ZoneEnd );
MemWrite( &item->zoneEnd.time, Profiler::GetTime() );
TracyLfqCommit;
return 0;
}
static inline int LuaZoneText( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
if( !s_luaZoneState.active ) return 0;
if( !s_profiler.IsConnected() )
if( !GetLuaZoneState().active ) return 0;
if( !GetProfiler().IsConnected() )
{
s_luaZoneState.active = false;
GetLuaZoneState().active = false;
return 0;
}
#endif
@@ -281,27 +327,22 @@ static inline int LuaZoneText( lua_State* L )
auto txt = lua_tostring( L, 1 );
const auto size = strlen( txt );
Magic magic;
auto& token = s_token.ptr;
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneText );
MemWrite( &item->zoneText.thread, GetThreadHandle() );
TracyLfqPrepare( QueueType::ZoneText );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
return 0;
}
static inline int LuaZoneName( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
if( !s_luaZoneState.active ) return 0;
if( !s_profiler.IsConnected() )
if( !GetLuaZoneState().active ) return 0;
if( !GetProfiler().IsConnected() )
{
s_luaZoneState.active = false;
GetLuaZoneState().active = false;
return 0;
}
#endif
@@ -309,41 +350,31 @@ static inline int LuaZoneName( lua_State* L )
auto txt = lua_tostring( L, 1 );
const auto size = strlen( txt );
Magic magic;
auto& token = s_token.ptr;
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneName );
MemWrite( &item->zoneText.thread, GetThreadHandle() );
TracyLfqPrepare( QueueType::ZoneName );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
return 0;
}
static inline int LuaMessage( lua_State* L )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return 0;
if( !GetProfiler().IsConnected() ) return 0;
#endif
auto txt = lua_tostring( L, 1 );
const auto size = strlen( txt );
Magic magic;
auto& token = s_token.ptr;
TracyLfqPrepare( QueueType::Message );
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::Message );
MemWrite( &item->message.time, Profiler::GetTime() );
MemWrite( &item->message.thread, GetThreadHandle() );
MemWrite( &item->message.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
return 0;
}
@@ -356,6 +387,17 @@ static inline void LuaRegister( lua_State* L )
lua_setfield( L, -2, "ZoneBegin" );
lua_pushcfunction( L, detail::LuaZoneBeginN );
lua_setfield( L, -2, "ZoneBeginN" );
#ifdef TRACY_HAS_CALLSTACK
lua_pushcfunction( L, detail::LuaZoneBeginS );
lua_setfield( L, -2, "ZoneBeginS" );
lua_pushcfunction( L, detail::LuaZoneBeginNS );
lua_setfield( L, -2, "ZoneBeginNS" );
#else
lua_pushcfunction( L, detail::LuaZoneBegin );
lua_setfield( L, -2, "ZoneBeginS" );
lua_pushcfunction( L, detail::LuaZoneBeginN );
lua_setfield( L, -2, "ZoneBeginNS" );
#endif
lua_pushcfunction( L, detail::LuaZoneEnd );
lua_setfield( L, -2, "ZoneEnd" );
lua_pushcfunction( L, detail::LuaZoneText );

333
TracyOpenCL.hpp Normal file
View File

@@ -0,0 +1,333 @@
#ifndef __TRACYOPENCL_HPP__
#define __TRACYOPENCL_HPP__
#if !defined TRACY_ENABLE
#define TracyCLContext(x, y) nullptr
#define TracyCLDestroy(x)
#define TracyCLNamedZone(c, x, y, z, w)
#define TracyCLNamedZoneC(c, x, y, z, w, a)
#define TracyCLZone(c, x, y)
#define TracyCLZoneC(c, x, y, z)
#define TracyCLCollect(c)
#define TracyCLNamedZoneS(c, x, y, z, w, a)
#define TracyCLNamedZoneCS(c, x, y, z, w, v, a)
#define TracyCLZoneS(c, x, y, z)
#define TracyCLZoneCS(c, x, y, z, w)
namespace tracy
{
class OpenCLCtxScope {};
}
using TracyCLCtx = void*;
#else
#include <CL/cl.h>
#include <atomic>
#include <cassert>
#include "Tracy.hpp"
#include "client/TracyCallstack.hpp"
#include "client/TracyProfiler.hpp"
#include "common/TracyAlloc.hpp"
namespace tracy {
enum class EventPhase : uint8_t
{
Begin,
End
};
struct EventInfo
{
cl_event event;
EventPhase phase;
};
class OpenCLCtx
{
public:
enum { QueryCount = 64 * 1024 };
OpenCLCtx(cl_context context, cl_device_id device)
: m_contextId(GetGpuCtxCounter().fetch_add(1, std::memory_order_relaxed))
, m_head(0)
, m_tail(0)
{
assert(m_contextId != 255);
m_hostStartTime = Profiler::GetTime();
m_deviceStartTime = GetDeviceTimestamp(context, device);
auto item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuNewContext);
MemWrite(&item->gpuNewContext.cpuTime, m_hostStartTime);
MemWrite(&item->gpuNewContext.gpuTime, m_hostStartTime);
memset(&item->gpuNewContext.thread, 0, sizeof(item->gpuNewContext.thread));
MemWrite(&item->gpuNewContext.period, 1.0f);
MemWrite(&item->gpuNewContext.type, GpuContextType::OpenCL);
MemWrite(&item->gpuNewContext.context, (uint8_t) m_contextId);
MemWrite(&item->gpuNewContext.accuracyBits, (uint8_t)0);
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem(*item);
#endif
Profiler::QueueSerialFinish();
}
void Collect()
{
ZoneScopedC(Color::Red4);
if (m_tail == m_head) return;
#ifdef TRACY_ON_DEMAND
if (!GetProfiler().IsConnected())
{
m_head = m_tail = 0;
}
#endif
while (m_tail != m_head)
{
EventInfo eventInfo = m_query[m_tail];
cl_event event = eventInfo.event;
cl_int eventStatus;
cl_int err = clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, nullptr);
assert(err == CL_SUCCESS);
if (eventStatus != CL_COMPLETE) return;
cl_int eventInfoQuery = (eventInfo.phase == EventPhase::Begin)
? CL_PROFILING_COMMAND_START
: CL_PROFILING_COMMAND_END;
cl_ulong eventTimeStamp = 0;
err = clGetEventProfilingInfo(event, eventInfoQuery, sizeof(cl_ulong), &eventTimeStamp, nullptr);
assert(err == CL_SUCCESS);
assert(eventTimeStamp != 0);
auto item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuTime);
MemWrite(&item->gpuTime.gpuTime, TimestampOffset(eventTimeStamp));
MemWrite(&item->gpuTime.queryId, (uint16_t)m_tail);
MemWrite(&item->gpuTime.context, m_contextId);
Profiler::QueueSerialFinish();
if (eventInfo.phase == EventPhase::End)
{
// Done with the event, so release it
assert(clReleaseEvent(event) == CL_SUCCESS);
}
m_tail = (m_tail + 1) % QueryCount;
}
}
tracy_force_inline uint8_t GetId() const
{
return m_contextId;
}
tracy_force_inline unsigned int NextQueryId(EventInfo eventInfo)
{
const auto id = m_head;
m_head = (m_head + 1) % QueryCount;
assert(m_head != m_tail);
m_query[id] = eventInfo;
return id;
}
tracy_force_inline EventInfo& GetQuery(unsigned int id)
{
assert(id < QueryCount);
return m_query[id];
}
private:
tracy_force_inline int64_t GetHostStartTime() const
{
return m_hostStartTime;
}
tracy_force_inline int64_t GetDeviceStartTime() const
{
return m_deviceStartTime;
}
tracy_force_inline int64_t TimestampOffset(int64_t deviceTimestamp) const
{
return m_hostStartTime + (deviceTimestamp - m_deviceStartTime);
}
tracy_force_inline int64_t GetDeviceTimestamp(cl_context context, cl_device_id device) const
{
cl_ulong deviceTimestamp = 0;
cl_int err = CL_SUCCESS;
cl_command_queue queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err);
assert(err == CL_SUCCESS);
uint32_t dummyValue = 42;
cl_mem dummyBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(uint32_t), nullptr, &err);
assert(err == CL_SUCCESS);
cl_event writeBufferEvent;
err = clEnqueueWriteBuffer(queue, dummyBuffer, CL_TRUE, 0, sizeof(uint32_t), &dummyValue, 0, nullptr, &writeBufferEvent);
assert(err == CL_SUCCESS);
err = clWaitForEvents(1, &writeBufferEvent);
assert(err == CL_SUCCESS);
cl_int eventStatus;
err = clGetEventInfo(writeBufferEvent, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &eventStatus, nullptr);
assert(err == CL_SUCCESS);
assert(eventStatus == CL_COMPLETE);
err = clGetEventProfilingInfo(writeBufferEvent, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &deviceTimestamp, nullptr);
assert(err == CL_SUCCESS);
err = clReleaseEvent(writeBufferEvent);
assert(err == CL_SUCCESS);
err = clReleaseMemObject(dummyBuffer);
assert(err == CL_SUCCESS);
err = clReleaseCommandQueue(queue);
assert(err == CL_SUCCESS);
return (int64_t)deviceTimestamp;
}
unsigned int m_contextId;
EventInfo m_query[QueryCount];
unsigned int m_head;
unsigned int m_tail;
int64_t m_hostStartTime;
int64_t m_deviceStartTime;
};
class OpenCLCtxScope {
public:
tracy_force_inline OpenCLCtxScope(OpenCLCtx* ctx, const SourceLocationData* srcLoc, bool is_active)
#ifdef TRACY_ON_DEMAND
: m_active(is_active&& GetProfiler().IsConnected())
#else
: m_active(is_active)
#endif
, m_ctx(ctx)
, m_event(nullptr)
{
if (!m_active) return;
m_beginQueryId = ctx->NextQueryId(EventInfo{ nullptr, EventPhase::Begin });
auto item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuZoneBeginSerial);
MemWrite(&item->gpuZoneBegin.cpuTime, Profiler::GetTime());
MemWrite(&item->gpuZoneBegin.srcloc, (uint64_t)srcLoc);
MemWrite(&item->gpuZoneBegin.thread, GetThreadHandle());
MemWrite(&item->gpuZoneBegin.queryId, (uint16_t)m_beginQueryId);
MemWrite(&item->gpuZoneBegin.context, ctx->GetId());
Profiler::QueueSerialFinish();
}
tracy_force_inline OpenCLCtxScope(OpenCLCtx* ctx, const SourceLocationData* srcLoc, int depth, bool is_active)
#ifdef TRACY_ON_DEMAND
: m_active(is_active&& GetProfiler().IsConnected())
#else
: m_active(is_active)
#endif
, m_ctx(ctx)
, m_event(nullptr)
{
if (!m_active) return;
m_beginQueryId = ctx->NextQueryId(EventInfo{ nullptr, EventPhase::Begin });
auto item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuZoneBeginCallstackSerial);
MemWrite(&item->gpuZoneBegin.cpuTime, Profiler::GetTime());
MemWrite(&item->gpuZoneBegin.srcloc, (uint64_t)srcLoc);
MemWrite(&item->gpuZoneBegin.thread, GetThreadHandle());
MemWrite(&item->gpuZoneBegin.queryId, (uint16_t)m_beginQueryId);
MemWrite(&item->gpuZoneBegin.context, ctx->GetId());
Profiler::QueueSerialFinish();
GetProfiler().SendCallstack(depth);
}
tracy_force_inline void SetEvent(cl_event event)
{
m_event = event;
assert(clRetainEvent(m_event) == CL_SUCCESS);
m_ctx->GetQuery(m_beginQueryId).event = m_event;
}
tracy_force_inline ~OpenCLCtxScope()
{
const auto queryId = m_ctx->NextQueryId(EventInfo{ m_event, EventPhase::End });
auto item = Profiler::QueueSerial();
MemWrite(&item->hdr.type, QueueType::GpuZoneEndSerial);
MemWrite(&item->gpuZoneEnd.cpuTime, Profiler::GetTime());
MemWrite(&item->gpuZoneEnd.thread, GetThreadHandle());
MemWrite(&item->gpuZoneEnd.queryId, (uint16_t)queryId);
MemWrite(&item->gpuZoneEnd.context, m_ctx->GetId());
Profiler::QueueSerialFinish();
}
const bool m_active;
OpenCLCtx* m_ctx;
cl_event m_event;
unsigned int m_beginQueryId;
};
static inline OpenCLCtx* CreateCLContext(cl_context context, cl_device_id device)
{
InitRPMallocThread();
auto ctx = (OpenCLCtx*)tracy_malloc(sizeof(OpenCLCtx));
new (ctx) OpenCLCtx(context, device);
return ctx;
}
static inline void DestroyCLContext(OpenCLCtx* ctx)
{
ctx->~OpenCLCtx();
tracy_free(ctx);
}
} // namespace tracy
using TracyCLCtx = tracy::OpenCLCtx*;
#define TracyCLContext(context, device) tracy::CreateCLContext(context, device);
#define TracyCLDestroy(ctx) tracy::DestroyCLContext(ctx);
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyCLNamedZone(ctx, varname, name, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::OpenCLCtxScope varname(ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCLNamedZoneC(ctx, varname, name, color, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::OpenCLCtxScope varname(ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyCLZone(ctx, name) TracyCLNamedZoneS(ctx, __tracy_gpu_zone, name, TRACY_CALLSTACK, true)
# define TracyCLZoneC(ctx, name, color) TracyCLNamedZoneCS(ctx, __tracy_gpu_zone, name, color, TRACY_CALLSTACK, true)
#else
# define TracyCLNamedZone(ctx, varname, name, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__){ name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::OpenCLCtxScope varname(ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), active);
# define TracyCLNamedZoneC(ctx, varname, name, color, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__){ name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::OpenCLCtxScope varname(ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), active);
# define TracyCLZone(ctx, name) TracyCLNamedZone(ctx, __tracy_gpu_zone, name, true)
# define TracyCLZoneC(ctx, name, color) TracyCLNamedZoneC(ctx, __tracy_gpu_zone, name, color, true )
#endif
#ifdef TRACY_HAS_CALLSTACK
# define TracyCLNamedZoneS(ctx, varname, name, depth, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__){ name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::OpenCLCtxScope varname(ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), depth, active);
# define TracyCLNamedZoneCS(ctx, varname, name, color, depth, active) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__){ name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::OpenCLCtxScope varname(ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), depth, active);
# define TracyCLZoneS(ctx, name, depth) TracyCLNamedZoneS(ctx, __tracy_gpu_zone, name, depth, true)
# define TracyCLZoneCS(ctx, name, color, depth) TracyCLNamedZoneCS(ctx, __tracy_gpu_zone, name, color, depth, true)
#else
#define TracyCLNamedZoneS(ctx, varname, name, depth, active) TracyCLNamedZone(ctx, varname, name, active)
#define TracyCLNamedZoneCS(ctx, varname, name, color, depth, active) TracyCLNamedZoneC(ctx, varname, name, color, active)
#define TracyCLZoneS(ctx, name, depth) TracyCLZone(ctx, name)
#define TracyCLZoneCS(ctx, name, color, depth) TracyCLZoneC(ctx, name, color)
#endif
#define TracyCLNamedZoneSetEvent(varname, event) varname.SetEvent(event)
#define TracyCLZoneSetEvent(event) __tracy_gpu_zone.SetEvent(event)
#define TracyCLCollect(ctx) ctx->Collect()
#endif
#endif

View File

@@ -1,22 +1,35 @@
#ifndef __TRACYOPENGL_HPP__
#define __TRACYOPENGL_HPP__
// Include this file after you include OpenGL 3.2 headers.
#if !defined GL_TIMESTAMP && !defined GL_TIMESTAMP_EXT
# error "You must include OpenGL 3.2 headers before including TracyOpenGL.hpp"
#endif
#if !defined TRACY_ENABLE || defined __APPLE__
#define TracyGpuContext
#define TracyGpuNamedZone(x,y)
#define TracyGpuNamedZoneC(x,y,z)
#define TracyGpuNamedZone(x,y,z)
#define TracyGpuNamedZoneC(x,y,z,w)
#define TracyGpuZone(x)
#define TracyGpuZoneC(x,y)
#define TracyGpuCollect
#define TracyGpuNamedZoneS(x,y,z)
#define TracyGpuNamedZoneCS(x,y,z,w)
#define TracyGpuNamedZoneS(x,y,z,w)
#define TracyGpuNamedZoneCS(x,y,z,w,a)
#define TracyGpuZoneS(x,y)
#define TracyGpuZoneCS(x,y,z)
namespace tracy
{
struct SourceLocationData;
class GpuCtxScope
{
public:
GpuCtxScope( const SourceLocationData*, bool ) {}
GpuCtxScope( const SourceLocationData*, int, bool ) {}
};
}
#else
#include <atomic>
@@ -29,28 +42,36 @@
#include "common/TracyAlign.hpp"
#include "common/TracyAlloc.hpp"
#define TracyGpuContext tracy::s_gpuCtx.ptr = (tracy::GpuCtx*)tracy::tracy_malloc( sizeof( tracy::GpuCtx ) ); new(tracy::s_gpuCtx.ptr) tracy::GpuCtx;
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyGpuNamedZone( varname, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK );
# define TracyGpuNamedZoneC( varname, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK );
# define TracyGpuZone( name ) TracyGpuNamedZoneS( ___tracy_gpu_zone, name, TRACY_CALLSTACK )
# define TracyGpuZoneC( name, color ) TracyGpuNamedZoneCS( ___tracy_gpu_zone, name, color, TRACY_CALLSTACK )
#else
# define TracyGpuNamedZone( varname, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__) );
# define TracyGpuNamedZoneC( varname, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__) );
# define TracyGpuZone( name ) TracyGpuNamedZone( ___tracy_gpu_zone, name )
# define TracyGpuZoneC( name, color ) TracyGpuNamedZoneC( ___tracy_gpu_zone, name, color )
#if !defined GL_TIMESTAMP && defined GL_TIMESTAMP_EXT
# define GL_TIMESTAMP GL_TIMESTAMP_EXT
# define GL_QUERY_COUNTER_BITS GL_QUERY_COUNTER_BITS_EXT
# define glGetQueryObjectiv glGetQueryObjectivEXT
# define glGetQueryObjectui64v glGetQueryObjectui64vEXT
# define glQueryCounter glQueryCounterEXT
#endif
#define TracyGpuCollect tracy::s_gpuCtx.ptr->Collect();
#define TracyGpuContext tracy::InitRPMallocThread(); tracy::GetGpuCtx().ptr = (tracy::GpuCtx*)tracy::tracy_malloc( sizeof( tracy::GpuCtx ) ); new(tracy::GetGpuCtx().ptr) tracy::GpuCtx;
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyGpuNamedZone( varname, name, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyGpuNamedZoneC( varname, name, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), TRACY_CALLSTACK, active );
# define TracyGpuZone( name ) TracyGpuNamedZoneS( ___tracy_gpu_zone, name, TRACY_CALLSTACK, true )
# define TracyGpuZoneC( name, color ) TracyGpuNamedZoneCS( ___tracy_gpu_zone, name, color, TRACY_CALLSTACK, true )
#else
# define TracyGpuNamedZone( varname, name, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), active );
# define TracyGpuNamedZoneC( varname, name, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), active );
# define TracyGpuZone( name ) TracyGpuNamedZone( ___tracy_gpu_zone, name, true )
# define TracyGpuZoneC( name, color ) TracyGpuNamedZoneC( ___tracy_gpu_zone, name, color, true )
#endif
#define TracyGpuCollect tracy::GetGpuCtx().ptr->Collect();
#ifdef TRACY_HAS_CALLSTACK
# define TracyGpuNamedZoneS( varname, name, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), depth );
# define TracyGpuNamedZoneCS( varname, name, color, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), depth );
# define TracyGpuZoneS( name, depth ) TracyGpuNamedZoneS( ___tracy_gpu_zone, name, depth )
# define TracyGpuZoneCS( name, color, depth ) TracyGpuNamedZoneCS( ___tracy_gpu_zone, name, color, depth )
# define TracyGpuNamedZoneS( varname, name, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), depth, active );
# define TracyGpuNamedZoneCS( varname, name, color, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::GpuCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), depth, active );
# define TracyGpuZoneS( name, depth ) TracyGpuNamedZoneS( ___tracy_gpu_zone, name, depth, true )
# define TracyGpuZoneCS( name, color, depth ) TracyGpuNamedZoneCS( ___tracy_gpu_zone, name, color, depth, true )
#else
# define TracyGpuNamedZoneS( varname, name, depth ) TracyGpuNamedZone( varname, name )
# define TracyGpuNamedZoneCS( varname, name, color, depth ) TracyGpuNamedZoneC( varname, name, color )
# define TracyGpuNamedZoneS( varname, name, depth, active ) TracyGpuNamedZone( varname, name, active )
# define TracyGpuNamedZoneCS( varname, name, color, depth, active ) TracyGpuNamedZoneC( varname, name, color, active )
# define TracyGpuZoneS( name, depth ) TracyGpuZone( name )
# define TracyGpuZoneCS( name, color, depth ) TracyGpuZoneC( name, color )
#endif
@@ -58,8 +79,6 @@
namespace tracy
{
extern std::atomic<uint8_t> s_gpuCtxCounter;
class GpuCtx
{
friend class GpuCtxScope;
@@ -68,7 +87,7 @@ class GpuCtx
public:
GpuCtx()
: m_context( s_gpuCtxCounter.fetch_add( 1, std::memory_order_relaxed ) )
: m_context( GetGpuCtxCounter().fetch_add( 1, std::memory_order_relaxed ) )
, m_head( 0 )
, m_tail( 0 )
{
@@ -84,23 +103,21 @@ public:
glGetQueryiv( GL_TIMESTAMP, GL_QUERY_COUNTER_BITS, &bits );
const float period = 1.f;
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuNewContext );
const auto thread = GetThreadHandle();
TracyLfqPrepare( QueueType::GpuNewContext );
MemWrite( &item->gpuNewContext.cpuTime, tcpu );
MemWrite( &item->gpuNewContext.gpuTime, tgpu );
MemWrite( &item->gpuNewContext.thread, GetThreadHandle() );
MemWrite( &item->gpuNewContext.thread, thread );
MemWrite( &item->gpuNewContext.period, period );
MemWrite( &item->gpuNewContext.context, m_context );
MemWrite( &item->gpuNewContext.accuracyBits, (uint8_t)bits );
MemWrite( &item->gpuNewContext.type, GpuContextType::OpenGl );
#ifdef TRACY_ON_DEMAND
s_profiler.DeferItem( *item );
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
void Collect()
@@ -110,49 +127,28 @@ public:
if( m_tail == m_head ) return;
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() )
if( !GetProfiler().IsConnected() )
{
m_head = m_tail = 0;
return;
}
#endif
auto start = m_tail;
auto end = m_head + QueryCount;
auto cnt = ( end - start ) % QueryCount;
while( cnt > 1 )
while( m_tail != m_head )
{
auto mid = start + cnt / 2;
GLint available;
glGetQueryObjectiv( m_query[mid % QueryCount], GL_QUERY_RESULT_AVAILABLE, &available );
if( available )
{
start = mid;
}
else
{
end = mid;
}
cnt = ( end - start ) % QueryCount;
}
glGetQueryObjectiv( m_query[m_tail], GL_QUERY_RESULT_AVAILABLE, &available );
if( !available ) return;
start %= QueryCount;
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
while( m_tail != start )
{
uint64_t time;
glGetQueryObjectui64v( m_query[m_tail], GL_QUERY_RESULT, &time );
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuTime );
TracyLfqPrepare( QueueType::GpuTime );
MemWrite( &item->gpuTime.gpuTime, (int64_t)time );
MemWrite( &item->gpuTime.queryId, (uint16_t)m_tail );
MemWrite( &item->gpuTime.context, m_context );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
m_tail = ( m_tail + 1 ) % QueryCount;
}
}
@@ -183,86 +179,71 @@ private:
unsigned int m_tail;
};
extern thread_local GpuCtxWrapper s_gpuCtx;
class GpuCtxScope
{
public:
tracy_force_inline GpuCtxScope( const SourceLocationData* srcloc )
tracy_force_inline GpuCtxScope( const SourceLocationData* srcloc, bool is_active )
#ifdef TRACY_ON_DEMAND
: m_active( s_profiler.IsConnected() )
: m_active( is_active && GetProfiler().IsConnected() )
#else
: m_active( is_active )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = s_gpuCtx.ptr->NextQueryId();
glQueryCounter( s_gpuCtx.ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneBegin );
const auto queryId = GetGpuCtx().ptr->NextQueryId();
glQueryCounter( GetGpuCtx().ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
TracyLfqPrepare( QueueType::GpuZoneBegin );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
memset( &item->gpuZoneBegin.thread, 0, sizeof( item->gpuZoneBegin.thread ) );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, s_gpuCtx.ptr->GetId() );
tail.store( magic + 1, std::memory_order_release );
MemWrite( &item->gpuZoneBegin.context, GetGpuCtx().ptr->GetId() );
TracyLfqCommit;
}
tracy_force_inline GpuCtxScope( const SourceLocationData* srcloc, int depth )
tracy_force_inline GpuCtxScope( const SourceLocationData* srcloc, int depth, bool is_active )
#ifdef TRACY_ON_DEMAND
: m_active( s_profiler.IsConnected() )
: m_active( is_active && GetProfiler().IsConnected() )
#else
: m_active( is_active )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = s_gpuCtx.ptr->NextQueryId();
glQueryCounter( s_gpuCtx.ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
const auto queryId = GetGpuCtx().ptr->NextQueryId();
glQueryCounter( GetGpuCtx().ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
const auto thread = GetThreadHandle();
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginCallstack );
TracyLfqPrepare( QueueType::GpuZoneBeginCallstack );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, thread );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, s_gpuCtx.ptr->GetId() );
tail.store( magic + 1, std::memory_order_release );
MemWrite( &item->gpuZoneBegin.context, GetGpuCtx().ptr->GetId() );
TracyLfqCommit;
s_profiler.SendCallstack( depth, thread );
GetProfiler().SendCallstack( depth );
}
tracy_force_inline ~GpuCtxScope()
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto queryId = s_gpuCtx.ptr->NextQueryId();
glQueryCounter( s_gpuCtx.ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneEnd );
const auto queryId = GetGpuCtx().ptr->NextQueryId();
glQueryCounter( GetGpuCtx().ptr->TranslateOpenGlQueryId( queryId ), GL_TIMESTAMP );
TracyLfqPrepare( QueueType::GpuZoneEnd );
MemWrite( &item->gpuZoneEnd.cpuTime, Profiler::GetTime() );
memset( &item->gpuZoneEnd.thread, 0, sizeof( item->gpuZoneEnd.thread ) );
MemWrite( &item->gpuZoneEnd.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneEnd.context, s_gpuCtx.ptr->GetId() );
tail.store( magic + 1, std::memory_order_release );
MemWrite( &item->gpuZoneEnd.context, GetGpuCtx().ptr->GetId() );
TracyLfqCommit;
}
private:
#ifdef TRACY_ON_DEMAND
const bool m_active;
#endif
};
}

View File

@@ -3,18 +3,25 @@
#if !defined TRACY_ENABLE
#define TracyVkContext(x,y,z,w)
#define TracyVkDestroy
#define TracyVkNamedZone(x,y,z)
#define TracyVkNamedZoneC(x,y,z,w)
#define TracyVkZone(x,y)
#define TracyVkZoneC(x,y,z)
#define TracyVkCollect(x)
#define TracyVkContext(x,y,z,w) nullptr
#define TracyVkDestroy(x)
#define TracyVkNamedZone(c,x,y,z,w)
#define TracyVkNamedZoneC(c,x,y,z,w,a)
#define TracyVkZone(c,x,y)
#define TracyVkZoneC(c,x,y,z)
#define TracyVkCollect(c,x)
#define TracyVkNamedZoneS(x,y,z,w)
#define TracyVkNamedZoneCS(x,y,z,w,v)
#define TracyVkZoneS(x,y,z)
#define TracyVkZoneCS(x,y,z,w)
#define TracyVkNamedZoneS(c,x,y,z,w,a)
#define TracyVkNamedZoneCS(c,x,y,z,w,v,a)
#define TracyVkZoneS(c,x,y,z)
#define TracyVkZoneCS(c,x,y,z,w)
namespace tracy
{
class VkCtxScope {};
}
using TracyVkCtx = void*;
#else
@@ -25,38 +32,9 @@
#include "client/TracyProfiler.hpp"
#include "client/TracyCallstack.hpp"
#define TracyVkContext( physdev, device, queue, cmdbuf ) tracy::s_vkCtx.ptr = (tracy::VkCtx*)tracy::tracy_malloc( sizeof( tracy::VkCtx ) ); new(tracy::s_vkCtx.ptr) tracy::VkCtx( physdev, device, queue, cmdbuf );
#define TracyVkDestroy tracy::s_vkCtx.ptr->~VkCtx(); tracy::tracy_free( tracy::s_vkCtx.ptr ); tracy::s_vkCtx.ptr = nullptr;
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyVkNamedZone( varname, cmdbuf, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, TRACY_CALLSTACK );
# define TracyVkNamedZoneC( varname, cmdbuf, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, TRACY_CALLSTACK );
# define TracyVkZone( cmdbuf, name ) TracyVkNamedZoneS( ___tracy_gpu_zone, cmdbuf, name, TRACY_CALLSTACK )
# define TracyVkZoneC( cmdbuf, name, color ) TracyVkNamedZoneCS( ___tracy_gpu_zone, cmdbuf, name, color, TRACY_CALLSTACK )
#else
# define TracyVkNamedZone( varname, cmdbuf, name ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf );
# define TracyVkNamedZoneC( varname, cmdbuf, name, color ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf );
# define TracyVkZone( cmdbuf, name ) TracyVkNamedZone( ___tracy_gpu_zone, cmdbuf, name )
# define TracyVkZoneC( cmdbuf, name, color ) TracyVkNamedZoneC( ___tracy_gpu_zone, cmdbuf, name, color )
#endif
#define TracyVkCollect( cmdbuf ) tracy::s_vkCtx.ptr->Collect( cmdbuf );
#ifdef TRACY_HAS_CALLSTACK
# define TracyVkNamedZoneS( varname, cmdbuf, name, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, depth );
# define TracyVkNamedZoneCS( varname, cmdbuf, name, color, depth ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, depth );
# define TracyVkZoneS( cmdbuf, name, depth ) TracyVkNamedZoneS( ___tracy_gpu_zone, cmdbuf, name, depth )
# define TracyVkZoneCS( cmdbuf, name, color, depth ) TracyVkNamedZoneCS( ___tracy_gpu_zone, cmdbuf, name, color, depth )
#else
# define TracyVkNamedZoneS( varname, cmdbuf, name, depth ) TracyVkNamedZone( varname, cmdbuf, name )
# define TracyVkNamedZoneCS( varname, cmdbuf, name, color, depth ) TracyVkNamedZoneC( varname, cmdbuf, name, color )
# define TracyVkZoneS( cmdbuf, name, depth ) TracyVkZone( cmdbuf, name )
# define TracyVkZoneCS( cmdbuf, name, color, depth ) TracyVkZoneC( cmdbuf, name, color )
#endif
namespace tracy
{
extern std::atomic<uint8_t> s_gpuCtxCounter;
class VkCtx
{
friend class VkCtxScope;
@@ -66,11 +44,11 @@ class VkCtx
public:
VkCtx( VkPhysicalDevice physdev, VkDevice device, VkQueue queue, VkCommandBuffer cmdbuf )
: m_device( device )
, m_queue( queue )
, m_context( s_gpuCtxCounter.fetch_add( 1, std::memory_order_relaxed ) )
, m_context( GetGpuCtxCounter().fetch_add( 1, std::memory_order_relaxed ) )
, m_head( 0 )
, m_tail( 0 )
, m_oldCnt( 0 )
, m_queryCount( QueryCount )
{
assert( m_context != 255 );
@@ -80,9 +58,13 @@ public:
VkQueryPoolCreateInfo poolInfo = {};
poolInfo.sType = VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO;
poolInfo.queryCount = QueryCount;
poolInfo.queryCount = m_queryCount;
poolInfo.queryType = VK_QUERY_TYPE_TIMESTAMP;
vkCreateQueryPool( device, &poolInfo, nullptr, &m_query );
while( vkCreateQueryPool( device, &poolInfo, nullptr, &m_query ) != VK_SUCCESS )
{
m_queryCount /= 2;
poolInfo.queryCount = m_queryCount;
}
VkCommandBufferBeginInfo beginInfo = {};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
@@ -94,7 +76,7 @@ public:
submitInfo.pCommandBuffers = &cmdbuf;
vkBeginCommandBuffer( cmdbuf, &beginInfo );
vkCmdResetQueryPool( cmdbuf, m_query, 0, QueryCount );
vkCmdResetQueryPool( cmdbuf, m_query, 0, m_queryCount );
vkEndCommandBuffer( cmdbuf );
vkQueueSubmit( queue, 1, &submitInfo, VK_NULL_HANDLE );
vkQueueWaitIdle( queue );
@@ -115,10 +97,7 @@ public:
vkQueueSubmit( queue, 1, &submitInfo, VK_NULL_HANDLE );
vkQueueWaitIdle( queue );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuNewContext );
MemWrite( &item->gpuNewContext.cpuTime, tcpu );
MemWrite( &item->gpuNewContext.gpuTime, tgpu );
@@ -126,16 +105,19 @@ public:
MemWrite( &item->gpuNewContext.period, period );
MemWrite( &item->gpuNewContext.context, m_context );
MemWrite( &item->gpuNewContext.accuracyBits, uint8_t( 0 ) );
MemWrite( &item->gpuNewContext.type, GpuContextType::Vulkan );
#ifdef TRACY_ON_DEMAND
s_profiler.DeferItem( *item );
GetProfiler().DeferItem( *item );
#endif
Profiler::QueueSerialFinish();
tail.store( magic + 1, std::memory_order_release );
m_res = (int64_t*)tracy_malloc( sizeof( int64_t ) * m_queryCount );
}
~VkCtx()
{
tracy_free( m_res );
vkDestroyQueryPool( m_device, m_query, nullptr );
}
@@ -146,9 +128,9 @@ public:
if( m_tail == m_head ) return;
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() )
if( !GetProfiler().IsConnected() )
{
vkCmdResetQueryPool( cmdbuf, m_query, 0, QueryCount );
vkCmdResetQueryPool( cmdbuf, m_query, 0, m_queryCount );
m_head = m_tail = 0;
return;
}
@@ -162,41 +144,36 @@ public:
}
else
{
cnt = m_head < m_tail ? QueryCount - m_tail : m_head - m_tail;
cnt = m_head < m_tail ? m_queryCount - m_tail : m_head - m_tail;
}
int64_t res[QueryCount];
if( vkGetQueryPoolResults( m_device, m_query, m_tail, cnt, sizeof( res ), res, sizeof( *res ), VK_QUERY_RESULT_64_BIT ) == VK_NOT_READY )
if( vkGetQueryPoolResults( m_device, m_query, m_tail, cnt, sizeof( int64_t ) * m_queryCount, m_res, sizeof( int64_t ), VK_QUERY_RESULT_64_BIT ) == VK_NOT_READY )
{
m_oldCnt = cnt;
return;
}
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
for( unsigned int idx=0; idx<cnt; idx++ )
{
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuTime );
MemWrite( &item->gpuTime.gpuTime, res[idx] );
MemWrite( &item->gpuTime.gpuTime, m_res[idx] );
MemWrite( &item->gpuTime.queryId, uint16_t( m_tail + idx ) );
MemWrite( &item->gpuTime.context, m_context );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
vkCmdResetQueryPool( cmdbuf, m_query, m_tail, cnt );
m_tail += cnt;
if( m_tail == QueryCount ) m_tail = 0;
if( m_tail == m_queryCount ) m_tail = 0;
}
private:
tracy_force_inline unsigned int NextQueryId()
{
const auto id = m_head;
m_head = ( m_head + 1 ) % QueryCount;
m_head = ( m_head + 1 ) % m_queryCount;
assert( m_head != m_tail );
return id;
}
@@ -207,106 +184,138 @@ private:
}
VkDevice m_device;
VkQueue m_queue;
VkQueryPool m_query;
uint8_t m_context;
unsigned int m_head;
unsigned int m_tail;
unsigned int m_oldCnt;
};
unsigned int m_queryCount;
extern VkCtxWrapper s_vkCtx;
int64_t* m_res;
};
class VkCtxScope
{
public:
tracy_force_inline VkCtxScope( const SourceLocationData* srcloc, VkCommandBuffer cmdbuf )
: m_cmdbuf( cmdbuf )
tracy_force_inline VkCtxScope( VkCtx* ctx, const SourceLocationData* srcloc, VkCommandBuffer cmdbuf, bool is_active )
#ifdef TRACY_ON_DEMAND
, m_active( s_profiler.IsConnected() )
: m_active( is_active && GetProfiler().IsConnected() )
#else
: m_active( is_active )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
auto ctx = s_vkCtx.ptr;
const auto queryId = ctx->NextQueryId();
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, ctx->m_query, queryId );
m_cmdbuf = cmdbuf;
m_ctx = ctx;
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneBegin );
const auto queryId = ctx->NextQueryId();
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, ctx->m_query, queryId );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginSerial );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, ctx->GetId() );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
tracy_force_inline VkCtxScope( const SourceLocationData* srcloc, VkCommandBuffer cmdbuf, int depth )
: m_cmdbuf( cmdbuf )
tracy_force_inline VkCtxScope( VkCtx* ctx, const SourceLocationData* srcloc, VkCommandBuffer cmdbuf, int depth, bool is_active )
#ifdef TRACY_ON_DEMAND
, m_active( s_profiler.IsConnected() )
: m_active( is_active && GetProfiler().IsConnected() )
#else
: m_active( is_active )
#endif
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
const auto thread = GetThreadHandle();
m_cmdbuf = cmdbuf;
m_ctx = ctx;
auto ctx = s_vkCtx.ptr;
const auto queryId = ctx->NextQueryId();
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, ctx->m_query, queryId );
vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, ctx->m_query, queryId );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginCallstack );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginCallstackSerial );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, thread );
MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, ctx->GetId() );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
s_profiler.SendCallstack( depth, thread );
GetProfiler().SendCallstack( depth );
}
tracy_force_inline ~VkCtxScope()
{
#ifdef TRACY_ON_DEMAND
if( !m_active ) return;
#endif
auto ctx = s_vkCtx.ptr;
const auto queryId = ctx->NextQueryId();
vkCmdWriteTimestamp( m_cmdbuf, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, ctx->m_query, queryId );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::GpuZoneEnd );
const auto queryId = m_ctx->NextQueryId();
vkCmdWriteTimestamp( m_cmdbuf, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, m_ctx->m_query, queryId );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneEndSerial );
MemWrite( &item->gpuZoneEnd.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneEnd.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneEnd.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneEnd.context, ctx->GetId() );
tail.store( magic + 1, std::memory_order_release );
MemWrite( &item->gpuZoneEnd.context, m_ctx->GetId() );
Profiler::QueueSerialFinish();
}
private:
VkCommandBuffer m_cmdbuf;
#ifdef TRACY_ON_DEMAND
const bool m_active;
#endif
VkCommandBuffer m_cmdbuf;
VkCtx* m_ctx;
};
static inline VkCtx* CreateVkContext( VkPhysicalDevice physdev, VkDevice device, VkQueue queue, VkCommandBuffer cmdbuf )
{
InitRPMallocThread();
auto ctx = (VkCtx*)tracy_malloc( sizeof( VkCtx ) );
new(ctx) VkCtx( physdev, device, queue, cmdbuf );
return ctx;
}
static inline void DestroyVkContext( VkCtx* ctx )
{
ctx->~VkCtx();
tracy_free( ctx );
}
}
using TracyVkCtx = tracy::VkCtx*;
#define TracyVkContext( physdev, device, queue, cmdbuf ) tracy::CreateVkContext( physdev, device, queue, cmdbuf );
#define TracyVkDestroy( ctx ) tracy::DestroyVkContext( ctx );
#if defined TRACY_HAS_CALLSTACK && defined TRACY_CALLSTACK
# define TracyVkNamedZone( ctx, varname, cmdbuf, name, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, TRACY_CALLSTACK, active );
# define TracyVkNamedZoneC( ctx, varname, cmdbuf, name, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, TRACY_CALLSTACK, active );
# define TracyVkZone( ctx, cmdbuf, name ) TracyVkNamedZoneS( ctx, ___tracy_gpu_zone, cmdbuf, name, TRACY_CALLSTACK, true )
# define TracyVkZoneC( ctx, cmdbuf, name, color ) TracyVkNamedZoneCS( ctx, ___tracy_gpu_zone, cmdbuf, name, color, TRACY_CALLSTACK, true )
#else
# define TracyVkNamedZone( ctx, varname, cmdbuf, name, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, active );
# define TracyVkNamedZoneC( ctx, varname, cmdbuf, name, color, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, active );
# define TracyVkZone( ctx, cmdbuf, name ) TracyVkNamedZone( ctx, ___tracy_gpu_zone, cmdbuf, name, true )
# define TracyVkZoneC( ctx, cmdbuf, name, color ) TracyVkNamedZoneC( ctx, ___tracy_gpu_zone, cmdbuf, name, color, true )
#endif
#define TracyVkCollect( ctx, cmdbuf ) ctx->Collect( cmdbuf );
#ifdef TRACY_HAS_CALLSTACK
# define TracyVkNamedZoneS( ctx, varname, cmdbuf, name, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, 0 }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, depth, active );
# define TracyVkNamedZoneCS( ctx, varname, cmdbuf, name, color, depth, active ) static const tracy::SourceLocationData TracyConcat(__tracy_gpu_source_location,__LINE__) { name, __FUNCTION__, __FILE__, (uint32_t)__LINE__, color }; tracy::VkCtxScope varname( ctx, &TracyConcat(__tracy_gpu_source_location,__LINE__), cmdbuf, depth, active );
# define TracyVkZoneS( ctx, cmdbuf, name, depth ) TracyVkNamedZoneS( ctx, ___tracy_gpu_zone, cmdbuf, name, depth, true )
# define TracyVkZoneCS( ctx, cmdbuf, name, color, depth ) TracyVkNamedZoneCS( ctx, ___tracy_gpu_zone, cmdbuf, name, color, depth, true )
#else
# define TracyVkNamedZoneS( ctx, varname, cmdbuf, name, depth, active ) TracyVkNamedZone( ctx, varname, cmdbuf, name, active )
# define TracyVkNamedZoneCS( ctx, varname, cmdbuf, name, color, depth, active ) TracyVkNamedZoneC( ctx, varname, cmdbuf, name, color, active )
# define TracyVkZoneS( ctx, cmdbuf, name, depth ) TracyVkZone( ctx, cmdbuf, name )
# define TracyVkZoneCS( ctx, cmdbuf, name, color, depth ) TracyVkZoneC( ctx, cmdbuf, name, color )
#endif
#endif
#endif

View File

@@ -1,8 +1,8 @@
CFLAGS +=
CXXFLAGS := $(CFLAGS) -std=gnu++17
DEFINES += -DTRACY_NO_STATISTICS
INCLUDES :=
LIBS := -lpthread
INCLUDES := $(shell pkg-config --cflags capstone)
LIBS := $(shell pkg-config --libs capstone) -lpthread
PROJECT := capture
IMAGE := $(PROJECT)-$(BUILD)
@@ -14,6 +14,11 @@ BASE2 := $(shell egrep 'ClCompile.*c"' ../win32/$(PROJECT).vcxproj | sed -e 's/.
SRC := $(filter-out $(FILTER),$(BASE))
SRC2 := $(filter-out $(FILTER),$(BASE2))
TBB := $(shell ld -ltbb -o /dev/null 2>/dev/null; echo $$?)
ifeq ($(TBB),0)
LIBS += -ltbb
endif
OBJDIRBASE := obj/$(BUILD)
OBJDIR := $(OBJDIRBASE)/o/o/o
@@ -46,7 +51,7 @@ $(IMAGE): $(OBJ) $(OBJ2)
$(CXX) $(CXXFLAGS) $(DEFINES) $(OBJ) $(OBJ2) $(LIBS) -o $@
ifneq "$(MAKECMDGOALS)" "clean"
-include $(addprefix $(OBJDIR)/,$(SRC:.cpp=.d)) %(addprefix $(OBJDIR)/,$(SRC2:.c=.d))
-include $(addprefix $(OBJDIR)/,$(SRC:.cpp=.d)) $(addprefix $(OBJDIR)/,$(SRC2:.c=.d))
endif
clean:

View File

@@ -1,11 +1,7 @@
ARCH := $(shell uname -m)
CFLAGS := -O3 -s -fomit-frame-pointer
CFLAGS := -O3 -s -march=native
DEFINES := -DNDEBUG
BUILD := release
ifeq ($(ARCH),x86_64)
CFLAGS += -msse4.1
endif
include build.mk

View File

@@ -22,32 +22,33 @@
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{447D58BF-94CD-4469-BB90-549C05D03E00}</ProjectGuid>
<RootNamespace>capture</RootNamespace>
<WindowsTargetPlatformVersion>10.0.16299.0</WindowsTargetPlatformVersion>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
<VcpkgTriplet>x64-windows-static</VcpkgTriplet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
@@ -88,10 +89,12 @@
<PreprocessorDefinitions>TRACY_NO_STATISTICS;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;WIN32_LEAN_AND_MEAN;NOMINMAX;_USE_MATH_DEFINES;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<EnableEnhancedInstructionSet>AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
<LanguageStandard>stdcpplatest</LanguageStandard>
<AdditionalIncludeDirectories>..\..\..\vcpkg\vcpkg\installed\x64-windows-static\include</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<AdditionalDependencies>ws2_32.lib;%(AdditionalDependencies)</AdditionalDependencies>
<AdditionalDependencies>ws2_32.lib;capstone.lib;%(AdditionalDependencies)</AdditionalDependencies>
<SubSystem>Console</SubSystem>
<AdditionalLibraryDirectories>..\..\..\vcpkg\vcpkg\installed\x64-windows-static\debug\lib</AdditionalLibraryDirectories>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
@@ -120,12 +123,14 @@
<PreprocessorDefinitions>TRACY_NO_STATISTICS;NDEBUG;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;WIN32_LEAN_AND_MEAN;NOMINMAX;_USE_MATH_DEFINES;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<EnableEnhancedInstructionSet>AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
<LanguageStandard>stdcpplatest</LanguageStandard>
<AdditionalIncludeDirectories>..\..\..\vcpkg\vcpkg\installed\x64-windows-static\include</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<AdditionalDependencies>ws2_32.lib;%(AdditionalDependencies)</AdditionalDependencies>
<AdditionalDependencies>ws2_32.lib;capstone.lib;%(AdditionalDependencies)</AdditionalDependencies>
<SubSystem>Console</SubSystem>
<AdditionalLibraryDirectories>..\..\..\vcpkg\vcpkg\installed\x64-windows-static\lib</AdditionalLibraryDirectories>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
@@ -134,7 +139,37 @@
<ClCompile Include="..\..\..\common\tracy_lz4.cpp" />
<ClCompile Include="..\..\..\common\tracy_lz4hc.cpp" />
<ClCompile Include="..\..\..\server\TracyMemory.cpp" />
<ClCompile Include="..\..\..\server\TracyMmap.cpp" />
<ClCompile Include="..\..\..\server\TracyPrint.cpp" />
<ClCompile Include="..\..\..\server\TracyTaskDispatch.cpp" />
<ClCompile Include="..\..\..\server\TracyTextureCompression.cpp" />
<ClCompile Include="..\..\..\server\TracyThreadCompress.cpp" />
<ClCompile Include="..\..\..\server\TracyWorker.cpp" />
<ClCompile Include="..\..\..\zstd\debug.c" />
<ClCompile Include="..\..\..\zstd\entropy_common.c" />
<ClCompile Include="..\..\..\zstd\error_private.c" />
<ClCompile Include="..\..\..\zstd\fse_compress.c" />
<ClCompile Include="..\..\..\zstd\fse_decompress.c" />
<ClCompile Include="..\..\..\zstd\hist.c" />
<ClCompile Include="..\..\..\zstd\huf_compress.c" />
<ClCompile Include="..\..\..\zstd\huf_decompress.c" />
<ClCompile Include="..\..\..\zstd\pool.c" />
<ClCompile Include="..\..\..\zstd\threading.c" />
<ClCompile Include="..\..\..\zstd\xxhash.c" />
<ClCompile Include="..\..\..\zstd\zstdmt_compress.c" />
<ClCompile Include="..\..\..\zstd\zstd_common.c" />
<ClCompile Include="..\..\..\zstd\zstd_compress.c" />
<ClCompile Include="..\..\..\zstd\zstd_compress_literals.c" />
<ClCompile Include="..\..\..\zstd\zstd_compress_sequences.c" />
<ClCompile Include="..\..\..\zstd\zstd_compress_superblock.c" />
<ClCompile Include="..\..\..\zstd\zstd_ddict.c" />
<ClCompile Include="..\..\..\zstd\zstd_decompress.c" />
<ClCompile Include="..\..\..\zstd\zstd_decompress_block.c" />
<ClCompile Include="..\..\..\zstd\zstd_double_fast.c" />
<ClCompile Include="..\..\..\zstd\zstd_fast.c" />
<ClCompile Include="..\..\..\zstd\zstd_lazy.c" />
<ClCompile Include="..\..\..\zstd\zstd_ldm.c" />
<ClCompile Include="..\..\..\zstd\zstd_opt.c" />
<ClCompile Include="..\..\src\capture.cpp" />
<ClCompile Include="..\..\src\getopt.c" />
</ItemGroup>
@@ -147,19 +182,51 @@
<ClInclude Include="..\..\..\common\TracyQueue.hpp" />
<ClInclude Include="..\..\..\common\TracySocket.hpp" />
<ClInclude Include="..\..\..\common\TracySystem.hpp" />
<ClInclude Include="..\..\..\common\tracy_benaphore.h" />
<ClInclude Include="..\..\..\common\tracy_lz4.hpp" />
<ClInclude Include="..\..\..\common\tracy_lz4hc.hpp" />
<ClInclude Include="..\..\..\common\tracy_sema.h" />
<ClInclude Include="..\..\..\server\TracyCharUtil.hpp" />
<ClInclude Include="..\..\..\server\TracyEvent.hpp" />
<ClInclude Include="..\..\..\server\TracyFileRead.hpp" />
<ClInclude Include="..\..\..\server\TracyFileWrite.hpp" />
<ClInclude Include="..\..\..\server\TracyMemory.hpp" />
<ClInclude Include="..\..\..\server\TracyMmap.hpp" />
<ClInclude Include="..\..\..\server\TracyPopcnt.hpp" />
<ClInclude Include="..\..\..\server\TracyPrint.hpp" />
<ClInclude Include="..\..\..\server\TracySlab.hpp" />
<ClInclude Include="..\..\..\server\TracyTaskDispatch.hpp" />
<ClInclude Include="..\..\..\server\TracyTextureCompression.hpp" />
<ClInclude Include="..\..\..\server\TracyThreadCompress.hpp" />
<ClInclude Include="..\..\..\server\TracyVector.hpp" />
<ClInclude Include="..\..\..\server\TracyWorker.hpp" />
<ClInclude Include="..\..\..\server\tracy_flat_hash_map.hpp" />
<ClInclude Include="..\..\..\zstd\bitstream.h" />
<ClInclude Include="..\..\..\zstd\compiler.h" />
<ClInclude Include="..\..\..\zstd\cpu.h" />
<ClInclude Include="..\..\..\zstd\debug.h" />
<ClInclude Include="..\..\..\zstd\error_private.h" />
<ClInclude Include="..\..\..\zstd\fse.h" />
<ClInclude Include="..\..\..\zstd\hist.h" />
<ClInclude Include="..\..\..\zstd\huf.h" />
<ClInclude Include="..\..\..\zstd\mem.h" />
<ClInclude Include="..\..\..\zstd\pool.h" />
<ClInclude Include="..\..\..\zstd\threading.h" />
<ClInclude Include="..\..\..\zstd\xxhash.h" />
<ClInclude Include="..\..\..\zstd\zstd.h" />
<ClInclude Include="..\..\..\zstd\zstdmt_compress.h" />
<ClInclude Include="..\..\..\zstd\zstd_compress_internal.h" />
<ClInclude Include="..\..\..\zstd\zstd_compress_literals.h" />
<ClInclude Include="..\..\..\zstd\zstd_compress_sequences.h" />
<ClInclude Include="..\..\..\zstd\zstd_compress_superblock.h" />
<ClInclude Include="..\..\..\zstd\zstd_cwksp.h" />
<ClInclude Include="..\..\..\zstd\zstd_ddict.h" />
<ClInclude Include="..\..\..\zstd\zstd_decompress_block.h" />
<ClInclude Include="..\..\..\zstd\zstd_decompress_internal.h" />
<ClInclude Include="..\..\..\zstd\zstd_double_fast.h" />
<ClInclude Include="..\..\..\zstd\zstd_errors.h" />
<ClInclude Include="..\..\..\zstd\zstd_fast.h" />
<ClInclude Include="..\..\..\zstd\zstd_internal.h" />
<ClInclude Include="..\..\..\zstd\zstd_lazy.h" />
<ClInclude Include="..\..\..\zstd\zstd_ldm.h" />
<ClInclude Include="..\..\..\zstd\zstd_opt.h" />
<ClInclude Include="..\..\src\getopt.h" />
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />

View File

@@ -10,6 +10,9 @@
<Filter Include="common">
<UniqueIdentifier>{e39d3623-47cd-4752-8da9-3ea324f964c1}</UniqueIdentifier>
</Filter>
<Filter Include="zstd">
<UniqueIdentifier>{043ecb94-f240-4986-94b0-bc5bbd415a82}</UniqueIdentifier>
</Filter>
</ItemGroup>
<ItemGroup>
<ClCompile Include="..\..\..\common\tracy_lz4.cpp">
@@ -36,6 +39,96 @@
<ClCompile Include="..\..\..\common\tracy_lz4hc.cpp">
<Filter>common</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyPrint.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyThreadCompress.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyTaskDispatch.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\debug.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\entropy_common.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\error_private.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\fse_compress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\fse_decompress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\hist.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\huf_compress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\huf_decompress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\pool.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\threading.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\xxhash.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_common.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_compress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_compress_literals.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_compress_sequences.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_compress_superblock.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_ddict.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_decompress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_decompress_block.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_double_fast.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_fast.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_lazy.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_ldm.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstd_opt.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\zstd\zstdmt_compress.c">
<Filter>zstd</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyMmap.cpp">
<Filter>server</Filter>
</ClCompile>
<ClCompile Include="..\..\..\server\TracyTextureCompression.cpp">
<Filter>server</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\..\..\common\tracy_lz4.hpp">
@@ -62,9 +155,6 @@
<ClInclude Include="..\..\..\common\TracySystem.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\tracy_flat_hash_map.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyCharUtil.hpp">
<Filter>server</Filter>
</ClInclude>
@@ -95,14 +185,113 @@
<ClInclude Include="..\..\..\common\TracyAlign.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\tracy_benaphore.h">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\tracy_sema.h">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\common\tracy_lz4hc.hpp">
<Filter>common</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyPrint.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyThreadCompress.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyTaskDispatch.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\bitstream.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\compiler.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\cpu.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\debug.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\error_private.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\fse.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\hist.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\huf.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\mem.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\pool.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\threading.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\xxhash.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_compress_internal.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_compress_literals.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_compress_sequences.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_compress_superblock.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_cwksp.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_ddict.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_decompress_block.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_decompress_internal.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_double_fast.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_errors.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_fast.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_internal.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_lazy.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_ldm.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstd_opt.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\zstd\zstdmt_compress.h">
<Filter>zstd</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyFileRead.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyMmap.hpp">
<Filter>server</Filter>
</ClInclude>
<ClInclude Include="..\..\..\server\TracyTextureCompression.hpp">
<Filter>server</Filter>
</ClInclude>
</ItemGroup>
</Project>

View File

@@ -5,6 +5,7 @@
#include <chrono>
#include <inttypes.h>
#include <mutex>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
@@ -12,94 +13,21 @@
#include "../../common/TracyProtocol.hpp"
#include "../../server/TracyFileWrite.hpp"
#include "../../server/TracyMemory.hpp"
#include "../../server/TracyPrint.hpp"
#include "../../server/TracyWorker.hpp"
#include "getopt.h"
static const char* TimeToString( int64_t ns )
bool disconnect = false;
void SigInt( int )
{
enum { Pool = 8 };
static char bufpool[Pool][64];
static int bufsel = 0;
char* buf = bufpool[bufsel];
bufsel = ( bufsel + 1 ) % Pool;
const char* sign = "";
if( ns < 0 )
{
sign = "-";
ns = -ns;
}
if( ns < 1000 )
{
sprintf( buf, "%s%" PRIi64 " ns", sign, ns );
}
else if( ns < 1000ll * 1000 )
{
sprintf( buf, "%s%.2f us", sign, ns / 1000. );
}
else if( ns < 1000ll * 1000 * 1000 )
{
sprintf( buf, "%s%.2f ms", sign, ns / ( 1000. * 1000. ) );
}
else if( ns < 1000ll * 1000 * 1000 * 60 )
{
sprintf( buf, "%s%.2f s", sign, ns / ( 1000. * 1000. * 1000. ) );
}
else
{
const auto m = int64_t( ns / ( 1000ll * 1000 * 1000 * 60 ) );
const auto s = int64_t( ns - m * ( 1000ll * 1000 * 1000 * 60 ) );
sprintf( buf, "%s%" PRIi64 ":%04.1f", sign, m, s / ( 1000. * 1000. * 1000. ) );
}
return buf;
disconnect = true;
}
static const char* RealToString( double val, bool separator )
{
enum { Pool = 8 };
static char bufpool[Pool][64];
static int bufsel = 0;
char* buf = bufpool[bufsel];
bufsel = ( bufsel + 1 ) % Pool;
sprintf( buf, "%f", val );
auto ptr = buf;
if( *ptr == '-' ) ptr++;
const auto vbegin = ptr;
if( separator )
{
while( *ptr != '\0' && *ptr != ',' && *ptr != '.' ) ptr++;
auto end = ptr;
while( *end != '\0' ) end++;
auto sz = end - ptr;
while( ptr - vbegin > 3 )
{
ptr -= 3;
memmove( ptr+1, ptr, sz );
*ptr = ',';
sz += 4;
}
}
while( *ptr != '\0' && *ptr != ',' && *ptr != '.' ) ptr++;
if( *ptr == '\0' ) return buf;
while( *ptr != '\0' ) ptr++;
ptr--;
while( *ptr == '0' && *ptr != ',' && *ptr != '.' ) ptr--;
if( *ptr != '.' && *ptr != ',' ) ptr++;
*ptr = '\0';
return buf;
}
void Usage()
{
printf( "Usage: capture -a address -o output.tracy\n" );
printf( "Usage: capture -o output.tracy [-a address] [-p port]\n" );
exit( 1 );
}
@@ -113,11 +41,12 @@ int main( int argc, char** argv )
}
#endif
const char* address = nullptr;
const char* address = "localhost";
const char* output = nullptr;
int port = 8086;
int c;
while( ( c = getopt( argc, argv, "a:o:" ) ) != -1 )
while( ( c = getopt( argc, argv, "a:o:p:" ) ) != -1 )
{
switch( c )
{
@@ -127,6 +56,9 @@ int main( int argc, char** argv )
case 'o':
output = optarg;
break;
case 'p':
port = atoi( optarg );
break;
default:
Usage();
break;
@@ -135,13 +67,12 @@ int main( int argc, char** argv )
if( !address || !output ) Usage();
printf( "Connecting to %s...", address );
printf( "Connecting to %s:%i...", address, port );
fflush( stdout );
tracy::Worker worker( address );
for(;;)
tracy::Worker worker( address, port );
while( !worker.IsConnected() )
{
const auto handshake = worker.GetHandshakeStatus();
if( handshake == tracy::HandshakeWelcome ) break;
if( handshake == tracy::HandshakeProtocolMismatch )
{
printf( "\nThe client you are trying to connect to uses incompatible protocol version.\nMake sure you are using the same Tracy version on both client and server.\n" );
@@ -152,17 +83,39 @@ int main( int argc, char** argv )
printf( "\nThe client you are trying to connect to is no longer able to sent profiling data,\nbecause another server was already connected to it.\nYou can do the following:\n\n 1. Restart the client application.\n 2. Rebuild the client application with on-demand mode enabled.\n" );
return 2;
}
if( handshake == tracy::HandshakeDropped )
{
printf( "\nThe client you are trying to connect to has disconnected during the initial\nconnection handshake. Please check your network configuration.\n" );
return 3;
}
}
while( !worker.HasData() ) std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
printf( "\nQueue delay: %s\nTimer resolution: %s\n", TimeToString( worker.GetDelay() ), TimeToString( worker.GetResolution() ) );
printf( "\nQueue delay: %s\nTimer resolution: %s\n", tracy::TimeToString( worker.GetDelay() ), tracy::TimeToString( worker.GetResolution() ) );
#ifdef _WIN32
signal( SIGINT, SigInt );
#else
struct sigaction sigint, oldsigint;
memset( &sigint, 0, sizeof( sigint ) );
sigint.sa_handler = SigInt;
sigaction( SIGINT, &sigint, &oldsigint );
#endif
auto& lock = worker.GetMbpsDataLock();
const auto t0 = std::chrono::high_resolution_clock::now();
while( worker.IsConnected() )
{
if( disconnect )
{
worker.Disconnect();
disconnect = false;
}
lock.lock();
const auto mbps = worker.GetMbpsData().back();
const auto compRatio = worker.GetCompRatio();
const auto netTotal = worker.GetDataTransferred();
lock.unlock();
if( mbps < 0.1f )
@@ -173,19 +126,36 @@ int main( int argc, char** argv )
{
printf( "\33[2K\r\033[36;1m%7.2f Mbps", mbps );
}
printf( " \033[0m /\033[36;1m%5.1f%% \033[0m=\033[33;1m%7.2f Mbps \033[0m| Mem: \033[31;1m%.2f MB\033[0m | \033[33mTime: %s\033[0m", compRatio * 100.f, mbps / compRatio, tracy::memUsage.load( std::memory_order_relaxed ) / ( 1024.f * 1024.f ), TimeToString( worker.GetLastTime() - worker.GetTimeBegin() ) );
printf( " \033[0m /\033[36;1m%5.1f%% \033[0m=\033[33;1m%7.2f Mbps \033[0m| \033[33mNet: \033[32m%s \033[0m| \033[33mMem: \033[31;1m%s\033[0m | \033[33mTime: %s\033[0m",
compRatio * 100.f,
mbps / compRatio,
tracy::MemSizeToString( netTotal ),
tracy::MemSizeToString( tracy::memUsage ),
tracy::TimeToString( worker.GetLastTime() ) );
fflush( stdout );
std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );
}
const auto t1 = std::chrono::high_resolution_clock::now();
printf( "\nFrames: %" PRIu64 "\nTime span: %s\nZones: %s\nSaving trace...", worker.GetFrameCount( *worker.GetFramesBase() ), TimeToString( worker.GetLastTime() - worker.GetTimeBegin() ), RealToString( worker.GetZoneCount(), true ) );
const auto& failure = worker.GetFailureType();
if( failure != tracy::Worker::Failure::None )
{
printf( "\n\033[31;1mInstrumentation failure: %s\033[0m", tracy::Worker::GetFailureString( failure ) );
}
printf( "\nFrames: %" PRIu64 "\nTime span: %s\nZones: %s\nElapsed time: %s\nSaving trace...",
worker.GetFrameCount( *worker.GetFramesBase() ), tracy::TimeToString( worker.GetLastTime() ), tracy::RealToString( worker.GetZoneCount() ),
tracy::TimeToString( std::chrono::duration_cast<std::chrono::nanoseconds>( t1 - t0 ).count() ) );
fflush( stdout );
auto f = std::unique_ptr<tracy::FileWrite>( tracy::FileWrite::Open( output ) );
if( f )
{
worker.Write( *f );
printf( " \033[32;1mdone!\033[0m\n" );
f->Finish();
const auto stats = f->GetCompressionStatistics();
printf( "Trace size %s (%.2f%% ratio)\n", tracy::MemSizeToString( stats.second ), 100.f * stats.second / stats.first );
}
else
{

328
client/TracyArmCpuTable.hpp Normal file
View File

@@ -0,0 +1,328 @@
#ifdef _MSC_VER
# pragma warning(disable:4996)
#endif
namespace tracy
{
static const char* DecodeArmImplementer( uint32_t v )
{
static char buf[16];
switch( v )
{
case 0x41: return "ARM";
case 0x42: return "Broadcom";
case 0x43: return "Cavium";
case 0x44: return "DEC";
case 0x46: return "Fujitsu";
case 0x48: return "HiSilicon";
case 0x4d: return "Motorola";
case 0x4e: return "Nvidia";
case 0x50: return "Applied Micro";
case 0x51: return "Qualcomm";
case 0x53: return "Samsung";
case 0x54: return "Texas Instruments";
case 0x56: return "Marvell";
case 0x61: return "Apple";
case 0x66: return "Faraday";
case 0x68: return "HXT";
case 0x69: return "Intel";
default: break;
}
sprintf( buf, "0x%x", v );
return buf;
}
static const char* DecodeArmPart( uint32_t impl, uint32_t part )
{
static char buf[16];
switch( impl )
{
case 0x41:
switch( part )
{
case 0x810: return "810";
case 0x920: return "920";
case 0x922: return "922";
case 0x926: return "926";
case 0x940: return "940";
case 0x946: return "946";
case 0x966: return "966";
case 0xa20: return "1020";
case 0xa22: return "1022";
case 0xa26: return "1026";
case 0xb02: return "11 MPCore";
case 0xb36: return "1136";
case 0xb56: return "1156";
case 0xb76: return "1176";
case 0xc05: return " Cortex-A5";
case 0xc07: return " Cortex-A7";
case 0xc08: return " Cortex-A8";
case 0xc09: return " Cortex-A9";
case 0xc0c: return " Cortex-A12";
case 0xc0d: return " Rockchip RK3288";
case 0xc0f: return " Cortex-A15";
case 0xc0e: return " Cortex-A17";
case 0xc14: return " Cortex-R4";
case 0xc15: return " Cortex-R5";
case 0xc17: return " Cortex-R7";
case 0xc18: return " Cortex-R8";
case 0xc20: return " Cortex-M0";
case 0xc21: return " Cortex-M1";
case 0xc23: return " Cortex-M3";
case 0xc24: return " Cortex-M4";
case 0xc27: return " Cortex-M7";
case 0xc60: return " Cortex-M0+";
case 0xd00: return " AArch64 simulator";
case 0xd01: return " Cortex-A32";
case 0xd03: return " Cortex-A53";
case 0xd04: return " Cortex-A35";
case 0xd05: return " Cortex-A55";
case 0xd06: return " Cortex-A65";
case 0xd07: return " Cortex-A57";
case 0xd08: return " Cortex-A72";
case 0xd09: return " Cortex-A73";
case 0xd0a: return " Cortex-A75";
case 0xd0b: return " Cortex-A76";
case 0xd0c: return " Neoverse N1";
case 0xd0d: return " Cortex-A77";
case 0xd0e: return " Cortex-A76AE";
case 0xd0f: return " AEMv8";
case 0xd13: return " Cortex-R52";
case 0xd20: return " Cortex-M23";
case 0xd21: return " Cortex-M33";
case 0xd4a: return " Neoverse E1";
default: break;
}
case 0x42:
switch( part )
{
case 0xf: return " Brahma B15";
case 0x100: return " Brahma B53";
case 0x516: return " ThunderX2";
default: break;
}
case 0x43:
switch( part )
{
case 0xa0: return " ThunderX";
case 0xa1: return " ThunderX 88XX";
case 0xa2: return " ThunderX 81XX";
case 0xa3: return " ThunderX 83XX";
case 0xaf: return " ThunderX2 99xx";
default: break;
}
case 0x44:
switch( part )
{
case 0xa10: return " SA110";
case 0xa11: return " SA1100";
default: break;
}
case 0x46:
switch( part )
{
case 0x1: return " A64FX";
default: break;
}
case 0x48:
switch( part )
{
case 0xd01: return " TSV100";
case 0xd40: return " Kirin 980";
default: break;
}
case 0x4e:
switch( part )
{
case 0x0: return " Denver";
case 0x3: return " Denver 2";
case 0x4: return " Carmel";
default: break;
}
case 0x50:
switch( part )
{
case 0x0: return " X-Gene";
default: break;
}
case 0x51:
switch( part )
{
case 0xf: return " Scorpion";
case 0x2d: return " Scorpion";
case 0x4d: return " Krait";
case 0x6f: return " Krait";
case 0x200: return " Kryo";
case 0x201: return " Kryo Silver (Snapdragon 821)";
case 0x205: return " Kryo Gold";
case 0x211: return " Kryo Silver (Snapdragon 820)";
case 0x800: return " Kryo 260 / 280 Gold";
case 0x801: return " Kryo 260 / 280 Silver";
case 0x802: return " Kryo 385 Gold";
case 0x803: return " Kryo 385 Silver";
case 0x804: return " Kryo 485 Gold";
case 0xc00: return " Falkor";
case 0xc01: return " Saphira";
default: break;
}
case 0x53:
switch( part )
{
case 0x1: return " Exynos M1/M2";
case 0x2: return " Exynos M3";
default: break;
}
case 0x56:
switch( part )
{
case 0x131: return " Feroceon 88FR131";
case 0x581: return " PJ4 / PJ4B";
case 0x584: return " PJ4B-MP / PJ4C";
default: break;
}
case 0x61:
switch( part )
{
case 0x1: return " Cyclone";
case 0x2: return " Typhoon";
case 0x3: return " Typhoon/Capri";
case 0x4: return " Twister";
case 0x5: return " Twister/Elba/Malta";
case 0x6: return " Hurricane";
case 0x7: return " Hurricane/Myst";
default: break;
}
case 0x66:
switch( part )
{
case 0x526: return " FA526";
case 0x626: return " FA626";
default: break;
}
case 0x68:
switch( part )
{
case 0x0: return " Phecda";
default: break;
}
default: break;
}
sprintf( buf, " 0x%x", part );
return buf;
}
static const char* DecodeIosDevice( const char* id )
{
static const char* DeviceTable[] = {
"i386", "32-bit simulator",
"x86_64", "64-bit simulator",
"iPhone1,1", "iPhone",
"iPhone1,2", "iPhone 3G",
"iPhone2,1", "iPhone 3GS",
"iPhone3,1", "iPhone 4 (GSM)",
"iPhone3,2", "iPhone 4 (GSM)",
"iPhone3,3", "iPhone 4 (CDMA)",
"iPhone4,1", "iPhone 4S",
"iPhone5,1", "iPhone 5 (A1428)",
"iPhone5,2", "iPhone 5 (A1429)",
"iPhone5,3", "iPhone 5c (A1456/A1532)",
"iPhone5,4", "iPhone 5c (A1507/A1516/1526/A1529)",
"iPhone6,1", "iPhone 5s (A1433/A1533)",
"iPhone6,2", "iPhone 5s (A1457/A1518/A1528/A1530)",
"iPhone7,1", "iPhone 6 Plus",
"iPhone7,2", "iPhone 6",
"iPhone8,1", "iPhone 6S",
"iPhone8,2", "iPhone 6S Plus",
"iPhone8,4", "iPhone SE",
"iPhone9,1", "iPhone 7 (CDMA)",
"iPhone9,2", "iPhone 7 Plus (CDMA)",
"iPhone9,3", "iPhone 7 (GSM)",
"iPhone9,4", "iPhone 7 Plus (GSM)",
"iPhone10,1", "iPhone 8 (CDMA)",
"iPhone10,2", "iPhone 8 Plus (CDMA)",
"iPhone10,3", "iPhone X (CDMA)",
"iPhone10,4", "iPhone 8 (GSM)",
"iPhone10,5", "iPhone 8 Plus (GSM)",
"iPhone10,6", "iPhone X (GSM)",
"iPhone11,2", "iPhone XS",
"iPhone11,4", "iPhone XS Max",
"iPhone11,6", "iPhone XS Max China",
"iPhone11,8", "iPhone XR",
"iPhone12,1", "iPhone 11",
"iPhone12,3", "iPhone 11 Pro",
"iPhone12,5", "iPhone 11 Pro Max",
"iPad1,1", "iPad (A1219/A1337)",
"iPad2,1", "iPad 2 (A1395)",
"iPad2,2", "iPad 2 (A1396)",
"iPad2,3", "iPad 2 (A1397)",
"iPad2,4", "iPad 2 (A1395)",
"iPad2,5", "iPad Mini (A1432)",
"iPad2,6", "iPad Mini (A1454)",
"iPad2,7", "iPad Mini (A1455)",
"iPad3,1", "iPad 3 (A1416)",
"iPad3,2", "iPad 3 (A1403)",
"iPad3,3", "iPad 3 (A1430)",
"iPad3,4", "iPad 4 (A1458)",
"iPad3,5", "iPad 4 (A1459)",
"iPad3,6", "iPad 4 (A1460)",
"iPad4,1", "iPad Air (A1474)",
"iPad4,2", "iPad Air (A1475)",
"iPad4,3", "iPad Air (A1476)",
"iPad4,4", "iPad Mini 2 (A1489)",
"iPad4,5", "iPad Mini 2 (A1490)",
"iPad4,6", "iPad Mini 2 (A1491)",
"iPad4,7", "iPad Mini 3 (A1599)",
"iPad4,8", "iPad Mini 3 (A1600)",
"iPad4,9", "iPad Mini 3 (A1601)",
"iPad5,1", "iPad Mini 4 (A1538)",
"iPad5,2", "iPad Mini 4 (A1550)",
"iPad5,3", "iPad Air 2 (A1566)",
"iPad5,4", "iPad Air 2 (A1567)",
"iPad6,3", "iPad Pro 9.7\" (A1673)",
"iPad6,4", "iPad Pro 9.7\" (A1674)",
"iPad6,5", "iPad Pro 9.7\" (A1675)",
"iPad6,7", "iPad Pro 12.9\" (A1584)",
"iPad6,8", "iPad Pro 12.9\" (A1652)",
"iPad6,11", "iPad 5th gen (A1822)",
"iPad6,12", "iPad 5th gen (A1823)",
"iPad7,1", "iPad Pro 12.9\" 2nd gen (A1670)",
"iPad7,2", "iPad Pro 12.9\" 2nd gen (A1671/A1821)",
"iPad7,3", "iPad Pro 10.5\" (A1701)",
"iPad7,4", "iPad Pro 10.5\" (A1709)",
"iPad7,5", "iPad 6th gen (A1893)",
"iPad7,6", "iPad 6th gen (A1954)",
"iPad7,11", "iPad 7th gen 10.2\" (Wifi)",
"iPad7,12", "iPad 7th gen 10.2\" (Wifi+Cellular)",
"iPad8,1", "iPad Pro 11\" (A1980)",
"iPad8,2", "iPad Pro 11\" (A1980)",
"iPad8,3", "iPad Pro 11\" (A1934/A1979/A2013)",
"iPad8,4", "iPad Pro 11\" (A1934/A1979/A2013)",
"iPad8,5", "iPad Pro 12.9\" 3rd gen (A1876)",
"iPad8,6", "iPad Pro 12.9\" 3rd gen (A1876)",
"iPad8,7", "iPad Pro 12.9\" 3rd gen (A1895/A1983/A2014)",
"iPad8,8", "iPad Pro 12.9\" 3rd gen (A1895/A1983/A2014)",
"iPad11,1", "iPad Mini 5th gen (A2133)",
"iPad11,2", "iPad Mini 5th gen (A2124/A2125/A2126)",
"iPad11,3", "iPad Air 3rd gen (A2152)",
"iPad11,4", "iPad Air 3rd gen (A2123/A2153/A2154)",
"iPod1,1", "iPod Touch",
"iPod2,1", "iPod Touch 2nd gen",
"iPod3,1", "iPod Touch 3rd gen",
"iPod4,1", "iPod Touch 4th gen",
"iPod5,1", "iPod Touch 5th gen",
"iPod7,1", "iPod Touch 6th gen",
"iPod9,1", "iPod Touch 7th gen",
nullptr
};
auto ptr = DeviceTable;
while( *ptr )
{
if( strcmp( ptr[0], id ) == 0 ) return ptr[1];
ptr += 2;
}
return id;
}
}

View File

@@ -1,10 +1,18 @@
#include <new>
#include <stdio.h>
#include <string.h>
#include "TracyCallstack.hpp"
#include "TracyFastVector.hpp"
#include "../common/TracyAlloc.hpp"
#ifdef TRACY_HAS_CALLSTACK
#if TRACY_HAS_CALLSTACK == 1
# ifndef NOMINMAX
# define NOMINMAX
# endif
# include <windows.h>
# include <psapi.h>
# ifdef _MSC_VER
# pragma warning( push )
# pragma warning( disable : 4091 )
@@ -13,7 +21,11 @@
# ifdef _MSC_VER
# pragma warning( pop )
# endif
#elif TRACY_HAS_CALLSTACK >= 2
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 3 || TRACY_HAS_CALLSTACK == 4 || TRACY_HAS_CALLSTACK == 6
# include "../libbacktrace/backtrace.hpp"
# include <dlfcn.h>
# include <cxxabi.h>
#elif TRACY_HAS_CALLSTACK == 5
# include <dlfcn.h>
# include <cxxabi.h>
#endif
@@ -21,233 +33,705 @@
namespace tracy
{
static inline char* CopyString( const char* src, size_t sz )
{
assert( strlen( src ) == sz );
auto dst = (char*)tracy_malloc( sz + 1 );
memcpy( dst, src, sz );
dst[sz] = '\0';
return dst;
}
static inline char* CopyString( const char* src )
{
const auto sz = strlen( src );
auto dst = (char*)tracy_malloc( sz + 1 );
memcpy( dst, src, sz );
dst[sz] = '\0';
return dst;
}
#if TRACY_HAS_CALLSTACK == 1
extern "C" { t_RtlWalkFrameChain RtlWalkFrameChain = 0; }
enum { MaxCbTrace = 16 };
enum { MaxNameSize = 8*1024 };
int cb_num;
CallstackEntry cb_data[MaxCbTrace];
extern "C"
{
typedef unsigned long (__stdcall *t_RtlWalkFrameChain)( void**, unsigned long, unsigned long );
t_RtlWalkFrameChain RtlWalkFrameChain = 0;
}
#if defined __MINGW32__ && API_VERSION_NUMBER < 12
extern "C" {
// Actual required API_VERSION_NUMBER is unknown because it is undocumented. These functions are not present in at least v11.
DWORD IMAGEAPI SymAddrIncludeInlineTrace(HANDLE hProcess, DWORD64 Address);
BOOL IMAGEAPI SymQueryInlineTrace(HANDLE hProcess, DWORD64 StartAddress, DWORD StartContext, DWORD64 StartRetAddress,
DWORD64 CurAddress, LPDWORD CurContext, LPDWORD CurFrameIndex);
BOOL IMAGEAPI SymFromInlineContext(HANDLE hProcess, DWORD64 Address, ULONG InlineContext, PDWORD64 Displacement,
PSYMBOL_INFO Symbol);
BOOL IMAGEAPI SymGetLineFromInlineContext(HANDLE hProcess, DWORD64 qwAddr, ULONG InlineContext,
DWORD64 qwModuleBaseAddress, PDWORD pdwDisplacement, PIMAGEHLP_LINE64 Line64);
};
#endif
#ifndef __CYGWIN__
struct ModuleCache
{
uint64_t start;
uint64_t end;
char* name;
};
static FastVector<ModuleCache>* s_modCache;
#endif
void InitCallstack()
{
#ifdef UNICODE
RtlWalkFrameChain = (t_RtlWalkFrameChain)GetProcAddress( GetModuleHandle( L"ntdll.dll" ), "RtlWalkFrameChain" );
#else
RtlWalkFrameChain = (t_RtlWalkFrameChain)GetProcAddress( GetModuleHandle( "ntdll.dll" ), "RtlWalkFrameChain" );
#endif
RtlWalkFrameChain = (t_RtlWalkFrameChain)GetProcAddress( GetModuleHandleA( "ntdll.dll" ), "RtlWalkFrameChain" );
SymInitialize( GetCurrentProcess(), nullptr, true );
SymSetOptions( SYMOPT_LOAD_LINES );
#ifndef __CYGWIN__
HMODULE mod[1024];
DWORD needed;
HANDLE proc = GetCurrentProcess();
s_modCache = (FastVector<ModuleCache>*)tracy_malloc( sizeof( FastVector<ModuleCache> ) );
new(s_modCache) FastVector<ModuleCache>( 512 );
if( EnumProcessModules( proc, mod, sizeof( mod ), &needed ) != 0 )
{
const auto sz = needed / sizeof( HMODULE );
for( size_t i=0; i<sz; i++ )
{
MODULEINFO info;
if( GetModuleInformation( proc, mod[i], &info, sizeof( info ) ) != 0 )
{
const auto base = uint64_t( info.lpBaseOfDll );
char name[1024];
const auto res = GetModuleFileNameA( mod[i], name, 1021 );
if( res > 0 )
{
auto ptr = name + res;
while( ptr > name && *ptr != '\\' && *ptr != '/' ) ptr--;
if( ptr > name ) ptr++;
const auto namelen = name + res - ptr;
auto cache = s_modCache->push_next();
cache->start = base;
cache->end = base + info.SizeOfImage;
cache->name = (char*)tracy_malloc( namelen+3 );
cache->name[0] = '[';
memcpy( cache->name+1, ptr, namelen );
cache->name[namelen+1] = ']';
cache->name[namelen+2] = '\0';
}
}
}
}
#endif
}
CallstackEntry DecodeCallstackPtr( uint64_t ptr )
TRACY_API uintptr_t* CallTrace( int depth )
{
CallstackEntry ret;
auto trace = (uintptr_t*)tracy_malloc( ( 1 + depth ) * sizeof( uintptr_t ) );
const auto num = RtlWalkFrameChain( (void**)( trace + 1 ), depth, 0 );
*trace = num;
return trace;
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[MaxNameSize];
const auto proc = GetCurrentProcess();
char buf[sizeof( SYMBOL_INFO ) + 1024];
char buf[sizeof( SYMBOL_INFO ) + MaxNameSize];
auto si = (SYMBOL_INFO*)buf;
si->SizeOfStruct = sizeof( SYMBOL_INFO );
si->MaxNameLen = 1024;
si->MaxNameLen = MaxNameSize;
if( SymFromAddr( proc, ptr, nullptr, si ) == 0 )
{
memcpy( si->Name, "[unknown]", 10 );
si->NameLen = 9;
*ret = '\0';
}
else
{
memcpy( ret, si->Name, si->NameLen );
ret[si->NameLen] = '\0';
}
return ret;
}
static const char* GetModuleName( uint64_t addr )
{
if( ( addr & 0x8000000000000000 ) != 0 ) return "[kernel]";
#ifndef __CYGWIN__
for( auto& v : *s_modCache )
{
if( addr >= v.start && addr < v.end )
{
return v.name;
}
}
auto name = (char*)tracy_malloc( si->NameLen + 1 );
memcpy( name, si->Name, si->NameLen );
name[si->NameLen] = '\0';
HMODULE mod[1024];
DWORD needed;
HANDLE proc = GetCurrentProcess();
ret.name = name;
if( EnumProcessModules( proc, mod, sizeof( mod ), &needed ) != 0 )
{
const auto sz = needed / sizeof( HMODULE );
for( size_t i=0; i<sz; i++ )
{
MODULEINFO info;
if( GetModuleInformation( proc, mod[i], &info, sizeof( info ) ) != 0 )
{
const auto base = uint64_t( info.lpBaseOfDll );
if( addr >= base && addr < base + info.SizeOfImage )
{
char name[1024];
const auto res = GetModuleFileNameA( mod[i], name, 1021 );
if( res > 0 )
{
auto ptr = name + res;
while( ptr > name && *ptr != '\\' && *ptr != '/' ) ptr--;
if( ptr > name ) ptr++;
const auto namelen = name + res - ptr;
auto cache = s_modCache->push_next();
cache->start = base;
cache->end = base + info.SizeOfImage;
cache->name = (char*)tracy_malloc( namelen+3 );
cache->name[0] = '[';
memcpy( cache->name+1, ptr, namelen );
cache->name[namelen+1] = ']';
cache->name[namelen+2] = '\0';
return cache->name;
}
}
}
}
}
#endif
const char* filename;
return "[unknown]";
}
SymbolData DecodeSymbolAddress( uint64_t ptr )
{
SymbolData sym;
IMAGEHLP_LINE64 line;
DWORD displacement = 0;
line.SizeOfStruct = sizeof( IMAGEHLP_LINE64 );
if( SymGetLineFromAddr64( proc, ptr, &displacement, &line ) == 0 )
line.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
if( SymGetLineFromAddr64( GetCurrentProcess(), ptr, &displacement, &line ) == 0 )
{
filename = "[unknown]";
ret.line = 0;
sym.file = "[unknown]";
sym.line = 0;
}
else
{
filename = line.FileName;
ret.line = line.LineNumber;
sym.file = line.FileName;
sym.line = line.LineNumber;
}
const auto fsz = strlen( filename );
auto file = (char*)tracy_malloc( fsz + 1 );
memcpy( file, filename, fsz );
file[fsz] = '\0';
ret.file = file;
return ret;
sym.needFree = false;
return sym;
}
#elif TRACY_HAS_CALLSTACK == 2
CallstackEntry DecodeCallstackPtr( uint64_t ptr )
SymbolData DecodeCodeAddress( uint64_t ptr )
{
CallstackEntry ret;
ret.line = 0;
SymbolData sym;
const auto proc = GetCurrentProcess();
bool done = false;
char* demangled = nullptr;
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)ptr;
char** sym = nullptr;
ptrdiff_t symoff = 0;
IMAGEHLP_LINE64 line;
DWORD displacement = 0;
line.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
#ifndef __CYGWIN__
DWORD inlineNum = SymAddrIncludeInlineTrace( proc, ptr );
DWORD ctx = 0;
DWORD idx;
BOOL doInline = FALSE;
if( inlineNum != 0 ) doInline = SymQueryInlineTrace( proc, ptr, 0, ptr, ptr, &ctx, &idx );
if( doInline )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)ptr - (char*)dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
if( SymGetLineFromInlineContext( proc, ptr, ctx, 0, &displacement, &line ) != 0 )
{
size_t len = 0;
int status;
demangled = abi::__cxa_demangle( symname, nullptr, &len, &status );
if( status == 0 )
{
symname = demangled;
}
sym.file = line.FileName;
sym.line = line.LineNumber;
done = true;
}
}
if( !symname )
#endif
if( !done )
{
symname = "[unknown]";
}
if( !symloc )
{
symloc = "[unknown]";
}
if( symoff == 0 )
{
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + 1 );
memcpy( name, symname, namelen );
name[namelen] = '\0';
ret.name = name;
}
else
{
char buf[32];
sprintf( buf, " + %td", symoff );
const auto offlen = strlen( buf );
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + offlen + 1 );
memcpy( name, symname, namelen );
memcpy( name + namelen, buf, offlen );
name[namelen + offlen] = '\0';
ret.name = name;
}
char buf[32];
sprintf( buf, " [%p]", (void*)ptr );
const auto addrlen = strlen( buf );
const auto loclen = strlen( symloc );
auto loc = (char*)tracy_malloc( loclen + addrlen + 1 );
memcpy( loc, symloc, loclen );
memcpy( loc + loclen, buf, addrlen );
loc[loclen + addrlen] = '\0';
ret.file = loc;
if( sym ) free( sym );
if( demangled ) free( demangled );
return ret;
}
#elif TRACY_HAS_CALLSTACK == 3
CallstackEntry DecodeCallstackPtr( uint64_t ptr )
{
CallstackEntry ret;
ret.line = 0;
char* demangled = nullptr;
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)ptr;
char** sym = nullptr;
ptrdiff_t symoff = 0;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)ptr - (char*)dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
if( SymGetLineFromAddr64( proc, ptr, &displacement, &line ) == 0 )
{
size_t len = 0;
int status;
demangled = abi::__cxa_demangle( symname, nullptr, &len, &status );
if( status == 0 )
{
symname = demangled;
}
}
}
if( !symname )
{
sym = backtrace_symbols( &vptr, 1 );
if( !sym )
{
symname = "[unknown]";
sym.file = "[unknown]";
sym.line = 0;
}
else
{
symname = *sym;
sym.file = line.FileName;
sym.line = line.LineNumber;
}
}
if( !symloc )
sym.needFree = false;
return sym;
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
int write;
const auto proc = GetCurrentProcess();
#ifndef __CYGWIN__
DWORD inlineNum = SymAddrIncludeInlineTrace( proc, ptr );
if( inlineNum > MaxCbTrace - 1 ) inlineNum = MaxCbTrace - 1;
DWORD ctx = 0;
DWORD idx;
BOOL doInline = FALSE;
if( inlineNum != 0 ) doInline = SymQueryInlineTrace( proc, ptr, 0, ptr, ptr, &ctx, &idx );
if( doInline )
{
symloc = "[unknown]";
write = inlineNum;
cb_num = 1 + inlineNum;
}
else
#endif
{
write = 0;
cb_num = 1;
}
char buf[sizeof( SYMBOL_INFO ) + MaxNameSize];
auto si = (SYMBOL_INFO*)buf;
si->SizeOfStruct = sizeof( SYMBOL_INFO );
si->MaxNameLen = MaxNameSize;
const auto moduleName = GetModuleName( ptr );
const auto symValid = SymFromAddr( proc, ptr, nullptr, si ) != 0;
IMAGEHLP_LINE64 line;
DWORD displacement = 0;
line.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
{
const char* filename;
if( SymGetLineFromAddr64( proc, ptr, &displacement, &line ) == 0 )
{
filename = "[unknown]";
cb_data[write].line = 0;
}
else
{
filename = line.FileName;
cb_data[write].line = line.LineNumber;
}
cb_data[write].name = symValid ? CopyString( si->Name, si->NameLen ) : CopyString( moduleName );
cb_data[write].file = CopyString( filename );
if( symValid )
{
cb_data[write].symLen = si->Size;
cb_data[write].symAddr = si->Address;
}
else
{
cb_data[write].symLen = 0;
cb_data[write].symAddr = 0;
}
}
#ifndef __CYGWIN__
if( doInline )
{
for( DWORD i=0; i<inlineNum; i++ )
{
auto& cb = cb_data[i];
const auto symInlineValid = SymFromInlineContext( proc, ptr, ctx, nullptr, si ) != 0;
const char* filename;
if( SymGetLineFromInlineContext( proc, ptr, ctx, 0, &displacement, &line ) == 0 )
{
filename = "[unknown]";
cb.line = 0;
}
else
{
filename = line.FileName;
cb.line = line.LineNumber;
}
cb.name = symInlineValid ? CopyString( si->Name, si->NameLen ) : CopyString( moduleName );
cb.file = CopyString( filename );
if( symInlineValid )
{
cb.symLen = si->Size;
cb.symAddr = si->Address;
}
else
{
cb.symLen = 0;
cb.symAddr = 0;
}
ctx++;
}
}
#endif
return { cb_data, uint8_t( cb_num ), moduleName };
}
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 3 || TRACY_HAS_CALLSTACK == 4 || TRACY_HAS_CALLSTACK == 6
enum { MaxCbTrace = 16 };
struct backtrace_state* cb_bts;
int cb_num;
CallstackEntry cb_data[MaxCbTrace];
int cb_fixup;
void InitCallstack()
{
cb_bts = backtrace_create_state( nullptr, 0, nullptr, nullptr );
}
static int FastCallstackDataCb( void* data, uintptr_t pc, uintptr_t lowaddr, const char* fn, int lineno, const char* function )
{
if( function )
{
strcpy( (char*)data, function );
}
else
{
const char* symname = nullptr;
auto vptr = (void*)pc;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symname = dlinfo.dli_sname;
}
if( symname )
{
strcpy( (char*)data, symname );
}
else
{
*(char*)data = '\0';
}
}
return 1;
}
static void FastCallstackErrorCb( void* data, const char* /*msg*/, int /*errnum*/ )
{
*(char*)data = '\0';
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[1024];
backtrace_pcinfo( cb_bts, ptr, FastCallstackDataCb, FastCallstackErrorCb, ret );
return ret;
}
static int SymbolAddressDataCb( void* data, uintptr_t pc, uintptr_t lowaddr, const char* fn, int lineno, const char* function )
{
auto& sym = *(SymbolData*)data;
if( !fn )
{
const char* symloc = nullptr;
Dl_info dlinfo;
if( dladdr( (void*)pc, &dlinfo ) ) symloc = dlinfo.dli_fname;
if( !symloc ) symloc = "[unknown]";
sym.file = symloc;
sym.line = 0;
sym.needFree = false;
}
else
{
sym.file = CopyString( fn );
sym.line = lineno;
sym.needFree = true;
}
return 1;
}
static void SymbolAddressErrorCb( void* data, const char* /*msg*/, int /*errnum*/ )
{
auto& sym = *(SymbolData*)data;
sym.file = "[unknown]";
sym.line = 0;
sym.needFree = false;
}
SymbolData DecodeSymbolAddress( uint64_t ptr )
{
SymbolData sym;
backtrace_pcinfo( cb_bts, ptr, SymbolAddressDataCb, SymbolAddressErrorCb, &sym );
return sym;
}
SymbolData DecodeCodeAddress( uint64_t ptr )
{
return DecodeSymbolAddress( ptr );
}
static int CallstackDataCb( void* /*data*/, uintptr_t pc, uintptr_t lowaddr, const char* fn, int lineno, const char* function )
{
enum { DemangleBufLen = 64*1024 };
char demangled[DemangleBufLen];
cb_data[cb_num].symLen = 0;
cb_data[cb_num].symAddr = (uint64_t)lowaddr;
if( !fn && !function )
{
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)pc;
ptrdiff_t symoff = 0;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)pc - (char*)dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
{
size_t len = DemangleBufLen;
int status;
abi::__cxa_demangle( symname, demangled, &len, &status );
if( status == 0 )
{
symname = demangled;
}
}
}
if( !symname ) symname = "[unknown]";
if( !symloc ) symloc = "[unknown]";
if( symoff == 0 )
{
cb_data[cb_num].name = CopyString( symname );
}
else
{
char buf[32];
const auto offlen = sprintf( buf, " + %td", symoff );
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + offlen + 1 );
memcpy( name, symname, namelen );
memcpy( name + namelen, buf, offlen );
name[namelen + offlen] = '\0';
cb_data[cb_num].name = name;
}
char buf[32];
const auto addrlen = sprintf( buf, " [%p]", (void*)pc );
const auto loclen = strlen( symloc );
auto loc = (char*)tracy_malloc( loclen + addrlen + 1 );
memcpy( loc, symloc, loclen );
memcpy( loc + loclen, buf, addrlen );
loc[loclen + addrlen] = '\0';
cb_data[cb_num].file = loc;
cb_data[cb_num].line = 0;
}
else
{
if( !fn ) fn = "[unknown]";
if( !function )
{
function = "[unknown]";
}
else
{
if( function[0] == '_' )
{
size_t len = DemangleBufLen;
int status;
abi::__cxa_demangle( function, demangled, &len, &status );
if( status == 0 )
{
function = demangled;
}
}
}
cb_data[cb_num].name = CopyString( function );
cb_data[cb_num].file = CopyString( fn );
cb_data[cb_num].line = lineno;
}
if( ++cb_num >= MaxCbTrace )
{
return 1;
}
else
{
return 0;
}
}
static void CallstackErrorCb( void* /*data*/, const char* /*msg*/, int /*errnum*/ )
{
for( int i=0; i<cb_num; i++ )
{
tracy_free( (void*)cb_data[i].name );
tracy_free( (void*)cb_data[i].file );
}
cb_data[0].name = CopyString( "[error]" );
cb_data[0].file = CopyString( "[error]" );
cb_data[0].line = 0;
cb_num = 1;
}
void SymInfoCallback( void* /*data*/, uintptr_t pc, const char* symname, uintptr_t symval, uintptr_t symsize )
{
cb_data[cb_num-1].symLen = (uint32_t)symsize;
cb_data[cb_num-1].symAddr = (uint64_t)symval;
}
void SymInfoError( void* /*data*/, const char* /*msg*/, int /*errnum*/ )
{
cb_data[cb_num-1].symLen = 0;
cb_data[cb_num-1].symAddr = 0;
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
cb_num = 0;
backtrace_pcinfo( cb_bts, ptr, CallstackDataCb, CallstackErrorCb, nullptr );
assert( cb_num > 0 );
backtrace_syminfo( cb_bts, ptr, SymInfoCallback, SymInfoError, nullptr );
const char* symloc = nullptr;
Dl_info dlinfo;
if( dladdr( (void*)ptr, &dlinfo ) ) symloc = dlinfo.dli_fname;
return { cb_data, uint8_t( cb_num ), symloc ? symloc : "[unknown]" };
}
#elif TRACY_HAS_CALLSTACK == 5
void InitCallstack()
{
}
const char* DecodeCallstackPtrFast( uint64_t ptr )
{
static char ret[1024];
auto vptr = (void*)ptr;
const char* symname = nullptr;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) && dlinfo.dli_sname )
{
symname = dlinfo.dli_sname;
}
if( symname )
{
strcpy( ret, symname );
}
else
{
*ret = '\0';
}
return ret;
}
SymbolData DecodeSymbolAddress( uint64_t ptr )
{
const char* symloc = nullptr;
Dl_info dlinfo;
if( dladdr( (void*)ptr, &dlinfo ) ) symloc = dlinfo.dli_fname;
if( !symloc ) symloc = "[unknown]";
return SymbolData { symloc, 0, false };
}
SymbolData DecodeCodeAddress( uint64_t ptr )
{
return DecodeSymbolAddress( ptr );
}
CallstackEntryData DecodeCallstackPtr( uint64_t ptr )
{
static CallstackEntry cb;
cb.line = 0;
char* demangled = nullptr;
const char* symname = nullptr;
const char* symloc = nullptr;
auto vptr = (void*)ptr;
ptrdiff_t symoff = 0;
void* symaddr = nullptr;
Dl_info dlinfo;
if( dladdr( vptr, &dlinfo ) )
{
symloc = dlinfo.dli_fname;
symname = dlinfo.dli_sname;
symoff = (char*)ptr - (char*)dlinfo.dli_saddr;
symaddr = dlinfo.dli_saddr;
if( symname && symname[0] == '_' )
{
size_t len = 0;
int status;
demangled = abi::__cxa_demangle( symname, nullptr, &len, &status );
if( status == 0 )
{
symname = demangled;
}
}
}
if( !symname ) symname = "[unknown]";
if( !symloc ) symloc = "[unknown]";
if( symoff == 0 )
{
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + 1 );
memcpy( name, symname, namelen );
name[namelen] = '\0';
ret.name = name;
cb.name = CopyString( symname );
}
else
{
char buf[32];
sprintf( buf, " + %td", symoff );
const auto offlen = strlen( buf );
const auto offlen = sprintf( buf, " + %td", symoff );
const auto namelen = strlen( symname );
auto name = (char*)tracy_malloc( namelen + offlen + 1 );
memcpy( name, symname, namelen );
memcpy( name + namelen, buf, offlen );
name[namelen + offlen] = '\0';
ret.name = name;
cb.name = name;
}
char buf[32];
sprintf( buf, " [%p]", (void*)ptr );
const auto addrlen = strlen( buf );
const auto addrlen = sprintf( buf, " [%p]", (void*)ptr );
const auto loclen = strlen( symloc );
auto loc = (char*)tracy_malloc( loclen + addrlen + 1 );
memcpy( loc, symloc, loclen );
memcpy( loc + loclen, buf, addrlen );
loc[loclen + addrlen] = '\0';
ret.file = loc;
cb.file = loc;
cb.symLen = 0;
cb.symAddr = (uint64_t)symaddr;
if( sym ) free( sym );
if( demangled ) free( demangled );
return ret;
return { &cb, 1, symloc };
}
#endif

28
client/TracyCallstack.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef __TRACYCALLSTACK_H__
#define __TRACYCALLSTACK_H__
#if !defined _WIN32 && !defined __CYGWIN__
# include <sys/param.h>
#endif
#if defined _WIN32 || defined __CYGWIN__
# define TRACY_HAS_CALLSTACK 1
#elif defined __ANDROID__
# if !defined __arm__ || __ANDROID_API__ >= 21
# define TRACY_HAS_CALLSTACK 2
# else
# define TRACY_HAS_CALLSTACK 5
# endif
#elif defined __linux
# if defined _GNU_SOURCE && defined __GLIBC__
# define TRACY_HAS_CALLSTACK 3
# else
# define TRACY_HAS_CALLSTACK 2
# endif
#elif defined __APPLE__
# define TRACY_HAS_CALLSTACK 4
#elif defined BSD
# define TRACY_HAS_CALLSTACK 6
#endif
#endif

View File

@@ -1,24 +1,13 @@
#ifndef __TRACYCALLSTACK_HPP__
#define __TRACYCALLSTACK_HPP__
#if defined _WIN32 || defined __CYGWIN__
# define TRACY_HAS_CALLSTACK 1
extern "C"
{
typedef unsigned long (__stdcall *t_RtlWalkFrameChain)( void**, unsigned long, unsigned long );
extern t_RtlWalkFrameChain RtlWalkFrameChain;
}
#elif defined __ANDROID__
# define TRACY_HAS_CALLSTACK 2
#include "../common/TracyApi.h"
#include "TracyCallstack.h"
#if TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 5
# include <unwind.h>
#elif defined __linux
# if defined _GNU_SOURCE && defined __GLIBC__
# define TRACY_HAS_CALLSTACK 3
# include <execinfo.h>
# else
# define TRACY_HAS_CALLSTACK 2
# include <unwind.h>
# endif
#elif TRACY_HAS_CALLSTACK >= 3
# include <execinfo.h>
#endif
@@ -26,7 +15,6 @@ extern "C"
#include <assert.h>
#include <stdint.h>
#include <string.h>
#include "../common/TracyAlloc.hpp"
#include "../common/TracyForceInline.hpp"
@@ -34,33 +22,46 @@ extern "C"
namespace tracy
{
struct SymbolData
{
const char* file;
uint32_t line;
bool needFree;
};
struct CallstackEntry
{
const char* name;
const char* file;
uint32_t line;
uint32_t symLen;
uint64_t symAddr;
};
CallstackEntry DecodeCallstackPtr( uint64_t ptr );
struct CallstackEntryData
{
const CallstackEntry* data;
uint8_t size;
const char* imageName;
};
SymbolData DecodeSymbolAddress( uint64_t ptr );
SymbolData DecodeCodeAddress( uint64_t ptr );
const char* DecodeCallstackPtrFast( uint64_t ptr );
CallstackEntryData DecodeCallstackPtr( uint64_t ptr );
void InitCallstack();
#if TRACY_HAS_CALLSTACK == 1
void InitCallstack();
TRACY_API uintptr_t* CallTrace( int depth );
static tracy_force_inline void* Callstack( int depth )
{
assert( depth >= 1 && depth < 63 );
auto trace = (uintptr_t*)tracy_malloc( ( 1 + depth ) * sizeof( uintptr_t ) );
const auto num = RtlWalkFrameChain( (void**)( trace + 1 ), depth, 0 );
*trace = num;
return trace;
return CallTrace( depth );
}
#elif TRACY_HAS_CALLSTACK == 2
static tracy_force_inline void InitCallstack() {}
#elif TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 5
struct BacktraceState
{
@@ -93,9 +94,7 @@ static tracy_force_inline void* Callstack( int depth )
return trace;
}
#elif TRACY_HAS_CALLSTACK == 3
static tracy_force_inline void InitCallstack() {}
#elif TRACY_HAS_CALLSTACK == 3 || TRACY_HAS_CALLSTACK == 4 || TRACY_HAS_CALLSTACK == 6
static tracy_force_inline void* Callstack( int depth )
{

629
client/TracyDxt1.cpp Normal file
View File

@@ -0,0 +1,629 @@
#include "TracyDxt1.hpp"
#include "../common/TracyForceInline.hpp"
#include <assert.h>
#include <stdint.h>
#include <string.h>
#ifdef __ARM_NEON
# include <arm_neon.h>
#endif
#if defined __AVX__ && !defined __SSE4_1__
# define __SSE4_1__
#endif
#if defined __SSE4_1__ || defined __AVX2__
# ifdef _MSC_VER
# include <intrin.h>
# else
# include <x86intrin.h>
# ifndef _mm256_cvtsi256_si32
# define _mm256_cvtsi256_si32( v ) ( _mm_cvtsi128_si32( _mm256_castsi256_si128( v ) ) )
# endif
# endif
#endif
namespace tracy
{
static inline uint16_t to565( uint8_t r, uint8_t g, uint8_t b )
{
return ( ( r & 0xF8 ) << 8 ) | ( ( g & 0xFC ) << 3 ) | ( b >> 3 );
}
static inline uint16_t to565( uint32_t c )
{
return
( ( c & 0xF80000 ) >> 19 ) |
( ( c & 0x00FC00 ) >> 5 ) |
( ( c & 0x0000F8 ) << 8 );
}
static const uint16_t DivTable[255*3+1] = {
0xffff, 0xffff, 0xffff, 0xffff, 0xcccc, 0xaaaa, 0x9249, 0x8000, 0x71c7, 0x6666, 0x5d17, 0x5555, 0x4ec4, 0x4924, 0x4444, 0x4000,
0x3c3c, 0x38e3, 0x35e5, 0x3333, 0x30c3, 0x2e8b, 0x2c85, 0x2aaa, 0x28f5, 0x2762, 0x25ed, 0x2492, 0x234f, 0x2222, 0x2108, 0x2000,
0x1f07, 0x1e1e, 0x1d41, 0x1c71, 0x1bac, 0x1af2, 0x1a41, 0x1999, 0x18f9, 0x1861, 0x17d0, 0x1745, 0x16c1, 0x1642, 0x15c9, 0x1555,
0x14e5, 0x147a, 0x1414, 0x13b1, 0x1352, 0x12f6, 0x129e, 0x1249, 0x11f7, 0x11a7, 0x115b, 0x1111, 0x10c9, 0x1084, 0x1041, 0x1000,
0x0fc0, 0x0f83, 0x0f48, 0x0f0f, 0x0ed7, 0x0ea0, 0x0e6c, 0x0e38, 0x0e07, 0x0dd6, 0x0da7, 0x0d79, 0x0d4c, 0x0d20, 0x0cf6, 0x0ccc,
0x0ca4, 0x0c7c, 0x0c56, 0x0c30, 0x0c0c, 0x0be8, 0x0bc5, 0x0ba2, 0x0b81, 0x0b60, 0x0b40, 0x0b21, 0x0b02, 0x0ae4, 0x0ac7, 0x0aaa,
0x0a8e, 0x0a72, 0x0a57, 0x0a3d, 0x0a23, 0x0a0a, 0x09f1, 0x09d8, 0x09c0, 0x09a9, 0x0991, 0x097b, 0x0964, 0x094f, 0x0939, 0x0924,
0x090f, 0x08fb, 0x08e7, 0x08d3, 0x08c0, 0x08ad, 0x089a, 0x0888, 0x0876, 0x0864, 0x0853, 0x0842, 0x0831, 0x0820, 0x0810, 0x0800,
0x07f0, 0x07e0, 0x07d1, 0x07c1, 0x07b3, 0x07a4, 0x0795, 0x0787, 0x0779, 0x076b, 0x075d, 0x0750, 0x0743, 0x0736, 0x0729, 0x071c,
0x070f, 0x0703, 0x06f7, 0x06eb, 0x06df, 0x06d3, 0x06c8, 0x06bc, 0x06b1, 0x06a6, 0x069b, 0x0690, 0x0685, 0x067b, 0x0670, 0x0666,
0x065c, 0x0652, 0x0648, 0x063e, 0x0634, 0x062b, 0x0621, 0x0618, 0x060f, 0x0606, 0x05fd, 0x05f4, 0x05eb, 0x05e2, 0x05d9, 0x05d1,
0x05c9, 0x05c0, 0x05b8, 0x05b0, 0x05a8, 0x05a0, 0x0598, 0x0590, 0x0588, 0x0581, 0x0579, 0x0572, 0x056b, 0x0563, 0x055c, 0x0555,
0x054e, 0x0547, 0x0540, 0x0539, 0x0532, 0x052b, 0x0525, 0x051e, 0x0518, 0x0511, 0x050b, 0x0505, 0x04fe, 0x04f8, 0x04f2, 0x04ec,
0x04e6, 0x04e0, 0x04da, 0x04d4, 0x04ce, 0x04c8, 0x04c3, 0x04bd, 0x04b8, 0x04b2, 0x04ad, 0x04a7, 0x04a2, 0x049c, 0x0497, 0x0492,
0x048d, 0x0487, 0x0482, 0x047d, 0x0478, 0x0473, 0x046e, 0x0469, 0x0465, 0x0460, 0x045b, 0x0456, 0x0452, 0x044d, 0x0448, 0x0444,
0x043f, 0x043b, 0x0436, 0x0432, 0x042d, 0x0429, 0x0425, 0x0421, 0x041c, 0x0418, 0x0414, 0x0410, 0x040c, 0x0408, 0x0404, 0x0400,
0x03fc, 0x03f8, 0x03f4, 0x03f0, 0x03ec, 0x03e8, 0x03e4, 0x03e0, 0x03dd, 0x03d9, 0x03d5, 0x03d2, 0x03ce, 0x03ca, 0x03c7, 0x03c3,
0x03c0, 0x03bc, 0x03b9, 0x03b5, 0x03b2, 0x03ae, 0x03ab, 0x03a8, 0x03a4, 0x03a1, 0x039e, 0x039b, 0x0397, 0x0394, 0x0391, 0x038e,
0x038b, 0x0387, 0x0384, 0x0381, 0x037e, 0x037b, 0x0378, 0x0375, 0x0372, 0x036f, 0x036c, 0x0369, 0x0366, 0x0364, 0x0361, 0x035e,
0x035b, 0x0358, 0x0355, 0x0353, 0x0350, 0x034d, 0x034a, 0x0348, 0x0345, 0x0342, 0x0340, 0x033d, 0x033a, 0x0338, 0x0335, 0x0333,
0x0330, 0x032e, 0x032b, 0x0329, 0x0326, 0x0324, 0x0321, 0x031f, 0x031c, 0x031a, 0x0317, 0x0315, 0x0313, 0x0310, 0x030e, 0x030c,
0x0309, 0x0307, 0x0305, 0x0303, 0x0300, 0x02fe, 0x02fc, 0x02fa, 0x02f7, 0x02f5, 0x02f3, 0x02f1, 0x02ef, 0x02ec, 0x02ea, 0x02e8,
0x02e6, 0x02e4, 0x02e2, 0x02e0, 0x02de, 0x02dc, 0x02da, 0x02d8, 0x02d6, 0x02d4, 0x02d2, 0x02d0, 0x02ce, 0x02cc, 0x02ca, 0x02c8,
0x02c6, 0x02c4, 0x02c2, 0x02c0, 0x02be, 0x02bc, 0x02bb, 0x02b9, 0x02b7, 0x02b5, 0x02b3, 0x02b1, 0x02b0, 0x02ae, 0x02ac, 0x02aa,
0x02a8, 0x02a7, 0x02a5, 0x02a3, 0x02a1, 0x02a0, 0x029e, 0x029c, 0x029b, 0x0299, 0x0297, 0x0295, 0x0294, 0x0292, 0x0291, 0x028f,
0x028d, 0x028c, 0x028a, 0x0288, 0x0287, 0x0285, 0x0284, 0x0282, 0x0280, 0x027f, 0x027d, 0x027c, 0x027a, 0x0279, 0x0277, 0x0276,
0x0274, 0x0273, 0x0271, 0x0270, 0x026e, 0x026d, 0x026b, 0x026a, 0x0268, 0x0267, 0x0265, 0x0264, 0x0263, 0x0261, 0x0260, 0x025e,
0x025d, 0x025c, 0x025a, 0x0259, 0x0257, 0x0256, 0x0255, 0x0253, 0x0252, 0x0251, 0x024f, 0x024e, 0x024d, 0x024b, 0x024a, 0x0249,
0x0247, 0x0246, 0x0245, 0x0243, 0x0242, 0x0241, 0x0240, 0x023e, 0x023d, 0x023c, 0x023b, 0x0239, 0x0238, 0x0237, 0x0236, 0x0234,
0x0233, 0x0232, 0x0231, 0x0230, 0x022e, 0x022d, 0x022c, 0x022b, 0x022a, 0x0229, 0x0227, 0x0226, 0x0225, 0x0224, 0x0223, 0x0222,
0x0220, 0x021f, 0x021e, 0x021d, 0x021c, 0x021b, 0x021a, 0x0219, 0x0218, 0x0216, 0x0215, 0x0214, 0x0213, 0x0212, 0x0211, 0x0210,
0x020f, 0x020e, 0x020d, 0x020c, 0x020b, 0x020a, 0x0209, 0x0208, 0x0207, 0x0206, 0x0205, 0x0204, 0x0203, 0x0202, 0x0201, 0x0200,
0x01ff, 0x01fe, 0x01fd, 0x01fc, 0x01fb, 0x01fa, 0x01f9, 0x01f8, 0x01f7, 0x01f6, 0x01f5, 0x01f4, 0x01f3, 0x01f2, 0x01f1, 0x01f0,
0x01ef, 0x01ee, 0x01ed, 0x01ec, 0x01eb, 0x01ea, 0x01e9, 0x01e9, 0x01e8, 0x01e7, 0x01e6, 0x01e5, 0x01e4, 0x01e3, 0x01e2, 0x01e1,
0x01e0, 0x01e0, 0x01df, 0x01de, 0x01dd, 0x01dc, 0x01db, 0x01da, 0x01da, 0x01d9, 0x01d8, 0x01d7, 0x01d6, 0x01d5, 0x01d4, 0x01d4,
0x01d3, 0x01d2, 0x01d1, 0x01d0, 0x01cf, 0x01cf, 0x01ce, 0x01cd, 0x01cc, 0x01cb, 0x01cb, 0x01ca, 0x01c9, 0x01c8, 0x01c7, 0x01c7,
0x01c6, 0x01c5, 0x01c4, 0x01c3, 0x01c3, 0x01c2, 0x01c1, 0x01c0, 0x01c0, 0x01bf, 0x01be, 0x01bd, 0x01bd, 0x01bc, 0x01bb, 0x01ba,
0x01ba, 0x01b9, 0x01b8, 0x01b7, 0x01b7, 0x01b6, 0x01b5, 0x01b4, 0x01b4, 0x01b3, 0x01b2, 0x01b2, 0x01b1, 0x01b0, 0x01af, 0x01af,
0x01ae, 0x01ad, 0x01ad, 0x01ac, 0x01ab, 0x01aa, 0x01aa, 0x01a9, 0x01a8, 0x01a8, 0x01a7, 0x01a6, 0x01a6, 0x01a5, 0x01a4, 0x01a4,
0x01a3, 0x01a2, 0x01a2, 0x01a1, 0x01a0, 0x01a0, 0x019f, 0x019e, 0x019e, 0x019d, 0x019c, 0x019c, 0x019b, 0x019a, 0x019a, 0x0199,
0x0198, 0x0198, 0x0197, 0x0197, 0x0196, 0x0195, 0x0195, 0x0194, 0x0193, 0x0193, 0x0192, 0x0192, 0x0191, 0x0190, 0x0190, 0x018f,
0x018f, 0x018e, 0x018d, 0x018d, 0x018c, 0x018b, 0x018b, 0x018a, 0x018a, 0x0189, 0x0189, 0x0188, 0x0187, 0x0187, 0x0186, 0x0186,
0x0185, 0x0184, 0x0184, 0x0183, 0x0183, 0x0182, 0x0182, 0x0181, 0x0180, 0x0180, 0x017f, 0x017f, 0x017e, 0x017e, 0x017d, 0x017d,
0x017c, 0x017b, 0x017b, 0x017a, 0x017a, 0x0179, 0x0179, 0x0178, 0x0178, 0x0177, 0x0177, 0x0176, 0x0175, 0x0175, 0x0174, 0x0174,
0x0173, 0x0173, 0x0172, 0x0172, 0x0171, 0x0171, 0x0170, 0x0170, 0x016f, 0x016f, 0x016e, 0x016e, 0x016d, 0x016d, 0x016c, 0x016c,
0x016b, 0x016b, 0x016a, 0x016a, 0x0169, 0x0169, 0x0168, 0x0168, 0x0167, 0x0167, 0x0166, 0x0166, 0x0165, 0x0165, 0x0164, 0x0164,
0x0163, 0x0163, 0x0162, 0x0162, 0x0161, 0x0161, 0x0160, 0x0160, 0x015f, 0x015f, 0x015e, 0x015e, 0x015d, 0x015d, 0x015d, 0x015c,
0x015c, 0x015b, 0x015b, 0x015a, 0x015a, 0x0159, 0x0159, 0x0158, 0x0158, 0x0158, 0x0157, 0x0157, 0x0156, 0x0156
};
static const uint16_t DivTableNEON[255*3+1] = {
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
0x0000, 0x1c71, 0x1af2, 0x1999, 0x1861, 0x1745, 0x1642, 0x1555, 0x147a, 0x13b1, 0x12f6, 0x1249, 0x11a7, 0x1111, 0x1084, 0x1000,
0x0f83, 0x0f0f, 0x0ea0, 0x0e38, 0x0dd6, 0x0d79, 0x0d20, 0x0ccc, 0x0c7c, 0x0c30, 0x0be8, 0x0ba2, 0x0b60, 0x0b21, 0x0ae4, 0x0aaa,
0x0a72, 0x0a3d, 0x0a0a, 0x09d8, 0x09a9, 0x097b, 0x094f, 0x0924, 0x08fb, 0x08d3, 0x08ad, 0x0888, 0x0864, 0x0842, 0x0820, 0x0800,
0x07e0, 0x07c1, 0x07a4, 0x0787, 0x076b, 0x0750, 0x0736, 0x071c, 0x0703, 0x06eb, 0x06d3, 0x06bc, 0x06a6, 0x0690, 0x067b, 0x0666,
0x0652, 0x063e, 0x062b, 0x0618, 0x0606, 0x05f4, 0x05e2, 0x05d1, 0x05c0, 0x05b0, 0x05a0, 0x0590, 0x0581, 0x0572, 0x0563, 0x0555,
0x0547, 0x0539, 0x052b, 0x051e, 0x0511, 0x0505, 0x04f8, 0x04ec, 0x04e0, 0x04d4, 0x04c8, 0x04bd, 0x04b2, 0x04a7, 0x049c, 0x0492,
0x0487, 0x047d, 0x0473, 0x0469, 0x0460, 0x0456, 0x044d, 0x0444, 0x043b, 0x0432, 0x0429, 0x0421, 0x0418, 0x0410, 0x0408, 0x0400,
0x03f8, 0x03f0, 0x03e8, 0x03e0, 0x03d9, 0x03d2, 0x03ca, 0x03c3, 0x03bc, 0x03b5, 0x03ae, 0x03a8, 0x03a1, 0x039b, 0x0394, 0x038e,
0x0387, 0x0381, 0x037b, 0x0375, 0x036f, 0x0369, 0x0364, 0x035e, 0x0358, 0x0353, 0x034d, 0x0348, 0x0342, 0x033d, 0x0338, 0x0333,
0x032e, 0x0329, 0x0324, 0x031f, 0x031a, 0x0315, 0x0310, 0x030c, 0x0307, 0x0303, 0x02fe, 0x02fa, 0x02f5, 0x02f1, 0x02ec, 0x02e8,
0x02e4, 0x02e0, 0x02dc, 0x02d8, 0x02d4, 0x02d0, 0x02cc, 0x02c8, 0x02c4, 0x02c0, 0x02bc, 0x02b9, 0x02b5, 0x02b1, 0x02ae, 0x02aa,
0x02a7, 0x02a3, 0x02a0, 0x029c, 0x0299, 0x0295, 0x0292, 0x028f, 0x028c, 0x0288, 0x0285, 0x0282, 0x027f, 0x027c, 0x0279, 0x0276,
0x0273, 0x0270, 0x026d, 0x026a, 0x0267, 0x0264, 0x0261, 0x025e, 0x025c, 0x0259, 0x0256, 0x0253, 0x0251, 0x024e, 0x024b, 0x0249,
0x0246, 0x0243, 0x0241, 0x023e, 0x023c, 0x0239, 0x0237, 0x0234, 0x0232, 0x0230, 0x022d, 0x022b, 0x0229, 0x0226, 0x0224, 0x0222,
0x021f, 0x021d, 0x021b, 0x0219, 0x0216, 0x0214, 0x0212, 0x0210, 0x020e, 0x020c, 0x020a, 0x0208, 0x0206, 0x0204, 0x0202, 0x0200,
0x01fe, 0x01fc, 0x01fa, 0x01f8, 0x01f6, 0x01f4, 0x01f2, 0x01f0, 0x01ee, 0x01ec, 0x01ea, 0x01e9, 0x01e7, 0x01e5, 0x01e3, 0x01e1,
0x01e0, 0x01de, 0x01dc, 0x01da, 0x01d9, 0x01d7, 0x01d5, 0x01d4, 0x01d2, 0x01d0, 0x01cf, 0x01cd, 0x01cb, 0x01ca, 0x01c8, 0x01c7,
0x01c5, 0x01c3, 0x01c2, 0x01c0, 0x01bf, 0x01bd, 0x01bc, 0x01ba, 0x01b9, 0x01b7, 0x01b6, 0x01b4, 0x01b3, 0x01b2, 0x01b0, 0x01af,
0x01ad, 0x01ac, 0x01aa, 0x01a9, 0x01a8, 0x01a6, 0x01a5, 0x01a4, 0x01a2, 0x01a1, 0x01a0, 0x019e, 0x019d, 0x019c, 0x019a, 0x0199,
0x0198, 0x0197, 0x0195, 0x0194, 0x0193, 0x0192, 0x0190, 0x018f, 0x018e, 0x018d, 0x018b, 0x018a, 0x0189, 0x0188, 0x0187, 0x0186,
0x0184, 0x0183, 0x0182, 0x0181, 0x0180, 0x017f, 0x017e, 0x017d, 0x017b, 0x017a, 0x0179, 0x0178, 0x0177, 0x0176, 0x0175, 0x0174,
0x0173, 0x0172, 0x0171, 0x0170, 0x016f, 0x016e, 0x016d, 0x016c, 0x016b, 0x016a, 0x0169, 0x0168, 0x0167, 0x0166, 0x0165, 0x0164,
0x0163, 0x0162, 0x0161, 0x0160, 0x015f, 0x015e, 0x015d, 0x015c, 0x015b, 0x015a, 0x0159, 0x0158, 0x0158, 0x0157, 0x0156, 0x0155,
0x0154, 0x0153, 0x0152, 0x0151, 0x0150, 0x0150, 0x014f, 0x014e, 0x014d, 0x014c, 0x014b, 0x014a, 0x014a, 0x0149, 0x0148, 0x0147,
0x0146, 0x0146, 0x0145, 0x0144, 0x0143, 0x0142, 0x0142, 0x0141, 0x0140, 0x013f, 0x013e, 0x013e, 0x013d, 0x013c, 0x013b, 0x013b,
0x013a, 0x0139, 0x0138, 0x0138, 0x0137, 0x0136, 0x0135, 0x0135, 0x0134, 0x0133, 0x0132, 0x0132, 0x0131, 0x0130, 0x0130, 0x012f,
0x012e, 0x012e, 0x012d, 0x012c, 0x012b, 0x012b, 0x012a, 0x0129, 0x0129, 0x0128, 0x0127, 0x0127, 0x0126, 0x0125, 0x0125, 0x0124,
0x0123, 0x0123, 0x0122, 0x0121, 0x0121, 0x0120, 0x0120, 0x011f, 0x011e, 0x011e, 0x011d, 0x011c, 0x011c, 0x011b, 0x011b, 0x011a,
0x0119, 0x0119, 0x0118, 0x0118, 0x0117, 0x0116, 0x0116, 0x0115, 0x0115, 0x0114, 0x0113, 0x0113, 0x0112, 0x0112, 0x0111, 0x0111,
0x0110, 0x010f, 0x010f, 0x010e, 0x010e, 0x010d, 0x010d, 0x010c, 0x010c, 0x010b, 0x010a, 0x010a, 0x0109, 0x0109, 0x0108, 0x0108,
0x0107, 0x0107, 0x0106, 0x0106, 0x0105, 0x0105, 0x0104, 0x0104, 0x0103, 0x0103, 0x0102, 0x0102, 0x0101, 0x0101, 0x0100, 0x0100,
0x00ff, 0x00ff, 0x00fe, 0x00fe, 0x00fd, 0x00fd, 0x00fc, 0x00fc, 0x00fb, 0x00fb, 0x00fa, 0x00fa, 0x00f9, 0x00f9, 0x00f8, 0x00f8,
0x00f7, 0x00f7, 0x00f6, 0x00f6, 0x00f5, 0x00f5, 0x00f4, 0x00f4, 0x00f4, 0x00f3, 0x00f3, 0x00f2, 0x00f2, 0x00f1, 0x00f1, 0x00f0,
0x00f0, 0x00f0, 0x00ef, 0x00ef, 0x00ee, 0x00ee, 0x00ed, 0x00ed, 0x00ed, 0x00ec, 0x00ec, 0x00eb, 0x00eb, 0x00ea, 0x00ea, 0x00ea,
0x00e9, 0x00e9, 0x00e8, 0x00e8, 0x00e7, 0x00e7, 0x00e7, 0x00e6, 0x00e6, 0x00e5, 0x00e5, 0x00e5, 0x00e4, 0x00e4, 0x00e3, 0x00e3,
0x00e3, 0x00e2, 0x00e2, 0x00e1, 0x00e1, 0x00e1, 0x00e0, 0x00e0, 0x00e0, 0x00df, 0x00df, 0x00de, 0x00de, 0x00de, 0x00dd, 0x00dd,
0x00dd, 0x00dc, 0x00dc, 0x00db, 0x00db, 0x00db, 0x00da, 0x00da, 0x00da, 0x00d9, 0x00d9, 0x00d9, 0x00d8, 0x00d8, 0x00d7, 0x00d7,
0x00d7, 0x00d6, 0x00d6, 0x00d6, 0x00d5, 0x00d5, 0x00d5, 0x00d4, 0x00d4, 0x00d4, 0x00d3, 0x00d3, 0x00d3, 0x00d2, 0x00d2, 0x00d2,
0x00d1, 0x00d1, 0x00d1, 0x00d0, 0x00d0, 0x00d0, 0x00cf, 0x00cf, 0x00cf, 0x00ce, 0x00ce, 0x00ce, 0x00cd, 0x00cd, 0x00cd, 0x00cc,
0x00cc, 0x00cc, 0x00cb, 0x00cb, 0x00cb, 0x00ca, 0x00ca, 0x00ca, 0x00c9, 0x00c9, 0x00c9, 0x00c9, 0x00c8, 0x00c8, 0x00c8, 0x00c7,
0x00c7, 0x00c7, 0x00c6, 0x00c6, 0x00c6, 0x00c5, 0x00c5, 0x00c5, 0x00c5, 0x00c4, 0x00c4, 0x00c4, 0x00c3, 0x00c3, 0x00c3, 0x00c3,
0x00c2, 0x00c2, 0x00c2, 0x00c1, 0x00c1, 0x00c1, 0x00c1, 0x00c0, 0x00c0, 0x00c0, 0x00bf, 0x00bf, 0x00bf, 0x00bf, 0x00be, 0x00be,
0x00be, 0x00bd, 0x00bd, 0x00bd, 0x00bd, 0x00bc, 0x00bc, 0x00bc, 0x00bc, 0x00bb, 0x00bb, 0x00bb, 0x00ba, 0x00ba, 0x00ba, 0x00ba,
0x00b9, 0x00b9, 0x00b9, 0x00b9, 0x00b8, 0x00b8, 0x00b8, 0x00b8, 0x00b7, 0x00b7, 0x00b7, 0x00b7, 0x00b6, 0x00b6, 0x00b6, 0x00b6,
0x00b5, 0x00b5, 0x00b5, 0x00b5, 0x00b4, 0x00b4, 0x00b4, 0x00b4, 0x00b3, 0x00b3, 0x00b3, 0x00b3, 0x00b2, 0x00b2, 0x00b2, 0x00b2,
0x00b1, 0x00b1, 0x00b1, 0x00b1, 0x00b0, 0x00b0, 0x00b0, 0x00b0, 0x00af, 0x00af, 0x00af, 0x00af, 0x00ae, 0x00ae, 0x00ae, 0x00ae,
0x00ae, 0x00ad, 0x00ad, 0x00ad, 0x00ad, 0x00ac, 0x00ac, 0x00ac, 0x00ac, 0x00ac, 0x00ab, 0x00ab, 0x00ab, 0x00ab,
};
static tracy_force_inline uint64_t ProcessRGB( const uint8_t* src )
{
#ifdef __SSE4_1__
__m128i px0 = _mm_loadu_si128(((__m128i*)src) + 0);
__m128i px1 = _mm_loadu_si128(((__m128i*)src) + 1);
__m128i px2 = _mm_loadu_si128(((__m128i*)src) + 2);
__m128i px3 = _mm_loadu_si128(((__m128i*)src) + 3);
__m128i smask = _mm_set1_epi32( 0xF8FCF8 );
__m128i sd0 = _mm_and_si128( px0, smask );
__m128i sd1 = _mm_and_si128( px1, smask );
__m128i sd2 = _mm_and_si128( px2, smask );
__m128i sd3 = _mm_and_si128( px3, smask );
__m128i sc = _mm_shuffle_epi32(sd0, _MM_SHUFFLE(0, 0, 0, 0));
__m128i sc0 = _mm_cmpeq_epi8(sd0, sc);
__m128i sc1 = _mm_cmpeq_epi8(sd1, sc);
__m128i sc2 = _mm_cmpeq_epi8(sd2, sc);
__m128i sc3 = _mm_cmpeq_epi8(sd3, sc);
__m128i sm0 = _mm_and_si128(sc0, sc1);
__m128i sm1 = _mm_and_si128(sc2, sc3);
__m128i sm = _mm_and_si128(sm0, sm1);
if( _mm_testc_si128(sm, _mm_set1_epi32(-1)) )
{
return uint64_t( to565( src[0], src[1], src[2] ) ) << 16;
}
__m128i min0 = _mm_min_epu8( px0, px1 );
__m128i min1 = _mm_min_epu8( px2, px3 );
__m128i min2 = _mm_min_epu8( min0, min1 );
__m128i max0 = _mm_max_epu8( px0, px1 );
__m128i max1 = _mm_max_epu8( px2, px3 );
__m128i max2 = _mm_max_epu8( max0, max1 );
__m128i min3 = _mm_shuffle_epi32( min2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m128i max3 = _mm_shuffle_epi32( max2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m128i min4 = _mm_min_epu8( min2, min3 );
__m128i max4 = _mm_max_epu8( max2, max3 );
__m128i min5 = _mm_shuffle_epi32( min4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m128i max5 = _mm_shuffle_epi32( max4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m128i rmin = _mm_min_epu8( min4, min5 );
__m128i rmax = _mm_max_epu8( max4, max5 );
__m128i range1 = _mm_subs_epu8( rmax, rmin );
__m128i range2 = _mm_sad_epu8( rmax, rmin );
uint32_t vrange = _mm_cvtsi128_si32( range2 ) >> 1;
__m128i range = _mm_set1_epi16( DivTable[vrange] );
__m128i inset1 = _mm_srli_epi16( range1, 4 );
__m128i inset = _mm_and_si128( inset1, _mm_set1_epi8( 0xF ) );
__m128i min = _mm_adds_epu8( rmin, inset );
__m128i max = _mm_subs_epu8( rmax, inset );
__m128i c0 = _mm_subs_epu8( px0, rmin );
__m128i c1 = _mm_subs_epu8( px1, rmin );
__m128i c2 = _mm_subs_epu8( px2, rmin );
__m128i c3 = _mm_subs_epu8( px3, rmin );
__m128i is0 = _mm_maddubs_epi16( c0, _mm_set1_epi8( 1 ) );
__m128i is1 = _mm_maddubs_epi16( c1, _mm_set1_epi8( 1 ) );
__m128i is2 = _mm_maddubs_epi16( c2, _mm_set1_epi8( 1 ) );
__m128i is3 = _mm_maddubs_epi16( c3, _mm_set1_epi8( 1 ) );
__m128i s0 = _mm_hadd_epi16( is0, is1 );
__m128i s1 = _mm_hadd_epi16( is2, is3 );
__m128i m0 = _mm_mulhi_epu16( s0, range );
__m128i m1 = _mm_mulhi_epu16( s1, range );
__m128i p0 = _mm_packus_epi16( m0, m1 );
__m128i p1 = _mm_or_si128( _mm_srai_epi32( p0, 6 ), _mm_srai_epi32( p0, 12 ) );
__m128i p2 = _mm_or_si128( _mm_srai_epi32( p0, 18 ), p0 );
__m128i p3 = _mm_or_si128( p1, p2 );
__m128i p =_mm_shuffle_epi8( p3, _mm_set1_epi32( 0x0C080400 ) );
uint32_t vmin = _mm_cvtsi128_si32( min );
uint32_t vmax = _mm_cvtsi128_si32( max );
uint32_t vp = _mm_cvtsi128_si32( p );
return uint64_t( ( uint64_t( to565( vmin ) ) << 16 ) | to565( vmax ) | ( uint64_t( vp ) << 32 ) );
#elif defined __ARM_NEON
# ifdef __aarch64__
uint8x16x4_t px = vld4q_u8( src );
uint8x16_t lr = px.val[0];
uint8x16_t lg = px.val[1];
uint8x16_t lb = px.val[2];
uint8_t rmaxr = vmaxvq_u8( lr );
uint8_t rmaxg = vmaxvq_u8( lg );
uint8_t rmaxb = vmaxvq_u8( lb );
uint8_t rminr = vminvq_u8( lr );
uint8_t rming = vminvq_u8( lg );
uint8_t rminb = vminvq_u8( lb );
int rr = rmaxr - rminr;
int rg = rmaxg - rming;
int rb = rmaxb - rminb;
int vrange1 = rr + rg + rb;
uint16_t vrange2 = DivTableNEON[vrange1];
uint8_t insetr = rr >> 4;
uint8_t insetg = rg >> 4;
uint8_t insetb = rb >> 4;
uint8_t minr = rminr + insetr;
uint8_t ming = rming + insetg;
uint8_t minb = rminb + insetb;
uint8_t maxr = rmaxr - insetr;
uint8_t maxg = rmaxg - insetg;
uint8_t maxb = rmaxb - insetb;
uint8x16_t cr = vsubq_u8( lr, vdupq_n_u8( rminr ) );
uint8x16_t cg = vsubq_u8( lg, vdupq_n_u8( rming ) );
uint8x16_t cb = vsubq_u8( lb, vdupq_n_u8( rminb ) );
uint16x8_t is0l = vaddl_u8( vget_low_u8( cr ), vget_low_u8( cg ) );
uint16x8_t is0h = vaddl_u8( vget_high_u8( cr ), vget_high_u8( cg ) );
uint16x8_t is1l = vaddw_u8( is0l, vget_low_u8( cb ) );
uint16x8_t is1h = vaddw_u8( is0h, vget_high_u8( cb ) );
int16x8_t range = vdupq_n_s16( vrange2 );
uint16x8_t m0 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( is1l ), range ) );
uint16x8_t m1 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( is1h ), range ) );
uint8x8_t p00 = vmovn_u16( m0 );
uint8x8_t p01 = vmovn_u16( m1 );
uint8x16_t p0 = vcombine_u8( p00, p01 );
uint32x4_t p1 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 6 ), vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 12 ) );
uint32x4_t p2 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 18 ), vreinterpretq_u32_u8( p0 ) );
uint32x4_t p3 = vaddq_u32( p1, p2 );
uint16x4x2_t p4 = vuzp_u16( vget_low_u16( vreinterpretq_u16_u32( p3 ) ), vget_high_u16( vreinterpretq_u16_u32( p3 ) ) );
uint8x8x2_t p = vuzp_u8( vreinterpret_u8_u16( p4.val[0] ), vreinterpret_u8_u16( p4.val[0] ) );
uint32_t vp;
vst1_lane_u32( &vp, vreinterpret_u32_u8( p.val[0] ), 0 );
return uint64_t( ( uint64_t( to565( minr, ming, minb ) ) << 16 ) | to565( maxr, maxg, maxb ) | ( uint64_t( vp ) << 32 ) );
# else
uint32x4_t px0 = vld1q_u32( (uint32_t*)src );
uint32x4_t px1 = vld1q_u32( (uint32_t*)src + 4 );
uint32x4_t px2 = vld1q_u32( (uint32_t*)src + 8 );
uint32x4_t px3 = vld1q_u32( (uint32_t*)src + 12 );
uint32x4_t smask = vdupq_n_u32( 0xF8FCF8 );
uint32x4_t sd0 = vandq_u32( smask, px0 );
uint32x4_t sd1 = vandq_u32( smask, px1 );
uint32x4_t sd2 = vandq_u32( smask, px2 );
uint32x4_t sd3 = vandq_u32( smask, px3 );
uint32x4_t sc = vdupq_n_u32( sd0[0] );
uint32x4_t sc0 = vceqq_u32( sd0, sc );
uint32x4_t sc1 = vceqq_u32( sd1, sc );
uint32x4_t sc2 = vceqq_u32( sd2, sc );
uint32x4_t sc3 = vceqq_u32( sd3, sc );
uint32x4_t sm0 = vandq_u32( sc0, sc1 );
uint32x4_t sm1 = vandq_u32( sc2, sc3 );
int64x2_t sm = vreinterpretq_s64_u32( vandq_u32( sm0, sm1 ) );
if( sm[0] == -1 && sm[1] == -1 )
{
return uint64_t( to565( src[0], src[1], src[2] ) ) << 16;
}
uint32x4_t mask = vdupq_n_u32( 0xFFFFFF );
uint8x16_t l0 = vreinterpretq_u8_u32( vandq_u32( mask, px0 ) );
uint8x16_t l1 = vreinterpretq_u8_u32( vandq_u32( mask, px1 ) );
uint8x16_t l2 = vreinterpretq_u8_u32( vandq_u32( mask, px2 ) );
uint8x16_t l3 = vreinterpretq_u8_u32( vandq_u32( mask, px3 ) );
uint8x16_t min0 = vminq_u8( l0, l1 );
uint8x16_t min1 = vminq_u8( l2, l3 );
uint8x16_t min2 = vminq_u8( min0, min1 );
uint8x16_t max0 = vmaxq_u8( l0, l1 );
uint8x16_t max1 = vmaxq_u8( l2, l3 );
uint8x16_t max2 = vmaxq_u8( max0, max1 );
uint8x16_t min3 = vreinterpretq_u8_u32( vrev64q_u32( vreinterpretq_u32_u8( min2 ) ) );
uint8x16_t max3 = vreinterpretq_u8_u32( vrev64q_u32( vreinterpretq_u32_u8( max2 ) ) );
uint8x16_t min4 = vminq_u8( min2, min3 );
uint8x16_t max4 = vmaxq_u8( max2, max3 );
uint8x16_t min5 = vcombine_u8( vget_high_u8( min4 ), vget_low_u8( min4 ) );
uint8x16_t max5 = vcombine_u8( vget_high_u8( max4 ), vget_low_u8( max4 ) );
uint8x16_t rmin = vminq_u8( min4, min5 );
uint8x16_t rmax = vmaxq_u8( max4, max5 );
uint8x16_t range1 = vsubq_u8( rmax, rmin );
uint8x8_t range2 = vget_low_u8( range1 );
uint8x8x2_t range3 = vzip_u8( range2, vdup_n_u8( 0 ) );
uint16x4_t range4 = vreinterpret_u16_u8( range3.val[0] );
uint16_t vrange1;
uint16x4_t range5 = vpadd_u16( range4, range4 );
uint16x4_t range6 = vpadd_u16( range5, range5 );
vst1_lane_u16( &vrange1, range6, 0 );
uint32_t vrange2 = ( 2 << 16 ) / uint32_t( vrange1 + 1 );
uint16x8_t range = vdupq_n_u16( vrange2 );
uint8x16_t inset = vshrq_n_u8( range1, 4 );
uint8x16_t min = vaddq_u8( rmin, inset );
uint8x16_t max = vsubq_u8( rmax, inset );
uint8x16_t c0 = vsubq_u8( l0, rmin );
uint8x16_t c1 = vsubq_u8( l1, rmin );
uint8x16_t c2 = vsubq_u8( l2, rmin );
uint8x16_t c3 = vsubq_u8( l3, rmin );
uint16x8_t is0 = vpaddlq_u8( c0 );
uint16x8_t is1 = vpaddlq_u8( c1 );
uint16x8_t is2 = vpaddlq_u8( c2 );
uint16x8_t is3 = vpaddlq_u8( c3 );
uint16x4_t is4 = vpadd_u16( vget_low_u16( is0 ), vget_high_u16( is0 ) );
uint16x4_t is5 = vpadd_u16( vget_low_u16( is1 ), vget_high_u16( is1 ) );
uint16x4_t is6 = vpadd_u16( vget_low_u16( is2 ), vget_high_u16( is2 ) );
uint16x4_t is7 = vpadd_u16( vget_low_u16( is3 ), vget_high_u16( is3 ) );
uint16x8_t s0 = vcombine_u16( is4, is5 );
uint16x8_t s1 = vcombine_u16( is6, is7 );
uint16x8_t m0 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( s0 ), vreinterpretq_s16_u16( range ) ) );
uint16x8_t m1 = vreinterpretq_u16_s16( vqdmulhq_s16( vreinterpretq_s16_u16( s1 ), vreinterpretq_s16_u16( range ) ) );
uint8x8_t p00 = vmovn_u16( m0 );
uint8x8_t p01 = vmovn_u16( m1 );
uint8x16_t p0 = vcombine_u8( p00, p01 );
uint32x4_t p1 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 6 ), vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 12 ) );
uint32x4_t p2 = vaddq_u32( vshrq_n_u32( vreinterpretq_u32_u8( p0 ), 18 ), vreinterpretq_u32_u8( p0 ) );
uint32x4_t p3 = vaddq_u32( p1, p2 );
uint16x4x2_t p4 = vuzp_u16( vget_low_u16( vreinterpretq_u16_u32( p3 ) ), vget_high_u16( vreinterpretq_u16_u32( p3 ) ) );
uint8x8x2_t p = vuzp_u8( vreinterpret_u8_u16( p4.val[0] ), vreinterpret_u8_u16( p4.val[0] ) );
uint32_t vmin, vmax, vp;
vst1q_lane_u32( &vmin, vreinterpretq_u32_u8( min ), 0 );
vst1q_lane_u32( &vmax, vreinterpretq_u32_u8( max ), 0 );
vst1_lane_u32( &vp, vreinterpret_u32_u8( p.val[0] ), 0 );
return uint64_t( ( uint64_t( to565( vmin ) ) << 16 ) | to565( vmax ) | ( uint64_t( vp ) << 32 ) );
# endif
#else
uint32_t ref;
memcpy( &ref, src, 4 );
uint32_t refMask = ref & 0xF8FCF8;
auto stmp = src + 4;
for( int i=1; i<16; i++ )
{
uint32_t px;
memcpy( &px, stmp, 4 );
if( ( px & 0xF8FCF8 ) != refMask ) break;
stmp += 4;
}
if( stmp == src + 64 )
{
return uint64_t( to565( ref ) ) << 16;
}
uint8_t min[3] = { src[0], src[1], src[2] };
uint8_t max[3] = { src[0], src[1], src[2] };
auto tmp = src + 4;
for( int i=1; i<16; i++ )
{
for( int j=0; j<3; j++ )
{
if( tmp[j] < min[j] ) min[j] = tmp[j];
else if( tmp[j] > max[j] ) max[j] = tmp[j];
}
tmp += 4;
}
const uint32_t range = DivTable[max[0] - min[0] + max[1] - min[1] + max[2] - min[2]];
const uint32_t rmin = min[0] + min[1] + min[2];
for( int i=0; i<3; i++ )
{
const uint8_t inset = ( max[i] - min[i] ) >> 4;
min[i] += inset;
max[i] -= inset;
}
uint32_t data = 0;
for( int i=0; i<16; i++ )
{
const uint32_t c = src[0] + src[1] + src[2] - rmin;
const uint8_t idx = ( c * range ) >> 16;
data |= idx << (i*2);
src += 4;
}
return uint64_t( ( uint64_t( to565( min[0], min[1], min[2] ) ) << 16 ) | to565( max[0], max[1], max[2] ) | ( uint64_t( data ) << 32 ) );
#endif
}
#ifdef __AVX2__
static tracy_force_inline void ProcessRGB_AVX( const uint8_t* src, char*& dst )
{
__m256i px0 = _mm256_loadu_si256(((__m256i*)src) + 0);
__m256i px1 = _mm256_loadu_si256(((__m256i*)src) + 1);
__m256i px2 = _mm256_loadu_si256(((__m256i*)src) + 2);
__m256i px3 = _mm256_loadu_si256(((__m256i*)src) + 3);
__m256i smask = _mm256_set1_epi32( 0xF8FCF8 );
__m256i sd0 = _mm256_and_si256( px0, smask );
__m256i sd1 = _mm256_and_si256( px1, smask );
__m256i sd2 = _mm256_and_si256( px2, smask );
__m256i sd3 = _mm256_and_si256( px3, smask );
__m256i sc = _mm256_shuffle_epi32(sd0, _MM_SHUFFLE(0, 0, 0, 0));
__m256i sc0 = _mm256_cmpeq_epi8( sd0, sc );
__m256i sc1 = _mm256_cmpeq_epi8( sd1, sc );
__m256i sc2 = _mm256_cmpeq_epi8( sd2, sc );
__m256i sc3 = _mm256_cmpeq_epi8( sd3, sc );
__m256i sm0 = _mm256_and_si256( sc0, sc1 );
__m256i sm1 = _mm256_and_si256( sc2, sc3 );
__m256i sm = _mm256_and_si256( sm0, sm1 );
const int64_t solid0 = 1 - _mm_testc_si128( _mm256_castsi256_si128( sm ), _mm_set1_epi32( -1 ) );
const int64_t solid1 = 1 - _mm_testc_si128( _mm256_extracti128_si256( sm, 1 ), _mm_set1_epi32( -1 ) );
if( solid0 + solid1 == 0 )
{
const auto c0 = uint64_t( to565( src[0], src[1], src[2] ) ) << 16;
const auto c1 = uint64_t( to565( src[16], src[17], src[18] ) ) << 16;
memcpy( dst, &c0, 8 );
memcpy( dst+8, &c1, 8 );
dst += 16;
return;
}
__m256i min0 = _mm256_min_epu8( px0, px1 );
__m256i min1 = _mm256_min_epu8( px2, px3 );
__m256i min2 = _mm256_min_epu8( min0, min1 );
__m256i max0 = _mm256_max_epu8( px0, px1 );
__m256i max1 = _mm256_max_epu8( px2, px3 );
__m256i max2 = _mm256_max_epu8( max0, max1 );
__m256i min3 = _mm256_shuffle_epi32( min2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m256i max3 = _mm256_shuffle_epi32( max2, _MM_SHUFFLE( 2, 3, 0, 1 ) );
__m256i min4 = _mm256_min_epu8( min2, min3 );
__m256i max4 = _mm256_max_epu8( max2, max3 );
__m256i min5 = _mm256_shuffle_epi32( min4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m256i max5 = _mm256_shuffle_epi32( max4, _MM_SHUFFLE( 0, 0, 2, 2 ) );
__m256i rmin = _mm256_min_epu8( min4, min5 );
__m256i rmax = _mm256_max_epu8( max4, max5 );
__m256i range1 = _mm256_subs_epu8( rmax, rmin );
__m256i range2 = _mm256_sad_epu8( rmax, rmin );
uint16_t vrange0 = DivTable[_mm256_cvtsi256_si32( range2 ) >> 1];
uint16_t vrange1 = DivTable[_mm256_extract_epi16( range2, 8 ) >> 1];
__m256i range00 = _mm256_set1_epi16( vrange0 );
__m256i range = _mm256_inserti128_si256( range00, _mm_set1_epi16( vrange1 ), 1 );
__m256i inset1 = _mm256_srli_epi16( range1, 4 );
__m256i inset = _mm256_and_si256( inset1, _mm256_set1_epi8( 0xF ) );
__m256i min = _mm256_adds_epu8( rmin, inset );
__m256i max = _mm256_subs_epu8( rmax, inset );
__m256i c0 = _mm256_subs_epu8( px0, rmin );
__m256i c1 = _mm256_subs_epu8( px1, rmin );
__m256i c2 = _mm256_subs_epu8( px2, rmin );
__m256i c3 = _mm256_subs_epu8( px3, rmin );
__m256i is0 = _mm256_maddubs_epi16( c0, _mm256_set1_epi8( 1 ) );
__m256i is1 = _mm256_maddubs_epi16( c1, _mm256_set1_epi8( 1 ) );
__m256i is2 = _mm256_maddubs_epi16( c2, _mm256_set1_epi8( 1 ) );
__m256i is3 = _mm256_maddubs_epi16( c3, _mm256_set1_epi8( 1 ) );
__m256i s0 = _mm256_hadd_epi16( is0, is1 );
__m256i s1 = _mm256_hadd_epi16( is2, is3 );
__m256i m0 = _mm256_mulhi_epu16( s0, range );
__m256i m1 = _mm256_mulhi_epu16( s1, range );
__m256i p0 = _mm256_packus_epi16( m0, m1 );
__m256i p1 = _mm256_or_si256( _mm256_srai_epi32( p0, 6 ), _mm256_srai_epi32( p0, 12 ) );
__m256i p2 = _mm256_or_si256( _mm256_srai_epi32( p0, 18 ), p0 );
__m256i p3 = _mm256_or_si256( p1, p2 );
__m256i p =_mm256_shuffle_epi8( p3, _mm256_set1_epi32( 0x0C080400 ) );
__m256i mm0 = _mm256_unpacklo_epi8( _mm256_setzero_si256(), min );
__m256i mm1 = _mm256_unpacklo_epi8( _mm256_setzero_si256(), max );
__m256i mm2 = _mm256_unpacklo_epi64( mm1, mm0 );
__m256i mmr = _mm256_slli_epi64( _mm256_srli_epi64( mm2, 11 ), 11 );
__m256i mmg = _mm256_slli_epi64( _mm256_srli_epi64( mm2, 26 ), 5 );
__m256i mmb = _mm256_srli_epi64( _mm256_slli_epi64( mm2, 16 ), 59 );
__m256i mm3 = _mm256_or_si256( mmr, mmg );
__m256i mm4 = _mm256_or_si256( mm3, mmb );
__m256i mm5 = _mm256_shuffle_epi8( mm4, _mm256_set1_epi32( 0x09080100 ) );
__m256i d0 = _mm256_unpacklo_epi32( mm5, p );
__m256i d1 = _mm256_permute4x64_epi64( d0, _MM_SHUFFLE( 3, 2, 2, 0 ) );
__m128i d2 = _mm256_castsi256_si128( d1 );
__m128i mask = _mm_set_epi64x( 0xFFFF0000 | -solid1, 0xFFFF0000 | -solid0 );
__m128i d3 = _mm_and_si128( d2, mask );
_mm_storeu_si128( (__m128i*)dst, d3 );
dst += 16;
}
#endif
void CompressImageDxt1( const char* src, char* dst, int w, int h )
{
assert( (w % 4) == 0 && (h % 4) == 0 );
#ifdef __AVX2__
if( w%8 == 0 )
{
uint32_t buf[8*4];
int i = 0;
auto blocks = w * h / 32;
do
{
auto tmp = (char*)buf;
memcpy( tmp, src, 8*4 );
memcpy( tmp + 8*4, src + w * 4, 8*4 );
memcpy( tmp + 16*4, src + w * 8, 8*4 );
memcpy( tmp + 24*4, src + w * 12, 8*4 );
src += 8*4;
if( ++i == w/8 )
{
src += w * 3 * 4;
i = 0;
}
ProcessRGB_AVX( (uint8_t*)buf, dst );
}
while( --blocks );
}
else
#endif
{
uint32_t buf[4*4];
int i = 0;
auto ptr = dst;
auto blocks = w * h / 16;
do
{
auto tmp = (char*)buf;
memcpy( tmp, src, 4*4 );
memcpy( tmp + 4*4, src + w * 4, 4*4 );
memcpy( tmp + 8*4, src + w * 8, 4*4 );
memcpy( tmp + 12*4, src + w * 12, 4*4 );
src += 4*4;
if( ++i == w/4 )
{
src += w * 3 * 4;
i = 0;
}
const auto c = ProcessRGB( (uint8_t*)buf );
memcpy( ptr, &c, sizeof( uint64_t ) );
ptr += sizeof( uint64_t );
}
while( --blocks );
}
}
}

11
client/TracyDxt1.hpp Normal file
View File

@@ -0,0 +1,11 @@
#ifndef __TRACYDXT1_HPP__
#define __TRACYDXT1_HPP__
namespace tracy
{
void CompressImageDxt1( const char* src, char* dst, int w, int h );
}
#endif

View File

@@ -1,6 +1,7 @@
#ifndef __TRACYFASTVECTOR_HPP__
#define __TRACYFASTVECTOR_HPP__
#include <assert.h>
#include <stddef.h>
#include "../common/TracyAlloc.hpp"
@@ -21,6 +22,7 @@ public:
, m_write( m_ptr )
, m_end( m_ptr + capacity )
{
assert( capacity != 0 );
}
FastVector( const FastVector& ) = delete;

View File

@@ -11,207 +11,224 @@
namespace tracy
{
extern std::atomic<uint32_t> s_lockCounter;
class LockableCtx
{
public:
tracy_force_inline LockableCtx( const SourceLocationData* srcloc )
: m_id( GetLockCounter().fetch_add( 1, std::memory_order_relaxed ) )
#ifdef TRACY_ON_DEMAND
, m_lockCount( 0 )
, m_active( false )
#endif
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
TracyLfqPrepare( QueueType::LockAnnounce );
MemWrite( &item->lockAnnounce.id, m_id );
MemWrite( &item->lockAnnounce.time, Profiler::GetTime() );
MemWrite( &item->lockAnnounce.lckloc, (uint64_t)srcloc );
MemWrite( &item->lockAnnounce.type, LockType::Lockable );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
LockableCtx( const LockableCtx& ) = delete;
LockableCtx& operator=( const LockableCtx& ) = delete;
tracy_force_inline ~LockableCtx()
{
TracyLfqPrepare( QueueType::LockTerminate );
MemWrite( &item->lockTerminate.id, m_id );
MemWrite( &item->lockTerminate.time, Profiler::GetTime() );
MemWrite( &item->lockTerminate.type, LockType::Lockable );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
tracy_force_inline bool BeforeLock()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return false;
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockWait );
MemWrite( &item->lockWait.thread, GetThreadHandle() );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::Lockable );
Profiler::QueueSerialFinish();
return true;
}
tracy_force_inline void AfterLock()
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterUnlock()
{
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !GetProfiler().IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockRelease );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterTryLock( bool acquired )
{
#ifdef TRACY_ON_DEMAND
if( !acquired ) return;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return;
#endif
if( acquired )
{
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
}
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
#ifdef TRACY_ON_DEMAND
const auto active = m_active.load( std::memory_order_relaxed );
if( !active ) return;
const auto connected = GetProfiler().IsConnected();
if( !connected )
{
if( active ) m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockMark );
MemWrite( &item->lockMark.thread, GetThreadHandle() );
MemWrite( &item->lockMark.id, m_id );
MemWrite( &item->lockMark.srcloc, (uint64_t)srcloc );
Profiler::QueueSerialFinish();
}
tracy_force_inline void CustomName( const char* name, size_t size )
{
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, name, size );
ptr[size] = '\0';
TracyLfqPrepare( QueueType::LockName );
MemWrite( &item->lockName.id, m_id );
MemWrite( &item->lockName.name, (uint64_t)ptr );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
private:
uint32_t m_id;
#ifdef TRACY_ON_DEMAND
std::atomic<uint32_t> m_lockCount;
std::atomic<bool> m_active;
#endif
};
template<class T>
class Lockable
{
public:
tracy_force_inline Lockable( const SourceLocationData* srcloc )
: m_id( s_lockCounter.fetch_add( 1, std::memory_order_relaxed ) )
#ifdef TRACY_ON_DEMAND
, m_lockCount( 0 )
, m_active( false )
#endif
: m_ctx( srcloc )
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockAnnounce );
MemWrite( &item->lockAnnounce.id, m_id );
MemWrite( &item->lockAnnounce.time, Profiler::GetTime() );
MemWrite( &item->lockAnnounce.lckloc, (uint64_t)srcloc );
MemWrite( &item->lockAnnounce.type, LockType::Lockable );
#ifdef TRACY_ON_DEMAND
s_profiler.DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
Lockable( const Lockable& ) = delete;
Lockable& operator=( const Lockable& ) = delete;
~Lockable()
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockTerminate );
MemWrite( &item->lockTerminate.id, m_id );
MemWrite( &item->lockTerminate.time, Profiler::GetTime() );
MemWrite( &item->lockTerminate.type, LockType::Lockable );
#ifdef TRACY_ON_DEMAND
s_profiler.DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
}
tracy_force_inline void lock()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = s_profiler.IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue )
{
m_lockable.lock();
return;
}
#endif
const auto thread = GetThreadHandle();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockWait );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.thread, thread );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::Lockable );
tail.store( magic + 1, std::memory_order_release );
}
const auto runAfter = m_ctx.BeforeLock();
m_lockable.lock();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.thread, thread );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
}
if( runAfter ) m_ctx.AfterLock();
}
tracy_force_inline void unlock()
{
m_lockable.unlock();
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !s_profiler.IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockRelease );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
m_ctx.AfterUnlock();
}
tracy_force_inline bool try_lock()
{
const auto ret = m_lockable.try_lock();
#ifdef TRACY_ON_DEMAND
if( !ret ) return ret;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = s_profiler.IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return ret;
#endif
if( ret )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
}
return ret;
const auto acquired = m_lockable.try_lock();
m_ctx.AfterTryLock( acquired );
return acquired;
}
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
#ifdef TRACY_ON_DEMAND
const auto active = m_active.load( std::memory_order_relaxed );
if( !active ) return;
const auto connected = s_profiler.IsConnected();
if( !connected )
{
if( active ) m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
m_ctx.Mark( srcloc );
}
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockMark );
MemWrite( &item->lockMark.id, m_id );
MemWrite( &item->lockMark.thread, GetThreadHandle() );
MemWrite( &item->lockMark.srcloc, (uint64_t)srcloc );
tail.store( magic + 1, std::memory_order_release );
tracy_force_inline void CustomName( const char* name, size_t size )
{
m_ctx.CustomName( name, size );
}
private:
T m_lockable;
uint32_t m_id;
#ifdef TRACY_ON_DEMAND
std::atomic<uint32_t> m_lockCount;
std::atomic<bool> m_active;
#endif
LockableCtx m_ctx;
};
template<class T>
class SharedLockable
class SharedLockableCtx
{
public:
tracy_force_inline SharedLockable( const SourceLocationData* srcloc )
: m_id( s_lockCounter.fetch_add( 1, std::memory_order_relaxed ) )
tracy_force_inline SharedLockableCtx( const SourceLocationData* srcloc )
: m_id( GetLockCounter().fetch_add( 1, std::memory_order_relaxed ) )
#ifdef TRACY_ON_DEMAND
, m_lockCount( 0 )
, m_active( false )
@@ -219,45 +236,37 @@ public:
{
assert( m_id != std::numeric_limits<uint32_t>::max() );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockAnnounce );
TracyLfqPrepare( QueueType::LockAnnounce );
MemWrite( &item->lockAnnounce.id, m_id );
MemWrite( &item->lockAnnounce.time, Profiler::GetTime() );
MemWrite( &item->lockAnnounce.lckloc, (uint64_t)srcloc );
MemWrite( &item->lockAnnounce.type, LockType::SharedLockable );
#ifdef TRACY_ON_DEMAND
s_profiler.DeferItem( *item );
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
SharedLockable( const SharedLockable& ) = delete;
SharedLockable& operator=( const SharedLockable& ) = delete;
SharedLockableCtx( const SharedLockableCtx& ) = delete;
SharedLockableCtx& operator=( const SharedLockableCtx& ) = delete;
~SharedLockable()
tracy_force_inline ~SharedLockableCtx()
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockTerminate );
TracyLfqPrepare( QueueType::LockTerminate );
MemWrite( &item->lockTerminate.id, m_id );
MemWrite( &item->lockTerminate.time, Profiler::GetTime() );
MemWrite( &item->lockTerminate.type, LockType::SharedLockable );
#ifdef TRACY_ON_DEMAND
s_profiler.DeferItem( *item );
GetProfiler().DeferItem( *item );
#endif
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
tracy_force_inline void lock()
tracy_force_inline bool BeforeLock()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
@@ -265,106 +274,82 @@ public:
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = s_profiler.IsConnected();
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue )
{
m_lockable.lock();
return;
}
if( !queue ) return false;
#endif
const auto thread = GetThreadHandle();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockWait );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.thread, thread );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::SharedLockable );
tail.store( magic + 1, std::memory_order_release );
}
m_lockable.lock();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.thread, thread );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
}
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockWait );
MemWrite( &item->lockWait.thread, GetThreadHandle() );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::SharedLockable );
Profiler::QueueSerialFinish();
return true;
}
tracy_force_inline void unlock()
tracy_force_inline void AfterLock()
{
m_lockable.unlock();
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterUnlock()
{
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !s_profiler.IsConnected() )
if( !GetProfiler().IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockRelease );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
tracy_force_inline bool try_lock()
tracy_force_inline void AfterTryLock( bool acquired )
{
const auto ret = m_lockable.try_lock();
#ifdef TRACY_ON_DEMAND
if( !ret ) return ret;
if( !acquired ) return;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = s_profiler.IsConnected();
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return ret;
if( !queue ) return;
#endif
if( ret )
if( acquired )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockObtain );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
return ret;
}
tracy_force_inline void lock_shared()
tracy_force_inline bool BeforeLockShared()
{
#ifdef TRACY_ON_DEMAND
bool queue = false;
@@ -372,103 +357,79 @@ public:
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = s_profiler.IsConnected();
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue )
{
m_lockable.lock_shared();
return;
}
if( !queue ) return false;
#endif
const auto thread = GetThreadHandle();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockSharedWait );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.thread, thread );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::SharedLockable );
tail.store( magic + 1, std::memory_order_release );
}
m_lockable.lock_shared();
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::LockSharedObtain );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.thread, thread );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
}
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedWait );
MemWrite( &item->lockWait.thread, GetThreadHandle() );
MemWrite( &item->lockWait.id, m_id );
MemWrite( &item->lockWait.time, Profiler::GetTime() );
MemWrite( &item->lockWait.type, LockType::SharedLockable );
Profiler::QueueSerialFinish();
return true;
}
tracy_force_inline void unlock_shared()
tracy_force_inline void AfterLockShared()
{
m_lockable.unlock_shared();
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedObtain );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
Profiler::QueueSerialFinish();
}
tracy_force_inline void AfterUnlockShared()
{
#ifdef TRACY_ON_DEMAND
m_lockCount.fetch_sub( 1, std::memory_order_relaxed );
if( !m_active.load( std::memory_order_relaxed ) ) return;
if( !s_profiler.IsConnected() )
if( !GetProfiler().IsConnected() )
{
m_active.store( false, std::memory_order_relaxed );
return;
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedRelease );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.thread, GetThreadHandle() );
MemWrite( &item->lockRelease.id, m_id );
MemWrite( &item->lockRelease.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
tracy_force_inline bool try_lock_shared()
tracy_force_inline void AfterTryLockShared( bool acquired )
{
const auto ret = m_lockable.try_lock_shared();
#ifdef TRACY_ON_DEMAND
if( !ret ) return ret;
if( !acquired ) return;
bool queue = false;
const auto locks = m_lockCount.fetch_add( 1, std::memory_order_relaxed );
const auto active = m_active.load( std::memory_order_relaxed );
if( locks == 0 || active )
{
const bool connected = s_profiler.IsConnected();
const bool connected = GetProfiler().IsConnected();
if( active != connected ) m_active.store( connected, std::memory_order_relaxed );
if( connected ) queue = true;
}
if( !queue ) return ret;
if( !queue ) return;
#endif
if( ret )
if( acquired )
{
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockSharedObtain );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.thread, GetThreadHandle() );
MemWrite( &item->lockObtain.id, m_id );
MemWrite( &item->lockObtain.time, Profiler::GetTime() );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
return ret;
}
tracy_force_inline void Mark( const SourceLocationData* srcloc )
@@ -476,7 +437,7 @@ public:
#ifdef TRACY_ON_DEMAND
const auto active = m_active.load( std::memory_order_relaxed );
if( !active ) return;
const auto connected = s_profiler.IsConnected();
const auto connected = GetProfiler().IsConnected();
if( !connected )
{
if( active ) m_active.store( false, std::memory_order_relaxed );
@@ -484,19 +445,29 @@ public:
}
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::LockMark );
MemWrite( &item->lockMark.id, m_id );
MemWrite( &item->lockMark.thread, GetThreadHandle() );
MemWrite( &item->lockMark.id, m_id );
MemWrite( &item->lockMark.srcloc, (uint64_t)srcloc );
tail.store( magic + 1, std::memory_order_release );
Profiler::QueueSerialFinish();
}
tracy_force_inline void CustomName( const char* name, size_t size )
{
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, name, size );
ptr[size] = '\0';
TracyLfqPrepare( QueueType::LockName );
MemWrite( &item->lockName.id, m_id );
MemWrite( &item->lockName.name, (uint64_t)ptr );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
private:
T m_lockable;
uint32_t m_id;
#ifdef TRACY_ON_DEMAND
@@ -505,7 +476,74 @@ private:
#endif
};
template<class T>
class SharedLockable
{
public:
tracy_force_inline SharedLockable( const SourceLocationData* srcloc )
: m_ctx( srcloc )
{
}
SharedLockable( const SharedLockable& ) = delete;
SharedLockable& operator=( const SharedLockable& ) = delete;
tracy_force_inline void lock()
{
const auto runAfter = m_ctx.BeforeLock();
m_lockable.lock();
if( runAfter ) m_ctx.AfterLock();
}
tracy_force_inline void unlock()
{
m_lockable.unlock();
m_ctx.AfterUnlock();
}
tracy_force_inline bool try_lock()
{
const auto acquired = m_lockable.try_lock();
m_ctx.AfterTryLock( acquired );
return acquired;
}
tracy_force_inline void lock_shared()
{
const auto runAfter = m_ctx.BeforeLockShared();
m_lockable.lock_shared();
if( runAfter ) m_ctx.AfterLockShared();
}
tracy_force_inline void unlock_shared()
{
m_lockable.unlock_shared();
m_ctx.AfterUnlockShared();
}
tracy_force_inline bool try_lock_shared()
{
const auto acquired = m_lockable.try_lock_shared();
m_ctx.AfterTryLockShared( acquired );
return acquired;
}
tracy_force_inline void Mark( const SourceLocationData* srcloc )
{
m_ctx.Mark( srcloc );
}
tracy_force_inline void CustomName( const char* name, size_t size )
{
m_ctx.CustomName( name, size );
}
private:
T m_lockable;
SharedLockableCtx m_ctx;
};
}
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -3,40 +3,62 @@
#include <assert.h>
#include <atomic>
#include <chrono>
#include <stdint.h>
#include <string.h>
#include "concurrentqueue.h"
#include "tracy_concurrentqueue.h"
#include "TracyCallstack.hpp"
#include "TracySysTime.hpp"
#include "TracyFastVector.hpp"
#include "../common/tracy_lz4.hpp"
#include "../common/TracyQueue.hpp"
#include "../common/TracyAlign.hpp"
#include "../common/TracyAlloc.hpp"
#include "../common/TracyMutex.hpp"
#include "../common/TracySystem.hpp"
#include "../common/TracyProtocol.hpp"
#if defined _MSC_VER || defined __CYGWIN__
#if defined _WIN32 || defined __CYGWIN__
# include <intrin.h>
#endif
#if defined _MSC_VER || defined __CYGWIN__ || ( ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 ) && !defined __ANDROID__ ) || __ARM_ARCH >= 6
# define TRACY_HW_TIMER
# if defined _MSC_VER || defined __CYGWIN__
// Enable optimization for MSVC __rdtscp() intrin, saving one LHS of a cpu value on the stack.
// This comes at the cost of an unaligned memory write.
# define TRACY_RDTSCP_OPT
# endif
#ifdef __APPLE__
# include <TargetConditionals.h>
# include <mach/mach_time.h>
#endif
#define TracyConcat(x,y) TracyConcatIndirect(x,y)
#define TracyConcatIndirect(x,y) x##y
#if defined _WIN32 || defined __CYGWIN__ || ( ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 ) && !defined __ANDROID__ ) || __ARM_ARCH >= 6
# define TRACY_HW_TIMER
#endif
#if !defined TRACY_HW_TIMER || ( defined __ARM_ARCH && __ARM_ARCH >= 6 && !defined CLOCK_MONOTONIC_RAW )
#include <chrono>
#endif
#ifndef TracyConcat
# define TracyConcat(x,y) TracyConcatIndirect(x,y)
#endif
#ifndef TracyConcatIndirect
# define TracyConcatIndirect(x,y) x##y
#endif
namespace tracy
{
class GpuCtx;
class Profiler;
class Socket;
class UdpBroadcast;
struct GpuCtxWrapper
{
GpuCtx* ptr;
};
TRACY_API moodycamel::ConcurrentQueue<QueueItem>::ExplicitProducer* GetToken();
TRACY_API Profiler& GetProfiler();
TRACY_API std::atomic<uint32_t>& GetLockCounter();
TRACY_API std::atomic<uint8_t>& GetGpuCtxCounter();
TRACY_API GpuCtxWrapper& GetGpuCtx();
TRACY_API uint64_t GetThreadHandle();
TRACY_API void InitRPMallocThread();
struct SourceLocationData
{
@@ -47,25 +69,6 @@ struct SourceLocationData
uint32_t color;
};
struct ProducerWrapper
{
tracy::moodycamel::ConcurrentQueue<QueueItem>::ExplicitProducer* ptr;
};
extern thread_local ProducerWrapper s_token;
class GpuCtx;
struct GpuCtxWrapper
{
GpuCtx* ptr;
};
class VkCtx;
struct VkCtxWrapper
{
VkCtx* ptr;
};
#ifdef TRACY_ON_DEMAND
struct LuaZoneState
{
@@ -74,217 +77,317 @@ struct LuaZoneState
};
#endif
using Magic = tracy::moodycamel::ConcurrentQueueDefaultTraits::index_t;
#if __ARM_ARCH >= 6
extern int64_t (*GetTimeImpl)();
#endif
#define TracyLfqPrepare( _type ) \
moodycamel::ConcurrentQueueDefaultTraits::index_t __magic; \
auto __token = GetToken(); \
auto& __tail = __token->get_tail_index(); \
auto item = __token->enqueue_begin( __magic ); \
MemWrite( &item->hdr.type, _type );
class Profiler;
extern Profiler& s_profiler;
#define TracyLfqCommit \
__tail.store( __magic + 1, std::memory_order_release );
#define TracyLfqPrepareC( _type ) \
tracy::moodycamel::ConcurrentQueueDefaultTraits::index_t __magic; \
auto __token = tracy::GetToken(); \
auto& __tail = __token->get_tail_index(); \
auto item = __token->enqueue_begin( __magic ); \
tracy::MemWrite( &item->hdr.type, _type );
#define TracyLfqCommitC \
__tail.store( __magic + 1, std::memory_order_release );
typedef void(*ParameterCallback)( uint32_t idx, int32_t val );
class Profiler
{
struct FrameImageQueueItem
{
void* image;
uint64_t frame;
uint16_t w;
uint16_t h;
uint8_t offset;
bool flip;
};
public:
Profiler();
~Profiler();
static tracy_force_inline int64_t GetTime( uint32_t& cpu )
{
#ifdef TRACY_HW_TIMER
# if __ARM_ARCH >= 6
cpu = 0xFFFFFFFF;
return GetTimeImpl();
# elif defined _MSC_VER || defined __CYGWIN__
const auto t = int64_t( __rdtscp( &cpu ) );
return t;
# elif defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64
uint32_t eax, edx;
asm volatile ( "rdtscp" : "=a" (eax), "=d" (edx), "=c" (cpu) :: );
return ( uint64_t( edx ) << 32 ) + uint64_t( eax );
# endif
#else
cpu = 0xFFFFFFFF;
return std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now().time_since_epoch() ).count();
#endif
}
static tracy_force_inline int64_t GetTime()
{
#ifdef TRACY_HW_TIMER
# if __ARM_ARCH >= 6
return GetTimeImpl();
# elif defined _MSC_VER || defined __CYGWIN__
unsigned int dontcare;
const auto t = int64_t( __rdtscp( &dontcare ) );
return t;
# elif defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64
# if defined TARGET_OS_IOS && TARGET_OS_IOS == 1
return mach_absolute_time();
# elif defined __ARM_ARCH && __ARM_ARCH >= 6
# ifdef CLOCK_MONOTONIC_RAW
struct timespec ts;
clock_gettime( CLOCK_MONOTONIC_RAW, &ts );
return int64_t( ts.tv_sec ) * 1000000000ll + int64_t( ts.tv_nsec );
# else
return std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now().time_since_epoch() ).count();
# endif
# elif defined _WIN32 || defined __CYGWIN__
# ifdef TRACY_TIMER_QPC
return GetTimeQpc();
# else
return int64_t( __rdtsc() );
# endif
# elif defined __i386 || defined _M_IX86
uint32_t eax, edx;
asm volatile ( "rdtscp" : "=a" (eax), "=d" (edx) :: "%ecx" );
asm volatile ( "rdtsc" : "=a" (eax), "=d" (edx) );
return ( uint64_t( edx ) << 32 ) + uint64_t( eax );
# elif defined __x86_64__ || defined _M_X64
uint64_t rax, rdx;
asm volatile ( "rdtsc" : "=a" (rax), "=d" (rdx) );
return ( rdx << 32 ) + rax;
# endif
#else
return std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now().time_since_epoch() ).count();
#endif
}
static tracy_force_inline void SendFrameMark()
tracy_force_inline uint32_t GetNextZoneId()
{
return m_zoneId.fetch_add( 1, std::memory_order_relaxed );
}
static tracy_force_inline QueueItem* QueueSerial()
{
auto& p = GetProfiler();
p.m_serialLock.lock();
return p.m_serialQueue.prepare_next();
}
static tracy_force_inline void QueueSerialFinish()
{
auto& p = GetProfiler();
p.m_serialQueue.commit_next();
p.m_serialLock.unlock();
}
static tracy_force_inline void SendFrameMark( const char* name )
{
if( !name ) GetProfiler().m_frameCount.fetch_add( 1, std::memory_order_relaxed );
#ifdef TRACY_ON_DEMAND
s_profiler.m_frameCount.fetch_add( 1, std::memory_order_relaxed );
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::FrameMarkMsg );
TracyLfqPrepare( QueueType::FrameMarkMsg );
MemWrite( &item->frameMark.time, GetTime() );
MemWrite( &item->frameMark.name, uint64_t( 0 ) );
tail.store( magic + 1, std::memory_order_release );
MemWrite( &item->frameMark.name, uint64_t( name ) );
TracyLfqCommit;
}
static tracy_force_inline void SendFrameMark( const char* name, QueueType type )
{
assert( type == QueueType::FrameMarkMsg || type == QueueType::FrameMarkMsgStart || type == QueueType::FrameMarkMsgEnd );
assert( type == QueueType::FrameMarkMsgStart || type == QueueType::FrameMarkMsgEnd );
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
auto item = QueueSerial();
MemWrite( &item->hdr.type, type );
MemWrite( &item->frameMark.time, GetTime() );
MemWrite( &item->frameMark.name, uint64_t( name ) );
tail.store( magic + 1, std::memory_order_release );
QueueSerialFinish();
}
static tracy_force_inline void SendFrameImage( const void* image, uint16_t w, uint16_t h, uint8_t offset, bool flip )
{
auto& profiler = GetProfiler();
#ifdef TRACY_ON_DEMAND
if( !profiler.IsConnected() ) return;
#endif
const auto sz = size_t( w ) * size_t( h ) * 4;
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, image, sz );
profiler.m_fiLock.lock();
auto fi = profiler.m_fiQueue.prepare_next();
fi->image = ptr;
fi->frame = profiler.m_frameCount.load( std::memory_order_relaxed ) - offset;
fi->w = w;
fi->h = h;
fi->flip = flip;
profiler.m_fiQueue.commit_next();
profiler.m_fiLock.unlock();
}
static tracy_force_inline void PlotData( const char* name, int64_t val )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::PlotData );
TracyLfqPrepare( QueueType::PlotData );
MemWrite( &item->plotData.name, (uint64_t)name );
MemWrite( &item->plotData.time, GetTime() );
MemWrite( &item->plotData.type, PlotDataType::Int );
MemWrite( &item->plotData.data.i, val );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
static tracy_force_inline void PlotData( const char* name, float val )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::PlotData );
TracyLfqPrepare( QueueType::PlotData );
MemWrite( &item->plotData.name, (uint64_t)name );
MemWrite( &item->plotData.time, GetTime() );
MemWrite( &item->plotData.type, PlotDataType::Float );
MemWrite( &item->plotData.data.f, val );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
static tracy_force_inline void PlotData( const char* name, double val )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::PlotData );
TracyLfqPrepare( QueueType::PlotData );
MemWrite( &item->plotData.name, (uint64_t)name );
MemWrite( &item->plotData.time, GetTime() );
MemWrite( &item->plotData.type, PlotDataType::Double );
MemWrite( &item->plotData.data.d, val );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
static tracy_force_inline void Message( const char* txt, size_t size )
static tracy_force_inline void ConfigurePlot( const char* name, PlotFormatType type )
{
TracyLfqPrepare( QueueType::PlotConfig );
MemWrite( &item->plotConfig.name, (uint64_t)name );
MemWrite( &item->plotConfig.type, (uint8_t)type );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
static tracy_force_inline void Message( const char* txt, size_t size, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
TracyLfqPrepare( callstack == 0 ? QueueType::Message : QueueType::MessageCallstack );
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::Message );
MemWrite( &item->message.time, GetTime() );
MemWrite( &item->message.thread, GetThreadHandle() );
MemWrite( &item->message.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void Message( const char* txt )
static tracy_force_inline void Message( const char* txt, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::MessageLiteral );
TracyLfqPrepare( callstack == 0 ? QueueType::MessageLiteral : QueueType::MessageLiteralCallstack );
MemWrite( &item->message.time, GetTime() );
MemWrite( &item->message.thread, GetThreadHandle() );
MemWrite( &item->message.text, (uint64_t)txt );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void MessageColor( const char* txt, size_t size, uint32_t color, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
TracyLfqPrepare( callstack == 0 ? QueueType::MessageColor : QueueType::MessageColorCallstack );
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
MemWrite( &item->messageColor.time, GetTime() );
MemWrite( &item->messageColor.text, (uint64_t)ptr );
MemWrite( &item->messageColor.r, uint8_t( ( color ) & 0xFF ) );
MemWrite( &item->messageColor.g, uint8_t( ( color >> 8 ) & 0xFF ) );
MemWrite( &item->messageColor.b, uint8_t( ( color >> 16 ) & 0xFF ) );
TracyLfqCommit;
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void MessageColor( const char* txt, uint32_t color, int callstack )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
TracyLfqPrepare( callstack == 0 ? QueueType::MessageLiteralColor : QueueType::MessageLiteralColorCallstack );
MemWrite( &item->messageColor.time, GetTime() );
MemWrite( &item->messageColor.text, (uint64_t)txt );
MemWrite( &item->messageColor.r, uint8_t( ( color ) & 0xFF ) );
MemWrite( &item->messageColor.g, uint8_t( ( color >> 8 ) & 0xFF ) );
MemWrite( &item->messageColor.b, uint8_t( ( color >> 16 ) & 0xFF ) );
TracyLfqCommit;
if( callstack != 0 ) tracy::GetProfiler().SendCallstack( callstack );
}
static tracy_force_inline void MessageAppInfo( const char* txt, size_t size )
{
InitRPMallocThread();
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
TracyLfqPrepare( QueueType::MessageAppInfo );
MemWrite( &item->message.time, GetTime() );
MemWrite( &item->message.text, (uint64_t)ptr );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
static tracy_force_inline void MemAlloc( const void* ptr, size_t size )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
const auto thread = GetThreadHandle();
s_profiler.m_serialLock.lock();
GetProfiler().m_serialLock.lock();
SendMemAlloc( QueueType::MemAlloc, thread, ptr, size );
s_profiler.m_serialLock.unlock();
GetProfiler().m_serialLock.unlock();
}
static tracy_force_inline void MemFree( const void* ptr )
{
#ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !GetProfiler().IsConnected() ) return;
#endif
const auto thread = GetThreadHandle();
s_profiler.m_serialLock.lock();
GetProfiler().m_serialLock.lock();
SendMemFree( QueueType::MemFree, thread, ptr );
s_profiler.m_serialLock.unlock();
GetProfiler().m_serialLock.unlock();
}
static tracy_force_inline void MemAllocCallstack( const void* ptr, size_t size, int depth )
{
#ifdef TRACY_HAS_CALLSTACK
auto& profiler = GetProfiler();
# ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !profiler.IsConnected() ) return;
# endif
const auto thread = GetThreadHandle();
rpmalloc_thread_initialize();
InitRPMallocThread();
auto callstack = Callstack( depth );
s_profiler.m_serialLock.lock();
profiler.m_serialLock.lock();
SendMemAlloc( QueueType::MemAllocCallstack, thread, ptr, size );
SendCallstackMemory( callstack );
s_profiler.m_serialLock.unlock();
profiler.m_serialLock.unlock();
#else
MemAlloc( ptr, size );
#endif
@@ -293,46 +396,64 @@ public:
static tracy_force_inline void MemFreeCallstack( const void* ptr, int depth )
{
#ifdef TRACY_HAS_CALLSTACK
auto& profiler = GetProfiler();
# ifdef TRACY_ON_DEMAND
if( !s_profiler.IsConnected() ) return;
if( !profiler.IsConnected() ) return;
# endif
const auto thread = GetThreadHandle();
rpmalloc_thread_initialize();
InitRPMallocThread();
auto callstack = Callstack( depth );
s_profiler.m_serialLock.lock();
profiler.m_serialLock.lock();
SendMemFree( QueueType::MemFreeCallstack, thread, ptr );
SendCallstackMemory( callstack );
s_profiler.m_serialLock.unlock();
profiler.m_serialLock.unlock();
#else
MemFree( ptr );
#endif
}
static tracy_force_inline void SendCallstack( int depth, uint64_t thread )
static tracy_force_inline void SendCallstack( int depth )
{
#ifdef TRACY_HAS_CALLSTACK
auto ptr = Callstack( depth );
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::Callstack );
MemWrite( &item->callstack.ptr, ptr );
MemWrite( &item->callstack.thread, thread );
tail.store( magic + 1, std::memory_order_release );
TracyLfqPrepare( QueueType::Callstack );
MemWrite( &item->callstack.ptr, (uint64_t)ptr );
TracyLfqCommit;
#endif
}
void SendCallstack( int depth, uint64_t thread, const char* skipBefore );
static tracy_force_inline void ParameterRegister( ParameterCallback cb ) { GetProfiler().m_paramCallback = cb; }
static tracy_force_inline void ParameterSetup( uint32_t idx, const char* name, bool isBool, int32_t val )
{
TracyLfqPrepare( QueueType::ParamSetup );
tracy::MemWrite( &item->paramSetup.idx, idx );
tracy::MemWrite( &item->paramSetup.name, (uint64_t)name );
tracy::MemWrite( &item->paramSetup.isBool, (uint8_t)isBool );
tracy::MemWrite( &item->paramSetup.val, val );
#ifdef TRACY_ON_DEMAND
GetProfiler().DeferItem( *item );
#endif
TracyLfqCommit;
}
void SendCallstack( int depth, const char* skipBefore );
static void CutCallstack( void* callstack, const char* skipBefore );
static bool ShouldExit();
#ifdef TRACY_ON_DEMAND
tracy_force_inline bool IsConnected()
tracy_force_inline bool IsConnected() const
{
return m_isConnected.load( std::memory_order_relaxed );
return m_isConnected.load( std::memory_order_acquire );
}
tracy_force_inline uint64_t ConnectionId() const
{
return m_connectionId.load( std::memory_order_acquire );
}
tracy_force_inline void DeferItem( const QueueItem& item )
@@ -347,18 +468,81 @@ public:
void RequestShutdown() { m_shutdown.store( true, std::memory_order_relaxed ); m_shutdownManual.store( true, std::memory_order_relaxed ); }
bool HasShutdownFinished() const { return m_shutdownFinished.load( std::memory_order_relaxed ); }
void SendString( uint64_t ptr, const char* str, QueueType type );
// Allocated source location data layout:
// 4b payload size
// 4b color
// 4b source line
// fsz function name
// 1b null terminator
// ssz source file name
// 1b null terminator
// nsz zone name (optional)
static tracy_force_inline uint64_t AllocSourceLocation( uint32_t line, const char* source, const char* function )
{
const auto fsz = strlen( function );
const auto ssz = strlen( source );
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memset( ptr + 4, 0, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, function, fsz+1 );
memcpy( ptr + 12 + fsz + 1, source, ssz + 1 );
return uint64_t( ptr );
}
static tracy_force_inline uint64_t AllocSourceLocation( uint32_t line, const char* source, const char* function, const char* name, size_t nameSz )
{
const auto fsz = strlen( function );
const auto ssz = strlen( source );
const uint32_t sz = uint32_t( 4 + 4 + 4 + fsz + 1 + ssz + 1 + nameSz );
auto ptr = (char*)tracy_malloc( sz );
memcpy( ptr, &sz, 4 );
memset( ptr + 4, 0, 4 );
memcpy( ptr + 8, &line, 4 );
memcpy( ptr + 12, function, fsz+1 );
memcpy( ptr + 12 + fsz + 1, source, ssz + 1 );
memcpy( ptr + 12 + fsz + 1 + ssz + 1, name, nameSz );
return uint64_t( ptr );
}
private:
enum DequeueStatus { Success, ConnectionLost, QueueEmpty };
enum class DequeueStatus { DataDequeued, ConnectionLost, QueueEmpty };
static void LaunchWorker( void* ptr ) { ((Profiler*)ptr)->Worker(); }
void Worker();
static void LaunchCompressWorker( void* ptr ) { ((Profiler*)ptr)->CompressWorker(); }
void CompressWorker();
void ClearQueues( tracy::moodycamel::ConsumerToken& token );
void ClearSerial();
DequeueStatus Dequeue( tracy::moodycamel::ConsumerToken& token );
DequeueStatus DequeueContextSwitches( tracy::moodycamel::ConsumerToken& token, int64_t& timeStop );
DequeueStatus DequeueSerial();
bool AppendData( const void* data, size_t len );
bool CommitData();
bool NeedDataSize( size_t len );
tracy_force_inline bool AppendData( const void* data, size_t len )
{
const auto ret = NeedDataSize( len );
AppendDataUnsafe( data, len );
return ret;
}
tracy_force_inline bool NeedDataSize( size_t len )
{
assert( len <= TargetFrameSize );
bool ret = true;
if( m_bufferOffset - m_bufferStart + len > TargetFrameSize )
{
ret = CommitData();
}
return ret;
}
tracy_force_inline void AppendDataUnsafe( const void* data, size_t len )
{
@@ -367,24 +551,32 @@ private:
}
bool SendData( const char* data, size_t len );
void SendString( uint64_t ptr, const char* str, QueueType type );
void SendLongString( uint64_t ptr, const char* str, size_t len, QueueType type );
void SendSourceLocation( uint64_t ptr );
void SendSourceLocationPayload( uint64_t ptr );
void SendCallstackPayload( uint64_t ptr );
void SendCallstackPayload64( uint64_t ptr );
void SendCallstackAlloc( uint64_t ptr );
void SendCallstackFrame( uint64_t ptr );
void SendCodeLocation( uint64_t ptr );
bool HandleServerQuery();
void HandleDisconnect();
void HandleParameter( uint64_t payload );
void HandleSymbolQuery( uint64_t symbol );
void HandleSymbolCodeQuery( uint64_t symbol, uint32_t size );
void CalibrateTimer();
void CalibrateDelay();
void ReportTopology();
static tracy_force_inline void SendCallstackMemory( void* ptr )
{
#ifdef TRACY_HAS_CALLSTACK
auto item = s_profiler.m_serialQueue.prepare_next();
auto item = GetProfiler().m_serialQueue.prepare_next();
MemWrite( &item->hdr.type, QueueType::CallstackMemory );
MemWrite( &item->callstackMemory.ptr, (uint64_t)ptr );
s_profiler.m_serialQueue.commit_next();
GetProfiler().m_serialQueue.commit_next();
#endif
}
@@ -392,7 +584,7 @@ private:
{
assert( type == QueueType::MemAlloc || type == QueueType::MemAllocCallstack );
auto item = s_profiler.m_serialQueue.prepare_next();
auto item = GetProfiler().m_serialQueue.prepare_next();
MemWrite( &item->hdr.type, type );
MemWrite( &item->memAlloc.time, GetTime() );
MemWrite( &item->memAlloc.thread, thread );
@@ -405,23 +597,28 @@ private:
else
{
assert( sizeof( size ) == 8 );
memcpy( &item->memAlloc.size, &size, 6 );
memcpy( &item->memAlloc.size, &size, 4 );
memcpy( ((char*)&item->memAlloc.size)+4, ((char*)&size)+4, 2 );
}
s_profiler.m_serialQueue.commit_next();
GetProfiler().m_serialQueue.commit_next();
}
static tracy_force_inline void SendMemFree( QueueType type, const uint64_t thread, const void* ptr )
{
assert( type == QueueType::MemFree || type == QueueType::MemFreeCallstack );
auto item = s_profiler.m_serialQueue.prepare_next();
auto item = GetProfiler().m_serialQueue.prepare_next();
MemWrite( &item->hdr.type, type );
MemWrite( &item->memFree.time, GetTime() );
MemWrite( &item->memFree.thread, thread );
MemWrite( &item->memFree.ptr, (uint64_t)ptr );
s_profiler.m_serialQueue.commit_next();
GetProfiler().m_serialQueue.commit_next();
}
#if ( defined _WIN32 || defined __CYGWIN__ ) && defined TRACY_TIMER_QPC
static int64_t GetTimeQpc();
#endif
double m_timerMul;
uint64_t m_resolution;
uint64_t m_delay;
@@ -432,28 +629,52 @@ private:
std::atomic<bool> m_shutdownManual;
std::atomic<bool> m_shutdownFinished;
Socket* m_sock;
UdpBroadcast* m_broadcast;
bool m_noExit;
uint32_t m_userPort;
std::atomic<uint32_t> m_zoneId;
int64_t m_samplingPeriod;
LZ4_stream_t* m_stream;
uint64_t m_threadCtx;
int64_t m_refTimeThread;
int64_t m_refTimeSerial;
int64_t m_refTimeCtx;
int64_t m_refTimeGpu;
void* m_stream; // LZ4_stream_t*
char* m_buffer;
int m_bufferOffset;
int m_bufferStart;
QueueItem* m_itemBuf;
char* m_lz4Buf;
FastVector<QueueItem> m_serialQueue, m_serialDequeue;
TracyMutex m_serialLock;
FastVector<FrameImageQueueItem> m_fiQueue, m_fiDequeue;
TracyMutex m_fiLock;
std::atomic<uint64_t> m_frameCount;
#ifdef TRACY_ON_DEMAND
std::atomic<bool> m_isConnected;
std::atomic<uint64_t> m_frameCount;
std::atomic<uint64_t> m_connectionId;
TracyMutex m_deferredLock;
FastVector<QueueItem> m_deferredQueue;
#endif
#ifdef TRACY_HAS_SYSTIME
void ProcessSysTime();
SysTime m_sysTime;
uint64_t m_sysTimeLast = 0;
#else
void ProcessSysTime() {}
#endif
ParameterCallback m_paramCallback;
};
};
}
#endif

View File

@@ -17,114 +17,96 @@ class ScopedZone
public:
tracy_force_inline ScopedZone( const SourceLocationData* srcloc, bool is_active = true )
#ifdef TRACY_ON_DEMAND
: m_active( s_profiler.IsConnected() )
: m_active( is_active && GetProfiler().IsConnected() )
#else
: m_active( is_active )
#endif
{
if( !m_active ) return;
const auto thread = GetThreadHandle();
m_thread = thread;
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBegin );
#ifdef TRACY_RDTSCP_OPT
MemWrite( &item->zoneBegin.time, Profiler::GetTime( item->zoneBegin.cpu ) );
#else
uint32_t cpu;
MemWrite( &item->zoneBegin.time, Profiler::GetTime( cpu ) );
MemWrite( &item->zoneBegin.cpu, cpu );
#ifdef TRACY_ON_DEMAND
m_connectionId = GetProfiler().ConnectionId();
#endif
MemWrite( &item->zoneBegin.thread, thread );
TracyLfqPrepare( QueueType::ZoneBegin );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)srcloc );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
tracy_force_inline ScopedZone( const SourceLocationData* srcloc, int depth, bool is_active = true )
#ifdef TRACY_ON_DEMAND
: m_active( s_profiler.IsConnected() )
: m_active( is_active && GetProfiler().IsConnected() )
#else
: m_active( is_active )
#endif
{
if( !m_active ) return;
const auto thread = GetThreadHandle();
m_thread = thread;
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneBeginCallstack );
#ifdef TRACY_RDTSCP_OPT
MemWrite( &item->zoneBegin.time, Profiler::GetTime( item->zoneBegin.cpu ) );
#else
uint32_t cpu;
MemWrite( &item->zoneBegin.time, Profiler::GetTime( cpu ) );
MemWrite( &item->zoneBegin.cpu, cpu );
#ifdef TRACY_ON_DEMAND
m_connectionId = GetProfiler().ConnectionId();
#endif
MemWrite( &item->zoneBegin.thread, thread );
TracyLfqPrepare( QueueType::ZoneBeginCallstack );
MemWrite( &item->zoneBegin.time, Profiler::GetTime() );
MemWrite( &item->zoneBegin.srcloc, (uint64_t)srcloc );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
s_profiler.SendCallstack( depth, thread );
GetProfiler().SendCallstack( depth );
}
tracy_force_inline ~ScopedZone()
{
if( !m_active ) return;
Magic magic;
auto& token = s_token.ptr;
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneEnd );
#ifdef TRACY_RDTSCP_OPT
MemWrite( &item->zoneEnd.time, Profiler::GetTime( item->zoneEnd.cpu ) );
#else
uint32_t cpu;
MemWrite( &item->zoneEnd.time, Profiler::GetTime( cpu ) );
MemWrite( &item->zoneEnd.cpu, cpu );
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
MemWrite( &item->zoneEnd.thread, m_thread );
tail.store( magic + 1, std::memory_order_release );
TracyLfqPrepare( QueueType::ZoneEnd );
MemWrite( &item->zoneEnd.time, Profiler::GetTime() );
TracyLfqCommit;
}
tracy_force_inline void Text( const char* txt, size_t size )
{
if( !m_active ) return;
Magic magic;
auto& token = s_token.ptr;
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneText );
MemWrite( &item->zoneText.thread, m_thread );
TracyLfqPrepare( QueueType::ZoneText );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
tracy_force_inline void Name( const char* txt, size_t size )
{
if( !m_active ) return;
Magic magic;
auto& token = s_token.ptr;
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
auto ptr = (char*)tracy_malloc( size+1 );
memcpy( ptr, txt, size );
ptr[size] = '\0';
auto& tail = token->get_tail_index();
auto item = token->enqueue_begin<tracy::moodycamel::CanAlloc>( magic );
MemWrite( &item->hdr.type, QueueType::ZoneName );
MemWrite( &item->zoneText.thread, m_thread );
TracyLfqPrepare( QueueType::ZoneName );
MemWrite( &item->zoneText.text, (uint64_t)ptr );
tail.store( magic + 1, std::memory_order_release );
TracyLfqCommit;
}
tracy_force_inline void Value( uint64_t value )
{
if( !m_active ) return;
#ifdef TRACY_ON_DEMAND
if( GetProfiler().ConnectionId() != m_connectionId ) return;
#endif
TracyLfqPrepare( QueueType::ZoneValue );
MemWrite( &item->zoneValue.value, value );
TracyLfqCommit;
}
private:
uint64_t m_thread;
const bool m_active;
#ifdef TRACY_ON_DEMAND
uint64_t m_connectionId;
#endif
};
}

108
client/TracySysTime.cpp Normal file
View File

@@ -0,0 +1,108 @@
#include "TracySysTime.hpp"
#ifdef TRACY_HAS_SYSTIME
# if defined _WIN32 || defined __CYGWIN__
# include <windows.h>
# elif defined __linux__
# include <stdio.h>
# include <inttypes.h>
# elif defined __APPLE__
# include <mach/mach_host.h>
# include <mach/host_info.h>
# elif defined BSD
# include <sys/types.h>
# include <sys/sysctl.h>
# endif
namespace tracy
{
# if defined _WIN32 || defined __CYGWIN__
static inline uint64_t ConvertTime( const FILETIME& t )
{
return ( uint64_t( t.dwHighDateTime ) << 32 ) | uint64_t( t.dwLowDateTime );
}
void SysTime::ReadTimes()
{
FILETIME idleTime;
FILETIME kernelTime;
FILETIME userTime;
GetSystemTimes( &idleTime, &kernelTime, &userTime );
idle = ConvertTime( idleTime );
const auto kernel = ConvertTime( kernelTime );
const auto user = ConvertTime( userTime );
used = kernel + user;
}
# elif defined __linux__
void SysTime::ReadTimes()
{
uint64_t user, nice, system;
FILE* f = fopen( "/proc/stat", "r" );
if( f )
{
int read = fscanf( f, "cpu %" PRIu64 " %" PRIu64 " %" PRIu64" %" PRIu64, &user, &nice, &system, &idle );
fclose( f );
if (read == 4)
{
used = user + nice + system;
}
}
}
# elif defined __APPLE__
void SysTime::ReadTimes()
{
host_cpu_load_info_data_t info;
mach_msg_type_number_t cnt = HOST_CPU_LOAD_INFO_COUNT;
host_statistics( mach_host_self(), HOST_CPU_LOAD_INFO, reinterpret_cast<host_info_t>( &info ), &cnt );
used = info.cpu_ticks[CPU_STATE_USER] + info.cpu_ticks[CPU_STATE_NICE] + info.cpu_ticks[CPU_STATE_SYSTEM];
idle = info.cpu_ticks[CPU_STATE_IDLE];
}
# elif defined BSD
void SysTime::ReadTimes()
{
u_long data[5];
size_t sz = sizeof( data );
sysctlbyname( "kern.cp_time", &data, &sz, nullptr, 0 );
used = data[0] + data[1] + data[2] + data[3];
idle = data[4];
}
#endif
SysTime::SysTime()
{
ReadTimes();
}
float SysTime::Get()
{
const auto oldUsed = used;
const auto oldIdle = idle;
ReadTimes();
const auto diffIdle = idle - oldIdle;
const auto diffUsed = used - oldUsed;
#if defined _WIN32 || defined __CYGWIN__
return diffUsed == 0 ? -1 : ( diffUsed - diffIdle ) * 100.f / diffUsed;
#elif defined __linux__ || defined __APPLE__ || defined BSD
const auto total = diffUsed + diffIdle;
return total == 0 ? -1 : diffUsed * 100.f / total;
#endif
}
}
#endif

36
client/TracySysTime.hpp Normal file
View File

@@ -0,0 +1,36 @@
#ifndef __TRACYSYSTIME_HPP__
#define __TRACYSYSTIME_HPP__
#if defined _WIN32 || defined __CYGWIN__ || defined __linux__ || defined __APPLE__
# define TRACY_HAS_SYSTIME
#else
# include <sys/param.h>
#endif
#ifdef BSD
# define TRACY_HAS_SYSTIME
#endif
#ifdef TRACY_HAS_SYSTIME
#include <stdint.h>
namespace tracy
{
class SysTime
{
public:
SysTime();
float Get();
void ReadTimes();
private:
uint64_t idle, used;
};
}
#endif
#endif

954
client/TracySysTrace.cpp Normal file
View File

@@ -0,0 +1,954 @@
#include "TracySysTrace.hpp"
#ifdef TRACY_HAS_SYSTEM_TRACING
# if defined _WIN32 || defined __CYGWIN__
# ifndef NOMINMAX
# define NOMINMAX
# endif
# define INITGUID
# include <assert.h>
# include <string.h>
# include <windows.h>
# include <dbghelp.h>
# include <evntrace.h>
# include <evntcons.h>
# include <psapi.h>
# include <winternl.h>
# include "../common/TracyAlloc.hpp"
# include "../common/TracySystem.hpp"
# include "TracyProfiler.hpp"
namespace tracy
{
DEFINE_GUID ( /* ce1dbfb4-137e-4da6-87b0-3f59aa102cbc */
PerfInfoGuid,
0xce1dbfb4,
0x137e,
0x4da6,
0x87, 0xb0, 0x3f, 0x59, 0xaa, 0x10, 0x2c, 0xbc
);
static TRACEHANDLE s_traceHandle;
static TRACEHANDLE s_traceHandle2;
static EVENT_TRACE_PROPERTIES* s_prop;
static DWORD s_pid;
struct CSwitch
{
uint32_t newThreadId;
uint32_t oldThreadId;
int8_t newThreadPriority;
int8_t oldThreadPriority;
uint8_t previousCState;
int8_t spareByte;
int8_t oldThreadWaitReason;
int8_t oldThreadWaitMode;
int8_t oldThreadState;
int8_t oldThreadWaitIdealProcessor;
uint32_t newThreadWaitTime;
uint32_t reserved;
};
struct ReadyThread
{
uint32_t threadId;
int8_t adjustReason;
int8_t adjustIncrement;
int8_t flag;
int8_t reserverd;
};
struct ThreadTrace
{
uint32_t processId;
uint32_t threadId;
uint32_t stackBase;
uint32_t stackLimit;
uint32_t userStackBase;
uint32_t userStackLimit;
uint32_t startAddr;
uint32_t win32StartAddr;
uint32_t tebBase;
uint32_t subProcessTag;
};
struct StackWalkEvent
{
uint64_t eventTimeStamp;
uint32_t stackProcess;
uint32_t stackThread;
uint64_t stack[192];
};
#ifdef __CYGWIN__
extern "C" typedef DWORD (WINAPI *t_GetProcessIdOfThread)( HANDLE );
extern "C" typedef DWORD (WINAPI *t_GetProcessImageFileNameA)( HANDLE, LPSTR, DWORD );
extern "C" ULONG WMIAPI TraceSetInformation(TRACEHANDLE SessionHandle, TRACE_INFO_CLASS InformationClass, PVOID TraceInformation, ULONG InformationLength);
t_GetProcessIdOfThread GetProcessIdOfThread = (t_GetProcessIdOfThread)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "GetProcessIdOfThread" );
t_GetProcessImageFileNameA GetProcessImageFileNameA = (t_GetProcessImageFileNameA)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "K32GetProcessImageFileNameA" );
#endif
extern "C" typedef NTSTATUS (WINAPI *t_NtQueryInformationThread)( HANDLE, THREADINFOCLASS, PVOID, ULONG, PULONG );
extern "C" typedef BOOL (WINAPI *t_EnumProcessModules)( HANDLE, HMODULE*, DWORD, LPDWORD );
extern "C" typedef BOOL (WINAPI *t_GetModuleInformation)( HANDLE, HMODULE, LPMODULEINFO, DWORD );
extern "C" typedef DWORD (WINAPI *t_GetModuleBaseNameA)( HANDLE, HMODULE, LPSTR, DWORD );
extern "C" typedef HRESULT (WINAPI *t_GetThreadDescription)( HANDLE, PWSTR* );
t_NtQueryInformationThread NtQueryInformationThread = (t_NtQueryInformationThread)GetProcAddress( GetModuleHandleA( "ntdll.dll" ), "NtQueryInformationThread" );
t_EnumProcessModules _EnumProcessModules = (t_EnumProcessModules)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "K32EnumProcessModules" );
t_GetModuleInformation _GetModuleInformation = (t_GetModuleInformation)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "K32GetModuleInformation" );
t_GetModuleBaseNameA _GetModuleBaseNameA = (t_GetModuleBaseNameA)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "K32GetModuleBaseNameA" );
static t_GetThreadDescription _GetThreadDescription = 0;
void WINAPI EventRecordCallback( PEVENT_RECORD record )
{
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) return;
#endif
const auto& hdr = record->EventHeader;
switch( hdr.ProviderId.Data1 )
{
case 0x3d6fa8d1: // Thread Guid
if( hdr.EventDescriptor.Opcode == 36 )
{
const auto cswitch = (const CSwitch*)record->UserData;
TracyLfqPrepare( QueueType::ContextSwitch );
MemWrite( &item->contextSwitch.time, hdr.TimeStamp.QuadPart );
memcpy( &item->contextSwitch.oldThread, &cswitch->oldThreadId, sizeof( cswitch->oldThreadId ) );
memcpy( &item->contextSwitch.newThread, &cswitch->newThreadId, sizeof( cswitch->newThreadId ) );
memset( ((char*)&item->contextSwitch.oldThread)+4, 0, 4 );
memset( ((char*)&item->contextSwitch.newThread)+4, 0, 4 );
MemWrite( &item->contextSwitch.cpu, record->BufferContext.ProcessorNumber );
MemWrite( &item->contextSwitch.reason, cswitch->oldThreadWaitReason );
MemWrite( &item->contextSwitch.state, cswitch->oldThreadState );
TracyLfqCommit;
}
else if( hdr.EventDescriptor.Opcode == 50 )
{
const auto rt = (const ReadyThread*)record->UserData;
TracyLfqPrepare( QueueType::ThreadWakeup );
MemWrite( &item->threadWakeup.time, hdr.TimeStamp.QuadPart );
memcpy( &item->threadWakeup.thread, &rt->threadId, sizeof( rt->threadId ) );
memset( ((char*)&item->threadWakeup.thread)+4, 0, 4 );
TracyLfqCommit;
}
else if( hdr.EventDescriptor.Opcode == 1 || hdr.EventDescriptor.Opcode == 3 )
{
const auto tt = (const ThreadTrace*)record->UserData;
uint64_t tid = tt->threadId;
if( tid == 0 ) return;
uint64_t pid = tt->processId;
TracyLfqPrepare( QueueType::TidToPid );
MemWrite( &item->tidToPid.tid, tid );
MemWrite( &item->tidToPid.pid, pid );
TracyLfqCommit;
}
break;
case 0xdef2fe46: // StackWalk Guid
if( hdr.EventDescriptor.Opcode == 32 )
{
const auto sw = (const StackWalkEvent*)record->UserData;
if( sw->stackProcess == s_pid && ( sw->stack[0] & 0x8000000000000000 ) == 0 )
{
const uint64_t sz = ( record->UserDataLength - 16 ) / 8;
if( sz > 0 )
{
auto trace = (uint64_t*)tracy_malloc( ( 1 + sz ) * sizeof( uint64_t ) );
memcpy( trace, &sz, sizeof( uint64_t ) );
memcpy( trace+1, sw->stack, sizeof( uint64_t ) * sz );
TracyLfqPrepare( QueueType::CallstackSample );
MemWrite( &item->callstackSample.time, sw->eventTimeStamp );
MemWrite( &item->callstackSample.thread, (uint64_t)sw->stackThread );
MemWrite( &item->callstackSample.ptr, (uint64_t)trace );
TracyLfqCommit;
}
}
}
break;
default:
break;
}
}
bool SysTraceStart( int64_t& samplingPeriod )
{
if( !_GetThreadDescription ) _GetThreadDescription = (t_GetThreadDescription)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "GetThreadDescription" );
s_pid = GetCurrentProcessId();
#if defined _WIN64
constexpr bool isOs64Bit = true;
#else
BOOL _iswow64;
IsWow64Process( GetCurrentProcess(), &_iswow64 );
const bool isOs64Bit = _iswow64;
#endif
TOKEN_PRIVILEGES priv = {};
priv.PrivilegeCount = 1;
priv.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
if( LookupPrivilegeValue( nullptr, SE_SYSTEM_PROFILE_NAME, &priv.Privileges[0].Luid ) == 0 ) return false;
HANDLE pt;
if( OpenProcessToken( GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES, &pt ) == 0 ) return false;
const auto adjust = AdjustTokenPrivileges( pt, FALSE, &priv, 0, nullptr, nullptr );
CloseHandle( pt );
if( adjust == 0 ) return false;
const auto status = GetLastError();
if( status != ERROR_SUCCESS ) return false;
if( isOs64Bit )
{
TRACE_PROFILE_INTERVAL interval = {};
interval.Interval = 1250; // 8 kHz
const auto intervalStatus = TraceSetInformation( 0, TraceSampledProfileIntervalInfo, &interval, sizeof( interval ) );
if( intervalStatus != ERROR_SUCCESS ) return false;
samplingPeriod = 125*1000;
}
const auto psz = sizeof( EVENT_TRACE_PROPERTIES ) + sizeof( KERNEL_LOGGER_NAME );
s_prop = (EVENT_TRACE_PROPERTIES*)tracy_malloc( psz );
memset( s_prop, 0, sizeof( EVENT_TRACE_PROPERTIES ) );
ULONG flags = EVENT_TRACE_FLAG_CSWITCH | EVENT_TRACE_FLAG_DISPATCHER | EVENT_TRACE_FLAG_THREAD;
if( isOs64Bit ) flags |= EVENT_TRACE_FLAG_PROFILE;
s_prop->EnableFlags = flags;
s_prop->LogFileMode = EVENT_TRACE_REAL_TIME_MODE;
s_prop->Wnode.BufferSize = psz;
s_prop->Wnode.Flags = WNODE_FLAG_TRACED_GUID;
#ifdef TRACY_TIMER_QPC
s_prop->Wnode.ClientContext = 1;
#else
s_prop->Wnode.ClientContext = 3;
#endif
s_prop->Wnode.Guid = SystemTraceControlGuid;
s_prop->BufferSize = 1024;
s_prop->LoggerNameOffset = sizeof( EVENT_TRACE_PROPERTIES );
memcpy( ((char*)s_prop) + sizeof( EVENT_TRACE_PROPERTIES ), KERNEL_LOGGER_NAME, sizeof( KERNEL_LOGGER_NAME ) );
auto backup = tracy_malloc( psz );
memcpy( backup, s_prop, psz );
const auto controlStatus = ControlTrace( 0, KERNEL_LOGGER_NAME, s_prop, EVENT_TRACE_CONTROL_STOP );
if( controlStatus != ERROR_SUCCESS && controlStatus != ERROR_WMI_INSTANCE_NOT_FOUND )
{
tracy_free( s_prop );
return false;
}
memcpy( s_prop, backup, psz );
tracy_free( backup );
const auto startStatus = StartTrace( &s_traceHandle, KERNEL_LOGGER_NAME, s_prop );
if( startStatus != ERROR_SUCCESS )
{
tracy_free( s_prop );
return false;
}
if( isOs64Bit )
{
CLASSIC_EVENT_ID stackId;
stackId.EventGuid = PerfInfoGuid;
stackId.Type = 46;
const auto stackStatus = TraceSetInformation( s_traceHandle, TraceStackTracingInfo, &stackId, sizeof( stackId ) );
if( stackStatus != ERROR_SUCCESS )
{
tracy_free( s_prop );
return false;
}
}
#ifdef UNICODE
WCHAR KernelLoggerName[sizeof( KERNEL_LOGGER_NAME )];
#else
char KernelLoggerName[sizeof( KERNEL_LOGGER_NAME )];
#endif
memcpy( KernelLoggerName, KERNEL_LOGGER_NAME, sizeof( KERNEL_LOGGER_NAME ) );
EVENT_TRACE_LOGFILE log = {};
log.LoggerName = KernelLoggerName;
log.ProcessTraceMode = PROCESS_TRACE_MODE_REAL_TIME | PROCESS_TRACE_MODE_EVENT_RECORD | PROCESS_TRACE_MODE_RAW_TIMESTAMP;
log.EventRecordCallback = EventRecordCallback;
s_traceHandle2 = OpenTrace( &log );
if( s_traceHandle2 == (TRACEHANDLE)INVALID_HANDLE_VALUE )
{
CloseTrace( s_traceHandle );
tracy_free( s_prop );
return false;
}
return true;
}
void SysTraceStop()
{
CloseTrace( s_traceHandle2 );
CloseTrace( s_traceHandle );
}
void SysTraceWorker( void* ptr )
{
SetThreadName( "Tracy SysTrace" );
ProcessTrace( &s_traceHandle2, 1, 0, 0 );
ControlTrace( 0, KERNEL_LOGGER_NAME, s_prop, EVENT_TRACE_CONTROL_STOP );
tracy_free( s_prop );
}
void SysTraceSendExternalName( uint64_t thread )
{
bool threadSent = false;
auto hnd = OpenThread( THREAD_QUERY_INFORMATION, FALSE, DWORD( thread ) );
if( hnd == 0 )
{
hnd = OpenThread( THREAD_QUERY_LIMITED_INFORMATION, FALSE, DWORD( thread ) );
}
if( hnd != 0 )
{
PWSTR tmp;
_GetThreadDescription( hnd, &tmp );
char buf[256];
if( tmp )
{
auto ret = wcstombs( buf, tmp, 256 );
if( ret != 0 )
{
GetProfiler().SendString( thread, buf, QueueType::ExternalThreadName );
threadSent = true;
}
}
const auto pid = GetProcessIdOfThread( hnd );
if( !threadSent && NtQueryInformationThread && _EnumProcessModules && _GetModuleInformation && _GetModuleBaseNameA )
{
void* ptr;
ULONG retlen;
auto status = NtQueryInformationThread( hnd, (THREADINFOCLASS)9 /*ThreadQuerySetWin32StartAddress*/, &ptr, sizeof( &ptr ), &retlen );
if( status == 0 )
{
const auto phnd = OpenProcess( PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, pid );
if( phnd != INVALID_HANDLE_VALUE )
{
HMODULE modules[1024];
DWORD needed;
if( _EnumProcessModules( phnd, modules, 1024 * sizeof( HMODULE ), &needed ) != 0 )
{
const auto sz = std::min( DWORD( needed / sizeof( HMODULE ) ), DWORD( 1024 ) );
for( DWORD i=0; i<sz; i++ )
{
MODULEINFO info;
if( _GetModuleInformation( phnd, modules[i], &info, sizeof( info ) ) != 0 )
{
if( (uint64_t)ptr >= (uint64_t)info.lpBaseOfDll && (uint64_t)ptr <= (uint64_t)info.lpBaseOfDll + (uint64_t)info.SizeOfImage )
{
char buf2[1024];
if( _GetModuleBaseNameA( phnd, modules[i], buf2, 1024 ) != 0 )
{
GetProfiler().SendString( thread, buf2, QueueType::ExternalThreadName );
threadSent = true;
}
}
}
}
}
CloseHandle( phnd );
}
}
}
CloseHandle( hnd );
if( !threadSent )
{
GetProfiler().SendString( thread, "???", QueueType::ExternalThreadName );
threadSent = true;
}
if( pid != 0 )
{
{
uint64_t _pid = pid;
TracyLfqPrepare( QueueType::TidToPid );
MemWrite( &item->tidToPid.tid, thread );
MemWrite( &item->tidToPid.pid, _pid );
TracyLfqCommit;
}
if( pid == 4 )
{
GetProfiler().SendString( thread, "System", QueueType::ExternalName );
return;
}
else
{
const auto phnd = OpenProcess( PROCESS_QUERY_LIMITED_INFORMATION, FALSE, pid );
if( phnd != INVALID_HANDLE_VALUE )
{
char buf2[1024];
const auto sz = GetProcessImageFileNameA( phnd, buf2, 1024 );
CloseHandle( phnd );
if( sz != 0 )
{
auto ptr = buf2 + sz - 1;
while( ptr > buf2 && *ptr != '\\' ) ptr--;
if( *ptr == '\\' ) ptr++;
GetProfiler().SendString( thread, ptr, QueueType::ExternalName );
return;
}
}
}
}
}
if( !threadSent )
{
GetProfiler().SendString( thread, "???", QueueType::ExternalThreadName );
}
GetProfiler().SendString( thread, "???", QueueType::ExternalName );
}
}
# elif defined __linux__
# include <sys/types.h>
# include <sys/stat.h>
# include <sys/wait.h>
# include <fcntl.h>
# include <inttypes.h>
# include <limits>
# include <poll.h>
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <unistd.h>
# include <atomic>
# include "TracyProfiler.hpp"
# ifdef __ANDROID__
# include "TracySysTracePayload.hpp"
# endif
namespace tracy
{
static const char BasePath[] = "/sys/kernel/debug/tracing/";
static const char TracingOn[] = "tracing_on";
static const char CurrentTracer[] = "current_tracer";
static const char TraceOptions[] = "trace_options";
static const char TraceClock[] = "trace_clock";
static const char SchedSwitch[] = "events/sched/sched_switch/enable";
static const char SchedWakeup[] = "events/sched/sched_wakeup/enable";
static const char BufferSizeKb[] = "buffer_size_kb";
static const char TracePipe[] = "trace_pipe";
static std::atomic<bool> traceActive { false };
#ifdef __ANDROID__
static bool TraceWrite( const char* path, size_t psz, const char* val, size_t vsz )
{
char tmp[256];
sprintf( tmp, "su -c 'echo \"%s\" > %s%s'", val, BasePath, path );
return system( tmp ) == 0;
}
#else
static bool TraceWrite( const char* path, size_t psz, const char* val, size_t vsz )
{
char tmp[256];
memcpy( tmp, BasePath, sizeof( BasePath ) - 1 );
memcpy( tmp + sizeof( BasePath ) - 1, path, psz );
int fd = open( tmp, O_WRONLY );
if( fd < 0 ) return false;
for(;;)
{
ssize_t cnt = write( fd, val, vsz );
if( cnt == (ssize_t)vsz )
{
close( fd );
return true;
}
if( cnt < 0 )
{
close( fd );
return false;
}
vsz -= cnt;
val += cnt;
}
}
#endif
#ifdef __ANDROID__
void SysTraceInjectPayload()
{
int pipefd[2];
if( pipe( pipefd ) == 0 )
{
const auto pid = fork();
if( pid == 0 )
{
// child
close( pipefd[1] );
if( dup2( pipefd[0], STDIN_FILENO ) >= 0 )
{
close( pipefd[0] );
execlp( "su", "su", "-c", "cat > /data/tracy_systrace", (char*)nullptr );
exit( 1 );
}
}
else if( pid > 0 )
{
// parent
close( pipefd[0] );
#ifdef __aarch64__
write( pipefd[1], tracy_systrace_aarch64_data, tracy_systrace_aarch64_size );
#else
write( pipefd[1], tracy_systrace_armv7_data, tracy_systrace_armv7_size );
#endif
close( pipefd[1] );
waitpid( pid, nullptr, 0 );
system( "su -c 'chmod 700 /data/tracy_systrace'" );
}
}
}
#endif
bool SysTraceStart( int64_t& samplingPeriod )
{
if( !TraceWrite( TracingOn, sizeof( TracingOn ), "0", 2 ) ) return false;
if( !TraceWrite( CurrentTracer, sizeof( CurrentTracer ), "nop", 4 ) ) return false;
TraceWrite( TraceOptions, sizeof( TraceOptions ), "norecord-cmd", 13 );
TraceWrite( TraceOptions, sizeof( TraceOptions ), "norecord-tgid", 14 );
TraceWrite( TraceOptions, sizeof( TraceOptions ), "noirq-info", 11 );
TraceWrite( TraceOptions, sizeof( TraceOptions ), "noannotate", 11 );
#if defined TRACY_HW_TIMER && ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 )
if( !TraceWrite( TraceClock, sizeof( TraceClock ), "x86-tsc", 8 ) ) return false;
#elif __ARM_ARCH >= 6
if( !TraceWrite( TraceClock, sizeof( TraceClock ), "mono_raw", 9 ) ) return false;
#endif
if( !TraceWrite( SchedSwitch, sizeof( SchedSwitch ), "1", 2 ) ) return false;
if( !TraceWrite( SchedWakeup, sizeof( SchedWakeup ), "1", 2 ) ) return false;
if( !TraceWrite( BufferSizeKb, sizeof( BufferSizeKb ), "512", 4 ) ) return false;
#if defined __ANDROID__ && ( defined __aarch64__ || defined __ARM_ARCH )
SysTraceInjectPayload();
#endif
if( !TraceWrite( TracingOn, sizeof( TracingOn ), "1", 2 ) ) return false;
traceActive.store( true, std::memory_order_relaxed );
return true;
}
void SysTraceStop()
{
TraceWrite( TracingOn, sizeof( TracingOn ), "0", 2 );
traceActive.store( false, std::memory_order_relaxed );
}
static uint64_t ReadNumber( const char*& ptr )
{
uint64_t val = 0;
for(;;)
{
if( *ptr >= '0' && *ptr <= '9' )
{
val = val * 10 + ( *ptr - '0' );
ptr++;
}
else
{
return val;
}
}
}
static uint8_t ReadState( char state )
{
switch( state )
{
case 'D': return 101;
case 'I': return 102;
case 'R': return 103;
case 'S': return 104;
case 'T': return 105;
case 't': return 106;
case 'W': return 107;
case 'X': return 108;
case 'Z': return 109;
default: return 100;
}
}
#if defined __ANDROID__ && defined __ANDROID_API__ && __ANDROID_API__ < 18
/*-
* Copyright (c) 2011 The NetBSD Foundation, Inc.
* All rights reserved.
*
* This code is derived from software contributed to The NetBSD Foundation
* by Christos Zoulas.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
ssize_t getdelim(char **buf, size_t *bufsiz, int delimiter, FILE *fp)
{
char *ptr, *eptr;
if (*buf == NULL || *bufsiz == 0) {
*bufsiz = BUFSIZ;
if ((*buf = (char*)malloc(*bufsiz)) == NULL)
return -1;
}
for (ptr = *buf, eptr = *buf + *bufsiz;;) {
int c = fgetc(fp);
if (c == -1) {
if (feof(fp))
return ptr == *buf ? -1 : ptr - *buf;
else
return -1;
}
*ptr++ = c;
if (c == delimiter) {
*ptr = '\0';
return ptr - *buf;
}
if (ptr + 2 >= eptr) {
char *nbuf;
size_t nbufsiz = *bufsiz * 2;
ssize_t d = ptr - *buf;
if ((nbuf = (char*)realloc(*buf, nbufsiz)) == NULL)
return -1;
*buf = nbuf;
*bufsiz = nbufsiz;
eptr = nbuf + nbufsiz;
ptr = nbuf + d;
}
}
}
ssize_t getline(char **buf, size_t *bufsiz, FILE *fp)
{
return getdelim(buf, bufsiz, '\n', fp);
}
#endif
static void HandleTraceLine( const char* line )
{
line += 23;
while( *line != '[' ) line++;
line++;
const auto cpu = (uint8_t)ReadNumber( line );
line++; // ']'
while( *line == ' ' ) line++;
#if defined TRACY_HW_TIMER && ( defined __i386 || defined _M_IX86 || defined __x86_64__ || defined _M_X64 )
const auto time = ReadNumber( line );
#elif __ARM_ARCH >= 6
const auto ts = ReadNumber( line );
line++; // '.'
const auto tus = ReadNumber( line );
const auto time = ts * 1000000000ll + tus * 1000ll;
#endif
line += 2; // ': '
if( memcmp( line, "sched_switch", 12 ) == 0 )
{
line += 14;
while( memcmp( line, "prev_pid", 8 ) != 0 ) line++;
line += 9;
const auto oldPid = ReadNumber( line );
line++;
while( memcmp( line, "prev_state", 10 ) != 0 ) line++;
line += 11;
const auto oldState = (uint8_t)ReadState( *line );
line += 5;
while( memcmp( line, "next_pid", 8 ) != 0 ) line++;
line += 9;
const auto newPid = ReadNumber( line );
uint8_t reason = 100;
TracyLfqPrepare( QueueType::ContextSwitch );
MemWrite( &item->contextSwitch.time, time );
MemWrite( &item->contextSwitch.oldThread, oldPid );
MemWrite( &item->contextSwitch.newThread, newPid );
MemWrite( &item->contextSwitch.cpu, cpu );
MemWrite( &item->contextSwitch.reason, reason );
MemWrite( &item->contextSwitch.state, oldState );
TracyLfqCommit;
}
else if( memcmp( line, "sched_wakeup", 12 ) == 0 )
{
line += 14;
while( memcmp( line, "pid", 3 ) != 0 ) line++;
line += 4;
const auto pid = ReadNumber( line );
TracyLfqPrepare( QueueType::ThreadWakeup );
MemWrite( &item->threadWakeup.time, time );
MemWrite( &item->threadWakeup.thread, pid );
TracyLfqCommit;
}
}
#ifdef __ANDROID__
static void ProcessTraceLines( int fd )
{
// Linux pipe buffer is 64KB, additional 1KB is for unfinished lines
char* buf = (char*)tracy_malloc( (64+1)*1024 );
char* line = buf;
for(;;)
{
if( !traceActive.load( std::memory_order_relaxed ) ) break;
const auto rd = read( fd, line, 64*1024 );
if( rd <= 0 ) break;
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() )
{
if( rd < 64*1024 )
{
assert( line[rd-1] == '\n' );
line = buf;
std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
}
else
{
const auto end = line + rd;
line = end - 1;
while( line > buf && *line != '\n' ) line--;
if( line > buf )
{
line++;
const auto lsz = end - line;
memmove( buf, line, lsz );
line = buf + lsz;
}
}
continue;
}
#endif
const auto end = line + rd;
line = buf;
for(;;)
{
auto next = line;
while( next < end && *next != '\n' ) next++;
next++;
if( next >= end )
{
const auto lsz = end - line;
memmove( buf, line, lsz );
line = buf + lsz;
break;
}
HandleTraceLine( line );
line = next;
}
if( rd < 64*1024 )
{
std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
}
}
tracy_free( buf );
}
void SysTraceWorker( void* ptr )
{
SetThreadName( "Tracy SysTrace" );
int pipefd[2];
if( pipe( pipefd ) == 0 )
{
const auto pid = fork();
if( pid == 0 )
{
// child
close( pipefd[0] );
dup2( pipefd[1], STDERR_FILENO );
if( dup2( pipefd[1], STDOUT_FILENO ) >= 0 )
{
close( pipefd[1] );
#if defined __ANDROID__ && ( defined __aarch64__ || defined __ARM_ARCH )
execlp( "su", "su", "-c", "/data/tracy_systrace", (char*)nullptr );
#endif
execlp( "su", "su", "-c", "cat /sys/kernel/debug/tracing/trace_pipe", (char*)nullptr );
exit( 1 );
}
}
else if( pid > 0 )
{
// parent
close( pipefd[1] );
ProcessTraceLines( pipefd[0] );
close( pipefd[0] );
}
}
}
#else
static void ProcessTraceLines( int fd )
{
char* buf = (char*)tracy_malloc( 64*1024 );
struct pollfd pfd;
pfd.fd = fd;
pfd.events = POLLIN | POLLERR;
for(;;)
{
while( poll( &pfd, 1, 0 ) <= 0 )
{
if( !traceActive.load( std::memory_order_relaxed ) ) break;
std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
}
const auto rd = read( fd, buf, 64*1024 );
if( rd <= 0 ) break;
#ifdef TRACY_ON_DEMAND
if( !GetProfiler().IsConnected() ) continue;
#endif
auto line = buf;
const auto end = buf + rd;
for(;;)
{
auto next = line;
while( next < end && *next != '\n' ) next++;
if( next == end ) break;
assert( *next == '\n' );
next++;
HandleTraceLine( line );
line = next;
}
}
tracy_free( buf );
}
void SysTraceWorker( void* ptr )
{
SetThreadName( "Tracy SysTrace" );
char tmp[256];
memcpy( tmp, BasePath, sizeof( BasePath ) - 1 );
memcpy( tmp + sizeof( BasePath ) - 1, TracePipe, sizeof( TracePipe ) );
int fd = open( tmp, O_RDONLY );
if( fd < 0 ) return;
ProcessTraceLines( fd );
close( fd );
}
#endif
void SysTraceSendExternalName( uint64_t thread )
{
FILE* f;
char fn[256];
sprintf( fn, "/proc/%" PRIu64 "/comm", thread );
f = fopen( fn, "rb" );
if( f )
{
char buf[256];
const auto sz = fread( buf, 1, 256, f );
if( sz > 0 && buf[sz-1] == '\n' ) buf[sz-1] = '\0';
GetProfiler().SendString( thread, buf, QueueType::ExternalThreadName );
fclose( f );
}
else
{
GetProfiler().SendString( thread, "???", QueueType::ExternalThreadName );
}
sprintf( fn, "/proc/%" PRIu64 "/status", thread );
f = fopen( fn, "rb" );
if( f )
{
int pid = -1;
size_t lsz = 1024;
auto line = (char*)malloc( lsz );
for(;;)
{
auto rd = getline( &line, &lsz, f );
if( rd <= 0 ) break;
if( memcmp( "Tgid:\t", line, 6 ) == 0 )
{
pid = atoi( line + 6 );
break;
}
}
free( line );
fclose( f );
if( pid >= 0 )
{
{
uint64_t _pid = pid;
TracyLfqPrepare( QueueType::TidToPid );
MemWrite( &item->tidToPid.tid, thread );
MemWrite( &item->tidToPid.pid, _pid );
TracyLfqCommit;
}
sprintf( fn, "/proc/%i/comm", pid );
f = fopen( fn, "rb" );
if( f )
{
char buf[256];
const auto sz = fread( buf, 1, 256, f );
if( sz > 0 && buf[sz-1] == '\n' ) buf[sz-1] = '\0';
GetProfiler().SendString( thread, buf, QueueType::ExternalName );
fclose( f );
return;
}
}
}
GetProfiler().SendString( thread, "???", QueueType::ExternalName );
}
}
# endif
#endif

25
client/TracySysTrace.hpp Normal file
View File

@@ -0,0 +1,25 @@
#ifndef __TRACYSYSTRACE_HPP__
#define __TRACYSYSTRACE_HPP__
#if !defined TRACY_NO_SYSTEM_TRACING && ( defined _WIN32 || defined __CYGWIN__ || defined __linux__ )
# define TRACY_HAS_SYSTEM_TRACING
#endif
#ifdef TRACY_HAS_SYSTEM_TRACING
#include <stdint.h>
namespace tracy
{
bool SysTraceStart( int64_t& samplingPeriod );
void SysTraceStop();
void SysTraceWorker( void* ptr );
void SysTraceSendExternalName( uint64_t thread );
}
#endif
#endif

View File

@@ -0,0 +1,80 @@
// File: '/home/wolf/desktop/tracy_systrace.armv7' (1210 bytes)`
// File: '/home/wolf/desktop/tracy_systrace.aarch64' (1650 bytes)
// Exported using binary_to_compressed_c.cpp
namespace tracy
{
static const unsigned int tracy_systrace_armv7_size = 1210;
static const unsigned int tracy_systrace_armv7_data[1212/4] =
{
0x464c457f, 0x00010101, 0x00000000, 0x00000000, 0x00280003, 0x00000001, 0x00000208, 0x00000034, 0x00000000, 0x05000200, 0x00200034, 0x00280007,
0x00000000, 0x00000006, 0x00000034, 0x00000034, 0x00000034, 0x000000e0, 0x000000e0, 0x00000004, 0x00000004, 0x00000003, 0x00000114, 0x00000114,
0x00000114, 0x00000013, 0x00000013, 0x00000004, 0x00000001, 0x00000001, 0x00000000, 0x00000000, 0x00000000, 0x000003ed, 0x000003ed, 0x00000005,
0x00001000, 0x00000001, 0x000003ed, 0x000013ed, 0x000013ed, 0x000000cd, 0x000000cf, 0x00000006, 0x00001000, 0x00000002, 0x000003f0, 0x000013f0,
0x000013f0, 0x000000b8, 0x000000b8, 0x00000006, 0x00000004, 0x6474e551, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000006,
0x00000010, 0x70000001, 0x00000394, 0x00000394, 0x00000394, 0x00000008, 0x00000008, 0x00000004, 0x00000004, 0x7379732f, 0x2f6d6574, 0x2f6e6962,
0x6b6e696c, 0x00007265, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000001, 0x00000000, 0x00000000, 0x00000012, 0x00000016, 0x00000000,
0x00000000, 0x00000012, 0x6f6c6400, 0x006e6570, 0x4342494c, 0x62696c00, 0x732e6c64, 0x6c64006f, 0x006d7973, 0x00000001, 0x00000003, 0x00000001,
0x00000000, 0x00000000, 0x00000000, 0x00000001, 0x00000003, 0x00000002, 0x00000000, 0x00000000, 0x00000001, 0x00020000, 0x00000002, 0x00010001,
0x0000000d, 0x00000010, 0x00000000, 0x00050d63, 0x00020000, 0x00000008, 0x00000000, 0x000014b4, 0x00000116, 0x000014b8, 0x00000216, 0xe52de004,
0xe59fe004, 0xe08fe00e, 0xe5bef008, 0x000012bc, 0xe28fc600, 0xe28cca01, 0xe5bcf2bc, 0xe28fc600, 0xe28cca01, 0xe5bcf2b4, 0xe92d4ff0, 0xe28db01c,
0xe24dd01c, 0xe24dd801, 0xe59f0154, 0xe3a01001, 0xe08f0000, 0xebfffff1, 0xe59f1148, 0xe1a07000, 0xe08f1001, 0xebfffff0, 0xe59f113c, 0xe1a09000,
0xe1a00007, 0xe08f1001, 0xebffffeb, 0xe59f112c, 0xe1a04000, 0xe1a00007, 0xe08f1001, 0xebffffe6, 0xe59f111c, 0xe1a05000, 0xe1a00007, 0xe08f1001,
0xebffffe1, 0xe59f110c, 0xe1a06000, 0xe1a00007, 0xe08f1001, 0xebffffdc, 0xe58d0004, 0xe1a00007, 0xe59f10f4, 0xe08f1001, 0xebffffd7, 0xe1a0a000,
0xe59f00e8, 0xe3a01000, 0xe3a08000, 0xe08f0000, 0xe12fff39, 0xe1a07000, 0xe3700001, 0xca000001, 0xe3a00000, 0xe12fff34, 0xe3a00009, 0xe58d4000,
0xe1cd01b4, 0xe3090680, 0xe3400098, 0xe28d4010, 0xe58d000c, 0xe28d9018, 0xe58d8008, 0xe28d8008, 0xe58d7010, 0xea000003, 0xe1a02000, 0xe3a00001,
0xe1a01009, 0xe12fff3a, 0xe1a00004, 0xe3a01001, 0xe3a02000, 0xe12fff35, 0xe3500000, 0xca000008, 0xe1a00008, 0xe3a01000, 0xe12fff36, 0xe1a00004,
0xe3a01001, 0xe3a02000, 0xe12fff35, 0xe3500001, 0xbafffff6, 0xe59d3004, 0xe1a00007, 0xe1a01009, 0xe3a02801, 0xe12fff33, 0xe3500001, 0xaaffffe5,
0xe59d1000, 0xe3a00000, 0xe12fff31, 0xe24bd01c, 0xe8bd8ff0, 0x00000174, 0x0000016c, 0x0000015d, 0x0000014e, 0x0000013f, 0x00000135, 0x00000126,
0x00000114, 0x7ffffe74, 0x00000001, 0x6362696c, 0x006f732e, 0x6e65706f, 0x69786500, 0x6f700074, 0x6e006c6c, 0x736f6e61, 0x7065656c, 0x61657200,
0x72770064, 0x00657469, 0x7379732f, 0x72656b2f, 0x2f6c656e, 0x75626564, 0x72742f67, 0x6e696361, 0x72742f67, 0x5f656361, 0x65706970, 0x00000000,
0x00000003, 0x000014a8, 0x00000002, 0x00000010, 0x00000017, 0x000001cc, 0x00000014, 0x00000011, 0x00000015, 0x00000000, 0x00000006, 0x00000128,
0x0000000b, 0x00000010, 0x00000005, 0x00000158, 0x0000000a, 0x0000001c, 0x6ffffef5, 0x00000174, 0x00000004, 0x0000018c, 0x00000001, 0x0000000d,
0x0000001e, 0x00000008, 0x6ffffffb, 0x00000001, 0x6ffffff0, 0x000001a4, 0x6ffffffe, 0x000001ac, 0x6fffffff, 0x00000001, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x000001dc, 0x000001dc,
};
static const unsigned int tracy_systrace_aarch64_size = 1650;
static const unsigned int tracy_systrace_aarch64_data[1652/4] =
{
0x464c457f, 0x00010102, 0x00000000, 0x00000000, 0x00b70003, 0x00000001, 0x00000300, 0x00000000, 0x00000040, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00380040, 0x00400006, 0x00000000, 0x00000006, 0x00000005, 0x00000040, 0x00000000, 0x00000040, 0x00000000, 0x00000040, 0x00000000,
0x00000150, 0x00000000, 0x00000150, 0x00000000, 0x00000008, 0x00000000, 0x00000003, 0x00000004, 0x00000190, 0x00000000, 0x00000190, 0x00000000,
0x00000190, 0x00000000, 0x00000015, 0x00000000, 0x00000015, 0x00000000, 0x00000001, 0x00000000, 0x00000001, 0x00000005, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x000004d1, 0x00000000, 0x000004d1, 0x00000000, 0x00010000, 0x00000000, 0x00000001, 0x00000006,
0x000004d8, 0x00000000, 0x000104d8, 0x00000000, 0x000104d8, 0x00000000, 0x0000019a, 0x00000000, 0x000001a0, 0x00000000, 0x00010000, 0x00000000,
0x00000002, 0x00000006, 0x000004d8, 0x00000000, 0x000104d8, 0x00000000, 0x000104d8, 0x00000000, 0x00000170, 0x00000000, 0x00000170, 0x00000000,
0x00000008, 0x00000000, 0x6474e551, 0x00000006, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000010, 0x00000000, 0x7379732f, 0x2f6d6574, 0x2f6e6962, 0x6b6e696c, 0x34367265, 0x00000000, 0x00000001, 0x00000004,
0x00000003, 0x00000000, 0x00000000, 0x00000000, 0x00000002, 0x00000000, 0x00000001, 0x00000001, 0x00000001, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x000a0003, 0x00000300, 0x00000000,
0x00000000, 0x00000000, 0x0000000a, 0x00000012, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000011, 0x00000012, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x62696c00, 0x732e6c64, 0x6c64006f, 0x6e65706f, 0x736c6400, 0x4c006d79, 0x00434249, 0x00000000, 0x00020002, 0x00000000,
0x00010001, 0x00000001, 0x00000010, 0x00000000, 0x00050d63, 0x00020000, 0x00000017, 0x00000000, 0x00010668, 0x00000000, 0x00000402, 0x00000002,
0x00000000, 0x00000000, 0x00010670, 0x00000000, 0x00000402, 0x00000003, 0x00000000, 0x00000000, 0xa9bf7bf0, 0x90000090, 0xf9433211, 0x91198210,
0xd61f0220, 0xd503201f, 0xd503201f, 0xd503201f, 0x90000090, 0xf9433611, 0x9119a210, 0xd61f0220, 0x90000090, 0xf9433a11, 0x9119c210, 0xd61f0220,
0xf81b0ffc, 0xa9015ff8, 0xa90257f6, 0xa9034ff4, 0xa9047bfd, 0x910103fd, 0xd14043ff, 0xd10043ff, 0x90000000, 0x91120000, 0x320003e1, 0x97ffffed,
0x90000001, 0x91122021, 0xaa0003f7, 0x97ffffed, 0x90000001, 0xaa0003f8, 0x91123421, 0xaa1703e0, 0x97ffffe8, 0x90000001, 0xaa0003f3, 0x91124821,
0xaa1703e0, 0x97ffffe3, 0x90000001, 0xaa0003f4, 0x91125c21, 0xaa1703e0, 0x97ffffde, 0x90000001, 0xaa0003f5, 0x91128421, 0xaa1703e0, 0x97ffffd9,
0x90000001, 0xaa0003f6, 0x91129821, 0xaa1703e0, 0x97ffffd4, 0xaa0003f7, 0x90000000, 0x9112b000, 0x2a1f03e1, 0xd63f0300, 0x2a0003f8, 0x36f80060,
0x2a1f03e0, 0xd63f0260, 0x90000008, 0x3dc11d00, 0x52800128, 0xb81c83b8, 0x781cc3a8, 0x3d8003e0, 0x14000005, 0x93407c02, 0x320003e0, 0x910043e1,
0xd63f02e0, 0xd100e3a0, 0x320003e1, 0x2a1f03e2, 0xd63f0280, 0x7100001f, 0x5400014c, 0x910003e0, 0xaa1f03e1, 0xd63f02a0, 0xd100e3a0, 0x320003e1,
0x2a1f03e2, 0xd63f0280, 0x7100041f, 0x54ffff0b, 0x910043e1, 0x321003e2, 0x2a1803e0, 0xd63f02c0, 0x7100041f, 0x54fffd0a, 0x2a1f03e0, 0xd63f0260,
0x914043ff, 0x910043ff, 0xa9447bfd, 0xa9434ff4, 0xa94257f6, 0xa9415ff8, 0xf84507fc, 0xd65f03c0, 0x00000000, 0x00000000, 0x00989680, 0x00000000,
0x6362696c, 0x006f732e, 0x6e65706f, 0x69786500, 0x6f700074, 0x6e006c6c, 0x736f6e61, 0x7065656c, 0x61657200, 0x72770064, 0x00657469, 0x7379732f,
0x72656b2f, 0x2f6c656e, 0x75626564, 0x72742f67, 0x6e696361, 0x72742f67, 0x5f656361, 0x65706970, 0x00000000, 0x00000000, 0x00000001, 0x00000000,
0x00000001, 0x00000000, 0x00000004, 0x00000000, 0x000001a8, 0x00000000, 0x6ffffef5, 0x00000000, 0x000001c8, 0x00000000, 0x00000005, 0x00000000,
0x00000248, 0x00000000, 0x00000006, 0x00000000, 0x000001e8, 0x00000000, 0x0000000a, 0x00000000, 0x0000001c, 0x00000000, 0x0000000b, 0x00000000,
0x00000018, 0x00000000, 0x00000015, 0x00000000, 0x00000000, 0x00000000, 0x00000003, 0x00000000, 0x00010650, 0x00000000, 0x00000002, 0x00000000,
0x00000030, 0x00000000, 0x00000014, 0x00000000, 0x00000007, 0x00000000, 0x00000017, 0x00000000, 0x00000290, 0x00000000, 0x0000001e, 0x00000000,
0x00000008, 0x00000000, 0x6ffffffb, 0x00000000, 0x00000001, 0x00000000, 0x6ffffffe, 0x00000000, 0x00000270, 0x00000000, 0x6fffffff, 0x00000000,
0x00000001, 0x00000000, 0x6ffffff0, 0x00000000, 0x00000264, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x000104d8, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x000002c0, 0x00000000, 0x000002c0,
};
}

View File

@@ -1,7 +1,7 @@
#ifndef __TRACYTHREAD_HPP__
#define __TRACYTHREAD_HPP__
#ifdef _MSC_VER
#if defined _WIN32 || defined __CYGWIN__
# include <windows.h>
#else
# include <pthread.h>
@@ -10,7 +10,7 @@
namespace tracy
{
#ifdef _MSC_VER
#if defined _WIN32 || defined __CYGWIN__
class Thread
{

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,9 +1,9 @@
/* rpmalloc.h - Memory allocator - Public Domain - 2016 Mattias Jansson / Rampant Pixels
/* rpmalloc.h - Memory allocator - Public Domain - 2016 Mattias Jansson
*
* This library provides a cross-platform lock free thread caching malloc implementation in C11.
* The latest source code is always available at
*
* https://github.com/rampantpixels/rpmalloc
* https://github.com/mjansson/rpmalloc
*
* This library is put in the public domain; you can redistribute it and/or modify it without any restrictions.
*
@@ -12,51 +12,113 @@
#pragma once
#include <stddef.h>
#include "../common/TracyApi.h"
namespace tracy
{
#if defined(__clang__) || defined(__GNUC__)
# define RPMALLOC_ATTRIBUTE __attribute__((__malloc__))
# define RPMALLOC_RESTRICT
# define RPMALLOC_EXPORT __attribute__((visibility("default")))
# define RPMALLOC_ALLOCATOR
# define RPMALLOC_ATTRIB_MALLOC __attribute__((__malloc__))
# if defined(__clang_major__) && (__clang_major__ < 4)
# define RPMALLOC_ATTRIB_ALLOC_SIZE(size)
# define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size)
# else
# define RPMALLOC_ATTRIB_ALLOC_SIZE(size) __attribute__((alloc_size(size)))
# define RPMALLOC_ATTRIB_ALLOC_SIZE2(count, size) __attribute__((alloc_size(count, size)))
# endif
# define RPMALLOC_CDECL
#elif defined(_MSC_VER)
# define RPMALLOC_ATTRIBUTE
# define RPMALLOC_RESTRICT __declspec(restrict)
# define RPMALLOC_EXPORT
# define RPMALLOC_ALLOCATOR __declspec(allocator) __declspec(restrict)
# define RPMALLOC_ATTRIB_MALLOC
# define RPMALLOC_ATTRIB_ALLOC_SIZE(size)
# define RPMALLOC_ATTRIB_ALLOC_SIZE2(count,size)
# define RPMALLOC_CDECL __cdecl
#else
# define RPMALLOC_ATTRIBUTE
# define RPMALLOC_RESTRICT
# define RPMALLOC_EXPORT
# define RPMALLOC_ALLOCATOR
# define RPMALLOC_ATTRIB_MALLOC
# define RPMALLOC_ATTRIB_ALLOC_SIZE(size)
# define RPMALLOC_ATTRIB_ALLOC_SIZE2(count,size)
# define RPMALLOC_CDECL
#endif
//! Define RPMALLOC_CONFIGURABLE to enable configuring sizes
#ifndef RPMALLOC_CONFIGURABLE
#define RPMALLOC_CONFIGURABLE 0
#endif
//! Flag to rpaligned_realloc to not preserve content in reallocation
#define RPMALLOC_NO_PRESERVE 1
typedef struct rpmalloc_global_statistics_t {
//! Current amount of virtual memory mapped (only if ENABLE_STATISTICS=1)
//! Current amount of virtual memory mapped, all of which might not have been committed (only if ENABLE_STATISTICS=1)
size_t mapped;
//! Current amount of memory in global caches for small and medium sizes (<64KiB)
//! Peak amount of virtual memory mapped, all of which might not have been committed (only if ENABLE_STATISTICS=1)
size_t mapped_peak;
//! Current amount of memory in global caches for small and medium sizes (<32KiB)
size_t cached;
//! Total amount of memory mapped (only if ENABLE_STATISTICS=1)
//! Current amount of memory allocated in huge allocations, i.e larger than LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1)
size_t huge_alloc;
//! Peak amount of memory allocated in huge allocations, i.e larger than LARGE_SIZE_LIMIT which is 2MiB by default (only if ENABLE_STATISTICS=1)
size_t huge_alloc_peak;
//! Total amount of memory mapped since initialization (only if ENABLE_STATISTICS=1)
size_t mapped_total;
//! Total amount of memory unmapped (only if ENABLE_STATISTICS=1)
//! Total amount of memory unmapped since initialization (only if ENABLE_STATISTICS=1)
size_t unmapped_total;
} rpmalloc_global_statistics_t;
typedef struct rpmalloc_thread_statistics_t {
//! Current number of bytes available for allocation from active spans
size_t active;
//! Current number of bytes available in thread size class caches
//! Current number of bytes available in thread size class caches for small and medium sizes (<32KiB)
size_t sizecache;
//! Current number of bytes available in thread span caches
//! Current number of bytes available in thread span caches for small and medium sizes (<32KiB)
size_t spancache;
//! Current number of bytes in pending deferred deallocations
size_t deferred;
//! Total number of bytes transitioned from thread cache to global cache
//! Total number of bytes transitioned from thread cache to global cache (only if ENABLE_STATISTICS=1)
size_t thread_to_global;
//! Total number of bytes transitioned from global cache to thread cache
//! Total number of bytes transitioned from global cache to thread cache (only if ENABLE_STATISTICS=1)
size_t global_to_thread;
//! Per span count statistics (only if ENABLE_STATISTICS=1)
struct {
//! Currently used number of spans
size_t current;
//! High water mark of spans used
size_t peak;
//! Number of spans transitioned to global cache
size_t to_global;
//! Number of spans transitioned from global cache
size_t from_global;
//! Number of spans transitioned to thread cache
size_t to_cache;
//! Number of spans transitioned from thread cache
size_t from_cache;
//! Number of spans transitioned to reserved state
size_t to_reserved;
//! Number of spans transitioned from reserved state
size_t from_reserved;
//! Number of raw memory map calls (not hitting the reserve spans but resulting in actual OS mmap calls)
size_t map_calls;
} span_use[32];
//! Per size class statistics (only if ENABLE_STATISTICS=1)
struct {
//! Current number of allocations
size_t alloc_current;
//! Peak number of allocations
size_t alloc_peak;
//! Total number of allocations
size_t alloc_total;
//! Total number of frees
size_t free_total;
//! Number of spans transitioned to cache
size_t spans_to_cache;
//! Number of spans transitioned from cache
size_t spans_from_cache;
//! Number of spans transitioned from reserved state
size_t spans_from_reserved;
//! Number of raw memory map calls (not hitting the reserve spans but resulting in actual OS mmap calls)
size_t map_calls;
} size_use[128];
} rpmalloc_thread_statistics_t;
typedef struct rpmalloc_config_t {
@@ -67,85 +129,133 @@ typedef struct rpmalloc_config_t {
// actual start of the memory region due to this alignment. The alignment offset
// will be passed to the memory unmap function. The alignment offset MUST NOT be
// larger than 65535 (storable in an uint16_t), if it is you must use natural
// alignment to shift it into 16 bits.
// alignment to shift it into 16 bits. If you set a memory_map function, you
// must also set a memory_unmap function or else the default implementation will
// be used for both.
void* (*memory_map)(size_t size, size_t* offset);
//! Unmap the memory pages starting at address and spanning the given number of bytes.
// If release is set to 1, the unmap is for an entire span range as returned by
// a previous call to memory_map and that the entire range should be released.
// If release is set to 0, the unmap is a partial decommit of a subset of the mapped
// memory range.
void (*memory_unmap)(void* address, size_t size, size_t offset, int release);
//! Size of memory pages. The page size MUST be a power of two in [512,16384] range
// (2^9 to 2^14) unless 0 - set to 0 to use system page size. All memory mapping
// If release is set to non-zero, the unmap is for an entire span range as returned by
// a previous call to memory_map and that the entire range should be released. The
// release argument holds the size of the entire span range. If release is set to 0,
// the unmap is a partial decommit of a subset of the mapped memory range.
// If you set a memory_unmap function, you must also set a memory_map function or
// else the default implementation will be used for both.
void (*memory_unmap)(void* address, size_t size, size_t offset, size_t release);
//! Size of memory pages. The page size MUST be a power of two. All memory mapping
// requests to memory_map will be made with size set to a multiple of the page size.
// Used if RPMALLOC_CONFIGURABLE is defined to 1, otherwise system page size is used.
size_t page_size;
//! Size of a span of memory pages. MUST be a multiple of page size, and in [4096,262144]
// range (unless 0 - set to 0 to use the default span size).
//! Size of a span of memory blocks. MUST be a power of two, and in [4096,262144]
// range (unless 0 - set to 0 to use the default span size). Used if RPMALLOC_CONFIGURABLE
// is defined to 1.
size_t span_size;
//! Number of spans to map at each request to map new virtual memory blocks. This can
// be used to minimize the system call overhead at the cost of virtual memory address
// space. The extra mapped pages will not be written until actually used, so physical
// committed memory should not be affected in the default implementation.
// committed memory should not be affected in the default implementation. Will be
// aligned to a multiple of spans that match memory page size in case of huge pages.
size_t span_map_count;
//! Debug callback if memory guards are enabled. Called if a memory overwrite is detected
void (*memory_overwrite)(void* address);
//! Enable use of large/huge pages. If this flag is set to non-zero and page size is
// zero, the allocator will try to enable huge pages and auto detect the configuration.
// If this is set to non-zero and page_size is also non-zero, the allocator will
// assume huge pages have been configured and enabled prior to initializing the
// allocator.
// For Windows, see https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support
// For Linux, see https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
int enable_huge_pages;
} rpmalloc_config_t;
extern int
//! Initialize allocator with default configuration
RPMALLOC_EXPORT int
rpmalloc_initialize(void);
extern int
//! Initialize allocator with given configuration
RPMALLOC_EXPORT int
rpmalloc_initialize_config(const rpmalloc_config_t* config);
extern const rpmalloc_config_t*
//! Get allocator configuration
RPMALLOC_EXPORT const rpmalloc_config_t*
rpmalloc_config(void);
extern void
//! Finalize allocator
RPMALLOC_EXPORT void
rpmalloc_finalize(void);
extern void
//! Initialize allocator for calling thread
RPMALLOC_EXPORT void
rpmalloc_thread_initialize(void);
extern void
//! Finalize allocator for calling thread
RPMALLOC_EXPORT void
rpmalloc_thread_finalize(void);
extern void
//! Perform deferred deallocations pending for the calling thread heap
RPMALLOC_EXPORT void
rpmalloc_thread_collect(void);
extern int
//! Query if allocator is initialized for calling thread
RPMALLOC_EXPORT int
rpmalloc_is_thread_initialized(void);
extern void
//! Get per-thread statistics
RPMALLOC_EXPORT void
rpmalloc_thread_statistics(rpmalloc_thread_statistics_t* stats);
extern void
//! Get global statistics
RPMALLOC_EXPORT void
rpmalloc_global_statistics(rpmalloc_global_statistics_t* stats);
extern RPMALLOC_RESTRICT void*
rpmalloc(size_t size) RPMALLOC_ATTRIBUTE;
//! Dump all statistics in human readable format to file (should be a FILE*)
RPMALLOC_EXPORT void
rpmalloc_dump_statistics(void* file);
extern void
//! Allocate a memory block of at least the given size
TRACY_API RPMALLOC_ALLOCATOR void*
rpmalloc(size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(1);
//! Free the given memory block
TRACY_API void
rpfree(void* ptr);
extern RPMALLOC_RESTRICT void*
rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIBUTE;
//! Allocate a memory block of at least the given size and zero initialize it
RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void*
rpcalloc(size_t num, size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE2(1, 2);
extern void*
rprealloc(void* ptr, size_t size);
//! Reallocate the given block to at least the given size
RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void*
rprealloc(void* ptr, size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(2);
extern void*
rpaligned_realloc(void* ptr, size_t alignment, size_t size, size_t oldsize, unsigned int flags);
//! Reallocate the given block to at least the given size and alignment,
// with optional control flags (see RPMALLOC_NO_PRESERVE).
// Alignment must be a power of two and a multiple of sizeof(void*),
// and should ideally be less than memory page size. A caveat of rpmalloc
// internals is that this must also be strictly less than the span size (default 64KiB)
RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void*
rpaligned_realloc(void* ptr, size_t alignment, size_t size, size_t oldsize, unsigned int flags) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(3);
extern RPMALLOC_RESTRICT void*
rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIBUTE;
//! Allocate a memory block of at least the given size and alignment.
// Alignment must be a power of two and a multiple of sizeof(void*),
// and should ideally be less than memory page size. A caveat of rpmalloc
// internals is that this must also be strictly less than the span size (default 64KiB)
RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void*
rpaligned_alloc(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(2);
extern RPMALLOC_RESTRICT void*
rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIBUTE;
//! Allocate a memory block of at least the given size and alignment.
// Alignment must be a power of two and a multiple of sizeof(void*),
// and should ideally be less than memory page size. A caveat of rpmalloc
// internals is that this must also be strictly less than the span size (default 64KiB)
RPMALLOC_EXPORT RPMALLOC_ALLOCATOR void*
rpmemalign(size_t alignment, size_t size) RPMALLOC_ATTRIB_MALLOC RPMALLOC_ATTRIB_ALLOC_SIZE(2);
extern int
//! Allocate a memory block of at least the given size and alignment.
// Alignment must be a power of two and a multiple of sizeof(void*),
// and should ideally be less than memory page size. A caveat of rpmalloc
// internals is that this must also be strictly less than the span size (default 64KiB)
RPMALLOC_EXPORT int
rpposix_memalign(void **memptr, size_t alignment, size_t size);
extern size_t
//! Query the usable size of the given memory block (from given pointer to the end of block)
RPMALLOC_EXPORT size_t
rpmalloc_usable_size(void* ptr);
}

View File

@@ -1,7 +1,7 @@
#ifndef __TRACYALLOC_HPP__
#define __TRACYALLOC_HPP__
#include <cstdlib>
#include <stdlib.h>
#ifdef TRACY_ENABLE
# include "../client/tracy_rpmalloc.hpp"

16
common/TracyApi.h Normal file
View File

@@ -0,0 +1,16 @@
#ifndef __TRACYAPI_H__
#define __TRACYAPI_H__
#if defined _WIN32 || defined __CYGWIN__
# if defined TRACY_EXPORTS
# define TRACY_API __declspec(dllexport)
# elif defined TRACY_IMPORTS
# define TRACY_API __declspec(dllimport)
# else
# define TRACY_API
# endif
#else
# define TRACY_API __attribute__((visibility("default")))
#endif
#endif // __TRACYAPI_H__

View File

@@ -10,15 +10,6 @@ namespace tracy
using TracyMutex = std::shared_mutex;
}
#elif defined __CYGWIN__
#include "tracy_benaphore.h"
namespace tracy
{
using TracyMutex = NonRecursiveBenaphore;
}
#else
#include <mutex>

View File

@@ -4,31 +4,21 @@
#include <limits>
#include <stdint.h>
#include "../common/tracy_lz4.hpp"
namespace tracy
{
enum : uint32_t { ProtocolVersion = 1 };
constexpr unsigned Lz4CompressBound( unsigned isize ) { return isize + ( isize / 255 ) + 16; }
enum : uint32_t { ProtocolVersion = 35 };
enum : uint32_t { BroadcastVersion = 1 };
using lz4sz_t = uint32_t;
enum { TargetFrameSize = 256 * 1024 };
enum { LZ4Size = LZ4_COMPRESSBOUND( TargetFrameSize ) };
enum { LZ4Size = Lz4CompressBound( TargetFrameSize ) };
static_assert( LZ4Size <= std::numeric_limits<lz4sz_t>::max(), "LZ4Size greater than lz4sz_t" );
static_assert( TargetFrameSize * 2 >= 64 * 1024, "Not enough space for LZ4 stream buffer" );
enum ServerQuery : uint8_t
{
ServerQueryTerminate,
ServerQueryString,
ServerQueryThreadString,
ServerQuerySourceLocation,
ServerQueryPlotName,
ServerQueryCallstackFrame,
ServerQueryFrameName,
};
enum { HandshakeShibbolethSize = 8 };
static const char HandshakeShibboleth[HandshakeShibbolethSize] = { 'T', 'r', 'a', 'c', 'y', 'P', 'r', 'f' };
@@ -37,7 +27,8 @@ enum HandshakeStatus : uint8_t
HandshakePending,
HandshakeWelcome,
HandshakeProtocolMismatch,
HandshakeNotAvailable
HandshakeNotAvailable,
HandshakeDropped
};
enum { WelcomeMessageProgramNameSize = 64 };
@@ -45,6 +36,44 @@ enum { WelcomeMessageHostInfoSize = 1024 };
#pragma pack( 1 )
// Must increase left query space after handling!
enum ServerQuery : uint8_t
{
ServerQueryTerminate,
ServerQueryString,
ServerQueryThreadString,
ServerQuerySourceLocation,
ServerQueryPlotName,
ServerQueryCallstackFrame,
ServerQueryFrameName,
ServerQueryDisconnect,
ServerQueryExternalName,
ServerQueryParameter,
ServerQuerySymbol,
ServerQuerySymbolCode,
ServerQueryCodeLocation
};
struct ServerQueryPacket
{
ServerQuery type;
uint64_t ptr;
uint32_t extra;
};
enum { ServerQueryPacketSize = sizeof( ServerQueryPacket ) };
enum CpuArchitecture : uint8_t
{
CpuArchUnknown,
CpuArchX86,
CpuArchX64,
CpuArchArm32,
CpuArchArm64
};
struct WelcomeMessage
{
double timerMul;
@@ -53,7 +82,13 @@ struct WelcomeMessage
uint64_t delay;
uint64_t resolution;
uint64_t epoch;
uint64_t pid;
int64_t samplingPeriod;
uint8_t onDemand;
uint8_t isApple;
uint8_t cpuArch;
char cpuManufacturer[12];
uint32_t cpuId;
char programName[WelcomeMessageProgramNameSize];
char hostInfo[WelcomeMessageHostInfoSize];
};
@@ -64,10 +99,23 @@ enum { WelcomeMessageSize = sizeof( WelcomeMessage ) };
struct OnDemandPayloadMessage
{
uint64_t frames;
uint64_t currentTime;
};
enum { OnDemandPayloadMessageSize = sizeof( OnDemandPayloadMessage ) };
struct BroadcastMessage
{
uint32_t broadcastVersion;
uint32_t protocolVersion;
uint32_t listenPort;
uint32_t activeTime; // in seconds
char programName[WelcomeMessageProgramNameSize];
};
enum { BroadcastMessageSize = sizeof( BroadcastMessage ) };
#pragma pack()
}

View File

@@ -11,66 +11,122 @@ enum class QueueType : uint8_t
ZoneText,
ZoneName,
Message,
MessageColor,
MessageCallstack,
MessageColorCallstack,
MessageAppInfo,
ZoneBeginAllocSrcLoc,
ZoneBeginAllocSrcLocLean,
ZoneBeginAllocSrcLocCallstack,
ZoneBeginAllocSrcLocCallstackLean,
CallstackMemory,
CallstackMemoryLean,
Callstack,
Terminate,
KeepAlive,
Crash,
CrashReport,
CallstackLean,
CallstackAlloc,
CallstackAllocLean,
CallstackSample,
CallstackSampleLean,
FrameImage,
FrameImageLean,
ZoneBegin,
ZoneBeginCallstack,
ZoneEnd,
FrameMarkMsg,
FrameMarkMsgStart,
FrameMarkMsgEnd,
SourceLocation,
LockAnnounce,
LockTerminate,
LockWait,
LockObtain,
LockRelease,
LockSharedWait,
LockSharedObtain,
LockSharedRelease,
LockMark,
PlotData,
MessageLiteral,
GpuNewContext,
GpuZoneBegin,
GpuZoneBeginCallstack,
GpuZoneEnd,
GpuTime,
LockName,
MemAlloc,
MemFree,
MemAllocCallstack,
MemFreeCallstack,
GpuZoneBegin,
GpuZoneBeginCallstack,
GpuZoneEnd,
GpuZoneBeginSerial,
GpuZoneBeginCallstackSerial,
GpuZoneEndSerial,
PlotData,
ContextSwitch,
ThreadWakeup,
GpuTime,
Terminate,
KeepAlive,
ThreadContext,
Crash,
CrashReport,
ZoneValidation,
ZoneValue,
FrameMarkMsg,
FrameMarkMsgStart,
FrameMarkMsgEnd,
SourceLocation,
LockAnnounce,
LockTerminate,
LockMark,
MessageLiteral,
MessageLiteralColor,
MessageLiteralCallstack,
MessageLiteralColorCallstack,
GpuNewContext,
CallstackFrameSize,
CallstackFrame,
SymbolInformation,
CodeInformation,
SysTimeReport,
TidToPid,
PlotConfig,
ParamSetup,
ParamPingback,
CpuTopology,
StringData,
ThreadName,
CustomStringData,
PlotName,
SourceLocationPayload,
CallstackPayload,
CallstackAllocPayload,
FrameName,
FrameImageData,
ExternalName,
ExternalThreadName,
SymbolCode,
NUM_TYPES
};
#pragma pack( 1 )
struct QueueZoneBegin
struct QueueThreadContext
{
uint64_t thread;
};
struct QueueZoneBeginLean
{
int64_t time;
uint64_t thread;
};
struct QueueZoneBegin : public QueueZoneBeginLean
{
uint64_t srcloc; // ptr
uint32_t cpu;
};
struct QueueZoneEnd
{
int64_t time;
uint64_t thread;
uint32_t cpu;
};
struct QueueZoneValidation
{
uint32_t id;
};
struct QueueZoneValue
{
uint64_t value;
};
struct QueueStringTransfer
@@ -84,6 +140,19 @@ struct QueueFrameMark
uint64_t name; // ptr
};
struct QueueFrameImageLean
{
uint64_t frame;
uint16_t w;
uint16_t h;
uint8_t flip;
};
struct QueueFrameImage : public QueueFrameImageLean
{
uint64_t image; // ptr
};
struct QueueSourceLocation
{
uint64_t name;
@@ -97,7 +166,6 @@ struct QueueSourceLocation
struct QueueZoneText
{
uint64_t thread;
uint64_t text; // ptr
};
@@ -124,33 +192,39 @@ struct QueueLockTerminate
struct QueueLockWait
{
uint64_t thread;
uint32_t id;
int64_t time;
uint64_t thread;
LockType type;
};
struct QueueLockObtain
{
uint64_t thread;
uint32_t id;
int64_t time;
uint64_t thread;
};
struct QueueLockRelease
{
uint64_t thread;
uint32_t id;
int64_t time;
uint64_t thread;
};
struct QueueLockMark
{
uint32_t id;
uint64_t thread;
uint32_t id;
uint64_t srcloc; // ptr
};
struct QueueLockName
{
uint32_t id;
uint64_t name; // ptr
};
enum class PlotDataType : uint8_t
{
Float,
@@ -174,10 +248,26 @@ struct QueuePlotData
struct QueueMessage
{
int64_t time;
uint64_t thread;
uint64_t text; // ptr
};
struct QueueMessageColor : public QueueMessage
{
uint8_t r;
uint8_t g;
uint8_t b;
};
// Don't change order, only add new entries at the end, this is also used on trace dumps!
enum class GpuContextType : uint8_t
{
Invalid,
OpenGl,
Vulkan,
OpenCL,
Direct3D12
};
struct QueueGpuNewContext
{
int64_t cpuTime;
@@ -186,6 +276,7 @@ struct QueueGpuNewContext
float period;
uint8_t context;
uint8_t accuracyBits;
GpuContextType type;
};
struct QueueGpuZoneBegin
@@ -200,6 +291,7 @@ struct QueueGpuZoneBegin
struct QueueGpuZoneEnd
{
int64_t cpuTime;
uint64_t thread;
uint16_t queryId;
uint8_t context;
};
@@ -234,24 +326,117 @@ struct QueueCallstackMemory
struct QueueCallstack
{
uint64_t ptr;
};
struct QueueCallstackAlloc
{
uint64_t ptr;
uint64_t nativePtr;
};
struct QueueCallstackSampleLean
{
int64_t time;
uint64_t thread;
};
struct QueueCallstackSample : public QueueCallstackSampleLean
{
uint64_t ptr;
};
struct QueueCallstackFrameSize
{
uint64_t ptr;
uint8_t size;
uint64_t imageName;
};
struct QueueCallstackFrame
{
uint64_t ptr;
uint64_t name;
uint64_t file;
uint32_t line;
uint64_t symAddr;
char symLen[3];
};
struct QueueSymbolInformation
{
uint64_t file;
uint32_t line;
uint64_t symAddr;
};
struct QueueCodeInformation
{
uint64_t ptr;
uint64_t file;
uint32_t line;
};
struct QueueCrashReport
{
int64_t time;
uint64_t thread;
uint64_t text; // ptr
};
struct QueueSysTime
{
int64_t time;
float sysTime;
};
struct QueueContextSwitch
{
int64_t time;
uint64_t oldThread;
uint64_t newThread;
uint8_t cpu;
uint8_t reason;
uint8_t state;
};
struct QueueThreadWakeup
{
int64_t time;
uint64_t thread;
};
struct QueueTidToPid
{
uint64_t tid;
uint64_t pid;
};
enum class PlotFormatType : uint8_t
{
Number,
Memory,
Percentage
};
struct QueuePlotConfig
{
uint64_t name; // ptr
uint8_t type;
};
struct QueueParamSetup
{
uint32_t idx;
uint64_t name; // ptr
uint8_t isBool;
int32_t val;
};
struct QueueCpuTopology
{
uint32_t package;
uint32_t core;
uint32_t thread;
};
struct QueueHeader
{
union
@@ -266,10 +451,16 @@ struct QueueItem
QueueHeader hdr;
union
{
QueueThreadContext threadCtx;
QueueZoneBegin zoneBegin;
QueueZoneBeginLean zoneBeginLean;
QueueZoneEnd zoneEnd;
QueueZoneValidation zoneValidation;
QueueZoneValue zoneValue;
QueueStringTransfer stringTransfer;
QueueFrameMark frameMark;
QueueFrameImage frameImage;
QueueFrameImage frameImageLean;
QueueSourceLocation srcloc;
QueueZoneText zoneText;
QueueLockAnnounce lockAnnounce;
@@ -278,8 +469,10 @@ struct QueueItem
QueueLockObtain lockObtain;
QueueLockRelease lockRelease;
QueueLockMark lockMark;
QueueLockName lockName;
QueuePlotData plotData;
QueueMessage message;
QueueMessageColor messageColor;
QueueGpuNewContext gpuNewContext;
QueueGpuZoneBegin gpuZoneBegin;
QueueGpuZoneEnd gpuZoneEnd;
@@ -288,55 +481,104 @@ struct QueueItem
QueueMemFree memFree;
QueueCallstackMemory callstackMemory;
QueueCallstack callstack;
QueueCallstackAlloc callstackAlloc;
QueueCallstackSample callstackSample;
QueueCallstackSampleLean callstackSampleLean;
QueueCallstackFrameSize callstackFrameSize;
QueueCallstackFrame callstackFrame;
QueueSymbolInformation symbolInformation;
QueueCodeInformation codeInformation;
QueueCrashReport crashReport;
QueueSysTime sysTime;
QueueContextSwitch contextSwitch;
QueueThreadWakeup threadWakeup;
QueueTidToPid tidToPid;
QueuePlotConfig plotConfig;
QueueParamSetup paramSetup;
QueueCpuTopology cpuTopology;
};
};
#pragma pack()
enum { QueueItemSize = sizeof( QueueItem ) };
static const size_t QueueDataSize[] = {
static constexpr size_t QueueDataSize[] = {
sizeof( QueueHeader ) + sizeof( QueueZoneText ),
sizeof( QueueHeader ) + sizeof( QueueZoneText ), // zone name
sizeof( QueueHeader ) + sizeof( QueueMessage ),
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // allocated source location
sizeof( QueueHeader ) + sizeof( QueueCallstackMemory ),
sizeof( QueueHeader ) + sizeof( QueueCallstack ),
// above items must be first
sizeof( QueueHeader ), // terminate
sizeof( QueueHeader ), // keep alive
sizeof( QueueHeader ), // crash
sizeof( QueueHeader ) + sizeof( QueueCrashReport ),
sizeof( QueueHeader ) + sizeof( QueueMessageColor ),
sizeof( QueueHeader ) + sizeof( QueueMessage ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMessageColor ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMessage ), // app info
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // allocated source location, not for network transfer
sizeof( QueueHeader ) + sizeof( QueueZoneBeginLean ), // lean allocated source location
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // allocated source location, callstack, not for network transfer
sizeof( QueueHeader ) + sizeof( QueueZoneBeginLean ), // lean allocated source location, callstack
sizeof( QueueHeader ) + sizeof( QueueCallstackMemory ), // not for network transfer
sizeof( QueueHeader ), // lean callstack memory
sizeof( QueueHeader ) + sizeof( QueueCallstack ), // not for network transfer
sizeof( QueueHeader ), // lean callstack
sizeof( QueueHeader ) + sizeof( QueueCallstackAlloc ), // not for network transfer
sizeof( QueueHeader ), // lean callstack alloc
sizeof( QueueHeader ) + sizeof( QueueCallstackSample ), // not for network transfer
sizeof( QueueHeader ) + sizeof( QueueCallstackSampleLean ),
sizeof( QueueHeader ) + sizeof( QueueFrameImage ), // not for network transfer
sizeof( QueueHeader ) + sizeof( QueueFrameImageLean ),
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueZoneBegin ), // callstack
sizeof( QueueHeader ) + sizeof( QueueZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ),
sizeof( QueueHeader ) + sizeof( QueueLockObtain ),
sizeof( QueueHeader ) + sizeof( QueueLockRelease ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ), // shared
sizeof( QueueHeader ) + sizeof( QueueLockObtain ), // shared
sizeof( QueueHeader ) + sizeof( QueueLockRelease ), // shared
sizeof( QueueHeader ) + sizeof( QueueLockName ),
sizeof( QueueHeader ) + sizeof( QueueMemAlloc ),
sizeof( QueueHeader ) + sizeof( QueueMemFree ),
sizeof( QueueHeader ) + sizeof( QueueMemAlloc ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMemFree ), // callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // serial
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // serial, callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneEnd ), // serial
sizeof( QueueHeader ) + sizeof( QueuePlotData ),
sizeof( QueueHeader ) + sizeof( QueueContextSwitch ),
sizeof( QueueHeader ) + sizeof( QueueThreadWakeup ),
sizeof( QueueHeader ) + sizeof( QueueGpuTime ),
// above items must be first
sizeof( QueueHeader ), // terminate
sizeof( QueueHeader ), // keep alive
sizeof( QueueHeader ) + sizeof( QueueThreadContext ),
sizeof( QueueHeader ), // crash
sizeof( QueueHeader ) + sizeof( QueueCrashReport ),
sizeof( QueueHeader ) + sizeof( QueueZoneValidation ),
sizeof( QueueHeader ) + sizeof( QueueZoneValue ),
sizeof( QueueHeader ) + sizeof( QueueFrameMark ), // continuous frames
sizeof( QueueHeader ) + sizeof( QueueFrameMark ), // start
sizeof( QueueHeader ) + sizeof( QueueFrameMark ), // end
sizeof( QueueHeader ) + sizeof( QueueSourceLocation ),
sizeof( QueueHeader ) + sizeof( QueueLockAnnounce ),
sizeof( QueueHeader ) + sizeof( QueueLockTerminate ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ),
sizeof( QueueHeader ) + sizeof( QueueLockObtain ),
sizeof( QueueHeader ) + sizeof( QueueLockRelease ),
sizeof( QueueHeader ) + sizeof( QueueLockWait ),
sizeof( QueueHeader ) + sizeof( QueueLockObtain ),
sizeof( QueueHeader ) + sizeof( QueueLockRelease ),
sizeof( QueueHeader ) + sizeof( QueueLockMark ),
sizeof( QueueHeader ) + sizeof( QueuePlotData ),
sizeof( QueueHeader ) + sizeof( QueueMessage ), // literal
sizeof( QueueHeader ) + sizeof( QueueMessageColor ), // literal
sizeof( QueueHeader ) + sizeof( QueueMessage ), // literal, callstack
sizeof( QueueHeader ) + sizeof( QueueMessageColor ), // literal, callstack
sizeof( QueueHeader ) + sizeof( QueueGpuNewContext ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ),
sizeof( QueueHeader ) + sizeof( QueueGpuZoneBegin ), // callstack
sizeof( QueueHeader ) + sizeof( QueueGpuZoneEnd ),
sizeof( QueueHeader ) + sizeof( QueueGpuTime ),
sizeof( QueueHeader ) + sizeof( QueueMemAlloc ),
sizeof( QueueHeader ) + sizeof( QueueMemFree ),
sizeof( QueueHeader ) + sizeof( QueueMemAlloc ), // callstack
sizeof( QueueHeader ) + sizeof( QueueMemFree ), // callstack
sizeof( QueueHeader ) + sizeof( QueueCallstackFrameSize ),
sizeof( QueueHeader ) + sizeof( QueueCallstackFrame ),
sizeof( QueueHeader ) + sizeof( QueueSymbolInformation ),
sizeof( QueueHeader ) + sizeof( QueueCodeInformation ),
sizeof( QueueHeader ) + sizeof( QueueSysTime ),
sizeof( QueueHeader ) + sizeof( QueueTidToPid ),
sizeof( QueueHeader ) + sizeof( QueuePlotConfig ),
sizeof( QueueHeader ) + sizeof( QueueParamSetup ),
sizeof( QueueHeader ), // param pingback
sizeof( QueueHeader ) + sizeof( QueueCpuTopology ),
// keep all QueueStringTransfer below
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // string data
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // thread name
@@ -344,7 +586,12 @@ static const size_t QueueDataSize[] = {
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // plot name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // allocated source location payload
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // callstack payload
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // callstack alloc payload
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // frame name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // frame image data
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // external name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // external thread name
sizeof( QueueHeader ) + sizeof( QueueStringTransfer ), // symbol code
};
static_assert( QueueItemSize == 32, "Queue item size not 32 bytes" );
@@ -352,6 +599,6 @@ static_assert( sizeof( QueueDataSize ) / sizeof( size_t ) == (uint8_t)QueueType:
static_assert( sizeof( void* ) <= sizeof( uint64_t ), "Pointer size > 8 bytes" );
static_assert( sizeof( void* ) == sizeof( uintptr_t ), "Pointer size != uintptr_t" );
};
}
#endif

View File

@@ -1,4 +1,3 @@
#include <algorithm>
#include <assert.h>
#include <new>
#include <stdio.h>
@@ -9,15 +8,27 @@
#include "TracyAlloc.hpp"
#include "TracySocket.hpp"
#ifdef _MSC_VER
#ifdef _WIN32
# ifndef NOMINMAX
# define NOMINMAX
# endif
# include <winsock2.h>
# include <ws2tcpip.h>
# pragma warning(disable:4244)
# pragma warning(disable:4267)
# ifdef _MSC_VER
# pragma warning(disable:4244)
# pragma warning(disable:4267)
# endif
# define poll WSAPoll
#else
# include <arpa/inet.h>
# include <sys/socket.h>
# include <sys/param.h>
# include <errno.h>
# include <fcntl.h>
# include <netinet/in.h>
# include <netdb.h>
# include <unistd.h>
# include <poll.h>
#endif
#ifndef MSG_NOSIGNAL
@@ -27,13 +38,13 @@
namespace tracy
{
#ifdef _MSC_VER
#ifdef _WIN32
typedef SOCKET socket_t;
#else
typedef int socket_t;
#endif
#ifdef _MSC_VER
#ifdef _WIN32
struct __wsinit
{
__wsinit()
@@ -53,37 +64,89 @@ void InitWinSock()
}
#endif
enum { BufSize = 128 * 1024 };
Socket::Socket()
: m_sock( -1 )
, m_buf( (char*)tracy_malloc( BufSize ) )
: m_buf( (char*)tracy_malloc( BufSize ) )
, m_bufPtr( nullptr )
, m_sock( -1 )
, m_bufLeft( 0 )
, m_ptr( nullptr )
{
#ifdef _MSC_VER
#ifdef _WIN32
InitWinSock();
#endif
}
Socket::Socket( int sock )
: m_sock( sock )
, m_buf( (char*)tracy_malloc( BufSize ) )
: m_buf( (char*)tracy_malloc( BufSize ) )
, m_bufPtr( nullptr )
, m_sock( sock )
, m_bufLeft( 0 )
, m_ptr( nullptr )
{
}
Socket::~Socket()
{
tracy_free( m_buf );
if( m_sock != -1 )
if( m_sock.load( std::memory_order_relaxed ) != -1 )
{
Close();
}
if( m_ptr )
{
freeaddrinfo( m_res );
#ifdef _WIN32
closesocket( m_connSock );
#else
close( m_connSock );
#endif
}
}
bool Socket::Connect( const char* addr, const char* port )
bool Socket::Connect( const char* addr, int port )
{
assert( m_sock == -1 );
assert( !IsValid() );
if( m_ptr )
{
const auto c = connect( m_connSock, m_ptr->ai_addr, m_ptr->ai_addrlen );
assert( c == -1 );
#if defined _WIN32 || defined __CYGWIN__
const auto err = WSAGetLastError();
if( err == WSAEALREADY || err == WSAEINPROGRESS ) return false;
if( err != WSAEISCONN )
{
freeaddrinfo( m_res );
closesocket( m_connSock );
m_ptr = nullptr;
return false;
}
#else
if( errno == EALREADY || errno == EINPROGRESS ) return false;
if( errno != EISCONN )
{
freeaddrinfo( m_res );
close( m_connSock );
m_ptr = nullptr;
return false;
}
#endif
#if defined _WIN32 || defined __CYGWIN__
u_long nonblocking = 0;
ioctlsocket( m_connSock, FIONBIO, &nonblocking );
#else
int flags = fcntl( m_connSock, F_GETFL, 0 );
fcntl( m_connSock, F_SETFL, flags & ~O_NONBLOCK );
#endif
m_sock.store( m_connSock, std::memory_order_relaxed );
freeaddrinfo( m_res );
m_ptr = nullptr;
return true;
}
struct addrinfo hints;
struct addrinfo *res, *ptr;
@@ -92,52 +155,87 @@ bool Socket::Connect( const char* addr, const char* port )
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
if( getaddrinfo( addr, port, &hints, &res ) != 0 ) return false;
char portbuf[32];
sprintf( portbuf, "%i", port );
if( getaddrinfo( addr, portbuf, &hints, &res ) != 0 ) return false;
int sock = 0;
for( ptr = res; ptr; ptr = ptr->ai_next )
{
if( ( sock = socket( ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol ) ) == -1 ) continue;
#if defined __APPLE__
int val = 1;
setsockopt( m_sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
if( connect( sock, ptr->ai_addr, ptr->ai_addrlen ) == -1 )
{
#ifdef _MSC_VER
closesocket( sock );
#if defined _WIN32 || defined __CYGWIN__
u_long nonblocking = 1;
ioctlsocket( sock, FIONBIO, &nonblocking );
#else
close( sock );
int flags = fcntl( sock, F_GETFL, 0 );
fcntl( sock, F_SETFL, flags | O_NONBLOCK );
#endif
continue;
if( connect( sock, ptr->ai_addr, ptr->ai_addrlen ) == 0 )
{
break;
}
break;
else
{
#if defined _WIN32 || defined __CYGWIN__
const auto err = WSAGetLastError();
if( err != WSAEWOULDBLOCK )
{
closesocket( sock );
continue;
}
#else
if( errno != EINPROGRESS )
{
close( sock );
continue;
}
#endif
}
m_res = res;
m_ptr = ptr;
m_connSock = sock;
return false;
}
freeaddrinfo( res );
if( !ptr ) return false;
m_sock = sock;
#if defined _WIN32 || defined __CYGWIN__
u_long nonblocking = 0;
ioctlsocket( sock, FIONBIO, &nonblocking );
#else
int flags = fcntl( sock, F_GETFL, 0 );
fcntl( sock, F_SETFL, flags & ~O_NONBLOCK );
#endif
m_sock.store( sock, std::memory_order_relaxed );
return true;
}
void Socket::Close()
{
assert( m_sock != -1 );
#ifdef _MSC_VER
closesocket( m_sock );
const auto sock = m_sock.load( std::memory_order_relaxed );
assert( sock != -1 );
#ifdef _WIN32
closesocket( sock );
#else
close( m_sock );
close( sock );
#endif
m_sock = -1;
m_sock.store( -1, std::memory_order_relaxed );
}
int Socket::Send( const void* _buf, int len )
{
const auto sock = m_sock.load( std::memory_order_relaxed );
auto buf = (const char*)_buf;
assert( m_sock != -1 );
assert( sock != -1 );
auto start = buf;
while( len > 0 )
{
auto ret = send( m_sock, buf, len, MSG_NOSIGNAL );
auto ret = send( sock, buf, len, MSG_NOSIGNAL );
if( ret == -1 ) return -1;
len -= ret;
buf += ret;
@@ -145,7 +243,21 @@ int Socket::Send( const void* _buf, int len )
return int( buf - start );
}
int Socket::RecvBuffered( void* buf, int len, const timeval* tv )
int Socket::GetSendBufSize()
{
const auto sock = m_sock.load( std::memory_order_relaxed );
int bufSize;
#if defined _WIN32 || defined __CYGWIN__
int sz = sizeof( bufSize );
getsockopt( sock, SOL_SOCKET, SO_SNDBUF, (char*)&bufSize, &sz );
#else
socklen_t sz = sizeof( bufSize );
getsockopt( sock, SOL_SOCKET, SO_SNDBUF, &bufSize, &sz );
#endif
return bufSize;
}
int Socket::RecvBuffered( void* buf, int len, int timeout )
{
if( len <= m_bufLeft )
{
@@ -163,35 +275,30 @@ int Socket::RecvBuffered( void* buf, int len, const timeval* tv )
return ret;
}
if( len >= BufSize ) return Recv( buf, len, tv );
if( len >= BufSize ) return Recv( buf, len, timeout );
m_bufLeft = Recv( m_buf, BufSize, tv );
m_bufLeft = Recv( m_buf, BufSize, timeout );
if( m_bufLeft <= 0 ) return m_bufLeft;
const auto sz = std::min( len, m_bufLeft );
const auto sz = len < m_bufLeft ? len : m_bufLeft;
memcpy( buf, m_buf, sz );
m_bufPtr = m_buf + sz;
m_bufLeft -= sz;
return sz;
}
int Socket::Recv( void* _buf, int len, const timeval* tv )
int Socket::Recv( void* _buf, int len, int timeout )
{
const auto sock = m_sock.load( std::memory_order_relaxed );
auto buf = (char*)_buf;
fd_set fds;
FD_ZERO( &fds );
FD_SET( static_cast<socket_t>(m_sock), &fds );
struct pollfd fd;
fd.fd = (socket_t)sock;
fd.events = POLLIN;
#ifndef _WIN32
timeval _tv = *tv;
select( m_sock+1, &fds, nullptr, nullptr, &_tv );
#else
select( m_sock+1, &fds, nullptr, nullptr, tv );
#endif
if( FD_ISSET( m_sock, &fds ) )
if( poll( &fd, 1, timeout ) > 0 )
{
return recv( m_sock, buf, len, 0 );
return recv( sock, buf, len, 0 );
}
else
{
@@ -199,42 +306,45 @@ int Socket::Recv( void* _buf, int len, const timeval* tv )
}
}
bool Socket::Read( void* _buf, int len, const timeval* tv, std::function<bool()> exitCb )
bool Socket::Read( void* buf, int len, int timeout )
{
auto buf = (char*)_buf;
auto cbuf = (char*)buf;
while( len > 0 )
{
if( exitCb() ) return false;
const auto sz = RecvBuffered( buf, len, tv );
switch( sz )
{
case 0:
return false;
case -1:
#ifdef _WIN32
{
auto err = WSAGetLastError();
if( err == WSAECONNABORTED || err == WSAECONNRESET ) return false;
}
#endif
break;
default:
len -= sz;
buf += sz;
break;
}
if( !ReadImpl( cbuf, len, timeout ) ) return false;
}
return true;
}
bool Socket::ReadRaw( void* _buf, int len, const timeval* tv )
bool Socket::ReadImpl( char*& buf, int& len, int timeout )
{
const auto sz = RecvBuffered( buf, len, timeout );
switch( sz )
{
case 0:
return false;
case -1:
#ifdef _WIN32
{
auto err = WSAGetLastError();
if( err == WSAECONNABORTED || err == WSAECONNRESET ) return false;
}
#endif
break;
default:
len -= sz;
buf += sz;
break;
}
return true;
}
bool Socket::ReadRaw( void* _buf, int len, int timeout )
{
auto buf = (char*)_buf;
while( len > 0 )
{
const auto sz = Recv( buf, len, tv );
const auto sz = Recv( buf, len, timeout );
if( sz <= 0 ) return false;
len -= sz;
buf += sz;
@@ -244,32 +354,36 @@ bool Socket::ReadRaw( void* _buf, int len, const timeval* tv )
bool Socket::HasData()
{
const auto sock = m_sock.load( std::memory_order_relaxed );
if( m_bufLeft > 0 ) return true;
struct timeval tv;
memset( &tv, 0, sizeof( tv ) );
struct pollfd fd;
fd.fd = (socket_t)sock;
fd.events = POLLIN;
fd_set fds;
FD_ZERO( &fds );
FD_SET( static_cast<socket_t>(m_sock), &fds );
return poll( &fd, 1, 0 ) > 0;
}
return select( m_sock+1, &fds, nullptr, nullptr, &tv ) > 0;
bool Socket::IsValid() const
{
return m_sock.load( std::memory_order_relaxed ) >= 0;
}
ListenSocket::ListenSocket()
: m_sock( -1 )
{
#ifdef _MSC_VER
#ifdef _WIN32
InitWinSock();
#endif
}
ListenSocket::~ListenSocket()
{
if( m_sock != -1 ) Close();
}
bool ListenSocket::Listen( const char* port, int backlog )
bool ListenSocket::Listen( int port, int backlog )
{
assert( m_sock == -1 );
@@ -279,20 +393,40 @@ bool ListenSocket::Listen( const char* port, int backlog )
memset( &hints, 0, sizeof( hints ) );
hints.ai_family = AF_INET6;
hints.ai_socktype = SOCK_STREAM;
#ifndef TRACY_ONLY_LOCALHOST
hints.ai_flags = AI_PASSIVE;
#endif
if( getaddrinfo( nullptr, port, &hints, &res ) != 0 ) return false;
char portbuf[32];
sprintf( portbuf, "%i", port );
if( getaddrinfo( nullptr, portbuf, &hints, &res ) != 0 ) return false;
m_sock = socket( res->ai_family, res->ai_socktype, res->ai_protocol );
#if defined _MSC_VER || defined __CYGWIN__
if (m_sock == -1)
{
// IPV6 protocol may not be available/is disabled. Try to create a socket
// with the IPV4 protocol
hints.ai_family = AF_INET;
if( getaddrinfo( nullptr, portbuf, &hints, &res ) != 0 ) return false;
m_sock = socket( res->ai_family, res->ai_socktype, res->ai_protocol );
if( m_sock == -1 ) return false;
}
#if defined _WIN32 || defined __CYGWIN__
unsigned long val = 0;
setsockopt( m_sock, IPPROTO_IPV6, IPV6_V6ONLY, (const char*)&val, sizeof( val ) );
#elif defined BSD
int val = 0;
setsockopt( m_sock, IPPROTO_IPV6, IPV6_V6ONLY, (const char*)&val, sizeof( val ) );
val = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, &val, sizeof( val ) );
#else
int val = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, &val, sizeof( val ) );
#endif
if( bind( m_sock, res->ai_addr, res->ai_addrlen ) == -1 ) return false;
if( listen( m_sock, backlog ) == -1 ) return false;
if( bind( m_sock, res->ai_addr, res->ai_addrlen ) == -1 ) { freeaddrinfo( res ); Close(); return false; }
if( listen( m_sock, backlog ) == -1 ) { freeaddrinfo( res ); Close(); return false; }
freeaddrinfo( res );
return true;
}
@@ -301,32 +435,23 @@ Socket* ListenSocket::Accept()
struct sockaddr_storage remote;
socklen_t sz = sizeof( remote );
struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 10000;
struct pollfd fd;
fd.fd = (socket_t)m_sock;
fd.events = POLLIN;
fd_set fds;
FD_ZERO( &fds );
FD_SET( static_cast<socket_t>(m_sock), &fds );
select( m_sock+1, &fds, nullptr, nullptr, &tv );
if( FD_ISSET( m_sock, &fds ) )
if( poll( &fd, 1, 10 ) > 0 )
{
int sock = accept( m_sock, (sockaddr*)&remote, &sz);
if( sock == -1 ) return nullptr;
#if defined __APPLE__
int val = 1;
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
if( sock == -1 )
{
return nullptr;
}
else
{
auto ptr = (Socket*)tracy_malloc( sizeof( Socket ) );
new(ptr) Socket( sock );
return ptr;
}
auto ptr = (Socket*)tracy_malloc( sizeof( Socket ) );
new(ptr) Socket( sock );
return ptr;
}
else
{
@@ -337,7 +462,7 @@ Socket* ListenSocket::Accept()
void ListenSocket::Close()
{
assert( m_sock != -1 );
#ifdef _MSC_VER
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
@@ -345,4 +470,200 @@ void ListenSocket::Close()
m_sock = -1;
}
UdpBroadcast::UdpBroadcast()
: m_sock( -1 )
{
#ifdef _WIN32
InitWinSock();
#endif
}
UdpBroadcast::~UdpBroadcast()
{
if( m_sock != -1 ) Close();
}
bool UdpBroadcast::Open( const char* addr, int port )
{
assert( m_sock == -1 );
struct addrinfo hints;
struct addrinfo *res, *ptr;
memset( &hints, 0, sizeof( hints ) );
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_DGRAM;
char portbuf[32];
sprintf( portbuf, "%i", port );
if( getaddrinfo( addr, portbuf, &hints, &res ) != 0 ) return false;
int sock = 0;
for( ptr = res; ptr; ptr = ptr->ai_next )
{
if( ( sock = socket( ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol ) ) == -1 ) continue;
#if defined __APPLE__
int val = 1;
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
#if defined _WIN32 || defined __CYGWIN__
unsigned long broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, (const char*)&broadcast, sizeof( broadcast ) ) == -1 )
#else
int broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof( broadcast ) ) == -1 )
#endif
{
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
#endif
continue;
}
break;
}
freeaddrinfo( res );
if( !ptr ) return false;
m_sock = sock;
return true;
}
void UdpBroadcast::Close()
{
assert( m_sock != -1 );
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
#endif
m_sock = -1;
}
int UdpBroadcast::Send( int port, const void* data, int len )
{
assert( m_sock != -1 );
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons( port );
addr.sin_addr.s_addr = INADDR_BROADCAST;
return sendto( m_sock, (const char*)data, len, MSG_NOSIGNAL, (sockaddr*)&addr, sizeof( addr ) );
}
IpAddress::IpAddress()
: m_number( 0 )
{
*m_text = '\0';
}
IpAddress::~IpAddress()
{
}
void IpAddress::Set( const struct sockaddr& addr )
{
#if __MINGW32__
auto ai = (struct sockaddr_in*)&addr;
#else
auto ai = (const struct sockaddr_in*)&addr;
#endif
inet_ntop( AF_INET, &ai->sin_addr, m_text, 17 );
m_number = ai->sin_addr.s_addr;
}
UdpListen::UdpListen()
: m_sock( -1 )
{
#ifdef _WIN32
InitWinSock();
#endif
}
UdpListen::~UdpListen()
{
if( m_sock != -1 ) Close();
}
bool UdpListen::Listen( int port )
{
assert( m_sock == -1 );
int sock;
if( ( sock = socket( AF_INET, SOCK_DGRAM, 0 ) ) == -1 ) return false;
#if defined __APPLE__
int val = 1;
setsockopt( sock, SOL_SOCKET, SO_NOSIGPIPE, &val, sizeof( val ) );
#endif
#if defined _WIN32 || defined __CYGWIN__
unsigned long reuse = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, (const char*)&reuse, sizeof( reuse ) );
#else
int reuse = 1;
setsockopt( m_sock, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof( reuse ) );
#endif
#if defined _WIN32 || defined __CYGWIN__
unsigned long broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, (const char*)&broadcast, sizeof( broadcast ) ) == -1 )
#else
int broadcast = 1;
if( setsockopt( sock, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof( broadcast ) ) == -1 )
#endif
{
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
#endif
return false;
}
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons( port );
addr.sin_addr.s_addr = INADDR_ANY;
if( bind( sock, (sockaddr*)&addr, sizeof( addr ) ) == -1 )
{
#ifdef _WIN32
closesocket( sock );
#else
close( sock );
#endif
return false;
}
m_sock = sock;
return true;
}
void UdpListen::Close()
{
assert( m_sock != -1 );
#ifdef _WIN32
closesocket( m_sock );
#else
close( m_sock );
#endif
m_sock = -1;
}
const char* UdpListen::Read( size_t& len, IpAddress& addr )
{
static char buf[2048];
struct pollfd fd;
fd.fd = (socket_t)m_sock;
fd.events = POLLIN;
if( poll( &fd, 1, 10 ) <= 0 ) return nullptr;
sockaddr sa;
socklen_t salen = sizeof( struct sockaddr );
len = (size_t)recvfrom( m_sock, buf, 2048, 0, &sa, &salen );
addr.Set( sa );
return buf;
}
}

View File

@@ -1,34 +1,51 @@
#ifndef __TRACYSOCKET_HPP__
#define __TRACYSOCKET_HPP__
#include <functional>
#include <atomic>
#include <stdint.h>
struct timeval;
#include "TracyForceInline.hpp"
struct addrinfo;
struct sockaddr;
namespace tracy
{
#ifdef _MSC_VER
#ifdef _WIN32
void InitWinSock();
#endif
class Socket
{
enum { BufSize = 128 * 1024 };
public:
Socket();
Socket( int sock );
~Socket();
bool Connect( const char* addr, const char* port );
bool Connect( const char* addr, int port );
void Close();
int Send( const void* buf, int len );
int GetSendBufSize();
bool Read( void* buf, int len, const timeval* tv, std::function<bool()> exitCb );
bool ReadRaw( void* buf, int len, const timeval* tv );
bool Read( void* buf, int len, int timeout );
template<typename ShouldExit>
bool Read( void* buf, int len, int timeout, ShouldExit exitCb )
{
auto cbuf = (char*)buf;
while( len > 0 )
{
if( exitCb() ) return false;
if( !ReadImpl( cbuf, len, timeout ) ) return false;
}
return true;
}
bool ReadRaw( void* buf, int len, int timeout );
bool HasData();
bool IsValid() const;
Socket( const Socket& ) = delete;
Socket( Socket&& ) = delete;
@@ -36,14 +53,19 @@ public:
Socket& operator=( Socket&& ) = delete;
private:
int RecvBuffered( void* buf, int len, const timeval* tv );
int Recv( void* buf, int len, const timeval* tv );
int RecvBuffered( void* buf, int len, int timeout );
int Recv( void* buf, int len, int timeout );
int m_sock;
bool ReadImpl( char*& buf, int& len, int timeout );
char* m_buf;
char* m_bufPtr;
std::atomic<int> m_sock;
int m_bufLeft;
struct addrinfo *m_res;
struct addrinfo *m_ptr;
int m_connSock;
};
class ListenSocket
@@ -52,7 +74,7 @@ public:
ListenSocket();
~ListenSocket();
bool Listen( const char* port, int backlog );
bool Listen( int port, int backlog );
Socket* Accept();
void Close();
@@ -65,6 +87,67 @@ private:
int m_sock;
};
class UdpBroadcast
{
public:
UdpBroadcast();
~UdpBroadcast();
bool Open( const char* addr, int port );
void Close();
int Send( int port, const void* data, int len );
UdpBroadcast( const UdpBroadcast& ) = delete;
UdpBroadcast( UdpBroadcast&& ) = delete;
UdpBroadcast& operator=( const UdpBroadcast& ) = delete;
UdpBroadcast& operator=( UdpBroadcast&& ) = delete;
private:
int m_sock;
};
class IpAddress
{
public:
IpAddress();
~IpAddress();
void Set( const struct sockaddr& addr );
uint32_t GetNumber() const { return m_number; }
const char* GetText() const { return m_text; }
IpAddress( const IpAddress& ) = delete;
IpAddress( IpAddress&& ) = delete;
IpAddress& operator=( const IpAddress& ) = delete;
IpAddress& operator=( IpAddress&& ) = delete;
private:
uint32_t m_number;
char m_text[17];
};
class UdpListen
{
public:
UdpListen();
~UdpListen();
bool Listen( int port );
void Close();
const char* Read( size_t& len, IpAddress& addr );
UdpListen( const UdpListen& ) = delete;
UdpListen( UdpListen&& ) = delete;
UdpListen& operator=( const UdpListen& ) = delete;
UdpListen& operator=( UdpListen&& ) = delete;
private:
int m_sock;
};
}
#endif

View File

@@ -6,7 +6,10 @@
# define NOMINMAX
# endif
#endif
#ifdef _WIN32
#ifdef _MSC_VER
# pragma warning(disable:4996)
#endif
#if defined _WIN32 || defined __CYGWIN__
# include <windows.h>
#else
# include <pthread.h>
@@ -15,18 +18,33 @@
#endif
#ifdef __linux__
# ifndef __ANDROID__
# include <syscall.h>
# endif
# include <fcntl.h>
# ifdef __ANDROID__
# include <sys/types.h>
# else
# include <sys/syscall.h>
# endif
# include <fcntl.h>
#elif defined __FreeBSD__
# include <sys/thr.h>
#elif defined __NetBSD__ || defined __DragonFly__
# include <sys/lwp.h>
#endif
#ifdef __MINGW32__
# define __STDC_FORMAT_MACROS
#endif
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include "TracySystem.hpp"
#ifdef TRACY_COLLECT_THREAD_NAMES
#if defined _WIN32 || defined __CYGWIN__
extern "C" typedef HRESULT (WINAPI *t_SetThreadDescription)( HANDLE, PCWSTR );
extern "C" typedef HRESULT (WINAPI *t_GetThreadDescription)( HANDLE, PWSTR* );
#endif
#ifdef TRACY_ENABLE
# include <atomic>
# include "TracyAlloc.hpp"
#endif
@@ -34,98 +52,129 @@
namespace tracy
{
#ifdef TRACY_COLLECT_THREAD_NAMES
namespace detail
{
TRACY_API uint64_t GetThreadHandleImpl()
{
#if defined _WIN32 || defined __CYGWIN__
static_assert( sizeof( decltype( GetCurrentThreadId() ) ) <= sizeof( uint64_t ), "Thread handle too big to fit in protocol" );
return uint64_t( GetCurrentThreadId() );
#elif defined __APPLE__
uint64_t id;
pthread_threadid_np( pthread_self(), &id );
return id;
#elif defined __ANDROID__
return (uint64_t)gettid();
#elif defined __linux__
return (uint64_t)syscall( SYS_gettid );
#elif defined __FreeBSD__
long id;
thr_self( &id );
return id;
#elif defined __NetBSD__
return _lwp_self();
#elif defined __DragonFly__
return lwp_gettid();
#elif defined __OpenBSD__
return getthrid();
#else
static_assert( sizeof( decltype( pthread_self() ) ) <= sizeof( uint64_t ), "Thread handle too big to fit in protocol" );
return uint64_t( pthread_self() );
#endif
}
}
#ifdef TRACY_ENABLE
struct ThreadNameData
{
uint64_t id;
const char* name;
ThreadNameData* next;
};
extern std::atomic<ThreadNameData*>& s_threadNameData;
std::atomic<ThreadNameData*>& GetThreadNameData();
TRACY_API void InitRPMallocThread();
#endif
void SetThreadName( std::thread& thread, const char* name )
TRACY_API void SetThreadName( const char* name )
{
SetThreadName( thread.native_handle(), name );
}
void SetThreadName( std::thread::native_handle_type handle, const char* name )
{
#ifdef _WIN32
# if defined NTDDI_WIN10_RS2 && NTDDI_VERSION >= NTDDI_WIN10_RS2
wchar_t buf[256];
mbstowcs( buf, name, 256 );
SetThreadDescription( static_cast<HANDLE>( handle ), buf );
# else
const DWORD MS_VC_EXCEPTION=0x406D1388;
# pragma pack( push, 8 )
struct THREADNAME_INFO
#if defined _WIN32 || defined __CYGWIN__
static auto _SetThreadDescription = (t_SetThreadDescription)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "SetThreadDescription" );
if( _SetThreadDescription )
{
DWORD dwType;
LPCSTR szName;
DWORD dwThreadID;
DWORD dwFlags;
};
wchar_t buf[256];
mbstowcs( buf, name, 256 );
_SetThreadDescription( GetCurrentThread(), buf );
}
else
{
# if defined _MSC_VER
const DWORD MS_VC_EXCEPTION=0x406D1388;
# pragma pack( push, 8 )
struct THREADNAME_INFO
{
DWORD dwType;
LPCSTR szName;
DWORD dwThreadID;
DWORD dwFlags;
};
# pragma pack(pop)
DWORD ThreadId = GetThreadId( static_cast<HANDLE>( handle ) );
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = name;
info.dwThreadID = ThreadId;
info.dwFlags = 0;
DWORD ThreadId = GetCurrentThreadId();
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = name;
info.dwThreadID = ThreadId;
info.dwFlags = 0;
__try
{
RaiseException( MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(ULONG_PTR), (ULONG_PTR*)&info );
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
}
__try
{
RaiseException( MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(ULONG_PTR), (ULONG_PTR*)&info );
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
}
# endif
#elif defined _GNU_SOURCE && !defined __EMSCRIPTEN__
}
#elif defined _GNU_SOURCE && !defined __EMSCRIPTEN__ && !defined __CYGWIN__
{
const auto sz = strlen( name );
if( sz <= 15 )
{
pthread_setname_np( handle, name );
pthread_setname_np( pthread_self(), name );
}
else
{
char buf[16];
memcpy( buf, name, 15 );
buf[15] = '\0';
pthread_setname_np( handle, buf );
pthread_setname_np( pthread_self(), buf );
}
}
#endif
#ifdef TRACY_COLLECT_THREAD_NAMES
#ifdef TRACY_ENABLE
{
rpmalloc_thread_initialize();
InitRPMallocThread();
const auto sz = strlen( name );
char* buf = (char*)tracy_malloc( sz+1 );
memcpy( buf, name, sz );
buf[sz+1] = '\0';
buf[sz] = '\0';
auto data = (ThreadNameData*)tracy_malloc( sizeof( ThreadNameData ) );
# ifdef _WIN32
data->id = GetThreadId( static_cast<HANDLE>( handle ) );
# elif defined __APPLE__
pthread_threadid_np( handle, &data->id );
# else
data->id = (uint64_t)handle;
# endif
data->id = detail::GetThreadHandleImpl();
data->name = buf;
data->next = s_threadNameData.load( std::memory_order_relaxed );
while( !s_threadNameData.compare_exchange_weak( data->next, data, std::memory_order_release, std::memory_order_relaxed ) ) {}
data->next = GetThreadNameData().load( std::memory_order_relaxed );
while( !GetThreadNameData().compare_exchange_weak( data->next, data, std::memory_order_release, std::memory_order_relaxed ) ) {}
}
#endif
}
const char* GetThreadName( uint64_t id )
TRACY_API const char* GetThreadName( uint64_t id )
{
static char buf[256];
#ifdef TRACY_COLLECT_THREAD_NAMES
auto ptr = s_threadNameData.load( std::memory_order_relaxed );
#ifdef TRACY_ENABLE
auto ptr = GetThreadNameData().load( std::memory_order_relaxed );
while( ptr )
{
if( ptr->id == id )
@@ -135,26 +184,23 @@ const char* GetThreadName( uint64_t id )
ptr = ptr->next;
}
#else
# ifdef _WIN32
# if defined NTDDI_WIN10_RS2 && NTDDI_VERSION >= NTDDI_WIN10_RS2
auto hnd = OpenThread( THREAD_QUERY_LIMITED_INFORMATION, FALSE, (DWORD)id );
if( hnd != 0 )
# if defined _WIN32 || defined __CYGWIN__
static auto _GetThreadDescription = (t_GetThreadDescription)GetProcAddress( GetModuleHandleA( "kernel32.dll" ), "GetThreadDescription" );
if( _GetThreadDescription )
{
PWSTR tmp;
GetThreadDescription( hnd, &tmp );
auto ret = wcstombs( buf, tmp, 256 );
CloseHandle( hnd );
if( ret != 0 )
auto hnd = OpenThread( THREAD_QUERY_LIMITED_INFORMATION, FALSE, (DWORD)id );
if( hnd != 0 )
{
return buf;
PWSTR tmp;
_GetThreadDescription( hnd, &tmp );
auto ret = wcstombs( buf, tmp, 256 );
CloseHandle( hnd );
if( ret != 0 )
{
return buf;
}
}
}
# endif
# elif defined __GLIBC__ && !defined __ANDROID__ && !defined __EMSCRIPTEN__
if( pthread_getname_np( (pthread_t)id, buf, 256 ) == 0 )
{
return buf;
}
# elif defined __linux__
int cs, fd;
char path[32];
@@ -170,8 +216,14 @@ const char* GetThreadName( uint64_t id )
# endif
if ( ( fd = open( path, O_RDONLY ) ) > 0) {
int len = read( fd, buf, 255 );
if ( len > 0 )
if( len > 0 )
{
buf[len] = 0;
if( len > 1 && buf[len-1] == '\n' )
{
buf[len-1] = 0;
}
}
close( fd );
}
# ifndef __ANDROID__

View File

@@ -1,44 +1,29 @@
#ifndef __TRACYSYSTEM_HPP__
#define __TRACYSYSTEM_HPP__
#ifdef TRACY_ENABLE
# if defined __ANDROID__ || defined __CYGWIN__ || defined __APPLE__ || defined _GNU_SOURCE || ( defined _WIN32 && ( !defined NTDDI_WIN10_RS2 || NTDDI_VERSION < NTDDI_WIN10_RS2 ) )
# define TRACY_COLLECT_THREAD_NAMES
# endif
#endif
#ifdef _WIN32
#ifndef _WINDOWS_
extern "C" __declspec(dllimport) unsigned long __stdcall GetCurrentThreadId(void);
#endif
#else
# include <pthread.h>
#endif
#include <stdint.h>
#include <thread>
#include "TracyApi.h"
namespace tracy
{
namespace detail
{
TRACY_API uint64_t GetThreadHandleImpl();
}
#ifdef TRACY_ENABLE
TRACY_API uint64_t GetThreadHandle();
#else
static inline uint64_t GetThreadHandle()
{
#ifdef _WIN32
static_assert( sizeof( decltype( GetCurrentThreadId() ) ) <= sizeof( uint64_t ), "Thread handle too big to fit in protocol" );
return uint64_t( GetCurrentThreadId() );
#elif defined __APPLE__
uint64_t id;
pthread_threadid_np( pthread_self(), &id );
return id;
#else
static_assert( sizeof( decltype( pthread_self() ) ) <= sizeof( uint64_t ), "Thread handle too big to fit in protocol" );
return uint64_t( pthread_self() );
#endif
return detail::GetThreadHandleImpl();
}
#endif
void SetThreadName( std::thread& thread, const char* name );
void SetThreadName( std::thread::native_handle_type handle, const char* name );
const char* GetThreadName( uint64_t id );
TRACY_API void SetThreadName( const char* name );
TRACY_API const char* GetThreadName( uint64_t id );
}

View File

@@ -1,68 +0,0 @@
// Copyright (c) 2015 Jeff Preshing
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#ifndef __TRACY_CPP11OM_BENAPHORE_H__
#define __TRACY_CPP11OM_BENAPHORE_H__
#include <cassert>
#include <thread>
#include <atomic>
#include "tracy_sema.h"
namespace tracy
{
class NonRecursiveBenaphore
{
private:
std::atomic<int> m_contentionCount;
DefaultSemaphoreType m_sema;
public:
NonRecursiveBenaphore() : m_contentionCount(0) {}
void lock()
{
if (m_contentionCount.fetch_add(1, std::memory_order_acquire) > 0)
{
m_sema.wait();
}
}
bool try_lock()
{
if (m_contentionCount.load(std::memory_order_relaxed) != 0)
return false;
int expected = 0;
return m_contentionCount.compare_exchange_strong(expected, 1, std::memory_order_acquire);
}
void unlock()
{
int oldCount = m_contentionCount.fetch_sub(1, std::memory_order_release);
assert(oldCount > 0);
if (oldCount > 1)
{
m_sema.signal();
}
}
};
}
#endif // __CPP11OM_BENAPHORE_H__

File diff suppressed because it is too large Load Diff

View File

@@ -52,19 +52,23 @@ namespace tracy
multiple GB/s per core, typically reaching RAM speed limits on multi-core systems.
The LZ4 compression library provides in-memory compression and decompression functions.
It gives full buffer control to user.
Compression can be done in:
- a single step (described as Simple Functions)
- a single step, reusing a context (described in Advanced Functions)
- unbounded multiple steps (described as Streaming compression)
lz4.h provides block compression functions. It gives full buffer control to user.
Decompressing an lz4-compressed block also requires metadata (such as compressed size).
Each application is free to encode such metadata in whichever way it wants.
lz4.h generates and decodes LZ4-compressed blocks (doc/lz4_Block_format.md).
Decompressing a block requires additional metadata, such as its compressed size.
Each application is free to encode and pass such metadata in whichever way it wants.
An additional format, called LZ4 frame specification (doc/lz4_Frame_format.md),
take care of encoding standard metadata alongside LZ4-compressed blocks.
Frame format is required for interoperability.
It is delivered through a companion API, declared in lz4frame.h.
lz4.h only handle blocks, it can not generate Frames.
Blocks are different from Frames (doc/lz4_Frame_format.md).
Frames bundle both blocks and metadata in a specified manner.
This are required for compressed data to be self-contained and portable.
Frame format is delivered through a companion API, declared in lz4frame.h.
Note that the `lz4` CLI can only manage frames.
*/
/*^***************************************************************
@@ -93,8 +97,8 @@ namespace tracy
/*------ Version ------*/
#define LZ4_VERSION_MAJOR 1 /* for breaking interface changes */
#define LZ4_VERSION_MINOR 8 /* for new (non-breaking) interface capabilities */
#define LZ4_VERSION_RELEASE 3 /* for tweaks, bug-fixes, or development */
#define LZ4_VERSION_MINOR 9 /* for new (non-breaking) interface capabilities */
#define LZ4_VERSION_RELEASE 1 /* for tweaks, bug-fixes, or development */
#define LZ4_VERSION_NUMBER (LZ4_VERSION_MAJOR *100*100 + LZ4_VERSION_MINOR *100 + LZ4_VERSION_RELEASE)
@@ -104,7 +108,7 @@ namespace tracy
#define LZ4_VERSION_STRING LZ4_EXPAND_AND_QUOTE(LZ4_LIB_VERSION)
LZ4LIB_API int LZ4_versionNumber (void); /**< library version number; useful to check dll version */
LZ4LIB_API const char* LZ4_versionString (void); /**< library version string; unseful to check dll version */
LZ4LIB_API const char* LZ4_versionString (void); /**< library version string; useful to check dll version */
/*-************************************
@@ -113,14 +117,15 @@ LZ4LIB_API const char* LZ4_versionString (void); /**< library version string;
/*!
* LZ4_MEMORY_USAGE :
* Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
* Increasing memory usage improves compression ratio
* Reduced memory usage may improve speed, thanks to cache effect
* Increasing memory usage improves compression ratio.
* Reduced memory usage may improve speed, thanks to better cache locality.
* Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache
*/
#ifndef LZ4_MEMORY_USAGE
# define LZ4_MEMORY_USAGE 12
# define LZ4_MEMORY_USAGE 14
#endif
/*-************************************
* Simple Functions
**************************************/
@@ -131,21 +136,22 @@ LZ4LIB_API const char* LZ4_versionString (void); /**< library version string;
It also runs faster, so it's a recommended setting.
If the function cannot compress 'src' into a more limited 'dst' budget,
compression stops *immediately*, and the function result is zero.
Note : as a consequence, 'dst' content is not valid.
Note 2 : This function is protected against buffer overflow scenarios (never writes outside 'dst' buffer, nor read outside 'source' buffer).
In which case, 'dst' content is undefined (invalid).
srcSize : max supported value is LZ4_MAX_INPUT_SIZE.
dstCapacity : size of buffer 'dst' (which must be already allocated)
return : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
or 0 if compression fails */
@return : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
or 0 if compression fails
Note : This function is protected against buffer overflow scenarios (never writes outside 'dst' buffer, nor read outside 'source' buffer).
*/
LZ4LIB_API int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);
/*! LZ4_decompress_safe() :
compressedSize : is the exact complete size of the compressed block.
dstCapacity : is the size of destination buffer, which must be already allocated.
return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
@return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
If destination buffer is not large enough, decoding will stop and output an error code (negative value).
If the source stream is detected malformed, the function will stop decoding and return a negative result.
This function is protected against malicious data packets.
Note : This function is protected against malicious data packets (never writes outside 'dst' buffer, nor read outside 'source' buffer).
*/
LZ4LIB_API int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);
@@ -156,8 +162,7 @@ LZ4LIB_API int LZ4_decompress_safe (const char* src, char* dst, int compressedSi
#define LZ4_MAX_INPUT_SIZE 0x7E000000 /* 2 113 929 216 bytes */
#define LZ4_COMPRESSBOUND(isize) ((unsigned)(isize) > (unsigned)LZ4_MAX_INPUT_SIZE ? 0 : (isize) + ((isize)/255) + 16)
/*!
LZ4_compressBound() :
/*! LZ4_compressBound() :
Provides the maximum size that LZ4 compression may output in a "worst case" scenario (input data not compressible)
This function is primarily useful for memory allocation purposes (destination buffer size).
Macro LZ4_COMPRESSBOUND() is also provided for compilation-time evaluation (stack memory allocation for example).
@@ -168,8 +173,7 @@ LZ4_compressBound() :
*/
LZ4LIB_API int LZ4_compressBound(int inputSize);
/*!
LZ4_compress_fast() :
/*! LZ4_compress_fast() :
Same as LZ4_compress_default(), but allows selection of "acceleration" factor.
The larger the acceleration value, the faster the algorithm, but also the lesser the compression.
It's a trade-off. It can be fine tuned, with each successive value providing roughly +~3% to speed.
@@ -179,13 +183,12 @@ LZ4_compress_fast() :
LZ4LIB_API int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
/*!
LZ4_compress_fast_extState() :
Same compression function, just using an externally allocated memory space to store compression state.
Use LZ4_sizeofState() to know how much memory must be allocated,
and allocate it on 8-bytes boundaries (using malloc() typically).
Then, provide this buffer as 'void* state' to compression function.
*/
/*! LZ4_compress_fast_extState() :
* Same as LZ4_compress_fast(), using an externally allocated memory space for its state.
* Use LZ4_sizeofState() to know how much memory must be allocated,
* and allocate it on 8-bytes boundaries (using `malloc()` typically).
* Then, provide this buffer as `void* state` to compression function.
*/
LZ4LIB_API int LZ4_sizeofState(void);
LZ4LIB_API int LZ4_compress_fast_extState (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
@@ -205,27 +208,6 @@ LZ4LIB_API int LZ4_compress_fast_extState (void* state, const char* src, char* d
LZ4LIB_API int LZ4_compress_destSize (const char* src, char* dst, int* srcSizePtr, int targetDstSize);
/*! LZ4_decompress_fast() : **unsafe!**
* This function used to be a bit faster than LZ4_decompress_safe(),
* though situation has changed in recent versions,
* and now `LZ4_decompress_safe()` can be as fast and sometimes faster than `LZ4_decompress_fast()`.
* Moreover, LZ4_decompress_fast() is not protected vs malformed input, as it doesn't perform full validation of compressed data.
* As a consequence, this function is no longer recommended, and may be deprecated in future versions.
* It's only remaining specificity is that it can decompress data without knowing its compressed size.
*
* originalSize : is the uncompressed size to regenerate.
* `dst` must be already allocated, its size must be >= 'originalSize' bytes.
* @return : number of bytes read from source buffer (== compressed size).
* If the source stream is detected malformed, the function stops decoding and returns a negative result.
* note : This function requires uncompressed originalSize to be known in advance.
* The function never writes past the output buffer.
* However, since it doesn't know its 'src' size, it may read past the intended input.
* Also, because match offsets are not validated during decoding,
* reads from 'src' may underflow.
* Use this function in trusted environment **only**.
*/
LZ4LIB_API int LZ4_decompress_fast (const char* src, char* dst, int originalSize);
/*! LZ4_decompress_safe_partial() :
* Decompress an LZ4 compressed block, of size 'srcSize' at position 'src',
* into destination buffer 'dst' of size 'dstCapacity'.
@@ -258,30 +240,49 @@ LZ4LIB_API int LZ4_decompress_safe_partial (const char* src, char* dst, int srcS
***********************************************/
typedef union LZ4_stream_u LZ4_stream_t; /* incomplete type (defined later) */
/*! LZ4_createStream() and LZ4_freeStream() :
* LZ4_createStream() will allocate and initialize an `LZ4_stream_t` structure.
* LZ4_freeStream() releases its memory.
*/
LZ4LIB_API LZ4_stream_t* LZ4_createStream(void);
LZ4LIB_API int LZ4_freeStream (LZ4_stream_t* streamPtr);
/*! LZ4_resetStream() :
* An LZ4_stream_t structure can be allocated once and re-used multiple times.
* Use this function to start compressing a new stream.
/*! LZ4_resetStream_fast() : v1.9.0+
* Use this to prepare an LZ4_stream_t for a new chain of dependent blocks
* (e.g., LZ4_compress_fast_continue()).
*
* An LZ4_stream_t must be initialized once before usage.
* This is automatically done when created by LZ4_createStream().
* However, should the LZ4_stream_t be simply declared on stack (for example),
* it's necessary to initialize it first, using LZ4_initStream().
*
* After init, start any new stream with LZ4_resetStream_fast().
* A same LZ4_stream_t can be re-used multiple times consecutively
* and compress multiple streams,
* provided that it starts each new stream with LZ4_resetStream_fast().
*
* LZ4_resetStream_fast() is much faster than LZ4_initStream(),
* but is not compatible with memory regions containing garbage data.
*
* Note: it's only useful to call LZ4_resetStream_fast()
* in the context of streaming compression.
* The *extState* functions perform their own resets.
* Invoking LZ4_resetStream_fast() before is redundant, and even counterproductive.
*/
LZ4LIB_API void LZ4_resetStream (LZ4_stream_t* streamPtr);
LZ4LIB_API void LZ4_resetStream_fast (LZ4_stream_t* streamPtr);
/*! LZ4_loadDict() :
* Use this function to load a static dictionary into LZ4_stream_t.
* Any previous data will be forgotten, only 'dictionary' will remain in memory.
* Use this function to reference a static dictionary into LZ4_stream_t.
* The dictionary must remain available during compression.
* LZ4_loadDict() triggers a reset, so any previous data will be forgotten.
* The same dictionary will have to be loaded on decompression side for successful decoding.
* Dictionary are useful for better compression of small data (KB range).
* While LZ4 accept any input as dictionary,
* results are generally better when using Zstandard's Dictionary Builder.
* Loading a size of 0 is allowed, and is the same as reset.
* @return : dictionary size, in bytes (necessarily <= 64 KB)
* @return : loaded dictionary size, in bytes (necessarily <= 64 KB)
*/
LZ4LIB_API int LZ4_loadDict (LZ4_stream_t* streamPtr, const char* dictionary, int dictSize);
/*! LZ4_compress_fast_continue() :
* Compress 'src' content using data from previously compressed blocks, for better compression ratio.
* 'dst' buffer must be already allocated.
* 'dst' buffer must be already allocated.
* If dstCapacity >= LZ4_compressBound(srcSize), compression is guaranteed to succeed, and runs faster.
*
* @return : size of compressed block
@@ -289,10 +290,10 @@ LZ4LIB_API int LZ4_loadDict (LZ4_stream_t* streamPtr, const char* dictionary, in
*
* Note 1 : Each invocation to LZ4_compress_fast_continue() generates a new block.
* Each block has precise boundaries.
* Each block must be decompressed separately, calling LZ4_decompress_*() with relevant metadata.
* It's not possible to append blocks together and expect a single invocation of LZ4_decompress_*() to decompress them together.
* Each block must be decompressed separately, calling LZ4_decompress_*() with associated metadata.
*
* Note 2 : The previous 64KB of source data is __assumed__ to remain present, unmodified, at same address in memory!
* Note 2 : The previous 64KB of source data is __assumed__ to remain present, unmodified, at same address in memory !
*
* Note 3 : When input is structured as a double-buffer, each buffer can have any size, including < 64 KB.
* Make sure that buffers are separated, by at least one byte.
@@ -300,7 +301,7 @@ LZ4LIB_API int LZ4_loadDict (LZ4_stream_t* streamPtr, const char* dictionary, in
*
* Note 4 : If input buffer is a ring-buffer, it can have any size, including < 64 KB.
*
* Note 5 : After an error, the stream status is invalid, it can only be reset or freed.
* Note 5 : After an error, the stream status is undefined (invalid), it can only be reset or freed.
*/
LZ4LIB_API int LZ4_compress_fast_continue (LZ4_stream_t* streamPtr, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
@@ -336,7 +337,7 @@ LZ4LIB_API int LZ4_freeStreamDecode (LZ4_streamDecode_t* LZ4_str
*/
LZ4LIB_API int LZ4_setStreamDecode (LZ4_streamDecode_t* LZ4_streamDecode, const char* dictionary, int dictSize);
/*! LZ4_decoderRingBufferSize() : v1.8.2
/*! LZ4_decoderRingBufferSize() : v1.8.2+
* Note : in a ring buffer scenario (optional),
* blocks are presumed decompressed next to each other
* up to the moment there is not enough remaining space for next block (remainingSize < maxBlockSize),
@@ -348,7 +349,7 @@ LZ4LIB_API int LZ4_setStreamDecode (LZ4_streamDecode_t* LZ4_streamDecode, const
* or 0 if there is an error (invalid maxBlockSize).
*/
LZ4LIB_API int LZ4_decoderRingBufferSize(int maxBlockSize);
#define LZ4_DECODER_RING_BUFFER_SIZE(mbs) (65536 + 14 + (mbs)) /* for static allocation; mbs presumed valid */
#define LZ4_DECODER_RING_BUFFER_SIZE(maxBlockSize) (65536 + 14 + (maxBlockSize)) /* for static allocation; maxBlockSize presumed valid */
/*! LZ4_decompress_*_continue() :
* These decoding functions allow decompression of consecutive blocks in "streaming" mode.
@@ -376,83 +377,67 @@ LZ4LIB_API int LZ4_decoderRingBufferSize(int maxBlockSize);
* then indicate where this data is saved using LZ4_setStreamDecode(), before decompressing next block.
*/
LZ4LIB_API int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int srcSize, int dstCapacity);
LZ4LIB_API int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int originalSize);
/*! LZ4_decompress_*_usingDict() :
* These decoding functions work the same as
* a combination of LZ4_setStreamDecode() followed by LZ4_decompress_*_continue()
* They are stand-alone, and don't need an LZ4_streamDecode_t structure.
* Dictionary is presumed stable : it must remain accessible and unmodified during next decompression.
* Dictionary is presumed stable : it must remain accessible and unmodified during decompression.
* Performance tip : Decompression speed can be substantially increased
* when dst == dictStart + dictSize.
*/
LZ4LIB_API int LZ4_decompress_safe_usingDict (const char* src, char* dst, int srcSize, int dstCapcity, const char* dictStart, int dictSize);
LZ4LIB_API int LZ4_decompress_fast_usingDict (const char* src, char* dst, int originalSize, const char* dictStart, int dictSize);
/*^**********************************************
/*^*************************************
* !!!!!! STATIC LINKING ONLY !!!!!!
***********************************************/
***************************************/
/*-************************************
* Unstable declarations
**************************************
* Declarations in this section should be considered unstable.
* Use at your own peril, etc., etc.
* They may be removed in the future.
* Their signatures may change.
**************************************/
/*-****************************************************************************
* Experimental section
*
* Symbols declared in this section must be considered unstable. Their
* signatures or semantics may change, or they may be removed altogether in the
* future. They are therefore only safe to depend on when the caller is
* statically linked against the library.
*
* To protect against unsafe usage, not only are the declarations guarded,
* the definitions are hidden by default
* when building LZ4 as a shared/dynamic library.
*
* In order to access these declarations,
* define LZ4_STATIC_LINKING_ONLY in your application
* before including LZ4's headers.
*
* In order to make their implementations accessible dynamically, you must
* define LZ4_PUBLISH_STATIC_FUNCTIONS when building the LZ4 library.
******************************************************************************/
#ifdef LZ4_PUBLISH_STATIC_FUNCTIONS
#define LZ4LIB_STATIC_API LZ4LIB_API
#else
#define LZ4LIB_STATIC_API
#endif
#ifdef LZ4_STATIC_LINKING_ONLY
/*! LZ4_resetStream_fast() :
* Use this, like LZ4_resetStream(), to prepare a context for a new chain of
* calls to a streaming API (e.g., LZ4_compress_fast_continue()).
*
* Note:
* Using this in advance of a non- streaming-compression function is redundant,
* and potentially bad for performance, since they all perform their own custom
* reset internally.
*
* Differences from LZ4_resetStream():
* When an LZ4_stream_t is known to be in a internally coherent state,
* it can often be prepared for a new compression with almost no work, only
* sometimes falling back to the full, expensive reset that is always required
* when the stream is in an indeterminate state (i.e., the reset performed by
* LZ4_resetStream()).
*
* LZ4_streams are guaranteed to be in a valid state when:
* - returned from LZ4_createStream()
* - reset by LZ4_resetStream()
* - memset(stream, 0, sizeof(LZ4_stream_t)), though this is discouraged
* - the stream was in a valid state and was reset by LZ4_resetStream_fast()
* - the stream was in a valid state and was then used in any compression call
* that returned success
* - the stream was in an indeterminate state and was used in a compression
* call that fully reset the state (e.g., LZ4_compress_fast_extState()) and
* that returned success
*
* When a stream isn't known to be in a valid state, it is not safe to pass to
* any fastReset or streaming function. It must first be cleansed by the full
* LZ4_resetStream().
*/
LZ4LIB_API void LZ4_resetStream_fast (LZ4_stream_t* streamPtr);
/*! LZ4_compress_fast_extState_fastReset() :
* A variant of LZ4_compress_fast_extState().
*
* Using this variant avoids an expensive initialization step. It is only safe
* to call if the state buffer is known to be correctly initialized already
* (see above comment on LZ4_resetStream_fast() for a definition of "correctly
* initialized"). From a high level, the difference is that this function
* initializes the provided state with a call to something like
* LZ4_resetStream_fast() while LZ4_compress_fast_extState() starts with a
* call to LZ4_resetStream().
* Using this variant avoids an expensive initialization step.
* It is only safe to call if the state buffer is known to be correctly initialized already
* (see above comment on LZ4_resetStream_fast() for a definition of "correctly initialized").
* From a high level, the difference is that
* this function initializes the provided state with a call to something like LZ4_resetStream_fast()
* while LZ4_compress_fast_extState() starts with a call to LZ4_resetStream().
*/
LZ4LIB_API int LZ4_compress_fast_extState_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
LZ4LIB_STATIC_API int LZ4_compress_fast_extState_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
/*! LZ4_attach_dictionary() :
* This is an experimental API that allows for the efficient use of a
* static dictionary many times.
* This is an experimental API that allows
* efficient use of a static dictionary many times.
*
* Rather than re-loading the dictionary buffer into a working context before
* each compression, or copying a pre-loaded dictionary's LZ4_stream_t into a
@@ -463,8 +448,8 @@ LZ4LIB_API int LZ4_compress_fast_extState_fastReset (void* state, const char* sr
* Currently, only streams which have been prepared by LZ4_loadDict() should
* be expected to work.
*
* Alternatively, the provided dictionary stream pointer may be NULL, in which
* case any existing dictionary stream is unset.
* Alternatively, the provided dictionaryStream may be NULL,
* in which case any existing dictionary stream is unset.
*
* If a dictionary is provided, it replaces any pre-existing stream history.
* The dictionary contents are the only history that can be referenced and
@@ -476,17 +461,18 @@ LZ4LIB_API int LZ4_compress_fast_extState_fastReset (void* state, const char* sr
* stream (and source buffer) must remain in-place / accessible / unchanged
* through the completion of the first compression call on the stream.
*/
LZ4LIB_API void LZ4_attach_dictionary(LZ4_stream_t *working_stream, const LZ4_stream_t *dictionary_stream);
LZ4LIB_STATIC_API void LZ4_attach_dictionary(LZ4_stream_t* workingStream, const LZ4_stream_t* dictionaryStream);
#endif
/*-************************************
* Private definitions
**************************************
* Do not use these definitions.
* They are exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
* Using these definitions will expose code to API and/or ABI break in future versions of the library.
**************************************/
/*-************************************************************
* PRIVATE DEFINITIONS
**************************************************************
* Do not use these definitions directly.
* They are only exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
* Accessing members will expose code to API and/or ABI break in future versions of the library.
**************************************************************/
#define LZ4_HASHLOG (LZ4_MEMORY_USAGE-2)
#define LZ4_HASHTABLESIZE (1 << LZ4_MEMORY_USAGE)
#define LZ4_HASH_SIZE_U32 (1 << LZ4_HASHLOG) /* required as macro for static allocation */
@@ -498,7 +484,7 @@ typedef struct LZ4_stream_t_internal LZ4_stream_t_internal;
struct LZ4_stream_t_internal {
uint32_t hashTable[LZ4_HASH_SIZE_U32];
uint32_t currentOffset;
uint16_t initCheck;
uint16_t dirty;
uint16_t tableType;
const uint8_t* dictionary;
const LZ4_stream_t_internal* dictCtx;
@@ -518,7 +504,7 @@ typedef struct LZ4_stream_t_internal LZ4_stream_t_internal;
struct LZ4_stream_t_internal {
unsigned int hashTable[LZ4_HASH_SIZE_U32];
unsigned int currentOffset;
unsigned short initCheck;
unsigned short dirty;
unsigned short tableType;
const unsigned char* dictionary;
const LZ4_stream_t_internal* dictCtx;
@@ -527,38 +513,54 @@ struct LZ4_stream_t_internal {
typedef struct {
const unsigned char* externalDict;
size_t extDictSize;
const unsigned char* prefixEnd;
size_t extDictSize;
size_t prefixSize;
} LZ4_streamDecode_t_internal;
#endif
/*!
* LZ4_stream_t :
* information structure to track an LZ4 stream.
* init this structure before first use.
* note : only use in association with static linking !
* this definition is not API/ABI safe,
* it may change in a future version !
/*! LZ4_stream_t :
* information structure to track an LZ4 stream.
* LZ4_stream_t can also be created using LZ4_createStream(), which is recommended.
* The structure definition can be convenient for static allocation
* (on stack, or as part of larger structure).
* Init this structure with LZ4_initStream() before first use.
* note : only use this definition in association with static linking !
* this definition is not API/ABI safe, and may change in a future version.
*/
#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE-3)) + 4)
#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE-3)) + 4 + ((sizeof(void*)==16) ? 4 : 0) /*AS-400*/ )
#define LZ4_STREAMSIZE (LZ4_STREAMSIZE_U64 * sizeof(unsigned long long))
union LZ4_stream_u {
unsigned long long table[LZ4_STREAMSIZE_U64];
LZ4_stream_t_internal internal_donotuse;
} ; /* previously typedef'd to LZ4_stream_t */
/*!
* LZ4_streamDecode_t :
* information structure to track an LZ4 stream during decompression.
* init this structure using LZ4_setStreamDecode (or memset()) before first use
* note : only use in association with static linking !
* this definition is not API/ABI safe,
* and may change in a future version !
/*! LZ4_initStream() : v1.9.0+
* An LZ4_stream_t structure must be initialized at least once.
* This is automatically done when invoking LZ4_createStream(),
* but it's not when the structure is simply declared on stack (for example).
*
* Use LZ4_initStream() to properly initialize a newly declared LZ4_stream_t.
* It can also initialize any arbitrary buffer of sufficient size,
* and will @return a pointer of proper type upon initialization.
*
* Note : initialization fails if size and alignment conditions are not respected.
* In which case, the function will @return NULL.
* Note2: An LZ4_stream_t structure guarantees correct alignment and size.
* Note3: Before v1.9.0, use LZ4_resetStream() instead
*/
#define LZ4_STREAMDECODESIZE_U64 4
LZ4LIB_API LZ4_stream_t* LZ4_initStream (void* buffer, size_t size);
/*! LZ4_streamDecode_t :
* information structure to track an LZ4 stream during decompression.
* init this structure using LZ4_setStreamDecode() before first use.
* note : only use in association with static linking !
* this definition is not API/ABI safe,
* and may change in a future version !
*/
#define LZ4_STREAMDECODESIZE_U64 (4 + ((sizeof(void*)==16) ? 2 : 0) /*AS-400*/ )
#define LZ4_STREAMDECODESIZE (LZ4_STREAMDECODESIZE_U64 * sizeof(unsigned long long))
union LZ4_streamDecode_u {
unsigned long long table[LZ4_STREAMDECODESIZE_U64];
@@ -571,11 +573,16 @@ union LZ4_streamDecode_u {
**************************************/
/*! Deprecation warnings
Should deprecation warnings be a problem,
it is generally possible to disable them,
typically with -Wno-deprecated-declarations for gcc
or _CRT_SECURE_NO_WARNINGS in Visual.
Otherwise, it's also possible to define LZ4_DISABLE_DEPRECATE_WARNINGS */
*
* Deprecated functions make the compiler generate a warning when invoked.
* This is meant to invite users to update their source code.
* Should deprecation warnings be a problem, it is generally possible to disable them,
* typically with -Wno-deprecated-declarations for gcc
* or _CRT_SECURE_NO_WARNINGS in Visual.
*
* Another method is to define LZ4_DISABLE_DEPRECATE_WARNINGS
* before including the header file.
*/
#ifdef LZ4_DISABLE_DEPRECATE_WARNINGS
# define LZ4_DEPRECATED(message) /* disable deprecation warnings */
#else
@@ -595,8 +602,8 @@ union LZ4_streamDecode_u {
#endif /* LZ4_DISABLE_DEPRECATE_WARNINGS */
/* Obsolete compression functions */
LZ4_DEPRECATED("use LZ4_compress_default() instead") LZ4LIB_API int LZ4_compress (const char* source, char* dest, int sourceSize);
LZ4_DEPRECATED("use LZ4_compress_default() instead") LZ4LIB_API int LZ4_compress_limitedOutput (const char* source, char* dest, int sourceSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_default() instead") LZ4LIB_API int LZ4_compress (const char* source, char* dest, int sourceSize);
LZ4_DEPRECATED("use LZ4_compress_default() instead") LZ4LIB_API int LZ4_compress_limitedOutput (const char* source, char* dest, int sourceSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_extState() instead") LZ4LIB_API int LZ4_compress_withState (void* state, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_extState() instead") LZ4LIB_API int LZ4_compress_limitedOutput_withState (void* state, const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_fast_continue() instead") LZ4LIB_API int LZ4_compress_continue (LZ4_stream_t* LZ4_streamPtr, const char* source, char* dest, int inputSize);
@@ -617,13 +624,56 @@ LZ4_DEPRECATED("use LZ4_decompress_safe() instead") LZ4LIB_API int LZ4_uncompres
*/
LZ4_DEPRECATED("Use LZ4_createStream() instead") LZ4LIB_API void* LZ4_create (char* inputBuffer);
LZ4_DEPRECATED("Use LZ4_createStream() instead") LZ4LIB_API int LZ4_sizeofStreamState(void);
LZ4_DEPRECATED("Use LZ4_resetStream() instead") LZ4LIB_API int LZ4_resetStreamState(void* state, char* inputBuffer);
LZ4_DEPRECATED("Use LZ4_saveDict() instead") LZ4LIB_API char* LZ4_slideInputBuffer (void* state);
LZ4_DEPRECATED("Use LZ4_resetStream() instead") LZ4LIB_API int LZ4_resetStreamState(void* state, char* inputBuffer);
LZ4_DEPRECATED("Use LZ4_saveDict() instead") LZ4LIB_API char* LZ4_slideInputBuffer (void* state);
/* Obsolete streaming decoding functions */
LZ4_DEPRECATED("use LZ4_decompress_safe_usingDict() instead") LZ4LIB_API int LZ4_decompress_safe_withPrefix64k (const char* src, char* dst, int compressedSize, int maxDstSize);
LZ4_DEPRECATED("use LZ4_decompress_fast_usingDict() instead") LZ4LIB_API int LZ4_decompress_fast_withPrefix64k (const char* src, char* dst, int originalSize);
/*! LZ4_decompress_fast() : **unsafe!**
* These functions used to be faster than LZ4_decompress_safe(),
* but it has changed, and they are now slower than LZ4_decompress_safe().
* This is because LZ4_decompress_fast() doesn't know the input size,
* and therefore must progress more cautiously in the input buffer to not read beyond the end of block.
* On top of that `LZ4_decompress_fast()` is not protected vs malformed or malicious inputs, making it a security liability.
* As a consequence, LZ4_decompress_fast() is strongly discouraged, and deprecated.
*
* The last remaining LZ4_decompress_fast() specificity is that
* it can decompress a block without knowing its compressed size.
* Such functionality could be achieved in a more secure manner,
* by also providing the maximum size of input buffer,
* but it would require new prototypes, and adaptation of the implementation to this new use case.
*
* Parameters:
* originalSize : is the uncompressed size to regenerate.
* `dst` must be already allocated, its size must be >= 'originalSize' bytes.
* @return : number of bytes read from source buffer (== compressed size).
* The function expects to finish at block's end exactly.
* If the source stream is detected malformed, the function stops decoding and returns a negative result.
* note : LZ4_decompress_fast*() requires originalSize. Thanks to this information, it never writes past the output buffer.
* However, since it doesn't know its 'src' size, it may read an unknown amount of input, past input buffer bounds.
* Also, since match offsets are not validated, match reads from 'src' may underflow too.
* These issues never happen if input (compressed) data is correct.
* But they may happen if input data is invalid (error or intentional tampering).
* As a consequence, use these functions in trusted environments with trusted data **only**.
*/
LZ4_DEPRECATED("This function is deprecated and unsafe. Consider using LZ4_decompress_safe() instead")
LZ4LIB_API int LZ4_decompress_fast (const char* src, char* dst, int originalSize);
LZ4_DEPRECATED("This function is deprecated and unsafe. Consider using LZ4_decompress_safe_continue() instead")
LZ4LIB_API int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int originalSize);
LZ4_DEPRECATED("This function is deprecated and unsafe. Consider using LZ4_decompress_safe_usingDict() instead")
LZ4LIB_API int LZ4_decompress_fast_usingDict (const char* src, char* dst, int originalSize, const char* dictStart, int dictSize);
/*! LZ4_resetStream() :
* An LZ4_stream_t structure must be initialized at least once.
* This is done with LZ4_initStream(), or LZ4_resetStream().
* Consider switching to LZ4_initStream(),
* invoking LZ4_resetStream() will trigger deprecation warnings in the future.
*/
LZ4LIB_API void LZ4_resetStream (LZ4_stream_t* streamPtr);
}
#endif /* LZ4_H_2983827168210 */

View File

@@ -61,8 +61,18 @@
# pragma clang diagnostic ignored "-Wunused-function"
#endif
namespace tracy
{
/*=== Enums ===*/
typedef enum { noDictCtx, usingDictCtxHc } dictCtx_directive;
}
#define LZ4_COMMONDEFS_ONLY
#ifndef LZ4_SRC_INCLUDED
#include "tracy_lz4.cpp" /* LZ4_count, constants, mem */
#endif
namespace tracy
{
@@ -78,12 +88,11 @@ namespace tracy
#define HASH_FUNCTION(i) (((i) * 2654435761U) >> ((MINMATCH*8)-LZ4HC_HASH_LOG))
#define DELTANEXTMAXD(p) chainTable[(p) & LZ4HC_MAXD_MASK] /* flexible, LZ4HC_MAXD dependent */
#define DELTANEXTU16(table, pos) table[(U16)(pos)] /* faster */
/* Make fields passed to, and updated by LZ4HC_encodeSequence explicit */
#define UPDATABLE(ip, op, anchor) &ip, &op, &anchor
static U32 LZ4HC_hashPtr(const void* ptr) { return HASH_FUNCTION(LZ4_read32(ptr)); }
/*=== Enums ===*/
typedef enum { noDictCtx, usingDictCtx } dictCtx_directive;
/**************************************
* HC Compression
@@ -94,9 +103,9 @@ static void LZ4HC_clearTables (LZ4HC_CCtx_internal* hc4)
MEM_INIT(hc4->chainTable, 0xFF, sizeof(hc4->chainTable));
}
static void LZ4HC_init (LZ4HC_CCtx_internal* hc4, const BYTE* start)
static void LZ4HC_init_internal (LZ4HC_CCtx_internal* hc4, const BYTE* start)
{
uptrval startingOffset = hc4->end - hc4->base;
uptrval startingOffset = (uptrval)(hc4->end - hc4->base);
if (startingOffset > 1 GB) {
LZ4HC_clearTables(hc4);
startingOffset = 0;
@@ -123,7 +132,7 @@ LZ4_FORCE_INLINE void LZ4HC_Insert (LZ4HC_CCtx_internal* hc4, const BYTE* ip)
while (idx < target) {
U32 const h = LZ4HC_hashPtr(base+idx);
size_t delta = idx - hashTable[h];
if (delta>MAX_DISTANCE) delta = MAX_DISTANCE;
if (delta>LZ4_DISTANCE_MAX) delta = LZ4_DISTANCE_MAX;
DELTANEXTU16(chainTable, idx) = (U16)delta;
hashTable[h] = idx;
idx++;
@@ -226,14 +235,13 @@ LZ4HC_InsertAndGetWiderMatch (
const U32 dictLimit = hc4->dictLimit;
const BYTE* const lowPrefixPtr = base + dictLimit;
const U32 ipIndex = (U32)(ip - base);
const U32 lowestMatchIndex = (hc4->lowLimit + 64 KB > ipIndex) ? hc4->lowLimit : ipIndex - MAX_DISTANCE;
const U32 lowestMatchIndex = (hc4->lowLimit + 64 KB > ipIndex) ? hc4->lowLimit : ipIndex - LZ4_DISTANCE_MAX;
const BYTE* const dictBase = hc4->dictBase;
int const lookBackLength = (int)(ip-iLowLimit);
int nbAttempts = maxNbAttempts;
int matchChainPos = 0;
U32 matchChainPos = 0;
U32 const pattern = LZ4_read32(ip);
U32 matchIndex;
U32 dictMatchIndex;
repeat_state_e repeat = rep_untested;
size_t srcPatternLength = 0;
@@ -258,7 +266,7 @@ LZ4HC_InsertAndGetWiderMatch (
if (LZ4_read16(iLowLimit + longest - 1) == LZ4_read16(matchPtr - lookBackLength + longest - 1)) {
if (LZ4_read32(matchPtr) == pattern) {
int const back = lookBackLength ? LZ4HC_countBack(ip, matchPtr, iLowLimit, lowPrefixPtr) : 0;
matchLength = MINMATCH + LZ4_count(ip+MINMATCH, matchPtr+MINMATCH, iHighLimit);
matchLength = MINMATCH + (int)LZ4_count(ip+MINMATCH, matchPtr+MINMATCH, iHighLimit);
matchLength -= back;
if (matchLength > longest) {
longest = matchLength;
@@ -272,7 +280,7 @@ LZ4HC_InsertAndGetWiderMatch (
int back = 0;
const BYTE* vLimit = ip + (dictLimit - matchIndex);
if (vLimit > iHighLimit) vLimit = iHighLimit;
matchLength = LZ4_count(ip+MINMATCH, matchPtr+MINMATCH, vLimit) + MINMATCH;
matchLength = (int)LZ4_count(ip+MINMATCH, matchPtr+MINMATCH, vLimit) + MINMATCH;
if ((ip+matchLength == vLimit) && (vLimit < iHighLimit))
matchLength += LZ4_count(ip+matchLength, lowPrefixPtr, iHighLimit);
back = lookBackLength ? LZ4HC_countBack(ip, matchPtr, iLowLimit, dictStart) : 0;
@@ -285,14 +293,14 @@ LZ4HC_InsertAndGetWiderMatch (
if (chainSwap && matchLength==longest) { /* better match => select a better chain */
assert(lookBackLength==0); /* search forward only */
if (matchIndex + longest <= ipIndex) {
if (matchIndex + (U32)longest <= ipIndex) {
U32 distanceToNextMatch = 1;
int pos;
for (pos = 0; pos <= longest - MINMATCH; pos++) {
U32 const candidateDist = DELTANEXTU16(chainTable, matchIndex + pos);
U32 const candidateDist = DELTANEXTU16(chainTable, matchIndex + (U32)pos);
if (candidateDist > distanceToNextMatch) {
distanceToNextMatch = candidateDist;
matchChainPos = pos;
matchChainPos = (U32)pos;
} }
if (distanceToNextMatch > 1) {
if (distanceToNextMatch > matchIndex) break; /* avoid overflow */
@@ -317,7 +325,7 @@ LZ4HC_InsertAndGetWiderMatch (
const BYTE* const matchPtr = base + matchCandidateIdx;
if (LZ4_read32(matchPtr) == pattern) { /* good candidate */
size_t const forwardPatternLength = LZ4HC_countPattern(matchPtr+sizeof(pattern), iHighLimit, pattern) + sizeof(pattern);
const BYTE* const lowestMatchPtr = (lowPrefixPtr + MAX_DISTANCE >= ip) ? lowPrefixPtr : ip - MAX_DISTANCE;
const BYTE* const lowestMatchPtr = (lowPrefixPtr + LZ4_DISTANCE_MAX >= ip) ? lowPrefixPtr : ip - LZ4_DISTANCE_MAX;
size_t const backLength = LZ4HC_reverseCountPattern(matchPtr, lowestMatchPtr, pattern);
size_t const currentSegmentLength = backLength + forwardPatternLength;
@@ -330,7 +338,7 @@ LZ4HC_InsertAndGetWiderMatch (
size_t const maxML = MIN(currentSegmentLength, srcPatternLength);
if ((size_t)longest < maxML) {
assert(base + matchIndex < ip);
if (ip - (base+matchIndex) > MAX_DISTANCE) break;
if (ip - (base+matchIndex) > LZ4_DISTANCE_MAX) break;
assert(maxML < 2 GB);
longest = (int)maxML;
*matchpos = base + matchIndex; /* virtual pos, relative to ip, to retrieve offset */
@@ -345,16 +353,18 @@ LZ4HC_InsertAndGetWiderMatch (
} } /* PA optimization */
/* follow current chain */
matchIndex -= DELTANEXTU16(chainTable, matchIndex+matchChainPos);
matchIndex -= DELTANEXTU16(chainTable, matchIndex + matchChainPos);
} /* while ((matchIndex>=lowestMatchIndex) && (nbAttempts)) */
if (dict == usingDictCtx && nbAttempts && ipIndex - lowestMatchIndex < MAX_DISTANCE) {
size_t const dictEndOffset = dictCtx->end - dictCtx->base;
if ( dict == usingDictCtxHc
&& nbAttempts
&& ipIndex - lowestMatchIndex < LZ4_DISTANCE_MAX) {
size_t const dictEndOffset = (size_t)(dictCtx->end - dictCtx->base);
U32 dictMatchIndex = dictCtx->hashTable[LZ4HC_hashPtr(ip)];
assert(dictEndOffset <= 1 GB);
dictMatchIndex = dictCtx->hashTable[LZ4HC_hashPtr(ip)];
matchIndex = dictMatchIndex + lowestMatchIndex - (U32)dictEndOffset;
while (ipIndex - matchIndex <= MAX_DISTANCE && nbAttempts--) {
while (ipIndex - matchIndex <= LZ4_DISTANCE_MAX && nbAttempts--) {
const BYTE* const matchPtr = dictCtx->base + dictMatchIndex;
if (LZ4_read32(matchPtr) == pattern) {
@@ -362,22 +372,19 @@ LZ4HC_InsertAndGetWiderMatch (
int back = 0;
const BYTE* vLimit = ip + (dictEndOffset - dictMatchIndex);
if (vLimit > iHighLimit) vLimit = iHighLimit;
mlt = LZ4_count(ip+MINMATCH, matchPtr+MINMATCH, vLimit) + MINMATCH;
mlt = (int)LZ4_count(ip+MINMATCH, matchPtr+MINMATCH, vLimit) + MINMATCH;
back = lookBackLength ? LZ4HC_countBack(ip, matchPtr, iLowLimit, dictCtx->base + dictCtx->dictLimit) : 0;
mlt -= back;
if (mlt > longest) {
longest = mlt;
*matchpos = base + matchIndex + back;
*startpos = ip + back;
}
}
} }
{ U32 const nextOffset = DELTANEXTU16(dictCtx->chainTable, dictMatchIndex);
dictMatchIndex -= nextOffset;
matchIndex -= nextOffset;
}
}
}
} } }
return longest;
}
@@ -397,14 +404,6 @@ int LZ4HC_InsertAndFindBestMatch(LZ4HC_CCtx_internal* const hc4, /* Index tabl
return LZ4HC_InsertAndGetWiderMatch(hc4, ip, ip, iLimit, MINMATCH-1, matchpos, &uselessPtr, maxNbAttempts, patternAnalysis, 0 /*chainSwap*/, dict, favorCompressionRatio);
}
typedef enum {
noLimit = 0,
limitedOutput = 1,
limitedDestSize = 2,
} limitedOutput_directive;
/* LZ4HC_encodeSequence() :
* @return : 0 if ok,
* 1 if buffer issue detected */
@@ -439,7 +438,7 @@ LZ4_FORCE_INLINE int LZ4HC_encodeSequence (
/* Encode Literal length */
length = (size_t)(*ip - *anchor);
if ((limit) && ((*op + (length >> 8) + length + (2 + 1 + LASTLITERALS)) > oend)) return 1; /* Check output limit */
if ((limit) && ((*op + (length / 255) + length + (2 + 1 + LASTLITERALS)) > oend)) return 1; /* Check output limit */
if (length >= RUN_MASK) {
size_t len = length - RUN_MASK;
*token = (RUN_MASK << ML_BITS);
@@ -450,17 +449,17 @@ LZ4_FORCE_INLINE int LZ4HC_encodeSequence (
}
/* Copy Literals */
LZ4_wildCopy(*op, *anchor, (*op) + length);
LZ4_wildCopy8(*op, *anchor, (*op) + length);
*op += length;
/* Encode Offset */
assert( (*ip - match) <= MAX_DISTANCE ); /* note : consider providing offset as a value, rather than as a pointer difference */
assert( (*ip - match) <= LZ4_DISTANCE_MAX ); /* note : consider providing offset as a value, rather than as a pointer difference */
LZ4_writeLE16(*op, (U16)(*ip-match)); *op += 2;
/* Encode MatchLength */
assert(matchLength >= MINMATCH);
length = (size_t)(matchLength - MINMATCH);
if ((limit) && (*op + (length >> 8) + (1 + LASTLITERALS) > oend)) return 1; /* Check output limit */
length = (size_t)matchLength - MINMATCH;
if ((limit) && (*op + (length / 255) + (1 + LASTLITERALS) > oend)) return 1; /* Check output limit */
if (length >= ML_MASK) {
*token += ML_MASK;
length -= ML_MASK;
@@ -513,12 +512,12 @@ LZ4_FORCE_INLINE int LZ4HC_compress_hashChain (
/* init */
*srcSizePtr = 0;
if (limit == limitedDestSize) oend -= LASTLITERALS; /* Hack for support LZ4 format restriction */
if (limit == fillOutput) oend -= LASTLITERALS; /* Hack for support LZ4 format restriction */
if (inputSize < LZ4_minLength) goto _last_literals; /* Input too small, no compression (all literals) */
/* Main Loop */
while (ip <= mflimit) {
ml = LZ4HC_InsertAndFindBestMatch (ctx, ip, matchlimit, &ref, maxNbAttempts, patternAnalysis, dict);
ml = LZ4HC_InsertAndFindBestMatch(ctx, ip, matchlimit, &ref, maxNbAttempts, patternAnalysis, dict);
if (ml<MINMATCH) { ip++; continue; }
/* saved, in case we would skip too much */
@@ -535,7 +534,7 @@ _Search2:
if (ml2 == ml) { /* No better match => encode ML1 */
optr = op;
if (LZ4HC_encodeSequence(&ip, &op, &anchor, ml, ref, limit, oend)) goto _dest_overflow;
if (LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), ml, ref, limit, oend)) goto _dest_overflow;
continue;
}
@@ -583,10 +582,10 @@ _Search3:
if (start2 < ip+ml) ml = (int)(start2 - ip);
/* Now, encode 2 sequences */
optr = op;
if (LZ4HC_encodeSequence(&ip, &op, &anchor, ml, ref, limit, oend)) goto _dest_overflow;
if (LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), ml, ref, limit, oend)) goto _dest_overflow;
ip = start2;
optr = op;
if (LZ4HC_encodeSequence(&ip, &op, &anchor, ml2, ref2, limit, oend)) goto _dest_overflow;
if (LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), ml2, ref2, limit, oend)) goto _dest_overflow;
continue;
}
@@ -605,7 +604,7 @@ _Search3:
}
optr = op;
if (LZ4HC_encodeSequence(&ip, &op, &anchor, ml, ref, limit, oend)) goto _dest_overflow;
if (LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), ml, ref, limit, oend)) goto _dest_overflow;
ip = start3;
ref = ref3;
ml = ml3;
@@ -643,7 +642,7 @@ _Search3:
}
}
optr = op;
if (LZ4HC_encodeSequence(&ip, &op, &anchor, ml, ref, limit, oend)) goto _dest_overflow;
if (LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), ml, ref, limit, oend)) goto _dest_overflow;
/* ML2 becomes ML1 */
ip = start2; ref = ref2; ml = ml2;
@@ -660,7 +659,7 @@ _last_literals:
{ size_t lastRunSize = (size_t)(iend - anchor); /* literals */
size_t litLength = (lastRunSize + 255 - RUN_MASK) / 255;
size_t const totalSize = 1 + litLength + lastRunSize;
if (limit == limitedDestSize) oend += LASTLITERALS; /* restore correct value */
if (limit == fillOutput) oend += LASTLITERALS; /* restore correct value */
if (limit && (op + totalSize > oend)) {
if (limit == limitedOutput) return 0; /* Check output limit */
/* adapt lastRunSize to fill 'dest' */
@@ -687,7 +686,7 @@ _last_literals:
return (int) (((char*)op)-dest);
_dest_overflow:
if (limit == limitedDestSize) {
if (limit == fillOutput) {
op = optr; /* restore correct out pointer */
goto _last_literals;
}
@@ -737,56 +736,64 @@ LZ4_FORCE_INLINE int LZ4HC_compress_generic_internal (
{ lz4opt,16384,LZ4_OPT_NUM }, /* 12==LZ4HC_CLEVEL_MAX */
};
DEBUGLOG(4, "LZ4HC_compress_generic(%p, %p, %d)", ctx, src, *srcSizePtr);
DEBUGLOG(4, "LZ4HC_compress_generic(ctx=%p, src=%p, srcSize=%d)", ctx, src, *srcSizePtr);
if (limit == limitedDestSize && dstCapacity < 1) return 0; /* Impossible to store anything */
if ((U32)*srcSizePtr > (U32)LZ4_MAX_INPUT_SIZE) return 0; /* Unsupported input size (too large or negative) */
if (limit == fillOutput && dstCapacity < 1) return 0; /* Impossible to store anything */
if ((U32)*srcSizePtr > (U32)LZ4_MAX_INPUT_SIZE) return 0; /* Unsupported input size (too large or negative) */
ctx->end += *srcSizePtr;
if (cLevel < 1) cLevel = LZ4HC_CLEVEL_DEFAULT; /* note : convention is different from lz4frame, maybe something to review */
cLevel = MIN(LZ4HC_CLEVEL_MAX, cLevel);
{ cParams_t const cParam = clTable[cLevel];
HCfavor_e const favor = ctx->favorDecSpeed ? favorDecompressionSpeed : favorCompressionRatio;
if (cParam.strat == lz4hc)
return LZ4HC_compress_hashChain(ctx,
int result;
if (cParam.strat == lz4hc) {
result = LZ4HC_compress_hashChain(ctx,
src, dst, srcSizePtr, dstCapacity,
cParam.nbSearches, limit, dict);
assert(cParam.strat == lz4opt);
return LZ4HC_compress_optimal(ctx,
src, dst, srcSizePtr, dstCapacity,
cParam.nbSearches, cParam.targetLength, limit,
cLevel == LZ4HC_CLEVEL_MAX, /* ultra mode */
dict, favor);
} else {
assert(cParam.strat == lz4opt);
result = LZ4HC_compress_optimal(ctx,
src, dst, srcSizePtr, dstCapacity,
(int)cParam.nbSearches, cParam.targetLength, limit,
cLevel == LZ4HC_CLEVEL_MAX, /* ultra mode */
dict, favor);
}
if (result <= 0) ctx->dirty = 1;
return result;
}
}
static void LZ4HC_setExternalDict(LZ4HC_CCtx_internal* ctxPtr, const BYTE* newBlock);
static int LZ4HC_compress_generic_noDictCtx (
LZ4HC_CCtx_internal* const ctx,
const char* const src,
char* const dst,
int* const srcSizePtr,
int const dstCapacity,
int cLevel,
limitedOutput_directive limit
)
static int
LZ4HC_compress_generic_noDictCtx (
LZ4HC_CCtx_internal* const ctx,
const char* const src,
char* const dst,
int* const srcSizePtr,
int const dstCapacity,
int cLevel,
limitedOutput_directive limit
)
{
assert(ctx->dictCtx == NULL);
return LZ4HC_compress_generic_internal(ctx, src, dst, srcSizePtr, dstCapacity, cLevel, limit, noDictCtx);
}
static int LZ4HC_compress_generic_dictCtx (
LZ4HC_CCtx_internal* const ctx,
const char* const src,
char* const dst,
int* const srcSizePtr,
int const dstCapacity,
int cLevel,
limitedOutput_directive limit
)
static int
LZ4HC_compress_generic_dictCtx (
LZ4HC_CCtx_internal* const ctx,
const char* const src,
char* const dst,
int* const srcSizePtr,
int const dstCapacity,
int cLevel,
limitedOutput_directive limit
)
{
const size_t position = ctx->end - ctx->base - ctx->lowLimit;
const size_t position = (size_t)(ctx->end - ctx->base) - ctx->lowLimit;
assert(ctx->dictCtx != NULL);
if (position >= 64 KB) {
ctx->dictCtx = NULL;
@@ -797,19 +804,20 @@ static int LZ4HC_compress_generic_dictCtx (
ctx->compressionLevel = (short)cLevel;
return LZ4HC_compress_generic_noDictCtx(ctx, src, dst, srcSizePtr, dstCapacity, cLevel, limit);
} else {
return LZ4HC_compress_generic_internal(ctx, src, dst, srcSizePtr, dstCapacity, cLevel, limit, usingDictCtx);
return LZ4HC_compress_generic_internal(ctx, src, dst, srcSizePtr, dstCapacity, cLevel, limit, usingDictCtxHc);
}
}
static int LZ4HC_compress_generic (
LZ4HC_CCtx_internal* const ctx,
const char* const src,
char* const dst,
int* const srcSizePtr,
int const dstCapacity,
int cLevel,
limitedOutput_directive limit
)
static int
LZ4HC_compress_generic (
LZ4HC_CCtx_internal* const ctx,
const char* const src,
char* const dst,
int* const srcSizePtr,
int const dstCapacity,
int cLevel,
limitedOutput_directive limit
)
{
if (ctx->dictCtx == NULL) {
return LZ4HC_compress_generic_noDictCtx(ctx, src, dst, srcSizePtr, dstCapacity, cLevel, limit);
@@ -819,24 +827,41 @@ static int LZ4HC_compress_generic (
}
int LZ4_sizeofStateHC(void) { return sizeof(LZ4_streamHC_t); }
int LZ4_sizeofStateHC(void) { return (int)sizeof(LZ4_streamHC_t); }
#ifndef _MSC_VER /* for some reason, Visual fails the aligment test on 32-bit x86 :
* it reports an aligment of 8-bytes,
* while actually aligning LZ4_streamHC_t on 4 bytes. */
static size_t LZ4_streamHC_t_alignment(void)
{
struct { char c; LZ4_streamHC_t t; } t_a;
return sizeof(t_a) - sizeof(t_a.t);
}
#endif
/* state is presumed correctly initialized,
* in which case its size and alignment have already been validate */
int LZ4_compress_HC_extStateHC_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int compressionLevel)
{
LZ4HC_CCtx_internal* const ctx = &((LZ4_streamHC_t*)state)->internal_donotuse;
#ifndef _MSC_VER /* for some reason, Visual fails the aligment test on 32-bit x86 :
* it reports an aligment of 8-bytes,
* while actually aligning LZ4_streamHC_t on 4 bytes. */
assert(((size_t)state & (LZ4_streamHC_t_alignment() - 1)) == 0); /* check alignment */
#endif
if (((size_t)(state)&(sizeof(void*)-1)) != 0) return 0; /* Error : state is not aligned for pointers (32 or 64 bits) */
LZ4_resetStreamHC_fast((LZ4_streamHC_t*)state, compressionLevel);
LZ4HC_init (ctx, (const BYTE*)src);
LZ4HC_init_internal (ctx, (const BYTE*)src);
if (dstCapacity < LZ4_compressBound(srcSize))
return LZ4HC_compress_generic (ctx, src, dst, &srcSize, dstCapacity, compressionLevel, limitedOutput);
else
return LZ4HC_compress_generic (ctx, src, dst, &srcSize, dstCapacity, compressionLevel, noLimit);
return LZ4HC_compress_generic (ctx, src, dst, &srcSize, dstCapacity, compressionLevel, notLimited);
}
int LZ4_compress_HC_extStateHC (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int compressionLevel)
{
if (((size_t)(state)&(sizeof(void*)-1)) != 0) return 0; /* Error : state is not aligned for pointers (32 or 64 bits) */
LZ4_resetStreamHC ((LZ4_streamHC_t*)state, compressionLevel);
LZ4_streamHC_t* const ctx = LZ4_initStreamHC(state, sizeof(*ctx));
if (ctx==NULL) return 0; /* init failure */
return LZ4_compress_HC_extStateHC_fastReset(state, src, dst, srcSize, dstCapacity, compressionLevel);
}
@@ -850,19 +875,19 @@ int LZ4_compress_HC(const char* src, char* dst, int srcSize, int dstCapacity, in
#endif
int const cSize = LZ4_compress_HC_extStateHC(statePtr, src, dst, srcSize, dstCapacity, compressionLevel);
#if defined(LZ4HC_HEAPMODE) && LZ4HC_HEAPMODE==1
free(statePtr);
FREEMEM(statePtr);
#endif
return cSize;
}
/* LZ4_compress_HC_destSize() :
* only compatible with regular HC parser */
int LZ4_compress_HC_destSize(void* LZ4HC_Data, const char* source, char* dest, int* sourceSizePtr, int targetDestSize, int cLevel)
/* state is presumed sized correctly (>= sizeof(LZ4_streamHC_t)) */
int LZ4_compress_HC_destSize(void* state, const char* source, char* dest, int* sourceSizePtr, int targetDestSize, int cLevel)
{
LZ4HC_CCtx_internal* const ctx = &((LZ4_streamHC_t*)LZ4HC_Data)->internal_donotuse;
LZ4_resetStreamHC((LZ4_streamHC_t*)LZ4HC_Data, cLevel);
LZ4HC_init(ctx, (const BYTE*) source);
return LZ4HC_compress_generic(ctx, source, dest, sourceSizePtr, targetDestSize, cLevel, limitedDestSize);
LZ4_streamHC_t* const ctx = LZ4_initStreamHC(state, sizeof(*ctx));
if (ctx==NULL) return 0; /* init failure */
LZ4HC_init_internal(&ctx->internal_donotuse, (const BYTE*) source);
LZ4_setCompressionLevel(ctx, cLevel);
return LZ4HC_compress_generic(&ctx->internal_donotuse, source, dest, sourceSizePtr, targetDestSize, cLevel, fillOutput);
}
@@ -871,44 +896,70 @@ int LZ4_compress_HC_destSize(void* LZ4HC_Data, const char* source, char* dest, i
* Streaming Functions
**************************************/
/* allocation */
LZ4_streamHC_t* LZ4_createStreamHC(void) {
LZ4_streamHC_t* LZ4_createStreamHC(void)
{
LZ4_streamHC_t* const LZ4_streamHCPtr = (LZ4_streamHC_t*)ALLOC(sizeof(LZ4_streamHC_t));
if (LZ4_streamHCPtr==NULL) return NULL;
LZ4_resetStreamHC(LZ4_streamHCPtr, LZ4HC_CLEVEL_DEFAULT);
LZ4_initStreamHC(LZ4_streamHCPtr, sizeof(*LZ4_streamHCPtr)); /* full initialization, malloc'ed buffer can be full of garbage */
return LZ4_streamHCPtr;
}
int LZ4_freeStreamHC (LZ4_streamHC_t* LZ4_streamHCPtr) {
int LZ4_freeStreamHC (LZ4_streamHC_t* LZ4_streamHCPtr)
{
DEBUGLOG(4, "LZ4_freeStreamHC(%p)", LZ4_streamHCPtr);
if (!LZ4_streamHCPtr) return 0; /* support free on NULL */
free(LZ4_streamHCPtr);
FREEMEM(LZ4_streamHCPtr);
return 0;
}
/* initialization */
void LZ4_resetStreamHC (LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel)
LZ4_streamHC_t* LZ4_initStreamHC (void* buffer, size_t size)
{
LZ4_STATIC_ASSERT(sizeof(LZ4HC_CCtx_internal) <= sizeof(size_t) * LZ4_STREAMHCSIZE_SIZET); /* if compilation fails here, LZ4_STREAMHCSIZE must be increased */
DEBUGLOG(4, "LZ4_resetStreamHC(%p, %d)", LZ4_streamHCPtr, compressionLevel);
LZ4_streamHC_t* const LZ4_streamHCPtr = (LZ4_streamHC_t*)buffer;
if (buffer == NULL) return NULL;
if (size < sizeof(LZ4_streamHC_t)) return NULL;
#ifndef _MSC_VER /* for some reason, Visual fails the aligment test on 32-bit x86 :
* it reports an aligment of 8-bytes,
* while actually aligning LZ4_streamHC_t on 4 bytes. */
if (((size_t)buffer) & (LZ4_streamHC_t_alignment() - 1)) return NULL; /* alignment check */
#endif
/* if compilation fails here, LZ4_STREAMHCSIZE must be increased */
LZ4_STATIC_ASSERT(sizeof(LZ4HC_CCtx_internal) <= LZ4_STREAMHCSIZE);
DEBUGLOG(4, "LZ4_initStreamHC(%p, %u)", LZ4_streamHCPtr, (unsigned)size);
/* end-base will trigger a clearTable on starting compression */
LZ4_streamHCPtr->internal_donotuse.end = (const BYTE *)(ptrdiff_t)-1;
LZ4_streamHCPtr->internal_donotuse.base = NULL;
LZ4_streamHCPtr->internal_donotuse.dictCtx = NULL;
LZ4_streamHCPtr->internal_donotuse.favorDecSpeed = 0;
LZ4_streamHCPtr->internal_donotuse.dirty = 0;
LZ4_setCompressionLevel(LZ4_streamHCPtr, LZ4HC_CLEVEL_DEFAULT);
return LZ4_streamHCPtr;
}
/* just a stub */
void LZ4_resetStreamHC (LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel)
{
LZ4_initStreamHC(LZ4_streamHCPtr, sizeof(*LZ4_streamHCPtr));
LZ4_setCompressionLevel(LZ4_streamHCPtr, compressionLevel);
}
void LZ4_resetStreamHC_fast (LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel)
{
DEBUGLOG(4, "LZ4_resetStreamHC_fast(%p, %d)", LZ4_streamHCPtr, compressionLevel);
LZ4_streamHCPtr->internal_donotuse.end -= (uptrval)LZ4_streamHCPtr->internal_donotuse.base;
LZ4_streamHCPtr->internal_donotuse.base = NULL;
LZ4_streamHCPtr->internal_donotuse.dictCtx = NULL;
if (LZ4_streamHCPtr->internal_donotuse.dirty) {
LZ4_initStreamHC(LZ4_streamHCPtr, sizeof(*LZ4_streamHCPtr));
} else {
/* preserve end - base : can trigger clearTable's threshold */
LZ4_streamHCPtr->internal_donotuse.end -= (uptrval)LZ4_streamHCPtr->internal_donotuse.base;
LZ4_streamHCPtr->internal_donotuse.base = NULL;
LZ4_streamHCPtr->internal_donotuse.dictCtx = NULL;
}
LZ4_setCompressionLevel(LZ4_streamHCPtr, compressionLevel);
}
void LZ4_setCompressionLevel(LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel)
{
DEBUGLOG(5, "LZ4_setCompressionLevel(%p, %d)", LZ4_streamHCPtr, compressionLevel);
if (compressionLevel < 1) compressionLevel = LZ4HC_CLEVEL_DEFAULT;
if (compressionLevel > LZ4HC_CLEVEL_MAX) compressionLevel = LZ4HC_CLEVEL_MAX;
LZ4_streamHCPtr->internal_donotuse.compressionLevel = (short)compressionLevel;
@@ -919,16 +970,24 @@ void LZ4_favorDecompressionSpeed(LZ4_streamHC_t* LZ4_streamHCPtr, int favor)
LZ4_streamHCPtr->internal_donotuse.favorDecSpeed = (favor!=0);
}
int LZ4_loadDictHC (LZ4_streamHC_t* LZ4_streamHCPtr, const char* dictionary, int dictSize)
/* LZ4_loadDictHC() :
* LZ4_streamHCPtr is presumed properly initialized */
int LZ4_loadDictHC (LZ4_streamHC_t* LZ4_streamHCPtr,
const char* dictionary, int dictSize)
{
LZ4HC_CCtx_internal* const ctxPtr = &LZ4_streamHCPtr->internal_donotuse;
DEBUGLOG(4, "LZ4_loadDictHC(%p, %p, %d)", LZ4_streamHCPtr, dictionary, dictSize);
assert(LZ4_streamHCPtr != NULL);
if (dictSize > 64 KB) {
dictionary += dictSize - 64 KB;
dictionary += (size_t)dictSize - 64 KB;
dictSize = 64 KB;
}
LZ4_resetStreamHC(LZ4_streamHCPtr, ctxPtr->compressionLevel);
LZ4HC_init (ctxPtr, (const BYTE*)dictionary);
/* need a full initialization, there are bad side-effects when using resetFast() */
{ int const cLevel = ctxPtr->compressionLevel;
LZ4_initStreamHC(LZ4_streamHCPtr, sizeof(*LZ4_streamHCPtr));
LZ4_setCompressionLevel(LZ4_streamHCPtr, cLevel);
}
LZ4HC_init_internal (ctxPtr, (const BYTE*)dictionary);
ctxPtr->end = (const BYTE*)dictionary + dictSize;
if (dictSize >= 4) LZ4HC_Insert (ctxPtr, ctxPtr->end-3);
return dictSize;
@@ -961,9 +1020,11 @@ static int LZ4_compressHC_continue_generic (LZ4_streamHC_t* LZ4_streamHCPtr,
limitedOutput_directive limit)
{
LZ4HC_CCtx_internal* const ctxPtr = &LZ4_streamHCPtr->internal_donotuse;
DEBUGLOG(4, "LZ4_compressHC_continue_generic(%p, %p, %d)", LZ4_streamHCPtr, src, *srcSizePtr);
DEBUGLOG(4, "LZ4_compressHC_continue_generic(ctx=%p, src=%p, srcSize=%d)",
LZ4_streamHCPtr, src, *srcSizePtr);
assert(ctxPtr != NULL);
/* auto-init if forgotten */
if (ctxPtr->base == NULL) LZ4HC_init (ctxPtr, (const BYTE*) src);
if (ctxPtr->base == NULL) LZ4HC_init_internal (ctxPtr, (const BYTE*) src);
/* Check overflow */
if ((size_t)(ctxPtr->end - ctxPtr->base) > 2 GB) {
@@ -973,7 +1034,8 @@ static int LZ4_compressHC_continue_generic (LZ4_streamHC_t* LZ4_streamHCPtr,
}
/* Check if blocks follow each other */
if ((const BYTE*)src != ctxPtr->end) LZ4HC_setExternalDict(ctxPtr, (const BYTE*)src);
if ((const BYTE*)src != ctxPtr->end)
LZ4HC_setExternalDict(ctxPtr, (const BYTE*)src);
/* Check overlapping input/dictionary space */
{ const BYTE* sourceEnd = (const BYTE*) src + *srcSizePtr;
@@ -994,12 +1056,12 @@ int LZ4_compress_HC_continue (LZ4_streamHC_t* LZ4_streamHCPtr, const char* src,
if (dstCapacity < LZ4_compressBound(srcSize))
return LZ4_compressHC_continue_generic (LZ4_streamHCPtr, src, dst, &srcSize, dstCapacity, limitedOutput);
else
return LZ4_compressHC_continue_generic (LZ4_streamHCPtr, src, dst, &srcSize, dstCapacity, noLimit);
return LZ4_compressHC_continue_generic (LZ4_streamHCPtr, src, dst, &srcSize, dstCapacity, notLimited);
}
int LZ4_compress_HC_continue_destSize (LZ4_streamHC_t* LZ4_streamHCPtr, const char* src, char* dst, int* srcSizePtr, int targetDestSize)
{
return LZ4_compressHC_continue_generic(LZ4_streamHCPtr, src, dst, srcSizePtr, targetDestSize, limitedDestSize);
return LZ4_compressHC_continue_generic(LZ4_streamHCPtr, src, dst, srcSizePtr, targetDestSize, fillOutput);
}
@@ -1018,19 +1080,21 @@ int LZ4_saveDictHC (LZ4_streamHC_t* LZ4_streamHCPtr, char* safeBuffer, int dictS
{ U32 const endIndex = (U32)(streamPtr->end - streamPtr->base);
streamPtr->end = (const BYTE*)safeBuffer + dictSize;
streamPtr->base = streamPtr->end - endIndex;
streamPtr->dictLimit = endIndex - dictSize;
streamPtr->lowLimit = endIndex - dictSize;
streamPtr->dictLimit = endIndex - (U32)dictSize;
streamPtr->lowLimit = endIndex - (U32)dictSize;
if (streamPtr->nextToUpdate < streamPtr->dictLimit) streamPtr->nextToUpdate = streamPtr->dictLimit;
}
return dictSize;
}
/***********************************
/***************************************************
* Deprecated Functions
***********************************/
***************************************************/
/* These functions currently generate deprecation warnings */
/* Deprecated compression functions */
/* Wrappers for deprecated compression functions */
int LZ4_compressHC(const char* src, char* dst, int srcSize) { return LZ4_compress_HC (src, dst, srcSize, LZ4_compressBound(srcSize), 0); }
int LZ4_compressHC_limitedOutput(const char* src, char* dst, int srcSize, int maxDstSize) { return LZ4_compress_HC(src, dst, srcSize, maxDstSize, 0); }
int LZ4_compressHC2(const char* src, char* dst, int srcSize, int cLevel) { return LZ4_compress_HC (src, dst, srcSize, LZ4_compressBound(srcSize), cLevel); }
@@ -1046,25 +1110,26 @@ int LZ4_compressHC_limitedOutput_continue (LZ4_streamHC_t* ctx, const char* src,
/* Deprecated streaming functions */
int LZ4_sizeofStreamStateHC(void) { return LZ4_STREAMHCSIZE; }
/* state is presumed correctly sized, aka >= sizeof(LZ4_streamHC_t)
* @return : 0 on success, !=0 if error */
int LZ4_resetStreamStateHC(void* state, char* inputBuffer)
{
LZ4HC_CCtx_internal *ctx = &((LZ4_streamHC_t*)state)->internal_donotuse;
if ((((size_t)state) & (sizeof(void*)-1)) != 0) return 1; /* Error : pointer is not aligned for pointer (32 or 64 bits) */
LZ4_resetStreamHC((LZ4_streamHC_t*)state, ((LZ4_streamHC_t*)state)->internal_donotuse.compressionLevel);
LZ4HC_init(ctx, (const BYTE*)inputBuffer);
LZ4_streamHC_t* const hc4 = LZ4_initStreamHC(state, sizeof(*hc4));
if (hc4 == NULL) return 1; /* init failed */
LZ4HC_init_internal (&hc4->internal_donotuse, (const BYTE*)inputBuffer);
return 0;
}
void* LZ4_createHC (const char* inputBuffer)
{
LZ4_streamHC_t* hc4 = (LZ4_streamHC_t*)ALLOC(sizeof(LZ4_streamHC_t));
LZ4_streamHC_t* const hc4 = LZ4_createStreamHC();
if (hc4 == NULL) return NULL; /* not enough memory */
LZ4_resetStreamHC(hc4, 0 /* compressionLevel */);
LZ4HC_init (&hc4->internal_donotuse, (const BYTE*)inputBuffer);
LZ4HC_init_internal (&hc4->internal_donotuse, (const BYTE*)inputBuffer);
return hc4;
}
int LZ4_freeHC (void* LZ4HC_Data) {
int LZ4_freeHC (void* LZ4HC_Data)
{
if (!LZ4HC_Data) return 0; /* support free on NULL */
FREEMEM(LZ4HC_Data);
return 0;
@@ -1072,7 +1137,7 @@ int LZ4_freeHC (void* LZ4HC_Data) {
int LZ4_compressHC2_continue (void* LZ4HC_Data, const char* src, char* dst, int srcSize, int cLevel)
{
return LZ4HC_compress_generic (&((LZ4_streamHC_t*)LZ4HC_Data)->internal_donotuse, src, dst, &srcSize, 0, cLevel, noLimit);
return LZ4HC_compress_generic (&((LZ4_streamHC_t*)LZ4HC_Data)->internal_donotuse, src, dst, &srcSize, 0, cLevel, notLimited);
}
int LZ4_compressHC2_limitedOutput_continue (void* LZ4HC_Data, const char* src, char* dst, int srcSize, int dstCapacity, int cLevel)
@@ -1091,7 +1156,7 @@ char* LZ4_slideInputBufferHC(void* LZ4HC_Data)
/* ================================================
* LZ4 Optimal parser (levels 10-12)
* LZ4 Optimal parser (levels [LZ4HC_CLEVEL_OPT_MIN - LZ4HC_CLEVEL_MAX])
* ===============================================*/
typedef struct {
int price;
@@ -1104,8 +1169,9 @@ typedef struct {
LZ4_FORCE_INLINE int LZ4HC_literalsPrice(int const litlen)
{
int price = litlen;
assert(litlen >= 0);
if (litlen >= (int)RUN_MASK)
price += 1 + (litlen-RUN_MASK)/255;
price += 1 + ((litlen-(int)RUN_MASK) / 255);
return price;
}
@@ -1114,11 +1180,13 @@ LZ4_FORCE_INLINE int LZ4HC_literalsPrice(int const litlen)
LZ4_FORCE_INLINE int LZ4HC_sequencePrice(int litlen, int mlen)
{
int price = 1 + 2 ; /* token + 16-bit offset */
assert(litlen >= 0);
assert(mlen >= MINMATCH);
price += LZ4HC_literalsPrice(litlen);
if (mlen >= (int)(ML_MASK+MINMATCH))
price += 1 + (mlen-(ML_MASK+MINMATCH))/255;
price += 1 + ((mlen-(int)(ML_MASK+MINMATCH)) / 255);
return price;
}
@@ -1177,9 +1245,9 @@ static int LZ4HC_compress_optimal ( LZ4HC_CCtx_internal* ctx,
BYTE* oend = op + dstCapacity;
/* init */
DEBUGLOG(5, "LZ4HC_compress_optimal");
DEBUGLOG(5, "LZ4HC_compress_optimal(dst=%p, dstCapa=%u)", dst, (unsigned)dstCapacity);
*srcSizePtr = 0;
if (limit == limitedDestSize) oend -= LASTLITERALS; /* Hack for support LZ4 format restriction */
if (limit == fillOutput) oend -= LASTLITERALS; /* Hack for support LZ4 format restriction */
if (sufficient_len >= LZ4_OPT_NUM) sufficient_len = LZ4_OPT_NUM-1;
/* Main Loop */
@@ -1197,7 +1265,7 @@ static int LZ4HC_compress_optimal ( LZ4HC_CCtx_internal* ctx,
int const firstML = firstMatch.len;
const BYTE* const matchPos = ip - firstMatch.off;
opSaved = op;
if ( LZ4HC_encodeSequence(&ip, &op, &anchor, firstML, matchPos, limit, oend) ) /* updates ip, op and anchor */
if ( LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), firstML, matchPos, limit, oend) ) /* updates ip, op and anchor */
goto _dest_overflow;
continue;
}
@@ -1335,6 +1403,7 @@ static int LZ4HC_compress_optimal ( LZ4HC_CCtx_internal* ctx,
} }
} /* for (cur = 1; cur <= last_match_pos; cur++) */
assert(last_match_pos < LZ4_OPT_NUM + TRAILING_LITERALS);
best_mlen = opt[last_match_pos].mlen;
best_off = opt[last_match_pos].off;
cur = last_match_pos - best_mlen;
@@ -1367,9 +1436,9 @@ static int LZ4HC_compress_optimal ( LZ4HC_CCtx_internal* ctx,
if (ml == 1) { ip++; rPos++; continue; } /* literal; note: can end up with several literals, in which case, skip them */
rPos += ml;
assert(ml >= MINMATCH);
assert((offset >= 1) && (offset <= MAX_DISTANCE));
assert((offset >= 1) && (offset <= LZ4_DISTANCE_MAX));
opSaved = op;
if ( LZ4HC_encodeSequence(&ip, &op, &anchor, ml, ip - offset, limit, oend) ) /* updates ip, op and anchor */
if ( LZ4HC_encodeSequence(UPDATABLE(ip, op, anchor), ml, ip - offset, limit, oend) ) /* updates ip, op and anchor */
goto _dest_overflow;
} }
} /* while (ip <= mflimit) */
@@ -1379,7 +1448,7 @@ static int LZ4HC_compress_optimal ( LZ4HC_CCtx_internal* ctx,
{ size_t lastRunSize = (size_t)(iend - anchor); /* literals */
size_t litLength = (lastRunSize + 255 - RUN_MASK) / 255;
size_t const totalSize = 1 + litLength + lastRunSize;
if (limit == limitedDestSize) oend += LASTLITERALS; /* restore correct value */
if (limit == fillOutput) oend += LASTLITERALS; /* restore correct value */
if (limit && (op + totalSize > oend)) {
if (limit == limitedOutput) return 0; /* Check output limit */
/* adapt lastRunSize to fill 'dst' */
@@ -1406,7 +1475,7 @@ static int LZ4HC_compress_optimal ( LZ4HC_CCtx_internal* ctx,
return (int) ((char*)op-dst);
_dest_overflow:
if (limit == limitedDestSize) {
if (limit == fillOutput) {
op = opSaved; /* restore correct out pointer */
goto _last_literals;
}

View File

@@ -31,8 +31,8 @@
- LZ4 source repository : https://github.com/lz4/lz4
- LZ4 public forum : https://groups.google.com/forum/#!forum/lz4c
*/
#ifndef LZ4_HC_H_19834876238432
#define LZ4_HC_H_19834876238432
#ifndef TRACY_LZ4_HC_H_19834876238432
#define TRACY_LZ4_HC_H_19834876238432
/* --- Dependency --- */
/* note : lz4hc requires lz4.h/lz4.c for compilation */
@@ -52,7 +52,7 @@ namespace tracy
* Block Compression
**************************************/
/*! LZ4_compress_HC() :
* Compress data from `src` into `dst`, using the more powerful but slower "HC" algorithm.
* Compress data from `src` into `dst`, using the powerful but slower "HC" algorithm.
* `dst` must be already allocated.
* Compression is guaranteed to succeed if `dstCapacity >= LZ4_compressBound(srcSize)` (see "lz4.h")
* Max supported `srcSize` value is LZ4_MAX_INPUT_SIZE (see "lz4.h")
@@ -75,7 +75,21 @@ LZ4LIB_API int LZ4_compress_HC (const char* src, char* dst, int srcSize, int dst
* Memory segment must be aligned on 8-bytes boundaries (which a normal malloc() should do properly).
*/
LZ4LIB_API int LZ4_sizeofStateHC(void);
LZ4LIB_API int LZ4_compress_HC_extStateHC(void* state, const char* src, char* dst, int srcSize, int maxDstSize, int compressionLevel);
LZ4LIB_API int LZ4_compress_HC_extStateHC(void* stateHC, const char* src, char* dst, int srcSize, int maxDstSize, int compressionLevel);
/*! LZ4_compress_HC_destSize() : v1.9.0+
* Will compress as much data as possible from `src`
* to fit into `targetDstSize` budget.
* Result is provided in 2 parts :
* @return : the number of bytes written into 'dst' (necessarily <= targetDstSize)
* or 0 if compression fails.
* `srcSizePtr` : on success, *srcSizePtr is updated to indicate how much bytes were read from `src`
*/
LZ4LIB_API int LZ4_compress_HC_destSize(void* stateHC,
const char* src, char* dst,
int* srcSizePtr, int targetDstSize,
int compressionLevel);
/*-************************************
@@ -87,46 +101,92 @@ LZ4LIB_API int LZ4_compress_HC_extStateHC(void* state, const char* src, char* ds
/*! LZ4_createStreamHC() and LZ4_freeStreamHC() :
* These functions create and release memory for LZ4 HC streaming state.
* Newly created states are automatically initialized.
* Existing states can be re-used several times, using LZ4_resetStreamHC().
* These methods are API and ABI stable, they can be used in combination with a DLL.
* A same state can be used multiple times consecutively,
* starting with LZ4_resetStreamHC_fast() to start a new stream of blocks.
*/
LZ4LIB_API LZ4_streamHC_t* LZ4_createStreamHC(void);
LZ4LIB_API int LZ4_freeStreamHC (LZ4_streamHC_t* streamHCPtr);
LZ4LIB_API void LZ4_resetStreamHC (LZ4_streamHC_t* streamHCPtr, int compressionLevel);
/*
These functions compress data in successive blocks of any size,
using previous blocks as dictionary, to improve compression ratio.
One key assumption is that previous blocks (up to 64 KB) remain read-accessible while compressing next blocks.
There is an exception for ring buffers, which can be smaller than 64 KB.
Ring-buffer scenario is automatically detected and handled within LZ4_compress_HC_continue().
Before starting compression, state must be allocated and properly initialized.
LZ4_createStreamHC() does both, though compression level is set to LZ4HC_CLEVEL_DEFAULT.
Selecting the compression level can be done with LZ4_resetStreamHC_fast() (starts a new stream)
or LZ4_setCompressionLevel() (anytime, between blocks in the same stream) (experimental).
LZ4_resetStreamHC_fast() only works on states which have been properly initialized at least once,
which is automatically the case when state is created using LZ4_createStreamHC().
After reset, a first "fictional block" can be designated as initial dictionary,
using LZ4_loadDictHC() (Optional).
Invoke LZ4_compress_HC_continue() to compress each successive block.
The number of blocks is unlimited.
Previous input blocks, including initial dictionary when present,
must remain accessible and unmodified during compression.
It's allowed to update compression level anytime between blocks,
using LZ4_setCompressionLevel() (experimental).
'dst' buffer should be sized to handle worst case scenarios
(see LZ4_compressBound(), it ensures compression success).
In case of failure, the API does not guarantee recovery,
so the state _must_ be reset.
To ensure compression success
whenever `dst` buffer size cannot be made >= LZ4_compressBound(),
consider using LZ4_compress_HC_continue_destSize().
Whenever previous input blocks can't be preserved unmodified in-place during compression of next blocks,
it's possible to copy the last blocks into a more stable memory space, using LZ4_saveDictHC().
Return value of LZ4_saveDictHC() is the size of dictionary effectively saved into 'safeBuffer' (<= 64 KB)
After completing a streaming compression,
it's possible to start a new stream of blocks, using the same LZ4_streamHC_t state,
just by resetting it, using LZ4_resetStreamHC_fast().
*/
LZ4LIB_API void LZ4_resetStreamHC_fast(LZ4_streamHC_t* streamHCPtr, int compressionLevel); /* v1.9.0+ */
LZ4LIB_API int LZ4_loadDictHC (LZ4_streamHC_t* streamHCPtr, const char* dictionary, int dictSize);
LZ4LIB_API int LZ4_compress_HC_continue (LZ4_streamHC_t* streamHCPtr, const char* src, char* dst, int srcSize, int maxDstSize);
LZ4LIB_API int LZ4_compress_HC_continue (LZ4_streamHC_t* streamHCPtr,
const char* src, char* dst,
int srcSize, int maxDstSize);
/*! LZ4_compress_HC_continue_destSize() : v1.9.0+
* Similar to LZ4_compress_HC_continue(),
* but will read as much data as possible from `src`
* to fit into `targetDstSize` budget.
* Result is provided into 2 parts :
* @return : the number of bytes written into 'dst' (necessarily <= targetDstSize)
* or 0 if compression fails.
* `srcSizePtr` : on success, *srcSizePtr will be updated to indicate how much bytes were read from `src`.
* Note that this function may not consume the entire input.
*/
LZ4LIB_API int LZ4_compress_HC_continue_destSize(LZ4_streamHC_t* LZ4_streamHCPtr,
const char* src, char* dst,
int* srcSizePtr, int targetDstSize);
LZ4LIB_API int LZ4_saveDictHC (LZ4_streamHC_t* streamHCPtr, char* safeBuffer, int maxDictSize);
/*
These functions compress data in successive blocks of any size, using previous blocks as dictionary.
One key assumption is that previous blocks (up to 64 KB) remain read-accessible while compressing next blocks.
There is an exception for ring buffers, which can be smaller than 64 KB.
Ring buffers scenario is automatically detected and handled by LZ4_compress_HC_continue().
Before starting compression, state must be properly initialized, using LZ4_resetStreamHC().
A first "fictional block" can then be designated as initial dictionary, using LZ4_loadDictHC() (Optional).
Then, use LZ4_compress_HC_continue() to compress each successive block.
Previous memory blocks (including initial dictionary when present) must remain accessible and unmodified during compression.
'dst' buffer should be sized to handle worst case scenarios (see LZ4_compressBound()), to ensure operation success.
Because in case of failure, the API does not guarantee context recovery, and context will have to be reset.
If `dst` buffer budget cannot be >= LZ4_compressBound(), consider using LZ4_compress_HC_continue_destSize() instead.
If, for any reason, previous data block can't be preserved unmodified in memory for next compression block,
you can save it to a more stable memory space, using LZ4_saveDictHC().
Return value of LZ4_saveDictHC() is the size of dictionary effectively saved into 'safeBuffer'.
*/
/*-**************************************************************
/*^**********************************************
* !!!!!! STATIC LINKING ONLY !!!!!!
***********************************************/
/*-******************************************************************
* PRIVATE DEFINITIONS :
* Do not use these definitions.
* They are exposed to allow static allocation of `LZ4_streamHC_t`.
* Using these definitions makes the code vulnerable to potential API break when upgrading LZ4
****************************************************************/
* Do not use these definitions directly.
* They are merely exposed to allow static allocation of `LZ4_streamHC_t`.
* Declare an `LZ4_streamHC_t` directly, rather than any type below.
* Even then, only do so in the context of static linking, as definitions may change between versions.
********************************************************************/
#define LZ4HC_DICTIONARY_LOGSIZE 16
#define LZ4HC_MAXD (1<<LZ4HC_DICTIONARY_LOGSIZE)
#define LZ4HC_MAXD_MASK (LZ4HC_MAXD - 1)
@@ -151,7 +211,9 @@ struct LZ4HC_CCtx_internal
uint32_t lowLimit; /* below that point, no more dict */
uint32_t nextToUpdate; /* index from which to continue dictionary update */
short compressionLevel;
short favorDecSpeed;
int8_t favorDecSpeed; /* favor decompression speed if this flag set,
otherwise, favor compression ratio */
int8_t dirty; /* stream has to be fully reset if this flag is set */
const LZ4HC_CCtx_internal* dictCtx;
};
@@ -169,26 +231,43 @@ struct LZ4HC_CCtx_internal
unsigned int lowLimit; /* below that point, no more dict */
unsigned int nextToUpdate; /* index from which to continue dictionary update */
short compressionLevel;
short favorDecSpeed;
char favorDecSpeed; /* favor decompression speed if this flag set,
otherwise, favor compression ratio */
char dirty; /* stream has to be fully reset if this flag is set */
const LZ4HC_CCtx_internal* dictCtx;
};
#endif
#define LZ4_STREAMHCSIZE (4*LZ4HC_HASHTABLESIZE + 2*LZ4HC_MAXD + 56) /* 262200 */
/* Do not use these definitions directly !
* Declare or allocate an LZ4_streamHC_t instead.
*/
#define LZ4_STREAMHCSIZE (4*LZ4HC_HASHTABLESIZE + 2*LZ4HC_MAXD + 56 + ((sizeof(void*)==16) ? 56 : 0) /* AS400*/ ) /* 262200 or 262256*/
#define LZ4_STREAMHCSIZE_SIZET (LZ4_STREAMHCSIZE / sizeof(size_t))
union LZ4_streamHC_u {
size_t table[LZ4_STREAMHCSIZE_SIZET];
LZ4HC_CCtx_internal internal_donotuse;
}; /* previously typedef'd to LZ4_streamHC_t */
/*
LZ4_streamHC_t :
This structure allows static allocation of LZ4 HC streaming state.
State must be initialized using LZ4_resetStreamHC() before first use.
}; /* previously typedef'd to LZ4_streamHC_t */
Static allocation shall only be used in combination with static linking.
When invoking LZ4 from a DLL, use create/free functions instead, which are API and ABI stable.
*/
/* LZ4_streamHC_t :
* This structure allows static allocation of LZ4 HC streaming state.
* This can be used to allocate statically, on state, or as part of a larger structure.
*
* Such state **must** be initialized using LZ4_initStreamHC() before first use.
*
* Note that invoking LZ4_initStreamHC() is not required when
* the state was created using LZ4_createStreamHC() (which is recommended).
* Using the normal builder, a newly created state is automatically initialized.
*
* Static allocation shall only be used in combination with static linking.
*/
/* LZ4_initStreamHC() : v1.9.0+
* Required before first use of a statically allocated LZ4_streamHC_t.
* Before v1.9.0 : use LZ4_resetStreamHC() instead
*/
LZ4LIB_API LZ4_streamHC_t* LZ4_initStreamHC (void* buffer, size_t size);
/*-************************************
@@ -199,11 +278,11 @@ union LZ4_streamHC_u {
/* deprecated compression functions */
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC (const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC_limitedOutput (const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC2 (const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput (const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC2 (const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput(const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC_withStateHC (void* state, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC_limitedOutput_withStateHC (void* state, const char* source, char* dest, int inputSize, int maxOutputSize);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC2_withStateHC (void* state, const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC2_withStateHC (void* state, const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_extStateHC() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput_withStateHC(void* state, const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC_continue (LZ4_streamHC_t* LZ4_streamHCPtr, const char* source, char* dest, int inputSize);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC_limitedOutput_continue (LZ4_streamHC_t* LZ4_streamHCPtr, const char* source, char* dest, int inputSize, int maxOutputSize);
@@ -219,10 +298,22 @@ LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_comp
LZ4_DEPRECATED("use LZ4_createStreamHC() instead") LZ4LIB_API void* LZ4_createHC (const char* inputBuffer);
LZ4_DEPRECATED("use LZ4_saveDictHC() instead") LZ4LIB_API char* LZ4_slideInputBufferHC (void* LZ4HC_Data);
LZ4_DEPRECATED("use LZ4_freeStreamHC() instead") LZ4LIB_API int LZ4_freeHC (void* LZ4HC_Data);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC2_continue (void* LZ4HC_Data, const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC2_continue (void* LZ4HC_Data, const char* source, char* dest, int inputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_compress_HC_continue() instead") LZ4LIB_API int LZ4_compressHC2_limitedOutput_continue (void* LZ4HC_Data, const char* source, char* dest, int inputSize, int maxOutputSize, int compressionLevel);
LZ4_DEPRECATED("use LZ4_createStreamHC() instead") LZ4LIB_API int LZ4_sizeofStreamStateHC(void);
LZ4_DEPRECATED("use LZ4_resetStreamHC() instead") LZ4LIB_API int LZ4_resetStreamStateHC(void* state, char* inputBuffer);
LZ4_DEPRECATED("use LZ4_initStreamHC() instead") LZ4LIB_API int LZ4_resetStreamStateHC(void* state, char* inputBuffer);
/* LZ4_resetStreamHC() is now replaced by LZ4_initStreamHC().
* The intention is to emphasize the difference with LZ4_resetStreamHC_fast(),
* which is now the recommended function to start a new stream of blocks,
* but cannot be used to initialize a memory segment containing arbitrary garbage data.
*
* It is recommended to switch to LZ4_initStreamHC().
* LZ4_resetStreamHC() will generate deprecation warnings in a future version.
*/
LZ4LIB_API void LZ4_resetStreamHC (LZ4_streamHC_t* streamHCPtr, int compressionLevel);
}
@@ -244,44 +335,22 @@ LZ4_DEPRECATED("use LZ4_resetStreamHC() instead") LZ4LIB_API int LZ4_resetStr
namespace tracy
{
/*! LZ4_compress_HC_destSize() : v1.8.0 (experimental)
* Will try to compress as much data from `src` as possible
* that can fit into `targetDstSize` budget.
* Result is provided in 2 parts :
* @return : the number of bytes written into 'dst'
* or 0 if compression fails.
* `srcSizePtr` : value will be updated to indicate how much bytes were read from `src`
/*! LZ4_setCompressionLevel() : v1.8.0+ (experimental)
* It's possible to change compression level
* between successive invocations of LZ4_compress_HC_continue*()
* for dynamic adaptation.
*/
int LZ4_compress_HC_destSize(void* LZ4HC_Data,
const char* src, char* dst,
int* srcSizePtr, int targetDstSize,
int compressionLevel);
LZ4LIB_STATIC_API void LZ4_setCompressionLevel(
LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel);
/*! LZ4_compress_HC_continue_destSize() : v1.8.0 (experimental)
* Similar as LZ4_compress_HC_continue(),
* but will read a variable nb of bytes from `src`
* to fit into `targetDstSize` budget.
* Result is provided in 2 parts :
* @return : the number of bytes written into 'dst'
* or 0 if compression fails.
* `srcSizePtr` : value will be updated to indicate how much bytes were read from `src`.
/*! LZ4_favorDecompressionSpeed() : v1.8.2+ (experimental)
* Opt. Parser will favor decompression speed over compression ratio.
* Only applicable to levels >= LZ4HC_CLEVEL_OPT_MIN.
*/
int LZ4_compress_HC_continue_destSize(LZ4_streamHC_t* LZ4_streamHCPtr,
const char* src, char* dst,
int* srcSizePtr, int targetDstSize);
LZ4LIB_STATIC_API void LZ4_favorDecompressionSpeed(
LZ4_streamHC_t* LZ4_streamHCPtr, int favor);
/*! LZ4_setCompressionLevel() : v1.8.0 (experimental)
* It's possible to change compression level between 2 invocations of LZ4_compress_HC_continue*()
*/
void LZ4_setCompressionLevel(LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel);
/*! LZ4_favorDecompressionSpeed() : v1.8.2 (experimental)
* Parser will select decisions favoring decompression over compression ratio.
* Only work at highest compression settings (level >= LZ4HC_CLEVEL_OPT_MIN)
*/
void LZ4_favorDecompressionSpeed(LZ4_streamHC_t* LZ4_streamHCPtr, int favor);
/*! LZ4_resetStreamHC_fast() :
/*! LZ4_resetStreamHC_fast() : v1.9.0+
* When an LZ4_streamHC_t is known to be in a internally coherent state,
* it can often be prepared for a new compression with almost no work, only
* sometimes falling back to the full, expensive reset that is always required
@@ -298,8 +367,14 @@ void LZ4_favorDecompressionSpeed(LZ4_streamHC_t* LZ4_streamHCPtr, int favor);
* - the stream was in an indeterminate state and was used in a compression
* call that fully reset the state (LZ4_compress_HC_extStateHC()) and that
* returned success
*
* Note:
* A stream that was last used in a compression call that returned an error
* may be passed to this function. However, it will be fully reset, which will
* clear any existing history and settings from the context.
*/
void LZ4_resetStreamHC_fast(LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel);
LZ4LIB_STATIC_API void LZ4_resetStreamHC_fast(
LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLevel);
/*! LZ4_compress_HC_extStateHC_fastReset() :
* A variant of LZ4_compress_HC_extStateHC().
@@ -312,7 +387,11 @@ void LZ4_resetStreamHC_fast(LZ4_streamHC_t* LZ4_streamHCPtr, int compressionLeve
* LZ4_resetStreamHC_fast() while LZ4_compress_HC_extStateHC() starts with a
* call to LZ4_resetStreamHC().
*/
int LZ4_compress_HC_extStateHC_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int compressionLevel);
LZ4LIB_STATIC_API int LZ4_compress_HC_extStateHC_fastReset (
void* state,
const char* src, char* dst,
int srcSize, int dstCapacity,
int compressionLevel);
/*! LZ4_attach_HC_dictionary() :
* This is an experimental API that allows for the efficient use of a
@@ -339,7 +418,9 @@ int LZ4_compress_HC_extStateHC_fastReset (void* state, const char* src, char* ds
* stream (and source buffer) must remain in-place / accessible / unchanged
* through the lifetime of the stream session.
*/
LZ4LIB_API void LZ4_attach_HC_dictionary(LZ4_streamHC_t *working_stream, const LZ4_streamHC_t *dictionary_stream);
LZ4LIB_STATIC_API void LZ4_attach_HC_dictionary(
LZ4_streamHC_t *working_stream,
const LZ4_streamHC_t *dictionary_stream);
}

View File

@@ -1,255 +0,0 @@
// Copyright (c) 2015 Jeff Preshing
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#ifndef __TRACY_CPP11OM_SEMAPHORE_H__
#define __TRACY_CPP11OM_SEMAPHORE_H__
#include <atomic>
#include <cassert>
#if defined(__MACH__)
#include <mach/mach.h>
#elif defined(__unix__)
#include <semaphore.h>
#endif
namespace tracy
{
#if defined(_WIN32)
//---------------------------------------------------------
// Semaphore (Windows)
//---------------------------------------------------------
#ifndef MAXLONG
enum { MAXLONG = 0x7fffffff };
#endif
#ifndef INFINITE
enum { INFINITE = 0xFFFFFFFF };
#endif
#ifndef _WINDOWS_
typedef void* HANDLE;
extern "C" __declspec(dllimport) HANDLE __stdcall CreateSemaphoreA( void*, long, long, const char* );
extern "C" __declspec(dllimport) int __stdcall CloseHandle( HANDLE );
extern "C" __declspec(dllimport) unsigned long __stdcall WaitForSingleObject( HANDLE, unsigned long );
extern "C" __declspec(dllimport) int __stdcall ReleaseSemaphore( HANDLE, long, long* );
#endif
class Semaphore
{
private:
HANDLE m_hSema;
Semaphore(const Semaphore& other) = delete;
Semaphore& operator=(const Semaphore& other) = delete;
public:
Semaphore(int initialCount = 0)
{
assert(initialCount >= 0);
m_hSema = CreateSemaphoreA(NULL, initialCount, MAXLONG, NULL);
}
~Semaphore()
{
CloseHandle(m_hSema);
}
void wait()
{
WaitForSingleObject(m_hSema, INFINITE);
}
void signal(int count = 1)
{
ReleaseSemaphore(m_hSema, count, NULL);
}
};
#elif defined(__MACH__)
//---------------------------------------------------------
// Semaphore (Apple iOS and OSX)
// Can't use POSIX semaphores due to http://lists.apple.com/archives/darwin-kernel/2009/Apr/msg00010.html
//---------------------------------------------------------
class Semaphore
{
private:
semaphore_t m_sema;
Semaphore(const Semaphore& other) = delete;
Semaphore& operator=(const Semaphore& other) = delete;
public:
Semaphore(int initialCount = 0)
{
assert(initialCount >= 0);
semaphore_create(mach_task_self(), &m_sema, SYNC_POLICY_FIFO, initialCount);
}
~Semaphore()
{
semaphore_destroy(mach_task_self(), m_sema);
}
void wait()
{
semaphore_wait(m_sema);
}
void signal()
{
semaphore_signal(m_sema);
}
void signal(int count)
{
while (count-- > 0)
{
semaphore_signal(m_sema);
}
}
};
#elif defined(__unix__)
//---------------------------------------------------------
// Semaphore (POSIX, Linux)
//---------------------------------------------------------
class Semaphore
{
private:
sem_t m_sema;
Semaphore(const Semaphore& other) = delete;
Semaphore& operator=(const Semaphore& other) = delete;
public:
Semaphore(int initialCount = 0)
{
assert(initialCount >= 0);
sem_init(&m_sema, 0, initialCount);
}
~Semaphore()
{
sem_destroy(&m_sema);
}
void wait()
{
// http://stackoverflow.com/questions/2013181/gdb-causes-sem-wait-to-fail-with-eintr-error
int rc;
do
{
rc = sem_wait(&m_sema);
}
while (rc == -1 && errno == EINTR);
}
void signal()
{
sem_post(&m_sema);
}
void signal(int count)
{
while (count-- > 0)
{
sem_post(&m_sema);
}
}
};
#else
#error Unsupported platform!
#endif
//---------------------------------------------------------
// LightweightSemaphore
//---------------------------------------------------------
class LightweightSemaphore
{
private:
std::atomic<int> m_count;
Semaphore m_sema;
void waitWithPartialSpinning()
{
int oldCount;
// Is there a better way to set the initial spin count?
// If we lower it to 1000, testBenaphore becomes 15x slower on my Core i7-5930K Windows PC,
// as threads start hitting the kernel semaphore.
int spin = 10000;
while (spin--)
{
oldCount = m_count.load(std::memory_order_relaxed);
if ((oldCount > 0) && m_count.compare_exchange_strong(oldCount, oldCount - 1, std::memory_order_acquire))
return;
std::atomic_signal_fence(std::memory_order_acquire); // Prevent the compiler from collapsing the loop.
}
oldCount = m_count.fetch_sub(1, std::memory_order_acquire);
if (oldCount <= 0)
{
m_sema.wait();
}
}
public:
LightweightSemaphore(int initialCount = 0) : m_count(initialCount)
{
assert(initialCount >= 0);
}
bool tryWait()
{
int oldCount = m_count.load(std::memory_order_relaxed);
return (oldCount > 0 && m_count.compare_exchange_strong(oldCount, oldCount - 1, std::memory_order_acquire));
}
void wait()
{
if (!tryWait())
waitWithPartialSpinning();
}
void signal(int count = 1)
{
int oldCount = m_count.fetch_add(count, std::memory_order_release);
int toRelease = -oldCount < count ? -oldCount : count;
if (toRelease > 0)
{
m_sema.signal(toRelease);
}
}
};
typedef LightweightSemaphore DefaultSemaphoreType;
}
#endif // __CPP11OM_SEMAPHORE_H__

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.4 KiB

BIN
doc/issues/dxt1+alpha.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

After

Width:  |  Height:  |  Size: 284 KiB

BIN
doc/profiler2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

View File

@@ -0,0 +1,14 @@
cmake_minimum_required(VERSION 3.0)
project(OpenCLVectorAdd)
find_package(OpenCL REQUIRED)
add_executable(OpenCLVectorAdd OpenCLVectorAdd.cpp)
add_library(TracyClient STATIC ../../TracyClient.cpp
../../TracyOpenCL.hpp)
target_include_directories(TracyClient PUBLIC ../../)
target_compile_definitions(TracyClient PUBLIC TRACY_ENABLE=1)
target_link_libraries(OpenCLVectorAdd PUBLIC OpenCL::OpenCL TracyClient)

View File

@@ -0,0 +1,190 @@
#include <iostream>
#include <cassert>
#include <string>
#include <vector>
#include <numeric>
#include <CL/cl.h>
#include <Tracy.hpp>
#include <TracyOpenCL.hpp>
#define CL_ASSERT(err) \
if((err) != CL_SUCCESS) \
{ \
std::cerr << "OpenCL Call Returned " << err << std::endl; \
assert(false); \
}
const char kernelSource[] =
" void __kernel vectorAdd(global float* C, global float* A, global float* B, int N) "
" { "
" int i = get_global_id(0); "
" if (i < N) { "
" C[i] = A[i] + B[i]; "
" } "
" } ";
int main()
{
cl_platform_id platform;
cl_device_id device;
cl_context context;
cl_command_queue commandQueue;
cl_kernel vectorAddKernel;
cl_program program;
cl_int err;
cl_mem bufferA, bufferB, bufferC;
TracyCLCtx tracyCLCtx;
{
ZoneScopedN("OpenCL Init");
cl_uint numPlatforms = 0;
CL_ASSERT(clGetPlatformIDs(0, nullptr, &numPlatforms));
if (numPlatforms == 0)
{
std::cerr << "Cannot find OpenCL platform to run this application" << std::endl;
return 1;
}
CL_ASSERT(clGetPlatformIDs(1, &platform, nullptr));
size_t platformNameBufferSize = 0;
CL_ASSERT(clGetPlatformInfo(platform, CL_PLATFORM_NAME, 0, nullptr, &platformNameBufferSize));
std::string platformName(platformNameBufferSize, '\0');
CL_ASSERT(clGetPlatformInfo(platform, CL_PLATFORM_NAME, platformNameBufferSize, &platformName[0], nullptr));
std::cout << "OpenCL Platform: " << platformName << std::endl;
CL_ASSERT(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, nullptr));
size_t deviceNameBufferSize = 0;
CL_ASSERT(clGetDeviceInfo(device, CL_DEVICE_NAME, 0, nullptr, &deviceNameBufferSize));
std::string deviceName(deviceNameBufferSize, '\0');
CL_ASSERT(clGetDeviceInfo(device, CL_DEVICE_NAME, deviceNameBufferSize, &deviceName[0], nullptr));
std::cout << "OpenCL Device: " << deviceName << std::endl;
err = CL_SUCCESS;
context = clCreateContext(nullptr, 1, &device, nullptr, nullptr, &err);
CL_ASSERT(err);
size_t kernelSourceLength = sizeof(kernelSource);
const char* kernelSourceArray = { kernelSource };
program = clCreateProgramWithSource(context, 1, &kernelSourceArray, &kernelSourceLength, &err);
CL_ASSERT(err);
if (clBuildProgram(program, 1, &device, nullptr, nullptr, nullptr) != CL_SUCCESS)
{
size_t programBuildLogBufferSize = 0;
CL_ASSERT(clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, nullptr, &programBuildLogBufferSize));
std::string programBuildLog(programBuildLogBufferSize, '\0');
CL_ASSERT(clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, programBuildLogBufferSize, &programBuildLog[0], nullptr));
std::clog << programBuildLog << std::endl;
return 1;
}
vectorAddKernel = clCreateKernel(program, "vectorAdd", &err);
CL_ASSERT(err);
commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err);
CL_ASSERT(err);
}
tracyCLCtx = TracyCLContext(context, device);
size_t N = 10 * 1024 * 1024 / sizeof(float); // 10MB of floats
std::vector<float> hostA, hostB, hostC;
{
ZoneScopedN("Host Data Init");
hostA.resize(N);
hostB.resize(N);
hostC.resize(N);
std::iota(std::begin(hostA), std::end(hostA), 0);
std::iota(std::begin(hostB), std::end(hostB), 0);
}
{
ZoneScopedN("Host to Device Memory Copy");
bufferA = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(float), nullptr, &err);
CL_ASSERT(err);
bufferB = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(float), nullptr, &err);
CL_ASSERT(err);
bufferC = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(float), nullptr, &err);
CL_ASSERT(err);
cl_event writeBufferAEvent, writeBufferBEvent;
{
ZoneScopedN("Write Buffer A");
TracyCLZoneS(tracyCLCtx, "Write BufferA", 5);
CL_ASSERT(clEnqueueWriteBuffer(commandQueue, bufferA, CL_TRUE, 0, N * sizeof(float), hostA.data(), 0, nullptr, &writeBufferAEvent));
TracyCLZoneSetEvent(writeBufferAEvent);
}
{
ZoneScopedN("Write Buffer B");
TracyCLZone(tracyCLCtx, "Write BufferB");
CL_ASSERT(clEnqueueWriteBuffer(commandQueue, bufferB, CL_TRUE, 0, N * sizeof(float), hostB.data(), 0, nullptr, &writeBufferBEvent));
TracyCLZoneSetEvent(writeBufferBEvent);
}
}
for (int i = 0; i < 10; ++i)
{
ZoneScopedN("VectorAdd Kernel Launch");
TracyCLZoneC(tracyCLCtx, "VectorAdd Kernel", tracy::Color::Blue4);
CL_ASSERT(clSetKernelArg(vectorAddKernel, 0, sizeof(cl_mem), &bufferC));
CL_ASSERT(clSetKernelArg(vectorAddKernel, 1, sizeof(cl_mem), &bufferA));
CL_ASSERT(clSetKernelArg(vectorAddKernel, 2, sizeof(cl_mem), &bufferB));
CL_ASSERT(clSetKernelArg(vectorAddKernel, 3, sizeof(int), &static_cast<int>(N)));
cl_event vectorAddKernelEvent;
CL_ASSERT(clEnqueueNDRangeKernel(commandQueue, vectorAddKernel, 1, nullptr, &N, nullptr, 0, nullptr, &vectorAddKernelEvent));
CL_ASSERT(clWaitForEvents(1, &vectorAddKernelEvent));
TracyCLZoneSetEvent(vectorAddKernelEvent);
cl_ulong kernelStartTime, kernelEndTime;
CL_ASSERT(clGetEventProfilingInfo(vectorAddKernelEvent, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &kernelStartTime, nullptr));
CL_ASSERT(clGetEventProfilingInfo(vectorAddKernelEvent, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &kernelEndTime, nullptr));
std::cout << "VectorAdd Kernel Elapsed: " << ((kernelEndTime - kernelStartTime) / 1000) << " us" << std::endl;
}
{
ZoneScopedN("Device to Host Memory Copy");
TracyCLZone(tracyCLCtx, "Read Buffer C");
cl_event readbufferCEvent;
CL_ASSERT(clEnqueueReadBuffer(commandQueue, bufferC, CL_TRUE, 0, N * sizeof(float), hostC.data(), 0, nullptr, &readbufferCEvent));
TracyCLZoneSetEvent(readbufferCEvent);
}
CL_ASSERT(clFinish(commandQueue));
TracyCLCollect(tracyCLCtx);
{
ZoneScopedN("Checking results");
for (int i = 0; i < N; ++i)
{
assert(hostC[i] == hostA[i] + hostB[i]);
}
}
std::cout << "Results are correct!" << std::endl;
TracyCLDestroy(tracyCLCtx);
return 0;
}

1
examples/ToyPathTracer/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
Windows/Compiled*Shader.h

View File

@@ -0,0 +1,4 @@
https://github.com/aras-p/ToyPathTracer
Modified to render only 10 frames. Client part requires 12 GB, server part
requires 6.4 GB.

View File

@@ -0,0 +1,33 @@
#if defined(__APPLE__) && !defined(__METAL_VERSION__)
#include <TargetConditionals.h>
#endif
#define kBackbufferWidth 1280
#define kBackbufferHeight 720
#if defined(__EMSCRIPTEN__)
#define CPU_CAN_DO_SIMD 0
#define CPU_CAN_DO_THREADS 0
#else
#define CPU_CAN_DO_SIMD 1
#define CPU_CAN_DO_THREADS 1
#endif
#define DO_SAMPLES_PER_PIXEL 4
#define DO_ANIMATE_SMOOTHING 0.9f
#define DO_LIGHT_SAMPLING 1
#define DO_MITSUBA_COMPARE 0
// Should path tracing be done on the GPU with a compute shader?
#define DO_COMPUTE_GPU 0
#define kCSGroupSizeX 8
#define kCSGroupSizeY 8
#define kCSMaxObjects 64
// Should float3 struct use SSE/NEON?
#define DO_FLOAT3_WITH_SIMD (!(DO_COMPUTE_GPU) && CPU_CAN_DO_SIMD && 1)
// Should HitSpheres function use SSE/NEON?
#define DO_HIT_SPHERES_SIMD (CPU_CAN_DO_SIMD && 1)

View File

@@ -0,0 +1,192 @@
#pragma once
#if defined(_MSC_VER)
#define VM_INLINE __forceinline
#else
#define VM_INLINE __attribute__((unused, always_inline, nodebug)) inline
#endif
#define kSimdWidth 4
#if !defined(__arm__) && !defined(__arm64__) && !defined(__EMSCRIPTEN__)
// ---- SSE implementation
#include <xmmintrin.h>
#include <emmintrin.h>
#include <smmintrin.h>
#define SHUFFLE4(V, X,Y,Z,W) float4(_mm_shuffle_ps((V).m, (V).m, _MM_SHUFFLE(W,Z,Y,X)))
struct float4
{
VM_INLINE float4() {}
VM_INLINE explicit float4(const float *p) { m = _mm_loadu_ps(p); }
VM_INLINE explicit float4(float x, float y, float z, float w) { m = _mm_set_ps(w, z, y, x); }
VM_INLINE explicit float4(float v) { m = _mm_set_ps1(v); }
VM_INLINE explicit float4(__m128 v) { m = v; }
VM_INLINE float getX() const { return _mm_cvtss_f32(m); }
VM_INLINE float getY() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(1, 1, 1, 1))); }
VM_INLINE float getZ() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(2, 2, 2, 2))); }
VM_INLINE float getW() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(3, 3, 3, 3))); }
__m128 m;
};
typedef float4 bool4;
VM_INLINE float4 operator+ (float4 a, float4 b) { a.m = _mm_add_ps(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a, float4 b) { a.m = _mm_sub_ps(a.m, b.m); return a; }
VM_INLINE float4 operator* (float4 a, float4 b) { a.m = _mm_mul_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator==(float4 a, float4 b) { a.m = _mm_cmpeq_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator!=(float4 a, float4 b) { a.m = _mm_cmpneq_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator< (float4 a, float4 b) { a.m = _mm_cmplt_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator> (float4 a, float4 b) { a.m = _mm_cmpgt_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator<=(float4 a, float4 b) { a.m = _mm_cmple_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator>=(float4 a, float4 b) { a.m = _mm_cmpge_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator&(bool4 a, bool4 b) { a.m = _mm_and_ps(a.m, b.m); return a; }
VM_INLINE bool4 operator|(bool4 a, bool4 b) { a.m = _mm_or_ps(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a) { a.m = _mm_xor_ps(a.m, _mm_set1_ps(-0.0f)); return a; }
VM_INLINE float4 min(float4 a, float4 b) { a.m = _mm_min_ps(a.m, b.m); return a; }
VM_INLINE float4 max(float4 a, float4 b) { a.m = _mm_max_ps(a.m, b.m); return a; }
VM_INLINE float hmin(float4 v)
{
v = min(v, SHUFFLE4(v, 2, 3, 0, 0));
v = min(v, SHUFFLE4(v, 1, 0, 0, 0));
return v.getX();
}
// Returns a 4-bit code where bit0..bit3 is X..W
VM_INLINE unsigned mask(float4 v) { return _mm_movemask_ps(v.m); }
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool4 v) { return mask(v) != 0; }
VM_INLINE bool all(bool4 v) { return mask(v) == 15; }
// "select", i.e. hibit(cond) ? b : a
// on SSE4.1 and up this can be done easily via "blend" instruction;
// on older SSEs has to do a bunch of hoops, see
// https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/
VM_INLINE float4 select(float4 a, float4 b, bool4 cond)
{
#if defined(__SSE4_1__) || defined(_MSC_VER) // on windows assume we always have SSE4.1
a.m = _mm_blendv_ps(a.m, b.m, cond.m);
#else
__m128 d = _mm_castsi128_ps(_mm_srai_epi32(_mm_castps_si128(cond.m), 31));
a.m = _mm_or_ps(_mm_and_ps(d, b.m), _mm_andnot_ps(d, a.m));
#endif
return a;
}
VM_INLINE __m128i select(__m128i a, __m128i b, bool4 cond)
{
#if defined(__SSE4_1__) || defined(_MSC_VER) // on windows assume we always have SSE4.1
return _mm_blendv_epi8(a, b, _mm_castps_si128(cond.m));
#else
__m128i d = _mm_srai_epi32(_mm_castps_si128(cond.m), 31);
return _mm_or_si128(_mm_and_si128(d, b), _mm_andnot_si128(d, a));
#endif
}
VM_INLINE float4 sqrtf(float4 v) { return float4(_mm_sqrt_ps(v.m)); }
#elif !defined(__EMSCRIPTEN__)
// ---- NEON implementation
#define USE_NEON 1
#include <arm_neon.h>
struct float4
{
VM_INLINE float4() {}
VM_INLINE explicit float4(const float *p) { m = vld1q_f32(p); }
VM_INLINE explicit float4(float x, float y, float z, float w) { float v[4] = {x, y, z, w}; m = vld1q_f32(v); }
VM_INLINE explicit float4(float v) { m = vdupq_n_f32(v); }
VM_INLINE explicit float4(float32x4_t v) { m = v; }
VM_INLINE float getX() const { return vgetq_lane_f32(m, 0); }
VM_INLINE float getY() const { return vgetq_lane_f32(m, 1); }
VM_INLINE float getZ() const { return vgetq_lane_f32(m, 2); }
VM_INLINE float getW() const { return vgetq_lane_f32(m, 3); }
float32x4_t m;
};
typedef float4 bool4;
VM_INLINE float4 operator+ (float4 a, float4 b) { a.m = vaddq_f32(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a, float4 b) { a.m = vsubq_f32(a.m, b.m); return a; }
VM_INLINE float4 operator* (float4 a, float4 b) { a.m = vmulq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator==(float4 a, float4 b) { a.m = vceqq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator!=(float4 a, float4 b) { a.m = a.m = vmvnq_u32(vceqq_f32(a.m, b.m)); return a; }
VM_INLINE bool4 operator< (float4 a, float4 b) { a.m = vcltq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator> (float4 a, float4 b) { a.m = vcgtq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator<=(float4 a, float4 b) { a.m = vcleq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator>=(float4 a, float4 b) { a.m = vcgeq_f32(a.m, b.m); return a; }
VM_INLINE bool4 operator&(bool4 a, bool4 b) { a.m = vandq_u32(a.m, b.m); return a; }
VM_INLINE bool4 operator|(bool4 a, bool4 b) { a.m = vorrq_u32(a.m, b.m); return a; }
VM_INLINE float4 operator- (float4 a) { a.m = vnegq_f32(a.m); return a; }
VM_INLINE float4 min(float4 a, float4 b) { a.m = vminq_f32(a.m, b.m); return a; }
VM_INLINE float4 max(float4 a, float4 b) { a.m = vmaxq_f32(a.m, b.m); return a; }
VM_INLINE float hmin(float4 v)
{
float32x2_t minOfHalfs = vpmin_f32(vget_low_f32(v.m), vget_high_f32(v.m));
float32x2_t minOfMinOfHalfs = vpmin_f32(minOfHalfs, minOfHalfs);
return vget_lane_f32(minOfMinOfHalfs, 0);
}
// Returns a 4-bit code where bit0..bit3 is X..W
VM_INLINE unsigned mask(float4 v)
{
static const uint32x4_t movemask = { 1, 2, 4, 8 };
static const uint32x4_t highbit = { 0x80000000, 0x80000000, 0x80000000, 0x80000000 };
uint32x4_t t0 = vreinterpretq_u32_f32(v.m);
uint32x4_t t1 = vtstq_u32(t0, highbit);
uint32x4_t t2 = vandq_u32(t1, movemask);
uint32x2_t t3 = vorr_u32(vget_low_u32(t2), vget_high_u32(t2));
return vget_lane_u32(t3, 0) | vget_lane_u32(t3, 1);
}
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool4 v) { return mask(v) != 0; }
VM_INLINE bool all(bool4 v) { return mask(v) == 15; }
// "select", i.e. hibit(cond) ? b : a
// on SSE4.1 and up this can be done easily via "blend" instruction;
// on older SSEs has to do a bunch of hoops, see
// https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/
VM_INLINE float4 select(float4 a, float4 b, bool4 cond)
{
a.m = vbslq_f32(cond.m, b.m, a.m);
return a;
}
VM_INLINE int32x4_t select(int32x4_t a, int32x4_t b, bool4 cond)
{
return vbslq_f32(cond.m, b, a);
}
VM_INLINE float4 sqrtf(float4 v)
{
float32x4_t V = v.m;
float32x4_t S0 = vrsqrteq_f32(V);
float32x4_t P0 = vmulq_f32( V, S0 );
float32x4_t R0 = vrsqrtsq_f32( P0, S0 );
float32x4_t S1 = vmulq_f32( S0, R0 );
float32x4_t P1 = vmulq_f32( V, S1 );
float32x4_t R1 = vrsqrtsq_f32( P1, S1 );
float32x4_t S2 = vmulq_f32( S1, R1 );
float32x4_t P2 = vmulq_f32( V, S2 );
float32x4_t R2 = vrsqrtsq_f32( P2, S2 );
float32x4_t S3 = vmulq_f32( S2, R2 );
return float4(vmulq_f32(V, S3));
}
VM_INLINE float4 splatX(float32x4_t v) { return float4(vdupq_lane_f32(vget_low_f32(v), 0)); }
VM_INLINE float4 splatY(float32x4_t v) { return float4(vdupq_lane_f32(vget_low_f32(v), 1)); }
VM_INLINE float4 splatZ(float32x4_t v) { return float4(vdupq_lane_f32(vget_high_f32(v), 0)); }
VM_INLINE float4 splatW(float32x4_t v) { return float4(vdupq_lane_f32(vget_high_f32(v), 1)); }
#endif

View File

@@ -0,0 +1,203 @@
#include "Maths.h"
#include <stdlib.h>
#include <stdint.h>
static uint32_t XorShift32(uint32_t& state)
{
uint32_t x = state;
x ^= x << 13;
x ^= x >> 17;
x ^= x << 15;
state = x;
return x;
}
float RandomFloat01(uint32_t& state)
{
return (XorShift32(state) & 0xFFFFFF) / 16777216.0f;
}
float3 RandomInUnitDisk(uint32_t& state)
{
float3 p;
do
{
p = 2.0 * float3(RandomFloat01(state),RandomFloat01(state),0) - float3(1,1,0);
} while (dot(p,p) >= 1.0);
return p;
}
float3 RandomInUnitSphere(uint32_t& state)
{
float3 p;
do {
p = 2.0*float3(RandomFloat01(state),RandomFloat01(state),RandomFloat01(state)) - float3(1,1,1);
} while (sqLength(p) >= 1.0);
return p;
}
float3 RandomUnitVector(uint32_t& state)
{
float z = RandomFloat01(state) * 2.0f - 1.0f;
float a = RandomFloat01(state) * 2.0f * kPI;
float r = sqrtf(1.0f - z * z);
float x = r * cosf(a);
float y = r * sinf(a);
return float3(x, y, z);
}
int HitSpheres(const Ray& r, const SpheresSoA& spheres, float tMin, float tMax, Hit& outHit)
{
#if DO_HIT_SPHERES_SIMD
float4 hitT = float4(tMax);
#if USE_NEON
int32x4_t id = vdupq_n_s32(-1);
#else
__m128i id = _mm_set1_epi32(-1);
#endif
#if DO_FLOAT3_WITH_SIMD && !USE_NEON
float4 rOrigX = SHUFFLE4(r.orig, 0, 0, 0, 0);
float4 rOrigY = SHUFFLE4(r.orig, 1, 1, 1, 1);
float4 rOrigZ = SHUFFLE4(r.orig, 2, 2, 2, 2);
float4 rDirX = SHUFFLE4(r.dir, 0, 0, 0, 0);
float4 rDirY = SHUFFLE4(r.dir, 1, 1, 1, 1);
float4 rDirZ = SHUFFLE4(r.dir, 2, 2, 2, 2);
#elif DO_FLOAT3_WITH_SIMD
float4 rOrigX = splatX(r.orig.m);
float4 rOrigY = splatY(r.orig.m);
float4 rOrigZ = splatZ(r.orig.m);
float4 rDirX = splatX(r.dir.m);
float4 rDirY = splatY(r.dir.m);
float4 rDirZ = splatZ(r.dir.m);
#else
float4 rOrigX = float4(r.orig.x);
float4 rOrigY = float4(r.orig.y);
float4 rOrigZ = float4(r.orig.z);
float4 rDirX = float4(r.dir.x);
float4 rDirY = float4(r.dir.y);
float4 rDirZ = float4(r.dir.z);
#endif
float4 tMin4 = float4(tMin);
#if USE_NEON
int32x4_t curId = vcombine_u32(vcreate_u32(0ULL | (1ULL<<32)), vcreate_u32(2ULL | (3ULL<<32)));
#else
__m128i curId = _mm_set_epi32(3, 2, 1, 0);
#endif
// process 4 spheres at once
for (int i = 0; i < spheres.simdCount; i += kSimdWidth)
{
// load data for 4 spheres
float4 sCenterX = float4(spheres.centerX + i);
float4 sCenterY = float4(spheres.centerY + i);
float4 sCenterZ = float4(spheres.centerZ + i);
float4 sSqRadius = float4(spheres.sqRadius + i);
// note: we flip this vector and calculate -b (nb) since that happens to be slightly preferable computationally
float4 coX = sCenterX - rOrigX;
float4 coY = sCenterY - rOrigY;
float4 coZ = sCenterZ - rOrigZ;
float4 nb = coX * rDirX + coY * rDirY + coZ * rDirZ;
float4 c = coX * coX + coY * coY + coZ * coZ - sSqRadius;
float4 discr = nb * nb - c;
bool4 discrPos = discr > float4(0.0f);
// if ray hits any of the 4 spheres
if (any(discrPos))
{
float4 discrSq = sqrtf(discr);
// ray could hit spheres at t0 & t1
float4 t0 = nb - discrSq;
float4 t1 = nb + discrSq;
float4 t = select(t1, t0, t0 > tMin4); // if t0 is above min, take it (since it's the earlier hit); else try t1.
bool4 msk = discrPos & (t > tMin4) & (t < hitT);
// if hit, take it
id = select(id, curId, msk);
hitT = select(hitT, t, msk);
}
#if USE_NEON
curId = vaddq_s32(curId, vdupq_n_s32(kSimdWidth));
#else
curId = _mm_add_epi32(curId, _mm_set1_epi32(kSimdWidth));
#endif
}
// now we have up to 4 hits, find and return closest one
float minT = hmin(hitT);
if (minT < tMax) // any actual hits?
{
int minMask = mask(hitT == float4(minT));
if (minMask != 0)
{
int id_scalar[4];
float hitT_scalar[4];
#if USE_NEON
vst1q_s32(id_scalar, id);
vst1q_f32(hitT_scalar, hitT.m);
#else
_mm_storeu_si128((__m128i *)id_scalar, id);
_mm_storeu_ps(hitT_scalar, hitT.m);
#endif
// In general, you would do this with a bit scan (first set/trailing zero count).
// But who cares, it's only 16 options.
static const int laneId[16] =
{
0, 0, 1, 0, // 00xx
2, 0, 1, 0, // 01xx
3, 0, 1, 0, // 10xx
2, 0, 1, 0, // 11xx
};
int lane = laneId[minMask];
int hitId = id_scalar[lane];
float finalHitT = hitT_scalar[lane];
outHit.pos = r.pointAt(finalHitT);
outHit.normal = (outHit.pos - float3(spheres.centerX[hitId], spheres.centerY[hitId], spheres.centerZ[hitId])) * spheres.invRadius[hitId];
outHit.t = finalHitT;
return hitId;
}
}
return -1;
#else // #if DO_HIT_SPHERES_SIMD
float hitT = tMax;
int id = -1;
for (int i = 0; i < spheres.count; ++i)
{
float coX = spheres.centerX[i] - r.orig.getX();
float coY = spheres.centerY[i] - r.orig.getY();
float coZ = spheres.centerZ[i] - r.orig.getZ();
float nb = coX * r.dir.getX() + coY * r.dir.getY() + coZ * r.dir.getZ();
float c = coX * coX + coY * coY + coZ * coZ - spheres.sqRadius[i];
float discr = nb * nb - c;
if (discr > 0)
{
float discrSq = sqrtf(discr);
// Try earlier t
float t = nb - discrSq;
if (t <= tMin) // before min, try later t!
t = nb + discrSq;
if (t > tMin && t < hitT)
{
id = i;
hitT = t;
}
}
}
if (id != -1)
{
outHit.pos = r.pointAt(hitT);
outHit.normal = (outHit.pos - float3(spheres.centerX[id], spheres.centerY[id], spheres.centerZ[id])) * spheres.invRadius[id];
outHit.t = hitT;
return id;
}
else
return -1;
#endif // #else of #if DO_HIT_SPHERES_SIMD
}

View File

@@ -0,0 +1,436 @@
#pragma once
#include <math.h>
#include <assert.h>
#include <stdint.h>
#include "Config.h"
#include "MathSimd.h"
#define kPI 3.1415926f
// SSE/SIMD vector largely based on http://www.codersnotes.com/notes/maths-lib-2016/
#if DO_FLOAT3_WITH_SIMD
#if !defined(__arm__) && !defined(__arm64__)
// ---- SSE implementation
// SHUFFLE3(v, 0,1,2) leaves the vector unchanged (v.xyz).
// SHUFFLE3(v, 0,0,0) splats the X (v.xxx).
#define SHUFFLE3(V, X,Y,Z) float3(_mm_shuffle_ps((V).m, (V).m, _MM_SHUFFLE(Z,Z,Y,X)))
struct float3
{
VM_INLINE float3() {}
VM_INLINE explicit float3(const float *p) { m = _mm_set_ps(p[2], p[2], p[1], p[0]); }
VM_INLINE explicit float3(float x, float y, float z) { m = _mm_set_ps(z, z, y, x); }
VM_INLINE explicit float3(float v) { m = _mm_set1_ps(v); }
VM_INLINE explicit float3(__m128 v) { m = v; }
VM_INLINE float getX() const { return _mm_cvtss_f32(m); }
VM_INLINE float getY() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(1, 1, 1, 1))); }
VM_INLINE float getZ() const { return _mm_cvtss_f32(_mm_shuffle_ps(m, m, _MM_SHUFFLE(2, 2, 2, 2))); }
VM_INLINE float3 yzx() const { return SHUFFLE3(*this, 1, 2, 0); }
VM_INLINE float3 zxy() const { return SHUFFLE3(*this, 2, 0, 1); }
VM_INLINE void store(float *p) const { p[0] = getX(); p[1] = getY(); p[2] = getZ(); }
void setX(float x)
{
m = _mm_move_ss(m, _mm_set_ss(x));
}
void setY(float y)
{
__m128 t = _mm_move_ss(m, _mm_set_ss(y));
t = _mm_shuffle_ps(t, t, _MM_SHUFFLE(3, 2, 0, 0));
m = _mm_move_ss(t, m);
}
void setZ(float z)
{
__m128 t = _mm_move_ss(m, _mm_set_ss(z));
t = _mm_shuffle_ps(t, t, _MM_SHUFFLE(3, 0, 1, 0));
m = _mm_move_ss(t, m);
}
__m128 m;
};
typedef float3 bool3;
VM_INLINE float3 operator+ (float3 a, float3 b) { a.m = _mm_add_ps(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a, float3 b) { a.m = _mm_sub_ps(a.m, b.m); return a; }
VM_INLINE float3 operator* (float3 a, float3 b) { a.m = _mm_mul_ps(a.m, b.m); return a; }
VM_INLINE float3 operator/ (float3 a, float3 b) { a.m = _mm_div_ps(a.m, b.m); return a; }
VM_INLINE float3 operator* (float3 a, float b) { a.m = _mm_mul_ps(a.m, _mm_set1_ps(b)); return a; }
VM_INLINE float3 operator/ (float3 a, float b) { a.m = _mm_div_ps(a.m, _mm_set1_ps(b)); return a; }
VM_INLINE float3 operator* (float a, float3 b) { b.m = _mm_mul_ps(_mm_set1_ps(a), b.m); return b; }
VM_INLINE float3 operator/ (float a, float3 b) { b.m = _mm_div_ps(_mm_set1_ps(a), b.m); return b; }
VM_INLINE float3& operator+= (float3 &a, float3 b) { a = a + b; return a; }
VM_INLINE float3& operator-= (float3 &a, float3 b) { a = a - b; return a; }
VM_INLINE float3& operator*= (float3 &a, float3 b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float3 b) { a = a / b; return a; }
VM_INLINE float3& operator*= (float3 &a, float b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float b) { a = a / b; return a; }
VM_INLINE bool3 operator==(float3 a, float3 b) { a.m = _mm_cmpeq_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator!=(float3 a, float3 b) { a.m = _mm_cmpneq_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator< (float3 a, float3 b) { a.m = _mm_cmplt_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator> (float3 a, float3 b) { a.m = _mm_cmpgt_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator<=(float3 a, float3 b) { a.m = _mm_cmple_ps(a.m, b.m); return a; }
VM_INLINE bool3 operator>=(float3 a, float3 b) { a.m = _mm_cmpge_ps(a.m, b.m); return a; }
VM_INLINE float3 min(float3 a, float3 b) { a.m = _mm_min_ps(a.m, b.m); return a; }
VM_INLINE float3 max(float3 a, float3 b) { a.m = _mm_max_ps(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a) { return float3(_mm_setzero_ps()) - a; }
VM_INLINE float hmin(float3 v)
{
v = min(v, SHUFFLE3(v, 1, 0, 2));
return min(v, SHUFFLE3(v, 2, 0, 1)).getX();
}
VM_INLINE float hmax(float3 v)
{
v = max(v, SHUFFLE3(v, 1, 0, 2));
return max(v, SHUFFLE3(v, 2, 0, 1)).getX();
}
VM_INLINE float3 cross(float3 a, float3 b)
{
// x <- a.y*b.z - a.z*b.y
// y <- a.z*b.x - a.x*b.z
// z <- a.x*b.y - a.y*b.x
// We can save a shuffle by grouping it in this wacky order:
return (a.zxy()*b - a*b.zxy()).zxy();
}
// Returns a 3-bit code where bit0..bit2 is X..Z
VM_INLINE unsigned mask(float3 v) { return _mm_movemask_ps(v.m) & 7; }
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool3 v) { return mask(v) != 0; }
VM_INLINE bool all(bool3 v) { return mask(v) == 7; }
VM_INLINE float3 clamp(float3 t, float3 a, float3 b) { return min(max(t, a), b); }
VM_INLINE float sum(float3 v) { return v.getX() + v.getY() + v.getZ(); }
VM_INLINE float dot(float3 a, float3 b) { return sum(a*b); }
#else // #if !defined(__arm__) && !defined(__arm64__)
// ---- NEON implementation
#include <arm_neon.h>
struct float3
{
VM_INLINE float3() {}
VM_INLINE explicit float3(const float *p) { float v[4] = {p[0], p[1], p[2], 0}; m = vld1q_f32(v); }
VM_INLINE explicit float3(float x, float y, float z) { float v[4] = {x, y, z, 0}; m = vld1q_f32(v); }
VM_INLINE explicit float3(float v) { m = vdupq_n_f32(v); }
VM_INLINE explicit float3(float32x4_t v) { m = v; }
VM_INLINE float getX() const { return vgetq_lane_f32(m, 0); }
VM_INLINE float getY() const { return vgetq_lane_f32(m, 1); }
VM_INLINE float getZ() const { return vgetq_lane_f32(m, 2); }
VM_INLINE float3 yzx() const
{
float32x2_t low = vget_low_f32(m);
float32x4_t yzx = vcombine_f32(vext_f32(low, vget_high_f32(m), 1), low);
return float3(yzx);
}
VM_INLINE float3 zxy() const
{
float32x4_t p = m;
p = vuzpq_f32(vreinterpretq_f32_s32(vextq_s32(vreinterpretq_s32_f32(p), vreinterpretq_s32_f32(p), 1)), p).val[1];
return float3(p);
}
VM_INLINE void store(float *p) const { p[0] = getX(); p[1] = getY(); p[2] = getZ(); }
void setX(float x)
{
m = vsetq_lane_f32(x, m, 0);
}
void setY(float y)
{
m = vsetq_lane_f32(y, m, 1);
}
void setZ(float z)
{
m = vsetq_lane_f32(z, m, 2);
}
float32x4_t m;
};
typedef float3 bool3;
VM_INLINE float32x4_t rcp_2(float32x4_t v)
{
float32x4_t e = vrecpeq_f32(v);
e = vmulq_f32(vrecpsq_f32(e, v), e);
e = vmulq_f32(vrecpsq_f32(e, v), e);
return e;
}
VM_INLINE float3 operator+ (float3 a, float3 b) { a.m = vaddq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a, float3 b) { a.m = vsubq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator* (float3 a, float3 b) { a.m = vmulq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator/ (float3 a, float3 b) { float32x4_t recip = rcp_2(b.m); a.m = vmulq_f32(a.m, recip); return a; }
VM_INLINE float3 operator* (float3 a, float b) { a.m = vmulq_f32(a.m, vdupq_n_f32(b)); return a; }
VM_INLINE float3 operator/ (float3 a, float b) { float32x4_t recip = rcp_2(vdupq_n_f32(b)); a.m = vmulq_f32(a.m, recip); return a; }
VM_INLINE float3 operator* (float a, float3 b) { b.m = vmulq_f32(vdupq_n_f32(a), b.m); return b; }
VM_INLINE float3 operator/ (float a, float3 b) { float32x4_t recip = rcp_2(b.m); b.m = vmulq_f32(vdupq_n_f32(a), recip); return b; }
VM_INLINE float3& operator+= (float3 &a, float3 b) { a = a + b; return a; }
VM_INLINE float3& operator-= (float3 &a, float3 b) { a = a - b; return a; }
VM_INLINE float3& operator*= (float3 &a, float3 b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float3 b) { a = a / b; return a; }
VM_INLINE float3& operator*= (float3 &a, float b) { a = a * b; return a; }
VM_INLINE float3& operator/= (float3 &a, float b) { a = a / b; return a; }
VM_INLINE bool3 operator==(float3 a, float3 b) { a.m = vceqq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator!=(float3 a, float3 b) { a.m = vmvnq_u32(vceqq_f32(a.m, b.m)); return a; }
VM_INLINE bool3 operator< (float3 a, float3 b) { a.m = vcltq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator> (float3 a, float3 b) { a.m = vcgtq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator<=(float3 a, float3 b) { a.m = vcleq_f32(a.m, b.m); return a; }
VM_INLINE bool3 operator>=(float3 a, float3 b) { a.m = vcgeq_f32(a.m, b.m); return a; }
VM_INLINE float3 min(float3 a, float3 b) { a.m = vminq_f32(a.m, b.m); return a; }
VM_INLINE float3 max(float3 a, float3 b) { a.m = vmaxq_f32(a.m, b.m); return a; }
VM_INLINE float3 operator- (float3 a) { a.m = vnegq_f32(a.m); return a; }
VM_INLINE float hmin(float3 v)
{
float32x2_t minOfHalfs = vpmin_f32(vget_low_f32(v.m), vget_high_f32(v.m));
float32x2_t minOfMinOfHalfs = vpmin_f32(minOfHalfs, minOfHalfs);
return vget_lane_f32(minOfMinOfHalfs, 0);
}
VM_INLINE float hmax(float3 v)
{
float32x2_t maxOfHalfs = vpmax_f32(vget_low_f32(v.m), vget_high_f32(v.m));
float32x2_t maxOfMaxOfHalfs = vpmax_f32(maxOfHalfs, maxOfHalfs);
return vget_lane_f32(maxOfMaxOfHalfs, 0);
}
VM_INLINE float3 cross(float3 a, float3 b)
{
// x <- a.y*b.z - a.z*b.y
// y <- a.z*b.x - a.x*b.z
// z <- a.x*b.y - a.y*b.x
// We can save a shuffle by grouping it in this wacky order:
return (a.zxy()*b - a*b.zxy()).zxy();
}
// Returns a 3-bit code where bit0..bit2 is X..Z
VM_INLINE unsigned mask(float3 v)
{
static const uint32x4_t movemask = { 1, 2, 4, 8 };
static const uint32x4_t highbit = { 0x80000000, 0x80000000, 0x80000000, 0x80000000 };
uint32x4_t t0 = vreinterpretq_u32_f32(v.m);
uint32x4_t t1 = vtstq_u32(t0, highbit);
uint32x4_t t2 = vandq_u32(t1, movemask);
uint32x2_t t3 = vorr_u32(vget_low_u32(t2), vget_high_u32(t2));
return vget_lane_u32(t3, 0) | vget_lane_u32(t3, 1);
}
// Once we have a comparison, we can branch based on its results:
VM_INLINE bool any(bool3 v) { return mask(v) != 0; }
VM_INLINE bool all(bool3 v) { return mask(v) == 7; }
VM_INLINE float3 clamp(float3 t, float3 a, float3 b) { return min(max(t, a), b); }
VM_INLINE float sum(float3 v) { return v.getX() + v.getY() + v.getZ(); }
VM_INLINE float dot(float3 a, float3 b) { return sum(a*b); }
#endif // #else of #if !defined(__arm__) && !defined(__arm64__)
#else // #if DO_FLOAT3_WITH_SIMD
// ---- Simple scalar C implementation
struct float3
{
float3() : x(0), y(0), z(0) {}
float3(float x_, float y_, float z_) : x(x_), y(y_), z(z_) {}
float3 operator-() const { return float3(-x, -y, -z); }
float3& operator+=(const float3& o) { x+=o.x; y+=o.y; z+=o.z; return *this; }
float3& operator-=(const float3& o) { x-=o.x; y-=o.y; z-=o.z; return *this; }
float3& operator*=(const float3& o) { x*=o.x; y*=o.y; z*=o.z; return *this; }
float3& operator*=(float o) { x*=o; y*=o; z*=o; return *this; }
VM_INLINE float getX() const { return x; }
VM_INLINE float getY() const { return y; }
VM_INLINE float getZ() const { return z; }
VM_INLINE void setX(float x_) { x = x_; }
VM_INLINE void setY(float y_) { y = y_; }
VM_INLINE void setZ(float z_) { z = z_; }
VM_INLINE void store(float *p) const { p[0] = getX(); p[1] = getY(); p[2] = getZ(); }
float x, y, z;
};
VM_INLINE float3 operator+(const float3& a, const float3& b) { return float3(a.x+b.x,a.y+b.y,a.z+b.z); }
VM_INLINE float3 operator-(const float3& a, const float3& b) { return float3(a.x-b.x,a.y-b.y,a.z-b.z); }
VM_INLINE float3 operator*(const float3& a, const float3& b) { return float3(a.x*b.x,a.y*b.y,a.z*b.z); }
VM_INLINE float3 operator*(const float3& a, float b) { return float3(a.x*b,a.y*b,a.z*b); }
VM_INLINE float3 operator*(float a, const float3& b) { return float3(a*b.x,a*b.y,a*b.z); }
VM_INLINE float dot(const float3& a, const float3& b) { return a.x*b.x+a.y*b.y+a.z*b.z; }
VM_INLINE float3 cross(const float3& a, const float3& b)
{
return float3(
a.y*b.z - a.z*b.y,
-(a.x*b.z - a.z*b.x),
a.x*b.y - a.y*b.x
);
}
#endif // #else of #if DO_FLOAT3_WITH_SIMD
VM_INLINE float length(float3 v) { return sqrtf(dot(v, v)); }
VM_INLINE float sqLength(float3 v) { return dot(v, v); }
VM_INLINE float3 normalize(float3 v) { return v * (1.0f / length(v)); }
VM_INLINE float3 lerp(float3 a, float3 b, float t) { return a + (b-a)*t; }
inline void AssertUnit(float3 v)
{
assert(fabsf(sqLength(v) - 1.0f) < 0.01f);
}
inline float3 reflect(float3 v, float3 n)
{
return v - 2*dot(v,n)*n;
}
inline bool refract(float3 v, float3 n, float nint, float3& outRefracted)
{
AssertUnit(v);
float dt = dot(v, n);
float discr = 1.0f - nint*nint*(1-dt*dt);
if (discr > 0)
{
outRefracted = nint * (v - n*dt) - n*sqrtf(discr);
return true;
}
return false;
}
inline float schlick(float cosine, float ri)
{
float r0 = (1-ri) / (1+ri);
r0 = r0*r0;
return r0 + (1-r0)*powf(1-cosine, 5);
}
struct Ray
{
Ray() {}
Ray(float3 orig_, float3 dir_) : orig(orig_), dir(dir_) { AssertUnit(dir); }
float3 pointAt(float t) const { return orig + dir * t; }
float3 orig;
float3 dir;
};
struct Hit
{
float3 pos;
float3 normal;
float t;
};
struct Sphere
{
Sphere() : radius(1.0f), invRadius(0.0f) {}
Sphere(float3 center_, float radius_) : center(center_), radius(radius_), invRadius(0.0f) {}
void UpdateDerivedData() { invRadius = 1.0f/radius; }
float3 center;
float radius;
float invRadius;
};
// data for all spheres in a "structure of arrays" layout
struct SpheresSoA
{
SpheresSoA(int c)
{
count = c;
// we'll be processing spheres in kSimdWidth chunks, so make sure to allocate
// enough space
simdCount = (c + (kSimdWidth - 1)) / kSimdWidth * kSimdWidth;
centerX = new float[simdCount];
centerY = new float[simdCount];
centerZ = new float[simdCount];
sqRadius = new float[simdCount];
invRadius = new float[simdCount];
// set all data to "impossible sphere" state
for (int i = count; i < simdCount; ++i)
{
centerX[i] = centerY[i] = centerZ[i] = 10000.0f;
sqRadius[i] = 0.0f;
invRadius[i] = 0.0f;
}
}
~SpheresSoA()
{
delete[] centerX;
delete[] centerY;
delete[] centerZ;
delete[] sqRadius;
delete[] invRadius;
}
float* centerX;
float* centerY;
float* centerZ;
float* sqRadius;
float* invRadius;
int simdCount;
int count;
};
int HitSpheres(const Ray& r, const SpheresSoA& spheres, float tMin, float tMax, Hit& outHit);
float RandomFloat01(uint32_t& state);
float3 RandomInUnitDisk(uint32_t& state);
float3 RandomInUnitSphere(uint32_t& state);
float3 RandomUnitVector(uint32_t& state);
struct Camera
{
Camera() {}
// vfov is top to bottom in degrees
Camera(const float3& lookFrom, const float3& lookAt, const float3& vup, float vfov, float aspect, float aperture, float focusDist)
{
lensRadius = aperture / 2;
float theta = vfov*kPI/180;
float halfHeight = tanf(theta/2);
float halfWidth = aspect * halfHeight;
origin = lookFrom;
w = normalize(lookFrom - lookAt);
u = normalize(cross(vup, w));
v = cross(w, u);
lowerLeftCorner = origin - halfWidth*focusDist*u - halfHeight*focusDist*v - focusDist*w;
horizontal = 2*halfWidth*focusDist*u;
vertical = 2*halfHeight*focusDist*v;
}
Ray GetRay(float s, float t, uint32_t& state) const
{
float3 rd = lensRadius * RandomInUnitDisk(state);
float3 offset = u * rd.getX() + v * rd.getY();
return Ray(origin + offset, normalize(lowerLeftCorner + s*horizontal + t*vertical - origin - offset));
}
float3 origin;
float3 lowerLeftCorner;
float3 horizontal;
float3 vertical;
float3 u, v, w;
float lensRadius;
};

View File

@@ -0,0 +1,392 @@
#include "Config.h"
#include "Test.h"
#include "Maths.h"
#include <algorithm>
#if CPU_CAN_DO_THREADS
#include "enkiTS/TaskScheduler_c.h"
#include <thread>
#endif
#include <atomic>
#include "../../../Tracy.hpp"
// 46 spheres (2 emissive) when enabled; 9 spheres (1 emissive) when disabled
#define DO_BIG_SCENE 1
static Sphere s_Spheres[] =
{
{float3(0,-100.5,-1), 100},
{float3(2,0,-1), 0.5f},
{float3(0,0,-1), 0.5f},
{float3(-2,0,-1), 0.5f},
{float3(2,0,1), 0.5f},
{float3(0,0,1), 0.5f},
{float3(-2,0,1), 0.5f},
{float3(0.5f,1,0.5f), 0.5f},
{float3(-1.5f,1.5f,0.f), 0.3f},
#if DO_BIG_SCENE
{float3(4,0,-3), 0.5f}, {float3(3,0,-3), 0.5f}, {float3(2,0,-3), 0.5f}, {float3(1,0,-3), 0.5f}, {float3(0,0,-3), 0.5f}, {float3(-1,0,-3), 0.5f}, {float3(-2,0,-3), 0.5f}, {float3(-3,0,-3), 0.5f}, {float3(-4,0,-3), 0.5f},
{float3(4,0,-4), 0.5f}, {float3(3,0,-4), 0.5f}, {float3(2,0,-4), 0.5f}, {float3(1,0,-4), 0.5f}, {float3(0,0,-4), 0.5f}, {float3(-1,0,-4), 0.5f}, {float3(-2,0,-4), 0.5f}, {float3(-3,0,-4), 0.5f}, {float3(-4,0,-4), 0.5f},
{float3(4,0,-5), 0.5f}, {float3(3,0,-5), 0.5f}, {float3(2,0,-5), 0.5f}, {float3(1,0,-5), 0.5f}, {float3(0,0,-5), 0.5f}, {float3(-1,0,-5), 0.5f}, {float3(-2,0,-5), 0.5f}, {float3(-3,0,-5), 0.5f}, {float3(-4,0,-5), 0.5f},
{float3(4,0,-6), 0.5f}, {float3(3,0,-6), 0.5f}, {float3(2,0,-6), 0.5f}, {float3(1,0,-6), 0.5f}, {float3(0,0,-6), 0.5f}, {float3(-1,0,-6), 0.5f}, {float3(-2,0,-6), 0.5f}, {float3(-3,0,-6), 0.5f}, {float3(-4,0,-6), 0.5f},
{float3(1.5f,1.5f,-2), 0.3f},
#endif // #if DO_BIG_SCENE
};
const int kSphereCount = sizeof(s_Spheres) / sizeof(s_Spheres[0]);
static SpheresSoA s_SpheresSoA(kSphereCount);
struct Material
{
enum Type { Lambert, Metal, Dielectric };
Type type;
float3 albedo;
float3 emissive;
float roughness;
float ri;
};
static Material s_SphereMats[kSphereCount] =
{
{ Material::Lambert, float3(0.8f, 0.8f, 0.8f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.8f, 0.4f, 0.4f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0, 0, },
{ Material::Metal, float3(0.4f, 0.4f, 0.8f), float3(0,0,0), 0, 0 },
{ Material::Metal, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0, 0 },
{ Material::Metal, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0.2f, 0 },
{ Material::Metal, float3(0.4f, 0.8f, 0.4f), float3(0,0,0), 0.6f, 0 },
{ Material::Dielectric, float3(0.4f, 0.4f, 0.4f), float3(0,0,0), 0, 1.5f },
{ Material::Lambert, float3(0.8f, 0.6f, 0.2f), float3(30,25,15), 0, 0 },
#if DO_BIG_SCENE
{ Material::Lambert, float3(0.1f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.2f, 0.2f, 0.2f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.3f, 0.3f, 0.3f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.4f, 0.4f, 0.4f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.5f, 0.5f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.6f, 0.6f, 0.6f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.7f, 0.7f, 0.7f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.8f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.9f, 0.9f, 0.9f), float3(0,0,0), 0, 0, },
{ Material::Metal, float3(0.1f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.2f, 0.2f, 0.2f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.3f, 0.3f, 0.3f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.4f, 0.4f, 0.4f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.5f, 0.5f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.6f, 0.6f, 0.6f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.7f, 0.7f, 0.7f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.8f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.9f, 0.9f, 0.9f), float3(0,0,0), 0, 0, },
{ Material::Metal, float3(0.8f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.8f, 0.5f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.8f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.4f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.8f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.1f, 0.1f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.5f, 0.1f, 0.8f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.8f, 0.1f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.8f, 0.5f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.8f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.4f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.8f, 0.1f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.8f, 0.5f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.8f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Lambert, float3(0.1f, 0.1f, 0.8f), float3(0,0,0), 0, 0, }, { Material::Metal, float3(0.5f, 0.1f, 0.8f), float3(0,0,0), 0, 0, },
{ Material::Lambert, float3(0.1f, 0.2f, 0.5f), float3(3,10,20), 0, 0 },
#endif
};
static int s_EmissiveSpheres[kSphereCount];
static int s_EmissiveSphereCount;
static Camera s_Cam;
const float kMinT = 0.001f;
const float kMaxT = 1.0e7f;
const int kMaxDepth = 10;
bool HitWorld(const Ray& r, float tMin, float tMax, Hit& outHit, int& outID)
{
outID = HitSpheres(r, s_SpheresSoA, tMin, tMax, outHit);
return outID != -1;
}
static bool Scatter(const Material& mat, const Ray& r_in, const Hit& rec, float3& attenuation, Ray& scattered, float3& outLightE, int& inoutRayCount, uint32_t& state)
{
ZoneScoped;
outLightE = float3(0,0,0);
if (mat.type == Material::Lambert)
{
// random point on unit sphere that is tangent to the hit point
float3 target = rec.pos + rec.normal + RandomUnitVector(state);
scattered = Ray(rec.pos, normalize(target - rec.pos));
attenuation = mat.albedo;
// sample lights
#if DO_LIGHT_SAMPLING
for (int j = 0; j < s_EmissiveSphereCount; ++j)
{
int i = s_EmissiveSpheres[j];
const Material& smat = s_SphereMats[i];
if (&mat == &smat)
continue; // skip self
const Sphere& s = s_Spheres[i];
// create a random direction towards sphere
// coord system for sampling: sw, su, sv
float3 sw = normalize(s.center - rec.pos);
float3 su = normalize(cross(fabs(sw.getX())>0.01f ? float3(0,1,0):float3(1,0,0), sw));
float3 sv = cross(sw, su);
// sample sphere by solid angle
float cosAMax = sqrtf(1.0f - s.radius*s.radius / sqLength(rec.pos-s.center));
float eps1 = RandomFloat01(state), eps2 = RandomFloat01(state);
float cosA = 1.0f - eps1 + eps1 * cosAMax;
float sinA = sqrtf(1.0f - cosA*cosA);
float phi = 2 * kPI * eps2;
float3 l = su * (cosf(phi) * sinA) + sv * (sinf(phi) * sinA) + sw * cosA;
//l = normalize(l); // NOTE(fg): This is already normalized, by construction.
// shoot shadow ray
Hit lightHit;
int hitID;
++inoutRayCount;
if (HitWorld(Ray(rec.pos, l), kMinT, kMaxT, lightHit, hitID) && hitID == i)
{
float omega = 2 * kPI * (1-cosAMax);
float3 rdir = r_in.dir;
AssertUnit(rdir);
float3 nl = dot(rec.normal, rdir) < 0 ? rec.normal : -rec.normal;
outLightE += (mat.albedo * smat.emissive) * (std::max(0.0f, dot(l, nl)) * omega / kPI);
}
}
#endif
return true;
}
else if (mat.type == Material::Metal)
{
AssertUnit(r_in.dir); AssertUnit(rec.normal);
float3 refl = reflect(r_in.dir, rec.normal);
// reflected ray, and random inside of sphere based on roughness
float roughness = mat.roughness;
#if DO_MITSUBA_COMPARE
roughness = 0; // until we get better BRDF for metals
#endif
scattered = Ray(rec.pos, normalize(refl + roughness*RandomInUnitSphere(state)));
attenuation = mat.albedo;
return dot(scattered.dir, rec.normal) > 0;
}
else if (mat.type == Material::Dielectric)
{
AssertUnit(r_in.dir); AssertUnit(rec.normal);
float3 outwardN;
float3 rdir = r_in.dir;
float3 refl = reflect(rdir, rec.normal);
float nint;
attenuation = float3(1,1,1);
float3 refr;
float reflProb;
float cosine;
if (dot(rdir, rec.normal) > 0)
{
outwardN = -rec.normal;
nint = mat.ri;
cosine = mat.ri * dot(rdir, rec.normal);
}
else
{
outwardN = rec.normal;
nint = 1.0f / mat.ri;
cosine = -dot(rdir, rec.normal);
}
if (refract(rdir, outwardN, nint, refr))
{
reflProb = schlick(cosine, mat.ri);
}
else
{
reflProb = 1;
}
if (RandomFloat01(state) < reflProb)
scattered = Ray(rec.pos, normalize(refl));
else
scattered = Ray(rec.pos, normalize(refr));
}
else
{
attenuation = float3(1,0,1);
return false;
}
return true;
}
static float3 Trace(const Ray& r, int depth, int& inoutRayCount, uint32_t& state, bool doMaterialE = true)
{
ZoneScoped;
Hit rec;
int id = 0;
++inoutRayCount;
if (HitWorld(r, kMinT, kMaxT, rec, id))
{
Ray scattered;
float3 attenuation;
float3 lightE;
const Material& mat = s_SphereMats[id];
float3 matE = mat.emissive;
if (depth < kMaxDepth && Scatter(mat, r, rec, attenuation, scattered, lightE, inoutRayCount, state))
{
#if DO_LIGHT_SAMPLING
if (!doMaterialE) matE = float3(0,0,0); // don't add material emission if told so
// dor Lambert materials, we just did explicit light (emissive) sampling and already
// for their contribution, so if next ray bounce hits the light again, don't add
// emission
doMaterialE = (mat.type != Material::Lambert);
#endif
return matE + lightE + attenuation * Trace(scattered, depth+1, inoutRayCount, state, doMaterialE);
}
else
{
return matE;
}
}
else
{
// sky
#if DO_MITSUBA_COMPARE
return float3(0.15f,0.21f,0.3f); // easier compare with Mitsuba's constant environment light
#else
float3 unitDir = r.dir;
float t = 0.5f*(unitDir.getY() + 1.0f);
return ((1.0f-t)*float3(1.0f, 1.0f, 1.0f) + t*float3(0.5f, 0.7f, 1.0f)) * 0.3f;
#endif
}
}
#if CPU_CAN_DO_THREADS
static enkiTaskScheduler* g_TS;
#endif
void InitializeTest()
{
ZoneScoped;
#if CPU_CAN_DO_THREADS
g_TS = enkiNewTaskScheduler();
enkiInitTaskSchedulerNumThreads(g_TS, std::max<int>( 2, std::thread::hardware_concurrency() - 2));
#endif
}
void ShutdownTest()
{
ZoneScoped;
#if CPU_CAN_DO_THREADS
enkiDeleteTaskScheduler(g_TS);
#endif
}
struct JobData
{
float time;
int frameCount;
int screenWidth, screenHeight;
float* backbuffer;
Camera* cam;
std::atomic<int> rayCount;
unsigned testFlags;
};
static void TraceRowJob(uint32_t start, uint32_t end, uint32_t threadnum, void* data_)
{
ZoneScoped;
JobData& data = *(JobData*)data_;
float* backbuffer = data.backbuffer + start * data.screenWidth * 4;
float invWidth = 1.0f / data.screenWidth;
float invHeight = 1.0f / data.screenHeight;
float lerpFac = float(data.frameCount) / float(data.frameCount+1);
if (data.testFlags & kFlagAnimate)
lerpFac *= DO_ANIMATE_SMOOTHING;
if (!(data.testFlags & kFlagProgressive))
lerpFac = 0;
int rayCount = 0;
for (uint32_t y = start; y < end; ++y)
{
uint32_t state = (y * 9781 + data.frameCount * 6271) | 1;
for (int x = 0; x < data.screenWidth; ++x)
{
float3 col(0, 0, 0);
for (int s = 0; s < DO_SAMPLES_PER_PIXEL; s++)
{
float u = float(x + RandomFloat01(state)) * invWidth;
float v = float(y + RandomFloat01(state)) * invHeight;
Ray r = data.cam->GetRay(u, v, state);
col += Trace(r, 0, rayCount, state);
}
col *= 1.0f / float(DO_SAMPLES_PER_PIXEL);
float3 prev(backbuffer[0], backbuffer[1], backbuffer[2]);
col = prev * lerpFac + col * (1-lerpFac);
col.store(backbuffer);
backbuffer += 4;
}
}
data.rayCount += rayCount;
}
void UpdateTest(float time, int frameCount, int screenWidth, int screenHeight, unsigned testFlags)
{
ZoneScoped;
if (testFlags & kFlagAnimate)
{
s_Spheres[1].center.setY(cosf(time) + 1.0f);
s_Spheres[8].center.setZ(sinf(time)*0.3f);
}
float3 lookfrom(0, 2, 3);
float3 lookat(0, 0, 0);
float distToFocus = 3;
#if DO_MITSUBA_COMPARE
float aperture = 0.0f;
#else
float aperture = 0.1f;
#endif
#if DO_BIG_SCENE
aperture *= 0.2f;
#endif
s_EmissiveSphereCount = 0;
for (int i = 0; i < kSphereCount; ++i)
{
Sphere& s = s_Spheres[i];
s.UpdateDerivedData();
s_SpheresSoA.centerX[i] = s.center.getX();
s_SpheresSoA.centerY[i] = s.center.getY();
s_SpheresSoA.centerZ[i] = s.center.getZ();
s_SpheresSoA.sqRadius[i] = s.radius * s.radius;
s_SpheresSoA.invRadius[i] = s.invRadius;
// Remember IDs of emissive spheres (light sources)
const Material& smat = s_SphereMats[i];
if (smat.emissive.getX() > 0 || smat.emissive.getY() > 0 || smat.emissive.getZ() > 0)
{
s_EmissiveSpheres[s_EmissiveSphereCount] = i;
s_EmissiveSphereCount++;
}
}
s_Cam = Camera(lookfrom, lookat, float3(0, 1, 0), 60, float(screenWidth) / float(screenHeight), aperture, distToFocus);
}
void DrawTest(float time, int frameCount, int screenWidth, int screenHeight, float* backbuffer, int& outRayCount, unsigned testFlags)
{
ZoneScoped;
JobData args;
args.time = time;
args.frameCount = frameCount;
args.screenWidth = screenWidth;
args.screenHeight = screenHeight;
args.backbuffer = backbuffer;
args.cam = &s_Cam;
args.testFlags = testFlags;
args.rayCount = 0;
#if CPU_CAN_DO_THREADS
enkiTaskSet* task = enkiCreateTaskSet(g_TS, TraceRowJob);
bool threaded = true;
enkiAddTaskSetToPipeMinRange(g_TS, task, &args, screenHeight, threaded ? 4 : screenHeight);
enkiWaitForTaskSet(g_TS, task);
enkiDeleteTaskSet(task);
#else
TraceRowJob(0, screenHeight, 0, &args);
#endif
outRayCount = args.rayCount;
}
void GetObjectCount(int& outCount, int& outObjectSize, int& outMaterialSize, int& outCamSize)
{
ZoneScoped;
outCount = kSphereCount;
outObjectSize = sizeof(Sphere);
outMaterialSize = sizeof(Material);
outCamSize = sizeof(Camera);
}
void GetSceneDesc(void* outObjects, void* outMaterials, void* outCam, void* outEmissives, int* outEmissiveCount)
{
ZoneScoped;
memcpy(outObjects, s_Spheres, kSphereCount * sizeof(s_Spheres[0]));
memcpy(outMaterials, s_SphereMats, kSphereCount * sizeof(s_SphereMats[0]));
memcpy(outCam, &s_Cam, sizeof(s_Cam));
memcpy(outEmissives, s_EmissiveSpheres, s_EmissiveSphereCount * sizeof(s_EmissiveSpheres[0]));
*outEmissiveCount = s_EmissiveSphereCount;
}

View File

@@ -0,0 +1,17 @@
#pragma once
#include <stdint.h>
enum TestFlags
{
kFlagAnimate = (1 << 0),
kFlagProgressive = (1 << 1),
};
void InitializeTest();
void ShutdownTest();
void UpdateTest(float time, int frameCount, int screenWidth, int screenHeight, unsigned testFlags);
void DrawTest(float time, int frameCount, int screenWidth, int screenHeight, float* backbuffer, int& outRayCount, unsigned testFlags);
void GetObjectCount(int& outCount, int& outObjectSize, int& outMaterialSize, int& outCamSize);
void GetSceneDesc(void* outObjects, void* outMaterials, void* outCam, void* outEmissives, int* outEmissiveCount);

View File

@@ -0,0 +1,79 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#undef GetObject
#include <intrin.h>
extern "C" void _ReadWriteBarrier();
#pragma intrinsic(_ReadWriteBarrier)
#pragma intrinsic(_InterlockedCompareExchange)
#pragma intrinsic(_InterlockedExchangeAdd)
// Memory Barriers to prevent CPU and Compiler re-ordering
#define BASE_MEMORYBARRIER_ACQUIRE() _ReadWriteBarrier()
#define BASE_MEMORYBARRIER_RELEASE() _ReadWriteBarrier()
#define BASE_ALIGN(x) __declspec( align( x ) )
#else
#define BASE_MEMORYBARRIER_ACQUIRE() __asm__ __volatile__("": : :"memory")
#define BASE_MEMORYBARRIER_RELEASE() __asm__ __volatile__("": : :"memory")
#define BASE_ALIGN(x) __attribute__ ((aligned( x )))
#endif
namespace enki
{
// Atomically performs: if( *pDest == compareWith ) { *pDest = swapTo; }
// returns old *pDest (so if successfull, returns compareWith)
inline uint32_t AtomicCompareAndSwap( volatile uint32_t* pDest, uint32_t swapTo, uint32_t compareWith )
{
#ifdef _WIN32
// assumes two's complement - unsigned / signed conversion leads to same bit pattern
return _InterlockedCompareExchange( (volatile long*)pDest,swapTo, compareWith );
#else
return __sync_val_compare_and_swap( pDest, compareWith, swapTo );
#endif
}
inline uint64_t AtomicCompareAndSwap( volatile uint64_t* pDest, uint64_t swapTo, uint64_t compareWith )
{
#ifdef _WIN32
// assumes two's complement - unsigned / signed conversion leads to same bit pattern
return _InterlockedCompareExchange64( (__int64 volatile*)pDest, swapTo, compareWith );
#else
return __sync_val_compare_and_swap( pDest, compareWith, swapTo );
#endif
}
// Atomically performs: tmp = *pDest; *pDest += value; return tmp;
inline int32_t AtomicAdd( volatile int32_t* pDest, int32_t value )
{
#ifdef _WIN32
return _InterlockedExchangeAdd( (long*)pDest, value );
#else
return __sync_fetch_and_add( pDest, value );
#endif
}
}

View File

@@ -0,0 +1,240 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#include <assert.h>
#include "Atomics.h"
#include <string.h>
namespace enki
{
// LockLessMultiReadPipe - Single writer, multiple reader thread safe pipe using (semi) lockless programming
// Readers can only read from the back of the pipe
// The single writer can write to the front of the pipe, and read from both ends (a writer can be a reader)
// for many of the principles used here, see http://msdn.microsoft.com/en-us/library/windows/desktop/ee418650(v=vs.85).aspx
// Note: using log2 sizes so we do not need to clamp (multi-operation)
// T is the contained type
// Note this is not true lockless as the use of flags as a form of lock state.
template<uint8_t cSizeLog2, typename T> class LockLessMultiReadPipe
{
public:
LockLessMultiReadPipe();
~LockLessMultiReadPipe() {}
// ReaderTryReadBack returns false if we were unable to read
// This is thread safe for both multiple readers and the writer
bool ReaderTryReadBack( T* pOut );
// WriterTryReadFront returns false if we were unable to read
// This is thread safe for the single writer, but should not be called by readers
bool WriterTryReadFront( T* pOut );
// WriterTryWriteFront returns false if we were unable to write
// This is thread safe for the single writer, but should not be called by readers
bool WriterTryWriteFront( const T& in );
// IsPipeEmpty() is a utility function, not intended for general use
// Should only be used very prudently.
bool IsPipeEmpty() const
{
return 0 == m_WriteIndex - m_ReadCount;
}
void Clear()
{
m_WriteIndex = 0;
m_ReadIndex = 0;
m_ReadCount = 0;
memset( (void*)m_Flags, 0, sizeof( m_Flags ) );
}
private:
const static uint32_t ms_cSize = ( 1 << cSizeLog2 );
const static uint32_t ms_cIndexMask = ms_cSize - 1;
const static uint32_t FLAG_INVALID = 0xFFFFFFFF; // 32bit for CAS
const static uint32_t FLAG_CAN_WRITE = 0x00000000; // 32bit for CAS
const static uint32_t FLAG_CAN_READ = 0x11111111; // 32bit for CAS
T m_Buffer[ ms_cSize ];
// read and write indexes allow fast access to the pipe, but actual access
// controlled by the access flags.
volatile uint32_t BASE_ALIGN(4) m_WriteIndex;
volatile uint32_t BASE_ALIGN(4) m_ReadCount;
volatile uint32_t m_Flags[ ms_cSize ];
volatile uint32_t BASE_ALIGN(4) m_ReadIndex;
};
template<uint8_t cSizeLog2, typename T> inline
LockLessMultiReadPipe<cSizeLog2,T>::LockLessMultiReadPipe()
: m_WriteIndex(0)
, m_ReadIndex(0)
, m_ReadCount(0)
{
assert( cSizeLog2 < 32 );
memset( (void*)m_Flags, 0, sizeof( m_Flags ) );
}
template<uint8_t cSizeLog2, typename T> inline
bool LockLessMultiReadPipe<cSizeLog2,T>::ReaderTryReadBack( T* pOut )
{
uint32_t actualReadIndex;
uint32_t readCount = m_ReadCount;
// We get hold of read index for consistency,
// and do first pass starting at read count
uint32_t readIndexToUse = readCount;
while(true)
{
uint32_t writeIndex = m_WriteIndex;
// power of two sizes ensures we can use a simple calc without modulus
uint32_t numInPipe = writeIndex - readCount;
if( 0 == numInPipe )
{
return false;
}
if( readIndexToUse >= writeIndex )
{
// move back to start
readIndexToUse = m_ReadIndex;
}
// power of two sizes ensures we can perform AND for a modulus
actualReadIndex = readIndexToUse & ms_cIndexMask;
// Multiple potential readers mean we should check if the data is valid,
// using an atomic compare exchange
uint32_t previous = AtomicCompareAndSwap( &m_Flags[ actualReadIndex ], FLAG_INVALID, FLAG_CAN_READ );
if( FLAG_CAN_READ == previous )
{
break;
}
++readIndexToUse;
//update known readcount
readCount = m_ReadCount;
}
// we update the read index using an atomic add, as we've only read one piece of data.
// this ensure consistency of the read index, and the above loop ensures readers
// only read from unread data
AtomicAdd( (volatile int32_t*)&m_ReadCount, 1 );
BASE_MEMORYBARRIER_ACQUIRE();
// now read data, ensuring we do so after above reads & CAS
*pOut = m_Buffer[ actualReadIndex ];
m_Flags[ actualReadIndex ] = FLAG_CAN_WRITE;
return true;
}
template<uint8_t cSizeLog2, typename T> inline
bool LockLessMultiReadPipe<cSizeLog2,T>::WriterTryReadFront( T* pOut )
{
uint32_t writeIndex = m_WriteIndex;
uint32_t frontReadIndex = writeIndex;
// Multiple potential readers mean we should check if the data is valid,
// using an atomic compare exchange - which acts as a form of lock (so not quite lockless really).
uint32_t previous = FLAG_INVALID;
uint32_t actualReadIndex = 0;
while( true )
{
// power of two sizes ensures we can use a simple calc without modulus
uint32_t readCount = m_ReadCount;
uint32_t numInPipe = writeIndex - readCount;
if( 0 == numInPipe || 0 == frontReadIndex )
{
// frontReadIndex can get to 0 here if that item was just being read by another thread.
m_ReadIndex = readCount;
return false;
}
--frontReadIndex;
actualReadIndex = frontReadIndex & ms_cIndexMask;
previous = AtomicCompareAndSwap( &m_Flags[ actualReadIndex ], FLAG_INVALID, FLAG_CAN_READ );
if( FLAG_CAN_READ == previous )
{
break;
}
else if( m_ReadIndex >= frontReadIndex )
{
return false;
}
}
// now read data, ensuring we do so after above reads & CAS
*pOut = m_Buffer[ actualReadIndex ];
m_Flags[ actualReadIndex ] = FLAG_CAN_WRITE;
BASE_MEMORYBARRIER_RELEASE();
// 32-bit aligned stores are atomic, and writer owns the write index
// we only move one back as this is as many as we have read, not where we have read from.
--m_WriteIndex;
return true;
}
template<uint8_t cSizeLog2, typename T> inline
bool LockLessMultiReadPipe<cSizeLog2,T>::WriterTryWriteFront( const T& in )
{
// The writer 'owns' the write index, and readers can only reduce
// the amount of data in the pipe.
// We get hold of both values for consistency and to reduce false sharing
// impacting more than one access
uint32_t writeIndex = m_WriteIndex;
// power of two sizes ensures we can perform AND for a modulus
uint32_t actualWriteIndex = writeIndex & ms_cIndexMask;
// a reader may still be reading this item, as there are multiple readers
if( m_Flags[ actualWriteIndex ] != FLAG_CAN_WRITE )
{
return false; // still being read, so have caught up with tail.
}
// as we are the only writer we can update the data without atomics
// whilst the write index has not been updated
m_Buffer[ actualWriteIndex ] = in;
m_Flags[ actualWriteIndex ] = FLAG_CAN_READ;
// We need to ensure the above writes occur prior to updating the write index,
// otherwise another thread might read before it's finished
BASE_MEMORYBARRIER_RELEASE();
// 32-bit aligned stores are atomic, and the writer controls the write index
++writeIndex;
m_WriteIndex = writeIndex;
return true;
}
}

View File

@@ -0,0 +1,437 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#include <assert.h>
#include "TaskScheduler.h"
#include "LockLessMultiReadPipe.h"
using namespace enki;
static const uint32_t PIPESIZE_LOG2 = 8;
static const uint32_t SPIN_COUNT = 100;
static const uint32_t SPIN_BACKOFF_MULTIPLIER = 10;
static const uint32_t MAX_NUM_INITIAL_PARTITIONS = 8;
// each software thread gets it's own copy of gtl_threadNum, so this is safe to use as a static variable
static THREAD_LOCAL uint32_t gtl_threadNum = 0;
namespace enki
{
struct SubTaskSet
{
ITaskSet* pTask;
TaskSetPartition partition;
};
// we derive class TaskPipe rather than typedef to get forward declaration working easily
class TaskPipe : public LockLessMultiReadPipe<PIPESIZE_LOG2,enki::SubTaskSet> {};
struct ThreadArgs
{
uint32_t threadNum;
TaskScheduler* pTaskScheduler;
};
}
namespace
{
SubTaskSet SplitTask( SubTaskSet& subTask_, uint32_t rangeToSplit_ )
{
SubTaskSet splitTask = subTask_;
uint32_t rangeLeft = subTask_.partition.end - subTask_.partition.start;
if( rangeToSplit_ > rangeLeft )
{
rangeToSplit_ = rangeLeft;
}
splitTask.partition.end = subTask_.partition.start + rangeToSplit_;
subTask_.partition.start = splitTask.partition.end;
return splitTask;
}
#if defined _WIN32
#if defined _M_IX86 || defined _M_X64
#pragma intrinsic(_mm_pause)
inline void Pause() { _mm_pause(); }
#endif
#elif defined __i386__ || defined __x86_64__
inline void Pause() { __asm__ __volatile__("pause;"); }
#else
inline void Pause() { ;} // may have NOP or yield equiv
#endif
}
static void SafeCallback(ProfilerCallbackFunc func_, uint32_t threadnum_)
{
if( func_ )
{
func_(threadnum_);
}
}
ProfilerCallbacks* TaskScheduler::GetProfilerCallbacks()
{
return &m_ProfilerCallbacks;
}
THREADFUNC_DECL TaskScheduler::TaskingThreadFunction( void* pArgs )
{
ThreadArgs args = *(ThreadArgs*)pArgs;
uint32_t threadNum = args.threadNum;
TaskScheduler* pTS = args.pTaskScheduler;
gtl_threadNum = threadNum;
SafeCallback( pTS->m_ProfilerCallbacks.threadStart, threadNum );
uint32_t spinCount = 0;
uint32_t hintPipeToCheck_io = threadNum + 1; // does not need to be clamped.
while( pTS->m_bRunning )
{
if(!pTS->TryRunTask( threadNum, hintPipeToCheck_io ) )
{
// no tasks, will spin then wait
++spinCount;
if( spinCount > SPIN_COUNT )
{
pTS->WaitForTasks( threadNum );
spinCount = 0;
}
else
{
uint32_t spinBackoffCount = spinCount * SPIN_BACKOFF_MULTIPLIER;
while( spinBackoffCount )
{
Pause();
--spinBackoffCount;
}
}
}
else
{
spinCount = 0;
}
}
AtomicAdd( &pTS->m_NumThreadsRunning, -1 );
SafeCallback( pTS->m_ProfilerCallbacks.threadStop, threadNum );
return 0;
}
void TaskScheduler::StartThreads()
{
if( m_bHaveThreads )
{
return;
}
m_bRunning = true;
SemaphoreCreate( m_NewTaskSemaphore );
// we create one less thread than m_NumThreads as the main thread counts as one
m_pThreadNumStore = new ThreadArgs[m_NumThreads];
m_pThreadIDs = new threadid_t[m_NumThreads];
m_pThreadNumStore[0].threadNum = 0;
m_pThreadNumStore[0].pTaskScheduler = this;
m_pThreadIDs[0] = 0;
m_NumThreadsWaiting = 0;
m_NumThreadsRunning = 1;// acount for main thread
for( uint32_t thread = 1; thread < m_NumThreads; ++thread )
{
m_pThreadNumStore[thread].threadNum = thread;
m_pThreadNumStore[thread].pTaskScheduler = this;
ThreadCreate( &m_pThreadIDs[thread], TaskingThreadFunction, &m_pThreadNumStore[thread] );
++m_NumThreadsRunning;
}
// ensure we have sufficient tasks to equally fill either all threads including main
// or just the threads we've launched, this is outside the firstinit as we want to be able
// to runtime change it
if( 1 == m_NumThreads )
{
m_NumPartitions = 1;
m_NumInitialPartitions = 1;
}
else
{
m_NumPartitions = m_NumThreads * (m_NumThreads - 1);
m_NumInitialPartitions = m_NumThreads - 1;
if( m_NumInitialPartitions > MAX_NUM_INITIAL_PARTITIONS )
{
m_NumInitialPartitions = MAX_NUM_INITIAL_PARTITIONS;
}
}
m_bHaveThreads = true;
}
void TaskScheduler::StopThreads( bool bWait_ )
{
if( m_bHaveThreads )
{
// wait for them threads quit before deleting data
m_bRunning = false;
while( bWait_ && m_NumThreadsRunning > 1 )
{
// keep firing event to ensure all threads pick up state of m_bRunning
SemaphoreSignal( m_NewTaskSemaphore, m_NumThreadsRunning );
}
for( uint32_t thread = 1; thread < m_NumThreads; ++thread )
{
ThreadTerminate( m_pThreadIDs[thread] );
}
m_NumThreads = 0;
delete[] m_pThreadNumStore;
delete[] m_pThreadIDs;
m_pThreadNumStore = 0;
m_pThreadIDs = 0;
SemaphoreClose( m_NewTaskSemaphore );
m_bHaveThreads = false;
m_NumThreadsWaiting = 0;
m_NumThreadsRunning = 0;
}
}
bool TaskScheduler::TryRunTask( uint32_t threadNum, uint32_t& hintPipeToCheck_io_ )
{
// check for tasks
SubTaskSet subTask;
bool bHaveTask = m_pPipesPerThread[ threadNum ].WriterTryReadFront( &subTask );
uint32_t threadToCheck = hintPipeToCheck_io_;
uint32_t checkCount = 0;
while( !bHaveTask && checkCount < m_NumThreads )
{
threadToCheck = ( hintPipeToCheck_io_ + checkCount ) % m_NumThreads;
if( threadToCheck != threadNum )
{
bHaveTask = m_pPipesPerThread[ threadToCheck ].ReaderTryReadBack( &subTask );
}
++checkCount;
}
if( bHaveTask )
{
// update hint, will preserve value unless actually got task from another thread.
hintPipeToCheck_io_ = threadToCheck;
uint32_t partitionSize = subTask.partition.end - subTask.partition.start;
if( subTask.pTask->m_RangeToRun < partitionSize )
{
SubTaskSet taskToRun = SplitTask( subTask, subTask.pTask->m_RangeToRun );
SplitAndAddTask( gtl_threadNum, subTask, subTask.pTask->m_RangeToRun, 0 );
taskToRun.pTask->ExecuteRange( taskToRun.partition, threadNum );
AtomicAdd( &taskToRun.pTask->m_RunningCount, -1 );
}
else
{
// the task has already been divided up by AddTaskSetToPipe, so just run it
subTask.pTask->ExecuteRange( subTask.partition, threadNum );
AtomicAdd( &subTask.pTask->m_RunningCount, -1 );
}
}
return bHaveTask;
}
void TaskScheduler::WaitForTasks( uint32_t threadNum )
{
// We incrememt the number of threads waiting here in order
// to ensure that the check for tasks occurs after the increment
// to prevent a task being added after a check, then the thread waiting.
// This will occasionally result in threads being mistakenly awoken,
// but they will then go back to sleep.
AtomicAdd( &m_NumThreadsWaiting, 1 );
bool bHaveTasks = false;
for( uint32_t thread = 0; thread < m_NumThreads; ++thread )
{
if( !m_pPipesPerThread[ thread ].IsPipeEmpty() )
{
bHaveTasks = true;
break;
}
}
if( !bHaveTasks )
{
SafeCallback( m_ProfilerCallbacks.waitStart, threadNum );
SemaphoreWait( m_NewTaskSemaphore );
SafeCallback( m_ProfilerCallbacks.waitStop, threadNum );
}
int32_t prev = AtomicAdd( &m_NumThreadsWaiting, -1 );
assert( prev != 0 );
}
void TaskScheduler::WakeThreads()
{
SemaphoreSignal( m_NewTaskSemaphore, m_NumThreadsWaiting );
}
void TaskScheduler::SplitAndAddTask( uint32_t threadNum_, SubTaskSet subTask_,
uint32_t rangeToSplit_, int32_t runningCountOffset_ )
{
int32_t numAdded = 0;
while( subTask_.partition.start != subTask_.partition.end )
{
SubTaskSet taskToAdd = SplitTask( subTask_, rangeToSplit_ );
// add the partition to the pipe
++numAdded;
if( !m_pPipesPerThread[ gtl_threadNum ].WriterTryWriteFront( taskToAdd ) )
{
if( numAdded > 1 )
{
WakeThreads();
}
// alter range to run the appropriate fraction
if( taskToAdd.pTask->m_RangeToRun < rangeToSplit_ )
{
taskToAdd.partition.end = taskToAdd.partition.start + taskToAdd.pTask->m_RangeToRun;
subTask_.partition.start = taskToAdd.partition.end;
}
taskToAdd.pTask->ExecuteRange( taskToAdd.partition, threadNum_ );
--numAdded;
}
}
// increment running count by number added
AtomicAdd( &subTask_.pTask->m_RunningCount, numAdded + runningCountOffset_ );
WakeThreads();
}
void TaskScheduler::AddTaskSetToPipe( ITaskSet* pTaskSet )
{
// set running count to -1 to guarantee it won't be found complete until all subtasks added
pTaskSet->m_RunningCount = -1;
// divide task up and add to pipe
pTaskSet->m_RangeToRun = pTaskSet->m_SetSize / m_NumPartitions;
if( pTaskSet->m_RangeToRun < pTaskSet->m_MinRange ) { pTaskSet->m_RangeToRun = pTaskSet->m_MinRange; }
uint32_t rangeToSplit = pTaskSet->m_SetSize / m_NumInitialPartitions;
if( rangeToSplit < pTaskSet->m_MinRange ) { rangeToSplit = pTaskSet->m_MinRange; }
SubTaskSet subTask;
subTask.pTask = pTaskSet;
subTask.partition.start = 0;
subTask.partition.end = pTaskSet->m_SetSize;
SplitAndAddTask( gtl_threadNum, subTask, rangeToSplit, 1 );
}
void TaskScheduler::WaitforTaskSet( const ITaskSet* pTaskSet )
{
uint32_t hintPipeToCheck_io = gtl_threadNum + 1; // does not need to be clamped.
if( pTaskSet )
{
while( pTaskSet->m_RunningCount )
{
TryRunTask( gtl_threadNum, hintPipeToCheck_io );
// should add a spin then wait for task completion event.
}
}
else
{
TryRunTask( gtl_threadNum, hintPipeToCheck_io );
}
}
void TaskScheduler::WaitforAll()
{
bool bHaveTasks = true;
uint32_t hintPipeToCheck_io = gtl_threadNum + 1; // does not need to be clamped.
int32_t threadsRunning = m_NumThreadsRunning - 1;
while( bHaveTasks || m_NumThreadsWaiting < threadsRunning )
{
TryRunTask( gtl_threadNum, hintPipeToCheck_io );
bHaveTasks = false;
for( uint32_t thread = 0; thread < m_NumThreads; ++thread )
{
if( !m_pPipesPerThread[ thread ].IsPipeEmpty() )
{
bHaveTasks = true;
break;
}
}
}
}
void TaskScheduler::WaitforAllAndShutdown()
{
WaitforAll();
StopThreads(true);
delete[] m_pPipesPerThread;
m_pPipesPerThread = 0;
}
uint32_t TaskScheduler::GetNumTaskThreads() const
{
return m_NumThreads;
}
TaskScheduler::TaskScheduler()
: m_pPipesPerThread(NULL)
, m_NumThreads(0)
, m_pThreadNumStore(NULL)
, m_pThreadIDs(NULL)
, m_bRunning(false)
, m_NumThreadsRunning(0)
, m_NumThreadsWaiting(0)
, m_NumPartitions(0)
, m_bHaveThreads(false)
{
memset(&m_ProfilerCallbacks, 0, sizeof(m_ProfilerCallbacks));
}
TaskScheduler::~TaskScheduler()
{
StopThreads( true ); // Stops threads, waiting for them.
delete[] m_pPipesPerThread;
m_pPipesPerThread = 0;
}
void TaskScheduler::Initialize( uint32_t numThreads_ )
{
assert( numThreads_ );
StopThreads( true ); // Stops threads, waiting for them.
delete[] m_pPipesPerThread;
m_NumThreads = numThreads_;
m_pPipesPerThread = new TaskPipe[ m_NumThreads ];
StartThreads();
}
void TaskScheduler::Initialize()
{
Initialize( GetNumHardwareThreads() );
}

View File

@@ -0,0 +1,177 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#include "Threads.h"
namespace enki
{
struct TaskSetPartition
{
uint32_t start;
uint32_t end;
};
class TaskScheduler;
class TaskPipe;
struct ThreadArgs;
struct SubTaskSet;
// Subclass ITaskSet to create tasks.
// TaskSets can be re-used, but check
class ITaskSet
{
public:
ITaskSet()
: m_SetSize(1)
, m_MinRange(1)
, m_RunningCount(0)
, m_RangeToRun(1)
{}
ITaskSet( uint32_t setSize_ )
: m_SetSize( setSize_ )
, m_MinRange(1)
, m_RunningCount(0)
, m_RangeToRun(1)
{}
ITaskSet( uint32_t setSize_, uint32_t minRange_ )
: m_SetSize( setSize_ )
, m_MinRange( minRange_ )
, m_RunningCount(0)
, m_RangeToRun(minRange_)
{}
// Execute range should be overloaded to process tasks. It will be called with a
// range_ where range.start >= 0; range.start < range.end; and range.end < m_SetSize;
// The range values should be mapped so that linearly processing them in order is cache friendly
// i.e. neighbouring values should be close together.
// threadnum should not be used for changing processing of data, it's intended purpose
// is to allow per-thread data buckets for output.
virtual void ExecuteRange( TaskSetPartition range, uint32_t threadnum ) = 0;
// Size of set - usually the number of data items to be processed, see ExecuteRange. Defaults to 1
uint32_t m_SetSize;
// Minimum size of of TaskSetPartition range when splitting a task set into partitions.
// This should be set to a value which results in computation effort of at least 10k
// clock cycles to minimize tast scheduler overhead.
// NOTE: The last partition will be smaller than m_MinRange if m_SetSize is not a multiple
// of m_MinRange.
// Also known as grain size in literature.
uint32_t m_MinRange;
bool GetIsComplete()
{
return 0 == m_RunningCount;
}
private:
friend class TaskScheduler;
volatile int32_t m_RunningCount;
uint32_t m_RangeToRun;
};
// TaskScheduler implements several callbacks intended for profilers
typedef void (*ProfilerCallbackFunc)( uint32_t threadnum_ );
struct ProfilerCallbacks
{
ProfilerCallbackFunc threadStart;
ProfilerCallbackFunc threadStop;
ProfilerCallbackFunc waitStart;
ProfilerCallbackFunc waitStop;
};
class TaskScheduler
{
public:
TaskScheduler();
~TaskScheduler();
// Call either Initialize() or Initialize( numThreads_ ) before adding tasks.
// Initialize() will create GetNumHardwareThreads()-1 threads, which is
// sufficient to fill the system when including the main thread.
// Initialize can be called multiple times - it will wait for completion
// before re-initializing.
void Initialize();
// Initialize( numThreads_ ) - numThreads_ (must be > 0)
// will create numThreads_-1 threads, as thread 0 is
// the thread on which the initialize was called.
void Initialize( uint32_t numThreads_ );
// Adds the TaskSet to pipe and returns if the pipe is not full.
// If the pipe is full, pTaskSet is run.
// should only be called from main thread, or within a task
void AddTaskSetToPipe( ITaskSet* pTaskSet );
// Runs the TaskSets in pipe until true == pTaskSet->GetIsComplete();
// should only be called from thread which created the taskscheduler , or within a task
// if called with 0 it will try to run tasks, and return if none available.
void WaitforTaskSet( const ITaskSet* pTaskSet );
// Waits for all task sets to complete - not guaranteed to work unless we know we
// are in a situation where tasks aren't being continuosly added.
void WaitforAll();
// Waits for all task sets to complete and shutdown threads - not guaranteed to work unless we know we
// are in a situation where tasks aren't being continuosly added.
void WaitforAllAndShutdown();
// Returns the number of threads created for running tasks + 1
// to account for the main thread.
uint32_t GetNumTaskThreads() const;
// Returns the ProfilerCallbacks structure so that it can be modified to
// set the callbacks.
ProfilerCallbacks* GetProfilerCallbacks();
private:
static THREADFUNC_DECL TaskingThreadFunction( void* pArgs );
void WaitForTasks( uint32_t threadNum );
bool TryRunTask( uint32_t threadNum, uint32_t& hintPipeToCheck_io_ );
void StartThreads();
void StopThreads( bool bWait_ );
void SplitAndAddTask( uint32_t threadNum_, SubTaskSet subTask_,
uint32_t rangeToSplit_, int32_t runningCountOffset_ );
void WakeThreads();
TaskPipe* m_pPipesPerThread;
uint32_t m_NumThreads;
ThreadArgs* m_pThreadNumStore;
threadid_t* m_pThreadIDs;
volatile bool m_bRunning;
volatile int32_t m_NumThreadsRunning;
volatile int32_t m_NumThreadsWaiting;
uint32_t m_NumPartitions;
uint32_t m_NumInitialPartitions;
semaphoreid_t m_NewTaskSemaphore;
bool m_bHaveThreads;
ProfilerCallbacks m_ProfilerCallbacks;
TaskScheduler( const TaskScheduler& nocopy );
TaskScheduler& operator=( const TaskScheduler& nocopy );
};
}

View File

@@ -0,0 +1,122 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#include "TaskScheduler_c.h"
#include "TaskScheduler.h"
#include <assert.h>
using namespace enki;
struct enkiTaskScheduler : TaskScheduler
{
};
struct enkiTaskSet : ITaskSet
{
enkiTaskSet( enkiTaskExecuteRange taskFun_ ) : taskFun(taskFun_), pArgs(NULL) {}
virtual void ExecuteRange( TaskSetPartition range, uint32_t threadnum )
{
taskFun( range.start, range.end, threadnum, pArgs );
}
enkiTaskExecuteRange taskFun;
void* pArgs;
};
enkiTaskScheduler* enkiNewTaskScheduler()
{
enkiTaskScheduler* pETS = new enkiTaskScheduler();
return pETS;
}
void enkiInitTaskScheduler( enkiTaskScheduler* pETS_ )
{
pETS_->Initialize();
}
void enkiInitTaskSchedulerNumThreads( enkiTaskScheduler* pETS_, uint32_t numThreads_ )
{
pETS_->Initialize( numThreads_ );
}
void enkiDeleteTaskScheduler( enkiTaskScheduler* pETS_ )
{
delete pETS_;
}
enkiTaskSet* enkiCreateTaskSet( enkiTaskScheduler* pETS_, enkiTaskExecuteRange taskFunc_ )
{
return new enkiTaskSet( taskFunc_ );
}
void enkiDeleteTaskSet( enkiTaskSet* pTaskSet_ )
{
delete pTaskSet_;
}
void enkiAddTaskSetToPipe( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_, void* pArgs_, uint32_t setSize_ )
{
assert( pTaskSet_ );
assert( pTaskSet_->taskFun );
pTaskSet_->m_SetSize = setSize_;
pTaskSet_->pArgs = pArgs_;
pETS_->AddTaskSetToPipe( pTaskSet_ );
}
void enkiAddTaskSetToPipeMinRange(enkiTaskScheduler * pETS_, enkiTaskSet * pTaskSet_, void * pArgs_, uint32_t setSize_, uint32_t minRange_)
{
assert( pTaskSet_ );
assert( pTaskSet_->taskFun );
pTaskSet_->m_SetSize = setSize_;
pTaskSet_->m_MinRange = minRange_;
pTaskSet_->pArgs = pArgs_;
pETS_->AddTaskSetToPipe( pTaskSet_ );
}
int enkiIsTaskSetComplete( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ )
{
assert( pTaskSet_ );
return ( pTaskSet_->GetIsComplete() ) ? 1 : 0;
}
void enkiWaitForTaskSet( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ )
{
pETS_->WaitforTaskSet( pTaskSet_ );
}
void enkiWaitForAll( enkiTaskScheduler* pETS_ )
{
pETS_->WaitforAll();
}
uint32_t enkiGetNumTaskThreads( enkiTaskScheduler* pETS_ )
{
return pETS_->GetNumTaskThreads();
}
enkiProfilerCallbacks* enkiGetProfilerCallbacks( enkiTaskScheduler* pETS_ )
{
assert( sizeof(enkiProfilerCallbacks) == sizeof(enki::ProfilerCallbacks) );
return (enkiProfilerCallbacks*)pETS_->GetProfilerCallbacks();
}

View File

@@ -0,0 +1,104 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
#include <stdint.h>
typedef struct enkiTaskScheduler enkiTaskScheduler;
typedef struct enkiTaskSet enkiTaskSet;
typedef void (* enkiTaskExecuteRange)( uint32_t start_, uint32_t end, uint32_t threadnum_, void* pArgs_ );
// Create a new task scheduler
enkiTaskScheduler* enkiNewTaskScheduler();
// Initialize task scheduler - will create GetNumHardwareThreads()-1 threads, which is
// sufficient to fill the system when including the main thread.
// Initialize can be called multiple times - it will wait for completion
// before re-initializing.
void enkiInitTaskScheduler( enkiTaskScheduler* pETS_ );
// Initialize a task scheduler with numThreads_ (must be > 0)
// will create numThreads_-1 threads, as thread 0 is
// the thread on which the initialize was called.
void enkiInitTaskSchedulerNumThreads( enkiTaskScheduler* pETS_, uint32_t numThreads_ );
// Delete a task scheduler
void enkiDeleteTaskScheduler( enkiTaskScheduler* pETS_ );
// Create a task set.
enkiTaskSet* enkiCreateTaskSet( enkiTaskScheduler* pETS_, enkiTaskExecuteRange taskFunc_ );
// Delete a task set.
void enkiDeleteTaskSet( enkiTaskSet* pTaskSet_ );
// Schedule the task
void enkiAddTaskSetToPipe( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_,
void* pArgs_, uint32_t setSize_ );
// Schedule the task with a minimum range.
// This should be set to a value which results in computation effort of at least 10k
// clock cycles to minimize tast scheduler overhead.
// NOTE: The last partition will be smaller than m_MinRange if m_SetSize is not a multiple
// of m_MinRange.
// Also known as grain size in literature.
void enkiAddTaskSetToPipeMinRange( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_,
void* pArgs_, uint32_t setSize_, uint32_t minRange_ );
// Check if TaskSet is complete. Doesn't wait. Returns 1 if complete, 0 if not.
int enkiIsTaskSetComplete( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ );
// Wait for a given task.
// should only be called from thread which created the taskscheduler , or within a task
// if called with 0 it will try to run tasks, and return if none available.
void enkiWaitForTaskSet( enkiTaskScheduler* pETS_, enkiTaskSet* pTaskSet_ );
// Waits for all task sets to complete - not guaranteed to work unless we know we
// are in a situation where tasks aren't being continuosly added.
void enkiWaitForAll( enkiTaskScheduler* pETS_ );
// get number of threads
uint32_t enkiGetNumTaskThreads( enkiTaskScheduler* pETS_ );
// TaskScheduler implements several callbacks intended for profilers
typedef void (*enkiProfilerCallbackFunc)( uint32_t threadnum_ );
struct enkiProfilerCallbacks
{
enkiProfilerCallbackFunc threadStart;
enkiProfilerCallbackFunc threadStop;
enkiProfilerCallbackFunc waitStart;
enkiProfilerCallbackFunc waitStop;
};
// Get the callback structure so it can be set
struct enkiProfilerCallbacks* enkiGetProfilerCallbacks( enkiTaskScheduler* pETS_ );
#ifdef __cplusplus
}
#endif

View File

@@ -0,0 +1,210 @@
// Copyright (c) 2013 Doug Binks
//
// This software is provided 'as-is', without any express or implied
// warranty. In no event will the authors be held liable for any damages
// arising from the use of this software.
//
// Permission is granted to anyone to use this software for any purpose,
// including commercial applications, and to alter it and redistribute it
// freely, subject to the following restrictions:
//
// 1. The origin of this software must not be misrepresented; you must not
// claim that you wrote the original software. If you use this software
// in a product, an acknowledgement in the product documentation would be
// appreciated but is not required.
// 2. Altered source versions must be plainly marked as such, and must not be
// misrepresented as being the original software.
// 3. This notice may not be removed or altered from any source distribution.
#pragma once
#include <stdint.h>
#include <assert.h>
#ifdef _WIN32
#include "Atomics.h"
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#define THREADFUNC_DECL DWORD WINAPI
#define THREAD_LOCAL __declspec( thread )
namespace enki
{
typedef HANDLE threadid_t;
// declare the thread start function as:
// THREADFUNC_DECL MyThreadStart( void* pArg );
inline bool ThreadCreate( threadid_t* returnid, DWORD ( WINAPI *StartFunc) (void* ), void* pArg )
{
// posix equiv pthread_create
DWORD threadid;
*returnid = CreateThread( 0, 0, StartFunc, pArg, 0, &threadid );
return *returnid != NULL;
}
inline bool ThreadTerminate( threadid_t threadid )
{
// posix equiv pthread_cancel
return CloseHandle( threadid ) == 0;
}
inline uint32_t GetNumHardwareThreads()
{
SYSTEM_INFO sysInfo;
GetSystemInfo(&sysInfo);
return sysInfo.dwNumberOfProcessors;
}
}
#else // posix
#include <pthread.h>
#include <unistd.h>
#define THREADFUNC_DECL void*
#define THREAD_LOCAL __thread
namespace enki
{
typedef pthread_t threadid_t;
// declare the thread start function as:
// THREADFUNC_DECL MyThreadStart( void* pArg );
inline bool ThreadCreate( threadid_t* returnid, void* ( *StartFunc) (void* ), void* pArg )
{
// posix equiv pthread_create
int32_t retval = pthread_create( returnid, NULL, StartFunc, pArg );
return retval == 0;
}
inline bool ThreadTerminate( threadid_t threadid )
{
// posix equiv pthread_cancel
return pthread_cancel( threadid ) == 0;
}
inline uint32_t GetNumHardwareThreads()
{
return (uint32_t)sysconf( _SC_NPROCESSORS_ONLN );
}
}
#endif // posix
// Semaphore implementation
#ifdef _WIN32
namespace enki
{
struct semaphoreid_t
{
HANDLE sem;
};
inline void SemaphoreCreate( semaphoreid_t& semaphoreid )
{
semaphoreid.sem = CreateSemaphore(NULL, 0, MAXLONG, NULL );
}
inline void SemaphoreClose( semaphoreid_t& semaphoreid )
{
CloseHandle( semaphoreid.sem );
}
inline void SemaphoreWait( semaphoreid_t& semaphoreid )
{
DWORD retval = WaitForSingleObject( semaphoreid.sem, INFINITE );
assert( retval != WAIT_FAILED );
}
inline void SemaphoreSignal( semaphoreid_t& semaphoreid, int32_t countWaiting )
{
if( countWaiting )
{
ReleaseSemaphore( semaphoreid.sem, countWaiting, NULL );
}
}
}
#elif defined(__MACH__)
// OS X does not have POSIX semaphores
// see https://developer.apple.com/library/content/documentation/Darwin/Conceptual/KernelProgramming/synchronization/synchronization.html
#include <mach/mach.h>
namespace enki
{
struct semaphoreid_t
{
semaphore_t sem;
};
inline void SemaphoreCreate( semaphoreid_t& semaphoreid )
{
semaphore_create( mach_task_self(), &semaphoreid.sem, SYNC_POLICY_FIFO, 0 );
}
inline void SemaphoreClose( semaphoreid_t& semaphoreid )
{
semaphore_destroy( mach_task_self(), semaphoreid.sem );
}
inline void SemaphoreWait( semaphoreid_t& semaphoreid )
{
semaphore_wait( semaphoreid.sem );
}
inline void SemaphoreSignal( semaphoreid_t& semaphoreid, int32_t countWaiting )
{
while( countWaiting-- > 0 )
{
semaphore_signal( semaphoreid.sem );
}
}
}
#else // POSIX
#include <semaphore.h>
namespace enki
{
struct semaphoreid_t
{
sem_t sem;
};
inline void SemaphoreCreate( semaphoreid_t& semaphoreid )
{
int err = sem_init( &semaphoreid.sem, 0, 0 );
assert( err == 0 );
}
inline void SemaphoreClose( semaphoreid_t& semaphoreid )
{
sem_destroy( &semaphoreid.sem );
}
inline void SemaphoreWait( semaphoreid_t& semaphoreid )
{
int err = sem_wait( &semaphoreid.sem );
assert( err == 0 );
}
inline void SemaphoreSignal( semaphoreid_t& semaphoreid, int32_t countWaiting )
{
while( countWaiting-- > 0 )
{
sem_post( &semaphoreid.sem );
}
}
}
#endif

View File

@@ -0,0 +1,395 @@
#include "../Source/Config.h"
inline uint RNG(inout uint state)
{
uint x = state;
x ^= x << 13;
x ^= x >> 17;
x ^= x << 15;
state = x;
return x;
}
float RandomFloat01(inout uint state)
{
return (RNG(state) & 0xFFFFFF) / 16777216.0f;
}
float3 RandomInUnitDisk(inout uint state)
{
float a = RandomFloat01(state) * 2.0f * 3.1415926f;
float2 xy = float2(cos(a), sin(a));
xy *= sqrt(RandomFloat01(state));
return float3(xy, 0);
}
float3 RandomInUnitSphere(inout uint state)
{
float z = RandomFloat01(state) * 2.0f - 1.0f;
float t = RandomFloat01(state) * 2.0f * 3.1415926f;
float r = sqrt(max(0.0, 1.0f - z * z));
float x = r * cos(t);
float y = r * sin(t);
float3 res = float3(x, y, z);
res *= pow(RandomFloat01(state), 1.0 / 3.0);
return res;
}
float3 RandomUnitVector(inout uint state)
{
float z = RandomFloat01(state) * 2.0f - 1.0f;
float a = RandomFloat01(state) * 2.0f * 3.1415926f;
float r = sqrt(1.0f - z * z);
float x = r * cos(a);
float y = r * sin(a);
return float3(x, y, z);
}
struct Ray
{
float3 orig;
float3 dir;
};
Ray MakeRay(float3 orig_, float3 dir_) { Ray r; r.orig = orig_; r.dir = dir_; return r; }
float3 RayPointAt(Ray r, float t) { return r.orig + r.dir * t; }
inline bool refract(float3 v, float3 n, float nint, out float3 outRefracted)
{
float dt = dot(v, n);
float discr = 1.0f - nint * nint*(1 - dt * dt);
if (discr > 0)
{
outRefracted = nint * (v - n * dt) - n * sqrt(discr);
return true;
}
return false;
}
inline float schlick(float cosine, float ri)
{
float r0 = (1 - ri) / (1 + ri);
r0 = r0 * r0;
// note: saturate to guard against possible tiny negative numbers
return r0 + (1 - r0)*pow(saturate(1 - cosine), 5);
}
struct Hit
{
float3 pos;
float3 normal;
float t;
};
struct Sphere
{
float3 center;
float radius;
float invRadius;
};
#define MatLambert 0
#define MatMetal 1
#define MatDielectric 2
struct Material
{
int type;
float3 albedo;
float3 emissive;
float roughness;
float ri;
};
groupshared Sphere s_GroupSpheres[kCSMaxObjects];
groupshared Material s_GroupMaterials[kCSMaxObjects];
groupshared int s_GroupEmissives[kCSMaxObjects];
struct Camera
{
float3 origin;
float3 lowerLeftCorner;
float3 horizontal;
float3 vertical;
float3 u, v, w;
float lensRadius;
};
Ray CameraGetRay(Camera cam, float s, float t, inout uint state)
{
float3 rd = cam.lensRadius * RandomInUnitDisk(state);
float3 offset = cam.u * rd.x + cam.v * rd.y;
return MakeRay(cam.origin + offset, normalize(cam.lowerLeftCorner + s * cam.horizontal + t * cam.vertical - cam.origin - offset));
}
int HitSpheres(Ray r, int sphereCount, float tMin, float tMax, inout Hit outHit)
{
float hitT = tMax;
int id = -1;
for (int i = 0; i < sphereCount; ++i)
{
Sphere s = s_GroupSpheres[i];
float3 co = s.center - r.orig;
float nb = dot(co, r.dir);
float c = dot(co, co) - s.radius*s.radius;
float discr = nb * nb - c;
if (discr > 0)
{
float discrSq = sqrt(discr);
// Try earlier t
float t = nb - discrSq;
if (t <= tMin) // before min, try later t!
t = nb + discrSq;
if (t > tMin && t < hitT)
{
id = i;
hitT = t;
}
}
}
if (id != -1)
{
outHit.pos = RayPointAt(r, hitT);
outHit.normal = (outHit.pos - s_GroupSpheres[id].center) * s_GroupSpheres[id].invRadius;
outHit.t = hitT;
}
return id;
}
struct Params
{
Camera cam;
int sphereCount;
int screenWidth;
int screenHeight;
int frames;
float invWidth;
float invHeight;
float lerpFac;
int emissiveCount;
};
#define kMinT 0.001f
#define kMaxT 1.0e7f
#define kMaxDepth 10
static int HitWorld(int sphereCount, Ray r, float tMin, float tMax, inout Hit outHit)
{
return HitSpheres(r, sphereCount, tMin, tMax, outHit);
}
static bool Scatter(int sphereCount, int emissiveCount, int matID, Ray r_in, Hit rec, out float3 attenuation, out Ray scattered, out float3 outLightE, inout int inoutRayCount, inout uint state)
{
outLightE = float3(0, 0, 0);
Material mat = s_GroupMaterials[matID];
if (mat.type == MatLambert)
{
// random point on unit sphere that is tangent to the hit point
float3 target = rec.pos + rec.normal + RandomUnitVector(state);
scattered = MakeRay(rec.pos, normalize(target - rec.pos));
attenuation = mat.albedo;
// sample lights
#if DO_LIGHT_SAMPLING
for (int j = 0; j < emissiveCount; ++j)
{
int i = s_GroupEmissives[j];
if (matID == i)
continue; // skip self
Material smat = s_GroupMaterials[i];
Sphere s = s_GroupSpheres[i];
// create a random direction towards sphere
// coord system for sampling: sw, su, sv
float3 sw = normalize(s.center - rec.pos);
float3 su = normalize(cross(abs(sw.x)>0.01f ? float3(0, 1, 0) : float3(1, 0, 0), sw));
float3 sv = cross(sw, su);
// sample sphere by solid angle
float cosAMax = sqrt(1.0f - s.radius*s.radius / dot(rec.pos - s.center, rec.pos - s.center));
float eps1 = RandomFloat01(state), eps2 = RandomFloat01(state);
float cosA = 1.0f - eps1 + eps1 * cosAMax;
float sinA = sqrt(1.0f - cosA * cosA);
float phi = 2 * 3.1415926 * eps2;
float3 l = su * cos(phi) * sinA + sv * sin(phi) * sinA + sw * cosA;
// shoot shadow ray
Hit lightHit;
++inoutRayCount;
int hitID = HitWorld(sphereCount, MakeRay(rec.pos, l), kMinT, kMaxT, lightHit);
if (hitID == i)
{
float omega = 2 * 3.1415926 * (1 - cosAMax);
float3 rdir = r_in.dir;
float3 nl = dot(rec.normal, rdir) < 0 ? rec.normal : -rec.normal;
outLightE += (mat.albedo * smat.emissive) * (max(0.0f, dot(l, nl)) * omega / 3.1415926);
}
}
#endif
return true;
}
else if (mat.type == MatMetal)
{
float3 refl = reflect(r_in.dir, rec.normal);
// reflected ray, and random inside of sphere based on roughness
float roughness = mat.roughness;
#if DO_MITSUBA_COMPARE
roughness = 0; // until we get better BRDF for metals
#endif
scattered = MakeRay(rec.pos, normalize(refl + roughness*RandomInUnitSphere(state)));
attenuation = mat.albedo;
return dot(scattered.dir, rec.normal) > 0;
}
else if (mat.type == MatDielectric)
{
float3 outwardN;
float3 rdir = r_in.dir;
float3 refl = reflect(rdir, rec.normal);
float nint;
attenuation = float3(1, 1, 1);
float3 refr;
float reflProb;
float cosine;
if (dot(rdir, rec.normal) > 0)
{
outwardN = -rec.normal;
nint = mat.ri;
cosine = mat.ri * dot(rdir, rec.normal);
}
else
{
outwardN = rec.normal;
nint = 1.0f / mat.ri;
cosine = -dot(rdir, rec.normal);
}
if (refract(rdir, outwardN, nint, refr))
{
reflProb = schlick(cosine, mat.ri);
}
else
{
reflProb = 1;
}
if (RandomFloat01(state) < reflProb)
scattered = MakeRay(rec.pos, normalize(refl));
else
scattered = MakeRay(rec.pos, normalize(refr));
}
else
{
attenuation = float3(1, 0, 1);
scattered = MakeRay(float3(0,0,0), float3(0, 0, 1));
return false;
}
return true;
}
static float3 Trace(int sphereCount, int emissiveCount, Ray r, inout int inoutRayCount, inout uint state)
{
float3 col = 0;
float3 curAtten = 1;
bool doMaterialE = true;
// GPUs don't support recursion, so do tracing iterations in a loop up to max depth
for (int depth = 0; depth < kMaxDepth; ++depth)
{
Hit rec;
++inoutRayCount;
int id = HitWorld(sphereCount, r, kMinT, kMaxT, rec);
if (id >= 0)
{
Ray scattered;
float3 attenuation;
float3 lightE;
Material mat = s_GroupMaterials[id];
float3 matE = mat.emissive;
if (Scatter(sphereCount, emissiveCount, id, r, rec, attenuation, scattered, lightE, inoutRayCount, state))
{
#if DO_LIGHT_SAMPLING
if (!doMaterialE) matE = 0;
doMaterialE = (mat.type != MatLambert);
#endif
col += curAtten * (matE + lightE);
curAtten *= attenuation;
r = scattered;
}
else
{
col += curAtten * matE;
break;
}
}
else
{
// sky
#if DO_MITSUBA_COMPARE
col += curAtten * float3(0.15f, 0.21f, 0.3f); // easier compare with Mitsuba's constant environment light
#else
float3 unitDir = r.dir;
float t = 0.5f*(unitDir.y + 1.0f);
float3 skyCol = ((1.0f - t)*float3(1.0f, 1.0f, 1.0f) + t * float3(0.5f, 0.7f, 1.0f)) * 0.3f;
col += curAtten * skyCol;
#endif
break;
}
}
return col;
}
Texture2D srcImage : register(t0);
RWTexture2D<float4> dstImage : register(u0);
StructuredBuffer<Sphere> g_Spheres : register(t1);
StructuredBuffer<Material> g_Materials : register(t2);
StructuredBuffer<Params> g_Params : register(t3);
StructuredBuffer<int> g_Emissives : register(t4);
RWByteAddressBuffer g_OutRayCount : register(u1);
[numthreads(kCSGroupSizeX, kCSGroupSizeY, 1)]
void main(uint3 gid : SV_DispatchThreadID, uint3 tid : SV_GroupThreadID)
{
// First, move scene data (spheres, materials, emissive indices) into group shared
// memory. Do this in parallel; each thread in group copies its own chunk of data.
uint threadID = tid.y * kCSGroupSizeX + tid.x;
uint groupSize = kCSGroupSizeX * kCSGroupSizeY;
uint objCount = g_Params[0].sphereCount;
uint myObjCount = (objCount + groupSize - 1) / groupSize;
uint myObjStart = threadID * myObjCount;
for (uint io = myObjStart; io < myObjStart + myObjCount; ++io)
{
if (io < objCount)
{
s_GroupSpheres[io] = g_Spheres[io];
s_GroupMaterials[io] = g_Materials[io];
}
if (io < g_Params[0].emissiveCount)
{
s_GroupEmissives[io] = g_Emissives[io];
}
}
GroupMemoryBarrierWithGroupSync();
int rayCount = 0;
float3 col = 0;
Params params = g_Params[0];
uint rngState = (gid.x * 1973 + gid.y * 9277 + params.frames * 26699) | 1;
for (int s = 0; s < DO_SAMPLES_PER_PIXEL; s++)
{
float u = float(gid.x + RandomFloat01(rngState)) * params.invWidth;
float v = float(gid.y + RandomFloat01(rngState)) * params.invHeight;
Ray r = CameraGetRay(params.cam, u, v, rngState);
col += Trace(params.sphereCount, params.emissiveCount, r, rayCount, rngState);
}
col *= 1.0f / float(DO_SAMPLES_PER_PIXEL);
float3 prev = srcImage.Load(int3(gid.xy,0)).rgb;
col = lerp(col, prev, params.lerpFac);
dstImage[gid.xy] = float4(col, 1);
g_OutRayCount.InterlockedAdd(0, rayCount);
}

View File

@@ -0,0 +1,15 @@
float3 LinearToSRGB(float3 rgb)
{
rgb = max(rgb, float3(0, 0, 0));
return max(1.055 * pow(rgb, 0.416666667) - 0.055, 0.0);
}
Texture2D tex : register(t0);
SamplerState smp : register(s0);
float4 main(float2 uv : TEXCOORD0) : SV_Target
{
float3 col = tex.Sample(smp, uv).rgb;
col = LinearToSRGB(col);
return float4(col, 1.0f);
}

View File

@@ -0,0 +1,31 @@
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.27130.2036
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "TestCpu", "TestCpu.vcxproj", "{4F84B756-87F5-4B92-827B-DA087DAE1900}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Debug|x86 = Debug|x86
Release|x64 = Release|x64
Release|x86 = Release|x86
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x64.ActiveCfg = Debug|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x64.Build.0 = Debug|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x86.ActiveCfg = Debug|Win32
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Debug|x86.Build.0 = Debug|Win32
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x64.ActiveCfg = Release|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x64.Build.0 = Release|x64
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x86.ActiveCfg = Release|Win32
{4F84B756-87F5-4B92-827B-DA087DAE1900}.Release|x86.Build.0 = Release|Win32
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {067FB780-37B8-465E-AD7E-E7B238B9C04F}
EndGlobalSection
EndGlobal

View File

@@ -0,0 +1,243 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{4F84B756-87F5-4B92-827B-DA087DAE1900}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>TestCpu</RootNamespace>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="Shared">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<PreprocessorDefinitions>WIN32;_DEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<PreprocessorDefinitions>TRACY_ENABLE;_DEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<ExceptionHandling>false</ExceptionHandling>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<BufferSecurityCheck>false</BufferSecurityCheck>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>TRACY_ENABLE;NDEBUG;_WINDOWS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<ExceptionHandling>false</ExceptionHandling>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<BufferSecurityCheck>false</BufferSecurityCheck>
<CallingConvention>VectorCall</CallingConvention>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>d3d11.lib;kernel32.lib;user32.lib;gdi32.lib;%(AdditionalDependencies)</AdditionalDependencies>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
<ClCompile Include="..\..\..\TracyClient.cpp" />
<ClCompile Include="..\Source\enkiTS\TaskScheduler.cpp" />
<ClCompile Include="..\Source\enkiTS\TaskScheduler_c.cpp" />
<ClCompile Include="..\Source\Maths.cpp" />
<ClCompile Include="..\Source\Test.cpp" />
<ClCompile Include="TestWin.cpp" />
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\Source\Config.h" />
<ClInclude Include="..\Source\enkiTS\Atomics.h" />
<ClInclude Include="..\Source\enkiTS\LockLessMultiReadPipe.h" />
<ClInclude Include="..\Source\enkiTS\TaskScheduler.h" />
<ClInclude Include="..\Source\enkiTS\TaskScheduler_c.h" />
<ClInclude Include="..\Source\enkiTS\Threads.h" />
<ClInclude Include="..\Source\Maths.h" />
<ClInclude Include="..\Source\MathSimd.h" />
<ClInclude Include="..\Source\Test.h" />
<ClInclude Include="..\Source\stb_image.h" />
</ItemGroup>
<ItemGroup>
<None Include="..\.editorconfig" />
</ItemGroup>
<ItemGroup>
<FxCompile Include="ComputeShader.hlsl">
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">5.0</ShaderModel>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">5.0</ShaderModel>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">CompiledComputeShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">CompiledComputeShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">CompiledComputeShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">g_CSBytecode</VariableName>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">CompiledComputeShader.h</HeaderFileOutput>
</FxCompile>
<FxCompile Include="PixelShader.hlsl">
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Pixel</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Pixel</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Pixel</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Pixel</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">CompiledPixelShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">CompiledPixelShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">CompiledPixelShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">CompiledPixelShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">g_PSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">g_PSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">g_PSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">g_PSBytecode</VariableName>
</FxCompile>
<FxCompile Include="VertexShader.hlsl">
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Vertex</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Vertex</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Vertex</ShaderType>
<ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Vertex</ShaderType>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>
<ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">CompiledVertexShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">CompiledVertexShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">CompiledVertexShader.h</HeaderFileOutput>
<HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">CompiledVertexShader.h</HeaderFileOutput>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">g_VSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">g_VSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">g_VSBytecode</VariableName>
<VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">g_VSBytecode</VariableName>
</FxCompile>
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>

View File

@@ -0,0 +1,67 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<ClCompile Include="TestWin.cpp" />
<ClCompile Include="..\Source\Test.cpp">
<Filter>Source</Filter>
</ClCompile>
<ClCompile Include="..\Source\enkiTS\TaskScheduler.cpp">
<Filter>Source\enkiTS</Filter>
</ClCompile>
<ClCompile Include="..\Source\enkiTS\TaskScheduler_c.cpp">
<Filter>Source\enkiTS</Filter>
</ClCompile>
<ClCompile Include="..\Source\Maths.cpp">
<Filter>Source</Filter>
</ClCompile>
<ClCompile Include="..\..\..\TracyClient.cpp" />
</ItemGroup>
<ItemGroup>
<Filter Include="Source">
<UniqueIdentifier>{5f19f217-c1c7-4eeb-be61-8b986fee9375}</UniqueIdentifier>
</Filter>
<Filter Include="Source\enkiTS">
<UniqueIdentifier>{38c448a8-1dcc-4116-9410-a9f8d068caff}</UniqueIdentifier>
</Filter>
</ItemGroup>
<ItemGroup>
<ClInclude Include="..\Source\Test.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\stb_image.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\Atomics.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\LockLessMultiReadPipe.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\TaskScheduler.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\TaskScheduler_c.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\enkiTS\Threads.h">
<Filter>Source\enkiTS</Filter>
</ClInclude>
<ClInclude Include="..\Source\Maths.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\Config.h">
<Filter>Source</Filter>
</ClInclude>
<ClInclude Include="..\Source\MathSimd.h">
<Filter>Source</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<None Include="..\.editorconfig" />
</ItemGroup>
<ItemGroup>
<FxCompile Include="VertexShader.hlsl" />
<FxCompile Include="PixelShader.hlsl" />
<FxCompile Include="ComputeShader.hlsl" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,557 @@
#include <stdint.h>
#define WIN32_LEAN_AND_MEAN
#define NOMINMAX
#include <windows.h>
#include <d3d11_1.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <algorithm>
#include "../Source/Config.h"
#include "../Source/Maths.h"
#include "../Source/Test.h"
#include "CompiledVertexShader.h"
#include "CompiledPixelShader.h"
#include "../../../Tracy.hpp"
static HINSTANCE g_HInstance;
static HWND g_Wnd;
ATOM MyRegisterClass(HINSTANCE hInstance);
BOOL InitInstance(HINSTANCE, int);
LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM);
INT_PTR CALLBACK About(HWND, UINT, WPARAM, LPARAM);
static HRESULT InitD3DDevice();
static void ShutdownD3DDevice();
static void RenderFrame();
static float* g_Backbuffer;
static D3D_FEATURE_LEVEL g_D3D11FeatureLevel = D3D_FEATURE_LEVEL_11_0;
static ID3D11Device* g_D3D11Device = nullptr;
static ID3D11DeviceContext* g_D3D11Ctx = nullptr;
static IDXGISwapChain* g_D3D11SwapChain = nullptr;
static ID3D11RenderTargetView* g_D3D11RenderTarget = nullptr;
static ID3D11VertexShader* g_VertexShader;
static ID3D11PixelShader* g_PixelShader;
static ID3D11Texture2D *g_BackbufferTexture, *g_BackbufferTexture2;
static ID3D11ShaderResourceView *g_BackbufferSRV, *g_BackbufferSRV2;
static ID3D11UnorderedAccessView *g_BackbufferUAV, *g_BackbufferUAV2;
static ID3D11SamplerState* g_SamplerLinear;
static ID3D11RasterizerState* g_RasterState;
static int g_BackbufferIndex;
#if DO_COMPUTE_GPU
#include "CompiledComputeShader.h"
struct ComputeParams
{
Camera cam;
int sphereCount;
int screenWidth;
int screenHeight;
int frames;
float invWidth;
float invHeight;
float lerpFac;
int emissiveCount;
};
static ID3D11ComputeShader* g_ComputeShader;
static ID3D11Buffer* g_DataSpheres; static ID3D11ShaderResourceView* g_SRVSpheres;
static ID3D11Buffer* g_DataMaterials; static ID3D11ShaderResourceView* g_SRVMaterials;
static ID3D11Buffer* g_DataParams; static ID3D11ShaderResourceView* g_SRVParams;
static ID3D11Buffer* g_DataEmissives; static ID3D11ShaderResourceView* g_SRVEmissives;
static ID3D11Buffer* g_DataCounter; static ID3D11UnorderedAccessView* g_UAVCounter;
static int g_SphereCount, g_ObjSize, g_MatSize;
static ID3D11Query *g_QueryBegin, *g_QueryEnd, *g_QueryDisjoint;
#endif // #if DO_COMPUTE_GPU
int APIENTRY wWinMain(_In_ HINSTANCE hInstance, _In_opt_ HINSTANCE, _In_ LPWSTR, _In_ int nCmdShow)
{
g_Backbuffer = new float[kBackbufferWidth * kBackbufferHeight * 4];
memset(g_Backbuffer, 0, kBackbufferWidth * kBackbufferHeight * 4 * sizeof(g_Backbuffer[0]));
InitializeTest();
MyRegisterClass(hInstance);
if (!InitInstance (hInstance, nCmdShow))
{
return FALSE;
}
if (FAILED(InitD3DDevice()))
{
ShutdownD3DDevice();
return 0;
}
g_D3D11Device->CreateVertexShader(g_VSBytecode, ARRAYSIZE(g_VSBytecode), NULL, &g_VertexShader);
g_D3D11Device->CreatePixelShader(g_PSBytecode, ARRAYSIZE(g_PSBytecode), NULL, &g_PixelShader);
#if DO_COMPUTE_GPU
g_D3D11Device->CreateComputeShader(g_CSBytecode, ARRAYSIZE(g_CSBytecode), NULL, &g_ComputeShader);
#endif
D3D11_TEXTURE2D_DESC texDesc = {};
texDesc.Width = kBackbufferWidth;
texDesc.Height = kBackbufferHeight;
texDesc.MipLevels = 1;
texDesc.ArraySize = 1;
texDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
#if DO_COMPUTE_GPU
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS;
texDesc.CPUAccessFlags = 0;
#else
texDesc.Usage = D3D11_USAGE_DYNAMIC;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
#endif
texDesc.MiscFlags = 0;
g_D3D11Device->CreateTexture2D(&texDesc, NULL, &g_BackbufferTexture);
g_D3D11Device->CreateTexture2D(&texDesc, NULL, &g_BackbufferTexture2);
D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc = {};
srvDesc.Format = texDesc.Format;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
srvDesc.Texture2D.MipLevels = 1;
srvDesc.Texture2D.MostDetailedMip = 0;
g_D3D11Device->CreateShaderResourceView(g_BackbufferTexture, &srvDesc, &g_BackbufferSRV);
g_D3D11Device->CreateShaderResourceView(g_BackbufferTexture2, &srvDesc, &g_BackbufferSRV2);
D3D11_SAMPLER_DESC smpDesc = {};
smpDesc.Filter = D3D11_FILTER_MIN_MAG_LINEAR_MIP_POINT;
smpDesc.AddressU = smpDesc.AddressV = smpDesc.AddressW = D3D11_TEXTURE_ADDRESS_CLAMP;
g_D3D11Device->CreateSamplerState(&smpDesc, &g_SamplerLinear);
D3D11_RASTERIZER_DESC rasterDesc = {};
rasterDesc.FillMode = D3D11_FILL_SOLID;
rasterDesc.CullMode = D3D11_CULL_NONE;
g_D3D11Device->CreateRasterizerState(&rasterDesc, &g_RasterState);
#if DO_COMPUTE_GPU
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
int camSize;
GetObjectCount(g_SphereCount, g_ObjSize, g_MatSize, camSize);
assert(g_ObjSize == 20);
assert(g_MatSize == 36);
assert(camSize == 88);
D3D11_BUFFER_DESC bdesc = {};
bdesc.ByteWidth = g_SphereCount * g_ObjSize;
bdesc.Usage = D3D11_USAGE_DEFAULT;
bdesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
bdesc.CPUAccessFlags = 0;
bdesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
bdesc.StructureByteStride = g_ObjSize;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataSpheres);
srvDesc.Format = DXGI_FORMAT_UNKNOWN;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
srvDesc.Buffer.FirstElement = 0;
srvDesc.Buffer.NumElements = g_SphereCount;
g_D3D11Device->CreateShaderResourceView(g_DataSpheres, &srvDesc, &g_SRVSpheres);
bdesc.ByteWidth = g_SphereCount * g_MatSize;
bdesc.StructureByteStride = g_MatSize;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataMaterials);
srvDesc.Buffer.NumElements = g_SphereCount;
g_D3D11Device->CreateShaderResourceView(g_DataMaterials, &srvDesc, &g_SRVMaterials);
bdesc.ByteWidth = sizeof(ComputeParams);
bdesc.StructureByteStride = sizeof(ComputeParams);
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataParams);
srvDesc.Buffer.NumElements = 1;
g_D3D11Device->CreateShaderResourceView(g_DataParams, &srvDesc, &g_SRVParams);
bdesc.ByteWidth = g_SphereCount * 4;
bdesc.StructureByteStride = 4;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataEmissives);
srvDesc.Buffer.NumElements = g_SphereCount;
g_D3D11Device->CreateShaderResourceView(g_DataEmissives, &srvDesc, &g_SRVEmissives);
bdesc.ByteWidth = 4;
bdesc.BindFlags |= D3D11_BIND_UNORDERED_ACCESS;
bdesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
bdesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
g_D3D11Device->CreateBuffer(&bdesc, NULL, &g_DataCounter);
uavDesc.Format = DXGI_FORMAT_R32_TYPELESS;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.NumElements = 1;
uavDesc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
g_D3D11Device->CreateUnorderedAccessView(g_DataCounter, &uavDesc, &g_UAVCounter);
uavDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D;
uavDesc.Texture2D.MipSlice = 0;
g_D3D11Device->CreateUnorderedAccessView(g_BackbufferTexture, &uavDesc, &g_BackbufferUAV);
g_D3D11Device->CreateUnorderedAccessView(g_BackbufferTexture2, &uavDesc, &g_BackbufferUAV2);
D3D11_QUERY_DESC qDesc = {};
qDesc.Query = D3D11_QUERY_TIMESTAMP;
g_D3D11Device->CreateQuery(&qDesc, &g_QueryBegin);
g_D3D11Device->CreateQuery(&qDesc, &g_QueryEnd);
qDesc.Query = D3D11_QUERY_TIMESTAMP_DISJOINT;
g_D3D11Device->CreateQuery(&qDesc, &g_QueryDisjoint);
#endif // #if DO_COMPUTE_GPU
static int framesLeft = 10;
// Main message loop
MSG msg = { 0 };
while (msg.message != WM_QUIT)
{
if (PeekMessage(&msg, NULL, 0U, 0U, PM_REMOVE))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
else
{
RenderFrame();
if( --framesLeft == 0 ) break;
}
}
ShutdownTest();
ShutdownD3DDevice();
return (int) msg.wParam;
}
ATOM MyRegisterClass(HINSTANCE hInstance)
{
ZoneScoped;
WNDCLASSEXW wcex;
memset(&wcex, 0, sizeof(wcex));
wcex.cbSize = sizeof(WNDCLASSEX);
wcex.style = CS_HREDRAW | CS_VREDRAW;
wcex.lpfnWndProc = WndProc;
wcex.cbClsExtra = 0;
wcex.cbWndExtra = 0;
wcex.hInstance = hInstance;
wcex.hCursor = LoadCursor(nullptr, IDC_ARROW);
wcex.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
wcex.lpszClassName = L"TestClass";
return RegisterClassExW(&wcex);
}
BOOL InitInstance(HINSTANCE hInstance, int nCmdShow)
{
ZoneScoped;
g_HInstance = hInstance;
RECT rc = { 0, 0, kBackbufferWidth, kBackbufferHeight };
DWORD style = WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_MINIMIZEBOX;
AdjustWindowRect(&rc, style, FALSE);
HWND hWnd = CreateWindowW(L"TestClass", L"Test", style, CW_USEDEFAULT, CW_USEDEFAULT, rc.right-rc.left, rc.bottom-rc.top, nullptr, nullptr, hInstance, nullptr);
if (!hWnd)
return FALSE;
g_Wnd = hWnd;
ShowWindow(hWnd, nCmdShow);
UpdateWindow(hWnd);
return TRUE;
}
static uint64_t s_Time;
static int s_Count;
static char s_Buffer[200];
static unsigned s_Flags = kFlagProgressive;
static int s_FrameCount = 0;
static void RenderFrame()
{
ZoneScoped;
LARGE_INTEGER time1;
#if DO_COMPUTE_GPU
QueryPerformanceCounter(&time1);
float t = float(clock()) / CLOCKS_PER_SEC;
UpdateTest(t, s_FrameCount, kBackbufferWidth, kBackbufferHeight, s_Flags);
g_BackbufferIndex = 1 - g_BackbufferIndex;
void* dataSpheres = alloca(g_SphereCount * g_ObjSize);
void* dataMaterials = alloca(g_SphereCount * g_MatSize);
void* dataEmissives = alloca(g_SphereCount * 4);
ComputeParams dataParams;
GetSceneDesc(dataSpheres, dataMaterials, &dataParams.cam, dataEmissives, &dataParams.emissiveCount);
dataParams.sphereCount = g_SphereCount;
dataParams.screenWidth = kBackbufferWidth;
dataParams.screenHeight = kBackbufferHeight;
dataParams.frames = s_FrameCount;
dataParams.invWidth = 1.0f / kBackbufferWidth;
dataParams.invHeight = 1.0f / kBackbufferHeight;
float lerpFac = float(s_FrameCount) / float(s_FrameCount + 1);
if (s_Flags & kFlagAnimate)
lerpFac *= DO_ANIMATE_SMOOTHING;
if (!(s_Flags & kFlagProgressive))
lerpFac = 0;
dataParams.lerpFac = lerpFac;
g_D3D11Ctx->UpdateSubresource(g_DataSpheres, 0, NULL, dataSpheres, 0, 0);
g_D3D11Ctx->UpdateSubresource(g_DataMaterials, 0, NULL, dataMaterials, 0, 0);
g_D3D11Ctx->UpdateSubresource(g_DataParams, 0, NULL, &dataParams, 0, 0);
g_D3D11Ctx->UpdateSubresource(g_DataEmissives, 0, NULL, dataEmissives, 0, 0);
ID3D11ShaderResourceView* srvs[] = {
g_BackbufferIndex == 0 ? g_BackbufferSRV2 : g_BackbufferSRV,
g_SRVSpheres,
g_SRVMaterials,
g_SRVParams,
g_SRVEmissives
};
g_D3D11Ctx->CSSetShaderResources(0, ARRAYSIZE(srvs), srvs);
ID3D11UnorderedAccessView* uavs[] = {
g_BackbufferIndex == 0 ? g_BackbufferUAV : g_BackbufferUAV2,
g_UAVCounter
};
g_D3D11Ctx->CSSetUnorderedAccessViews(0, ARRAYSIZE(uavs), uavs, NULL);
g_D3D11Ctx->CSSetShader(g_ComputeShader, NULL, 0);
g_D3D11Ctx->Begin(g_QueryDisjoint);
g_D3D11Ctx->End(g_QueryBegin);
g_D3D11Ctx->Dispatch(kBackbufferWidth/kCSGroupSizeX, kBackbufferHeight/kCSGroupSizeY, 1);
g_D3D11Ctx->End(g_QueryEnd);
uavs[0] = NULL;
g_D3D11Ctx->CSSetUnorderedAccessViews(0, ARRAYSIZE(uavs), uavs, NULL);
++s_FrameCount;
#else
QueryPerformanceCounter(&time1);
float t = float(clock()) / CLOCKS_PER_SEC;
static size_t s_RayCounter = 0;
int rayCount;
UpdateTest(t, s_FrameCount, kBackbufferWidth, kBackbufferHeight, s_Flags);
DrawTest(t, s_FrameCount, kBackbufferWidth, kBackbufferHeight, g_Backbuffer, rayCount, s_Flags);
s_FrameCount++;
s_RayCounter += rayCount;
LARGE_INTEGER time2;
QueryPerformanceCounter(&time2);
uint64_t dt = time2.QuadPart - time1.QuadPart;
++s_Count;
s_Time += dt;
if (s_Count > 10)
{
LARGE_INTEGER frequency;
QueryPerformanceFrequency(&frequency);
double s = double(s_Time) / double(frequency.QuadPart) / s_Count;
sprintf_s(s_Buffer, sizeof(s_Buffer), "%.2fms (%.1f FPS) %.1fMrays/s %.2fMrays/frame frames %i\n", s * 1000.0f, 1.f / s, s_RayCounter / s_Count / s * 1.0e-6f, s_RayCounter / s_Count * 1.0e-6f, s_FrameCount);
SetWindowTextA(g_Wnd, s_Buffer);
OutputDebugStringA(s_Buffer);
s_Count = 0;
s_Time = 0;
s_RayCounter = 0;
}
D3D11_MAPPED_SUBRESOURCE mapped;
g_D3D11Ctx->Map(g_BackbufferTexture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
const uint8_t* src = (const uint8_t*)g_Backbuffer;
uint8_t* dst = (uint8_t*)mapped.pData;
for (int y = 0; y < kBackbufferHeight; ++y)
{
memcpy(dst, src, kBackbufferWidth * 16);
src += kBackbufferWidth * 16;
dst += mapped.RowPitch;
}
g_D3D11Ctx->Unmap(g_BackbufferTexture, 0);
#endif
g_D3D11Ctx->VSSetShader(g_VertexShader, NULL, 0);
g_D3D11Ctx->PSSetShader(g_PixelShader, NULL, 0);
g_D3D11Ctx->PSSetShaderResources(0, 1, g_BackbufferIndex == 0 ? &g_BackbufferSRV : &g_BackbufferSRV2);
g_D3D11Ctx->PSSetSamplers(0, 1, &g_SamplerLinear);
g_D3D11Ctx->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
g_D3D11Ctx->RSSetState(g_RasterState);
g_D3D11Ctx->Draw(3, 0);
g_D3D11SwapChain->Present(0, 0);
FrameMark;
#if DO_COMPUTE_GPU
g_D3D11Ctx->End(g_QueryDisjoint);
// get GPU times
while (g_D3D11Ctx->GetData(g_QueryDisjoint, NULL, 0, 0) == S_FALSE) { Sleep(0); }
D3D10_QUERY_DATA_TIMESTAMP_DISJOINT tsDisjoint;
g_D3D11Ctx->GetData(g_QueryDisjoint, &tsDisjoint, sizeof(tsDisjoint), 0);
if (!tsDisjoint.Disjoint)
{
UINT64 tsBegin, tsEnd;
// Note: on some GPUs/drivers, even when the disjoint query above already said "yeah I have data",
// might still not return "I have data" for timestamp queries before it.
while (g_D3D11Ctx->GetData(g_QueryBegin, &tsBegin, sizeof(tsBegin), 0) == S_FALSE) { Sleep(0); }
while (g_D3D11Ctx->GetData(g_QueryEnd, &tsEnd, sizeof(tsEnd), 0) == S_FALSE) { Sleep(0); }
float s = float(tsEnd - tsBegin) / float(tsDisjoint.Frequency);
static uint64_t s_RayCounter;
D3D11_MAPPED_SUBRESOURCE mapped;
g_D3D11Ctx->Map(g_DataCounter, 0, D3D11_MAP_READ, 0, &mapped);
s_RayCounter += *(const int*)mapped.pData;
g_D3D11Ctx->Unmap(g_DataCounter, 0);
int zeroCount = 0;
g_D3D11Ctx->UpdateSubresource(g_DataCounter, 0, NULL, &zeroCount, 0, 0);
static float s_Time;
++s_Count;
s_Time += s;
if (s_Count > 150)
{
s = s_Time / s_Count;
sprintf_s(s_Buffer, sizeof(s_Buffer), "%.2fms (%.1f FPS) %.1fMrays/s %.2fMrays/frame frames %i\n", s * 1000.0f, 1.f / s, s_RayCounter / s_Count / s * 1.0e-6f, s_RayCounter / s_Count * 1.0e-6f, s_FrameCount);
SetWindowTextA(g_Wnd, s_Buffer);
s_Count = 0;
s_Time = 0;
s_RayCounter = 0;
}
}
#endif // #if DO_COMPUTE_GPU
}
LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
switch (message)
{
case WM_PAINT:
{
PAINTSTRUCT ps;
HDC hdc = BeginPaint(hWnd, &ps);
EndPaint(hWnd, &ps);
}
break;
case WM_DESTROY:
PostQuitMessage(0);
break;
case WM_CHAR:
if (wParam == 'a')
s_Flags = s_Flags ^ kFlagAnimate;
if (wParam == 'p')
{
s_Flags = s_Flags ^ kFlagProgressive;
s_FrameCount = 0;
}
break;
default:
return DefWindowProc(hWnd, message, wParam, lParam);
}
return 0;
}
static HRESULT InitD3DDevice()
{
ZoneScoped;
HRESULT hr = S_OK;
RECT rc;
GetClientRect(g_Wnd, &rc);
UINT width = rc.right - rc.left;
UINT height = rc.bottom - rc.top;
UINT createDeviceFlags = 0;
#ifdef _DEBUG
createDeviceFlags |= D3D11_CREATE_DEVICE_DEBUG;
#endif
D3D_FEATURE_LEVEL featureLevels[] =
{
D3D_FEATURE_LEVEL_11_0,
};
UINT numFeatureLevels = ARRAYSIZE(featureLevels);
hr = D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, nullptr, createDeviceFlags, featureLevels, numFeatureLevels, D3D11_SDK_VERSION, &g_D3D11Device, &g_D3D11FeatureLevel, &g_D3D11Ctx);
if (FAILED(hr))
return hr;
// Get DXGI factory
IDXGIFactory1* dxgiFactory = nullptr;
{
IDXGIDevice* dxgiDevice = nullptr;
hr = g_D3D11Device->QueryInterface(__uuidof(IDXGIDevice), reinterpret_cast<void**>(&dxgiDevice));
if (SUCCEEDED(hr))
{
IDXGIAdapter* adapter = nullptr;
hr = dxgiDevice->GetAdapter(&adapter);
if (SUCCEEDED(hr))
{
hr = adapter->GetParent(__uuidof(IDXGIFactory1), reinterpret_cast<void**>(&dxgiFactory));
adapter->Release();
}
dxgiDevice->Release();
}
}
if (FAILED(hr))
return hr;
// Create swap chain
DXGI_SWAP_CHAIN_DESC sd;
ZeroMemory(&sd, sizeof(sd));
sd.BufferCount = 1;
sd.BufferDesc.Width = width;
sd.BufferDesc.Height = height;
sd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
sd.BufferDesc.RefreshRate.Numerator = 60;
sd.BufferDesc.RefreshRate.Denominator = 1;
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
sd.OutputWindow = g_Wnd;
sd.SampleDesc.Count = 1;
sd.SampleDesc.Quality = 0;
sd.Windowed = TRUE;
hr = dxgiFactory->CreateSwapChain(g_D3D11Device, &sd, &g_D3D11SwapChain);
// Prevent Alt-Enter
dxgiFactory->MakeWindowAssociation(g_Wnd, DXGI_MWA_NO_ALT_ENTER);
dxgiFactory->Release();
if (FAILED(hr))
return hr;
// RTV
ID3D11Texture2D* pBackBuffer = nullptr;
hr = g_D3D11SwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), reinterpret_cast<void**>(&pBackBuffer));
if (FAILED(hr))
return hr;
hr = g_D3D11Device->CreateRenderTargetView(pBackBuffer, nullptr, &g_D3D11RenderTarget);
pBackBuffer->Release();
if (FAILED(hr))
return hr;
g_D3D11Ctx->OMSetRenderTargets(1, &g_D3D11RenderTarget, nullptr);
// Viewport
D3D11_VIEWPORT vp;
vp.Width = (float)width;
vp.Height = (float)height;
vp.MinDepth = 0.0f;
vp.MaxDepth = 1.0f;
vp.TopLeftX = 0;
vp.TopLeftY = 0;
g_D3D11Ctx->RSSetViewports(1, &vp);
return S_OK;
}
static void ShutdownD3DDevice()
{
ZoneScoped;
if (g_D3D11Ctx) g_D3D11Ctx->ClearState();
if (g_D3D11RenderTarget) g_D3D11RenderTarget->Release();
if (g_D3D11SwapChain) g_D3D11SwapChain->Release();
if (g_D3D11Ctx) g_D3D11Ctx->Release();
if (g_D3D11Device) g_D3D11Device->Release();
}

View File

@@ -0,0 +1,13 @@
struct vs2ps
{
float2 uv : TEXCOORD0;
float4 pos : SV_Position;
};
vs2ps main(uint vid : SV_VertexID)
{
vs2ps o;
o.uv = float2((vid << 1) & 2, vid & 2);
o.pos = float4(o.uv * float2(2, 2) + float2(-1, -1), 0, 1);
return o;
}

View File

@@ -0,0 +1,24 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org>

22
extra/dxt1divtable.c Normal file
View File

@@ -0,0 +1,22 @@
#include <stdint.h>
#include <stdio.h>
int main()
{
for( int i=0; i<255*3+1; i++ )
{
// replace 4 with 2 for ARM NEON table
uint32_t range = ( 4 << 16 ) / ( 1+i );
if( range > 0xFFFF ) range = 0xFFFF;
if( i % 16 == 15 )
{
printf( "0x%04x,\n", range );
}
else
{
printf( "0x%04x, ", range );
}
}
printf( "\n" );
return 0;
}

36
extra/dxt1table.c Normal file
View File

@@ -0,0 +1,36 @@
#include <stdint.h>
#include <stdio.h>
static const uint8_t IndexTable[4] = { 1, 3, 2, 0 };
int convert( int v )
{
int v0 = v & 0x3;
int v1 = ( v >> 2 ) & 0x3;
int v2 = ( v >> 4 ) & 0x3;
int v3 = ( v >> 6 );
int t0 = IndexTable[v0];
int t1 = IndexTable[v1];
int t2 = IndexTable[v2];
int t3 = IndexTable[v3];
return t0 | ( t1 << 2 ) | ( t2 << 4 ) | ( t3 << 6 );
}
int main()
{
for( int i=0; i<256; i++ )
{
if( i % 16 == 15 )
{
printf( "%i,\n", convert( i ) );
}
else
{
printf( "%i,\t", convert( i ) );
}
}
printf( "\n" );
return 0;
}

24
extra/rdotbl.c Normal file
View File

@@ -0,0 +1,24 @@
#include <stdio.h>
int main()
{
//int a = 16, b = 44, s = 4;
//int av = 12, bv = 6, cv = 3;
//int a = 32, b = 48, s = 16;
//int av = 12, bv = 6, cv = 3;
int a = 48, b = 64, s = 16;
int av = 48, bv = 32, cv = 24;
printf( "int TrTbl[] = { " );
int first = 1;
for( int i=0; i<256; i+=s )
{
if( first ) first = 0; else printf( ", " );
if( i < a ) printf( "%i", av );
else if( i < b ) printf( "%i", bv );
else printf( "%i", cv );
}
printf( " };\n" );
}

4
extra/systrace/build.sh Executable file
View File

@@ -0,0 +1,4 @@
#!/bin/sh
clang tracy_systrace.c -s -Os -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-stack-protector -Wl,-z,norelro -Wl,--build-id=none -nostdlib -ldl -o tracy_systrace
strip --strip-all -R .note.gnu.gold-version -R .comment -R .note -R .note.gnu.build-id -R .note.ABI-tag -R .eh_frame -R .eh_frame_hdr -R .gnu.hash -R .gnu.version -R .got tracy_systrace
sstrip -z tracy_systrace

View File

@@ -0,0 +1,53 @@
#include <fcntl.h>
#include <poll.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <dlfcn.h>
enum { BufSize = 64*1024 };
typedef int (*open_t)( const char*, int, ... );
typedef void (*exit_t)( int );
typedef int (*poll_t)( struct pollfd*, nfds_t, int timeout );
typedef int (*nanosleep_t)( const struct timespec*, struct timespec* );
typedef ssize_t (*read_t)( int, void*, size_t );
typedef ssize_t (*write_t)( int, const void*, size_t );
void _start()
{
void* libc = dlopen( "libc.so", RTLD_LAZY );
open_t sym_open = dlsym( libc, "open" );
exit_t sym_exit = dlsym( libc, "exit" );
poll_t sym_poll = dlsym( libc, "poll" );
nanosleep_t sym_nanosleep = dlsym( libc, "nanosleep" );
read_t sym_read = dlsym( libc, "read" );
write_t sym_write = dlsym( libc, "write" );
char buf[BufSize];
int kernelFd = sym_open( "/sys/kernel/debug/tracing/trace_pipe", O_RDONLY );
if( kernelFd < 0 ) sym_exit( 0 );
struct pollfd pfd;
pfd.fd = kernelFd;
pfd.events = POLLIN | POLLERR;
struct timespec sleepTime;
sleepTime.tv_sec = 0;
sleepTime.tv_nsec = 1000 * 1000 * 10;
for(;;)
{
while( sym_poll( &pfd, 1, 0 ) <= 0 ) sym_nanosleep( &sleepTime, NULL );
const int rd = sym_read( kernelFd, buf, BufSize );
if( rd <= 0 ) break;
sym_write( STDOUT_FILENO, buf, rd );
}
sym_exit( 0 );
}

Some files were not shown because too many files have changed in this diff Show More