filament

Author	SHA1	Message	Date
Mathias Agopian	bbc4f4f21d	JobSystem: remove the DONT_SIGNAL flag A side effect of this flag was that we had to always use notify_all when starting a job, instead of notify_one -- in practice, we found experimentally that more calls to notify_one is cheaper than fewer calls to notify_all.	2021-06-08 09:47:16 -07:00
Mathias Agopian	c5ef1d6252	JobSystem: signal condition with lock held This seems to improve parallelism on multicores and at least on Android, doesn't seem to impact performance.	2021-06-08 09:47:16 -07:00
Mathias Agopian	4d73583f86	Better systrace logging and small improvements to JobSystem Tweak how we signal/wakeup threads so we get better parallelism on Android.	2021-06-02 16:40:40 -07:00
Mathias Agopian	e8b16d600e	Fix a hang in JobSystem The hang was caused by a subtle race. When a job is completed, its thread must signal all the threads that might be waiting on this job. The signaling code was attempting to signal only the minimum number of threads -- this was important especially in the case where no threads were waiting, then the call to notify() could be avoided. Unfortunately, for performance reasons we're not calling notify() with the condition lock held, this meant that between the time the number of waiting threads was latched and the time of the notify() call, more threads could enter their condition variable wait(), and it would then be possible for these threads to wake up, instead of the thread we were trying to wake up (the one waiting on the job). It would then get stuck forever. This bug was introduced in `2df639133b` Also add some debugging code for this kind of failure (disabled)	2021-03-17 13:44:30 -07:00
Philip Rideout	b82dca4fac	JobSystem: work around hang on 2-CPU machines If hardware_concurrency() returned 2 while UTILS_HAS_HYPER_THREADING was enabled, `mThreadCount` was resulting in zero, so jobs (such as texture decoding) would simply never start. This problem was noticed with GitHub Actions. I tested this fix locally by replacing `hardware_concurrency()` with fixed values like 1 or 2, and verified that the hang went away.	2020-10-09 16:14:50 -07:00
Pixelflinger	ec9fd58fc0	use more inclusive language	2020-07-22 15:20:54 -07:00
Pixelflinger	fb0e89d969	fix a race-condition in JobSystem mThreadMap needs to be protected as it's accessed from multiple threads.	2020-04-03 17:25:17 -07:00
Philip Rideout	c79f5a1a29	JobSystem: replace TLS with tid mapping.	2020-04-02 14:53:20 -07:00
Pixelflinger	9bcf10cf44	fix thread count initialization if hw thread count was 1, we'd end-up with 32 threads	2019-07-22 13:01:53 -07:00
Pixelflinger	261dafa924	Fix a possible infinite loop In the case where we have 2 cores, we would spawn only one thread in the thread pool. If that thread got to try to steal() from another thread before the main thread was adopted, it would end-up always trying to steal from itself and enter an infinite loop. This seems to happen during windows builds.	2019-07-19 10:56:41 -07:00
Mathias Agopian	665703dbd8	parallel_for now creates jobs in reverse order this is because the JobSystem's queue works as a LIFO, by creating jobs in reverse (memory) order, we attempt to help streaming to the d-cache on that threads -- until the point where jobs are stolen. we also execute the last job immediately instead of creating a job for it -- since we're already in a job.	2019-07-09 16:12:15 -07:00
Pixelflinger	8170ca7cd1	improve JobSystem::parallel_for + minor optimizations - parallel_for doesn't use recursion anymore to create the "leaf" jobs, this is now done linearly on N thread (one thread per CPU). This uses less stack space, and reduces miss-predicted branches. - remove almost all SYSTRACE calls because they have a huge impact on things like parallel_for() and are misleading. They can be enabled again by setting HEAVY_SYSTRACE to true.	2019-07-09 16:12:15 -07:00
Mathias Agopian	2df639133b	improvements to JobSystem - we simplify the waiting code by using only a single condition variable instead of two. - wait() now behaves just like a looper, it will process jobs until the one it's waiting for finishes -- before it could just sit there (the idea was that the job would finish quickly, but that's not always the case). - we also make sure to never call notify_n() when it's not needed. We track how many waiters we have and use that to decide if we need to notify(). notify is pretty slow on all architectures, even on linux it's always a syscall, so it's better to avoid it. - don't use stand-alone fences, makes things ugly for no real benefit - refactored the code a bit, hopefully it's more clear.	2019-07-09 16:12:15 -07:00
Mathias Agopian	76027cab85	fix a JobSystem bug in waitAndRelease() We were not checking for jobs to execute during waitAndRelease(), so this thread would essentially not participate to the work pool.	2019-07-01 14:25:26 -07:00
Romain Guy	826d52bca2	Fix Windows build and warnings	2019-06-27 09:18:31 -07:00
Mathias Agopian	921c2bcd61	mitigate overhead of jobsystem For jobs with that do very little work, the jobsystem can introduce a lot of overhead, we mitigate this by: - don't wake-up worker threads when scheduling several very small jobs, like when scheduling the per-face jobs. - don't wait for per-face jobs to finish -- we only did that to avoid a copy of the job's data. - don't use multi-threading at all if the job has too little work. We evaluate the work using the scanline length and number of samples.	2019-06-26 16:05:30 -07:00
Mathias Agopian	ae6ab66af4	fix #755 : a race condition cousing a deadlock This reverts a JobSystem optimization that attempted to avoid signaling a condition when there was no waiters. Unfortunately, there was a race that caused the the signaling thread to miss that the waiter flag was set, thus not signaling.	2019-01-29 15:40:25 -08:00
Mathias Agopian	cfb9c03226	Improve JobSystem, especially under contention - only signal waitAndRelease() when the corresponding job finishes and only if there is waitAndRelease() active -- instead of signaling every time a job ends. - don't surrender time slice when attempting to steal a job and it fails as long as some queue has jobs. - check that we have to wait, because taking the lock - add a benchmark This change more than doubles the amount of jobs we can handle per second (~965,000 jobs/s on Pixel3)	2018-12-14 16:01:17 -08:00
Mathias Agopian	d6de2bf426	get rid of JobSystem::reset() It was only used to clear the master job, instead the master job is cleared when waited on.	2018-12-14 14:48:33 -08:00
Mathias Agopian	e38870ebf5	Improve JobSystem::wait JobSystem::waitAndRelease used to spin to wait for the job to finish, usually this wasn't a problem because the spinning thread was able to handle other jobs. However, in cases where no job was available it would actually spin in burn cpu cycles. we now use a (separate) condition variable to handle that case.	2018-12-13 10:50:52 -08:00
Pixelflinger	e624dff037	set cpu affinity of JobSystem's threads this is to try to prevent threads from bouncing between cores	2018-12-11 13:46:05 -08:00
Mathias Agopian	f9cc118bdb	Always wake up a job queue when a job is ran We used to only wake up a job-queue if there was already some jobs running, the idea was that the current thread would handle the new job as soon as calling wait(). However, there is no guarantee that wait() will be called anytime soon. cv.signal() is not very expensive on Android/Linux, as we're using a custom implementation.	2018-12-10 21:33:38 -08:00
Mathias Agopian	bc3aee118a	code size and performance optimizations (#472 ) * Clean-up EntityManager a bit - use tsl::robin_set instead of std::set (which should have been unordered::set anyways). - getListeners() now returns a vector which avoids to traverse a set twice. Turns out that copying the set wasn't as efficient as I thought. * Improve jobsystem a bit We recently added a job reference counting mechanism, but we were a bit too aggressive about taking/release references. Also make the API more complete by adding explicit retain/release, which is needed to allow several threads to wait on the same job. Also improve futex code by inlining it.	2018-11-12 14:11:31 -08:00
Mathias Agopian	68c6afa72e	Fix reuse after free and API inconsistency in job system (#446 ) * Fix a reuse after free in the job system Jobs were destroyed and recycled while still in use by wait() or run(). To fix this we introduce reference-counting of jobs. Jobs start with a ref-count of 1, which is decremented when a job naturally finishes. Additionally, all user-facing methods acquire a reference for the duration of the call. * Fix an API inconsistency with JobSystem JobSystem's API lets the user create jobs but not destroy them. Jobs are destroyed automatically, without a way for the caller to know when that happens. We now explicitly enforce that jobs are no longer valid when wait() returns. Multiple concurrent wait() are allowed however. This is enforced by clearing the job pointer upon returning from JobSystem::wait(Job* job). * Rename linked-list put/get to push/pop * Better fix for Job use after free There was still a race condition where a run()'ed job could be destroyed before wait() was called, wait would then use a destroyed object. The available APIs now are: run() - runs and destroys a job runAndWait() - run, then waits for and destroys a job runAndRetain() - runs and keep a reference to the job wait() - waits and destroys a job wait() can only be used with a job obtained with runAndRetain(). * Get rid of unused code This version of parallel_for has use-after-free issues anyways, since we changed the semantics of run/wait/etc... * Fix decRef() memory order decRef() must ensure that all access to the object have happened before destroying it. * Fix memory order in atomic linked list's pop() It needs acquire semantic, since we want to make sure that no read/write are reordered before the pop() -- which returns an object to the caller. * Fix memory order on runningJobCount we needed acquire semantic when about to destroy the last job -- it's similar to decRef. * Comment usages of std::memory_order_* * Fix AtomicFreeList A-B-A bug Turns out AtomicFreeList was not immune to the ABA bug. W're fixing it here by using a 64-bits CAS, which is available on aarch64 and armv7.	2018-11-05 19:12:13 -08:00
prideout	3a7d80f29b	Restore "Add single-threaded config to Filament. (#130 )" This reverts commit `24022010f9`.	2018-08-27 08:51:46 -07:00
Philip Rideout	24022010f9	Revert "Add single-threaded config to Filament. (#130 )" (#152 ) This reverts commit `e3457aa0a7`.	2018-08-26 18:40:49 -07:00
Philip Rideout	e3457aa0a7	Add single-threaded config to Filament. (#130 ) * Add single-threaded config to Filament. This adds a tick method to Engine and disables a couple components in Renderer (FrameSkipper and FrameInfoManager). This will make it easier to support WebGL, and will allow us to remove some of the command buffer debugging stuff that we added for Vulkan. * tick => execute, and other review feedback * Restore the ASSERT for FFence::wait.	2018-08-24 16:37:12 -07:00
Mathias Agopian	d9ba3998d2	JobSystem now automatically free Jobs (#91 ) * JobSystem now automatically free Jobs Until now Job allocation used a linear allocator strategy which required to “reset” the JobSystem periodically — typically once per frame in filament. This is no longer required. We use a pool allocator now, which doesn’t add much overhead. It does use a spin-lock for thread-safety though, since we assume very little contention, this shouldn’t be a problem. * Thread Safe Object Pool Allocator A lock-less, thread-safe object pool allocator, now used for storing JobSystem’s jobs allocations. This gets rid of the spin-lock introduced in the previous cl.	2018-08-13 22:25:15 -07:00
Mathias Agopian	94c3623c1c	clean-up formatting Change-Id: I2071e53cceb93cbe02d6bdfa238aa6ce770b0534	2018-08-06 18:14:11 -07:00
Tact Yoshida	ad49986245	Remove execute permissions	2018-08-06 10:36:54 -07:00
Romain Guy	b3d758f3b3	Initial commit	2018-08-03 10:38:22 -07:00

31 Commits