Commit Graph

47 Commits

Author SHA1 Message Date
Mathias Agopian
bef849b0e3 large code cleanup (#8364)
* Remove redundant qualifiers in filament public headers

* remove redundant qualifiers in filament implementation

* remove redundant qualifiers in libutils public headers

* remove redundant qualifier for libutils implementation

* remove redundant qualifiers for libmath

* use is_same_v<> instead of is_same<>

* bring back Builder::name()

we keep Builder::name() on all object, and forward to the MixIn class
that does the implementation, so that we have correct documentation, and
better IDE completion.

* add missing const parameters in filament's implementation

* various source cleanup

- missing includes
- missing const
- C cast style
- superfluous inline keyword
2025-01-17 17:50:16 -08:00
Ben Doherty
e8f3fc5a46 Implement setThreadPriority for Apple devices (#8200) 2024-10-16 12:55:25 -04:00
Mathias Agopian
be4f287b07 More improvements to the JobSystem (#7988)
* improve parallel_for a bit

We get about 40% performance increase. The gain comes from not having
to copy the JobData structure each time we create a job, by using a
new emplaceJob() method, we can create the structure directly into
its destination.

* avoid calling wakeAll() when possible

wakeAll() is very expensive and not always needed when a job finishes
because there may not be anyone waiting on that job.

We now maintain a waiter count per job, and use that to determine if
we need to notify or not.

And now that the JobSystem overhead is lower, we can decrease the size
of the jobs, which improves the load balancing.

* mActiveJobs fixes

some comments claimed mActiveJobs needed to be modified before or after
accessing the WorkQueue; this couldn't be correct because there were no
guaranteed global ordering with the workQueue.
2024-07-24 15:34:36 -07:00
Mathias Agopian
a60fe41681 performance improvements to JobSystem
- reduce the number of calls to notify_one() and notify_all().
  notify_one() is not only called when running a new job, and
  notify_all() only when a job finishes.

- don't hold the condition lock while calling notify_*(), as it is not
  strictly needed, and because notify_*() can be very slow, there can
  be a lot of contention on this lock as a result; blocking the whole
  jobsystem thread pool.

- add a new version of run() that takes an opaque thread id that can
  be retrieved from a job's execute function; this is especially
  intended to be used by parallel_for(); it's just a more efficient
  version of run() that avoids a hashmap lookup.


Overall these change yield a significant performance boost:
- running + waiting a job: +200%
- running many jobs: +150%
- running many jobs in parallel: +50%
2024-07-22 11:12:51 -07:00
Ben Doherty
cf91e42847 Switch ASSERT macros to new stream API (#7881) 2024-05-24 20:46:34 +00:00
Mathias Agopian
8af2d7512d cleanup and small improvements to JobSystem
- add missing includes
- wake-up one or several threads based on the number of jobs available
2024-04-16 12:11:15 -07:00
Mathias Agopian
785592293c don't use thread affinity for the thread-pool
using thread affinity naively on big.little architectures is very flaky,
for now it's better to simplify and not use it at all, let the kernel
figure things out.

BUGS=[333582569]
2024-04-16 12:11:15 -07:00
Mathias Agopian
6ee20a57aa remove all uses of our custom spinlock
This has caused issues and over time we have reduced the use of 
spinlocks, it was only used in few places and we still have evidence
that it's causing ANRs.

We use utils::Mutex instead which is a low overhead mutex implementation
on Linux systems.

FIXES=[321101014]
2024-01-19 12:05:07 -08:00
Mathias Agopian
f537f62adf Use 4 background threads for shader compiler on PowerVR
Since powervr supports parallel shader compilation well, we use 
4 background threads for shader compilation.
2023-08-11 15:22:19 -07:00
Balazs Vegh
6a28d2b81c Implement setThreadName() on Windows (#5999) 2022-08-26 13:29:42 -04:00
Mathias Agopian
19b0ad2605 fix a race in jobsystem (2nd attempt)
We were decrementing activeJobCount after removing the job from the
queue, which could cause other threads in the pool to preempt us before
the decrement, causing them to spin forever trying to get a non-existant
job, until the decrement actually happened.

Now we always decrement first and fix-up the count if we couldn't get
a job from the queues. The race is inverted, and doesn't cause threads
to spin a long time.


fixes b/201100123
2022-03-14 16:40:02 -07:00
Romain Guy
dd4853bcc5 Replace ANDROID with __ANDROID__ (#4909)
__ANDROID__ is always set by the toolchain and less likely to cause
conflicts than ANDROID. This change also removes the -DANDROID flag
we set ourselves in our toolchain CMake files since we don't need
it anymore.
2021-11-30 16:27:56 -08:00
Benjamin Doherty
1f403fdae0 Revert "fix a race in jobsystem"
This reverts commit 2feb0ad325.
2021-09-24 11:33:14 -07:00
Mathias Agopian
2feb0ad325 fix a race in jobsystem
We were decrementing activeJobCount after removing the job from the
queue, which could cause other threads in the pool to preempt us before
the decrement, causing them to spin forever trying to get a non-existant
job, until the decrement actually happened.

Now we always decrement first and fix-up the count if we couldn't get
a job from the queues. The race is inverted, and doesn't cause threads
to spin a long time.
2021-09-14 16:05:30 -07:00
Ben Doherty
4e33e9c3d1 Attempt to fix TSAN failure in ColorGrading.cpp (#4447) 2021-08-05 11:53:35 -07:00
Mathias Agopian
3eceaf20a2 JobSystem: set thread pool count to ncores - 2
We need to reserve at least a code for the user and another one for the
backend thread.
2021-06-08 09:47:16 -07:00
Mathias Agopian
bbc4f4f21d JobSystem: remove the DONT_SIGNAL flag
A side effect of this flag was that we had to always use notify_all
when starting a job, instead of notify_one -- in practice, we found
experimentally that more calls to notify_one is cheaper than fewer
calls to notify_all.
2021-06-08 09:47:16 -07:00
Mathias Agopian
c5ef1d6252 JobSystem: signal condition with lock held
This seems to improve parallelism on multicores and at least on
Android, doesn't seem to impact performance.
2021-06-08 09:47:16 -07:00
Mathias Agopian
4d73583f86 Better systrace logging and small improvements to JobSystem
Tweak how we signal/wakeup threads so we get better parallelism 
on Android.
2021-06-02 16:40:40 -07:00
Mathias Agopian
e8b16d600e Fix a hang in JobSystem
The hang was caused by a subtle race. When a job is completed, its 
thread must signal all the threads that might be waiting on this job.
The signaling code was attempting to signal only the minimum number
of threads -- this was important especially in the case where no threads
were waiting, then the call to notify() could be avoided.

Unfortunately, for performance reasons we're not calling notify() with
the condition lock held, this meant that between the time the number of 
waiting threads was latched and the time of the notify() call, more
threads could enter their condition variable wait(), and it would
then be possible for these threads to wake up, instead of the thread
we were trying to wake up (the one waiting on the job).

It would then get stuck forever.

This bug was introduced in 2df639133b


Also add some debugging code for this kind of failure (disabled)
2021-03-17 13:44:30 -07:00
Philip Rideout
b82dca4fac JobSystem: work around hang on 2-CPU machines
If hardware_concurrency() returned 2 while UTILS_HAS_HYPER_THREADING was
enabled, `mThreadCount` was resulting in zero, so jobs (such as texture
decoding) would simply never start. This problem was noticed with
GitHub Actions.

I tested this fix locally by replacing `hardware_concurrency()` with
fixed values like 1 or 2, and verified that the hang went away.
2020-10-09 16:14:50 -07:00
Pixelflinger
ec9fd58fc0 use more inclusive language 2020-07-22 15:20:54 -07:00
Pixelflinger
fb0e89d969 fix a race-condition in JobSystem
mThreadMap needs to be protected as it's accessed from multiple threads.
2020-04-03 17:25:17 -07:00
Philip Rideout
c79f5a1a29 JobSystem: replace TLS with tid mapping. 2020-04-02 14:53:20 -07:00
Pixelflinger
9bcf10cf44 fix thread count initialization
if hw thread count was 1, we'd end-up with 32 threads
2019-07-22 13:01:53 -07:00
Pixelflinger
261dafa924 Fix a possible infinite loop
In the case where we have 2 cores, we would spawn only one thread in
the thread pool. If that thread got to try to steal() from another
thread before the main thread was adopted, it would end-up always
trying to steal from itself and enter an infinite loop.

This seems to happen during windows builds.
2019-07-19 10:56:41 -07:00
Mathias Agopian
665703dbd8 parallel_for now creates jobs in reverse order
this is because the JobSystem's queue works as a LIFO, by creating
jobs in reverse (memory) order, we attempt to help streaming to
the d-cache on that threads -- until the point where
jobs are stolen. 

we also execute the last job immediately instead of creating a job
for it -- since we're already in a job.
2019-07-09 16:12:15 -07:00
Pixelflinger
8170ca7cd1 improve JobSystem::parallel_for + minor optimizations
- parallel_for doesn't use recursion anymore to create the "leaf"
jobs, this is now done linearly on N thread (one thread per CPU).
This uses less stack space, and reduces miss-predicted branches.

- remove almost all SYSTRACE calls because they have a huge impact
on things like parallel_for() and are misleading. They can be
enabled again by setting HEAVY_SYSTRACE to true.
2019-07-09 16:12:15 -07:00
Mathias Agopian
2df639133b improvements to JobSystem
- we simplify the waiting code by using only a single
condition variable instead of two.

- wait() now behaves just like a looper, it will process jobs until
the one it's waiting for finishes -- before it could just sit there
(the idea was that the job would finish quickly, but that's not always
the case).

- we also make sure to never call notify_n() when it's not needed.
We track how many waiters we have and use that to decide if we need
to notify().

notify is pretty slow on all architectures, even on linux it's always
a syscall, so it's better to avoid it.

- don't use stand-alone fences, makes things ugly for no real benefit

- refactored the code a bit, hopefully it's more clear.
2019-07-09 16:12:15 -07:00
Mathias Agopian
76027cab85 fix a JobSystem bug in waitAndRelease()
We were not checking for jobs to execute during waitAndRelease(), so
this thread would essentially not participate to the work pool.
2019-07-01 14:25:26 -07:00
Romain Guy
826d52bca2 Fix Windows build and warnings 2019-06-27 09:18:31 -07:00
Mathias Agopian
921c2bcd61 mitigate overhead of jobsystem
For jobs with that do very little work, the jobsystem can introduce
a lot of overhead, we mitigate this by:

- don't wake-up worker threads when scheduling several very small jobs,
like when scheduling the per-face jobs. 

- don't wait for per-face jobs to finish -- we only did that to avoid
a copy of the job's data.

- don't use multi-threading at all if the job has too little work. We
evaluate the work using the scanline length and number of samples.
2019-06-26 16:05:30 -07:00
Mathias Agopian
ae6ab66af4 fix #755: a race condition cousing a deadlock
This reverts a JobSystem optimization that attempted to avoid signaling
a condition when there was no waiters. Unfortunately, there was a
race that caused the the signaling thread to miss that the waiter flag
was set, thus not signaling.
2019-01-29 15:40:25 -08:00
Mathias Agopian
cfb9c03226 Improve JobSystem, especially under contention
- only signal waitAndRelease() when the corresponding job finishes and
only if there is waitAndRelease() active -- instead of signaling 
every time a job ends.

- don't surrender time slice when attempting to steal a job and it fails
as long as some queue has jobs.

- check that we have to wait, because taking the lock

- add a benchmark

This change more than doubles the amount of jobs we can handle per
second (~965,000 jobs/s on Pixel3)
2018-12-14 16:01:17 -08:00
Mathias Agopian
d6de2bf426 get rid of JobSystem::reset()
It was only used to clear the master job, instead the master job is
cleared when waited on.
2018-12-14 14:48:33 -08:00
Mathias Agopian
e38870ebf5 Improve JobSystem::wait
JobSystem::waitAndRelease used to spin to wait for the job to finish,
usually this wasn't a problem because the spinning thread was
able to handle other jobs. However, in cases where no job was
available it would actually spin in burn cpu cycles.

we now use a (separate) condition variable to handle that case.
2018-12-13 10:50:52 -08:00
Pixelflinger
e624dff037 set cpu affinity of JobSystem's threads
this is to try to prevent threads from bouncing
between cores
2018-12-11 13:46:05 -08:00
Mathias Agopian
f9cc118bdb Always wake up a job queue when a job is ran
We used to only wake up a job-queue if there was already some jobs
running, the idea was that the current thread would handle the new job
as soon as calling wait(). However, there is no guarantee that wait() 
will be called anytime soon.

cv.signal() is not very expensive on Android/Linux, as we're using
a custom implementation.
2018-12-10 21:33:38 -08:00
Mathias Agopian
bc3aee118a code size and performance optimizations (#472)
* Clean-up EntityManager a bit

- use tsl::robin_set instead of std::set (which should have been unordered::set
  anyways).

- getListeners() now returns a vector which avoids to traverse a set twice.
  Turns out that copying the set wasn't as efficient as I thought.

* Improve jobsystem a bit

We recently added a job reference counting mechanism, but we were a bit
too aggressive about taking/release references.

Also make the API more complete by adding explicit retain/release,
which is needed to allow several threads to wait on the same job.

Also improve futex code by inlining it.
2018-11-12 14:11:31 -08:00
Mathias Agopian
68c6afa72e Fix reuse after free and API inconsistency in job system (#446)
* Fix a reuse after free in the job system

Jobs were destroyed and recycled while still in use
by wait() or run(). To fix this we introduce reference-counting of
jobs.

Jobs start with a ref-count of 1, which is decremented when a job
naturally finishes. Additionally, all user-facing methods acquire
a reference for the duration of the call.

* Fix an API inconsistency with JobSystem

JobSystem's API lets the user create jobs but not destroy them.
Jobs are destroyed automatically, without a way for the caller to
know when that happens.

We now explicitly enforce that jobs are no longer valid when
wait() returns. Multiple concurrent wait() are allowed however.

This is enforced by clearing the job pointer upon returning
from JobSystem::wait(Job* job).

* Rename linked-list put/get to push/pop

* Better fix for Job use after free

There was still a race condition where a run()'ed
job could be destroyed before wait() was called,
wait would then use a destroyed object.

The available APIs now are:

run() - runs and destroys a job
runAndWait() - run, then waits for and destroys a job
runAndRetain() - runs and keep a reference to the job
wait() - waits and destroys a job

wait() can only be used with a job obtained with runAndRetain().

* Get rid of unused code

This version of parallel_for has use-after-free issues anyways,
since we changed the semantics of run/wait/etc...

* Fix decRef() memory order

decRef() must ensure that all access to the 
object have happened before destroying it.

* Fix memory order in atomic linked list's pop()

It needs acquire semantic, since we want to make
sure that no read/write are reordered before the
pop() -- which returns an object to the caller.

* Fix memory order on runningJobCount

we needed acquire semantic when about to destroy
the last job -- it's similar to decRef.

* Comment usages of std::memory_order_*

* Fix AtomicFreeList A-B-A bug

Turns out AtomicFreeList was not immune to the ABA bug. W're fixing
it here by using a 64-bits CAS, which is available on aarch64 and armv7.
2018-11-05 19:12:13 -08:00
prideout
3a7d80f29b Restore "Add single-threaded config to Filament. (#130)"
This reverts commit 24022010f9.
2018-08-27 08:51:46 -07:00
Philip Rideout
24022010f9 Revert "Add single-threaded config to Filament. (#130)" (#152)
This reverts commit e3457aa0a7.
2018-08-26 18:40:49 -07:00
Philip Rideout
e3457aa0a7 Add single-threaded config to Filament. (#130)
* Add single-threaded config to Filament.

This adds a tick method to Engine and disables a couple components
in Renderer (FrameSkipper and FrameInfoManager).

This will make it easier to support WebGL, and will allow us to remove
some of the command buffer debugging stuff that we added for Vulkan.

* tick => execute, and other review feedback

* Restore the ASSERT for FFence::wait.
2018-08-24 16:37:12 -07:00
Mathias Agopian
d9ba3998d2 JobSystem now automatically free Jobs (#91)
* JobSystem now automatically free Jobs

Until now Job allocation used a linear allocator
strategy which required to “reset” the JobSystem
periodically — typically once per frame in
filament.

This is no longer required. We use a pool allocator
now, which doesn’t add much overhead. It does
use a spin-lock for thread-safety though, since
we assume very little contention, this shouldn’t
be a problem.

* Thread Safe Object Pool Allocator

A lock-less, thread-safe object pool allocator,
now used for storing JobSystem’s jobs allocations.
This gets rid of the spin-lock introduced in the
previous cl.
2018-08-13 22:25:15 -07:00
Mathias Agopian
94c3623c1c clean-up formatting
Change-Id: I2071e53cceb93cbe02d6bdfa238aa6ce770b0534
2018-08-06 18:14:11 -07:00
Tact Yoshida
ad49986245 Remove execute permissions 2018-08-06 10:36:54 -07:00
Romain Guy
b3d758f3b3 Initial commit 2018-08-03 10:38:22 -07:00