This has caused issues and over time we have reduced the use of
spinlocks, it was only used in few places and we still have evidence
that it's causing ANRs.
We use utils::Mutex instead which is a low overhead mutex implementation
on Linux systems.
FIXES=[321101014]
A side effect of this flag was that we had to always use notify_all
when starting a job, instead of notify_one -- in practice, we found
experimentally that more calls to notify_one is cheaper than fewer
calls to notify_all.
- parallel_for doesn't use recursion anymore to create the "leaf"
jobs, this is now done linearly on N thread (one thread per CPU).
This uses less stack space, and reduces miss-predicted branches.
- remove almost all SYSTRACE calls because they have a huge impact
on things like parallel_for() and are misleading. They can be
enabled again by setting HEAVY_SYSTRACE to true.
- only signal waitAndRelease() when the corresponding job finishes and
only if there is waitAndRelease() active -- instead of signaling
every time a job ends.
- don't surrender time slice when attempting to steal a job and it fails
as long as some queue has jobs.
- check that we have to wait, because taking the lock
- add a benchmark
This change more than doubles the amount of jobs we can handle per
second (~965,000 jobs/s on Pixel3)