- only signal waitAndRelease() when the corresponding job finishes and
only if there is waitAndRelease() active -- instead of signaling
every time a job ends.
- don't surrender time slice when attempting to steal a job and it fails
as long as some queue has jobs.
- check that we have to wait, because taking the lock
- add a benchmark
This change more than doubles the amount of jobs we can handle per
second (~965,000 jobs/s on Pixel3)