Capture command callstacks for debugging

Adjust NEW_RELEASE_NOTES to reflect cherry-picks
Update FrameCompletedCallback using directive (#7128 )
2023-10-30 14:42:24 -07:00 · 2023-08-30 16:30:10 -07:00 · 2023-08-30 16:26:15 -07:00 · 2023-08-30 16:22:37 -07:00 · 2023-08-30 13:34:49 -07:00 · 2023-08-30 08:46:53 -07:00
205 changed files with 5962 additions and 2707 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -4,6 +4,8 @@ about: Create a report to help us improve

 ---

+⚠️ **Issues not using this template will be systematically closed.**
+
 **Describe the bug**
 A clear and concise description of what the bug is.

@@ -18,8 +20,8 @@ A clear and concise description of what you expected to happen.
 If applicable, add screenshots to help explain your problem.

 **Logs**
-If applicable, copy logs from your console here. Please *do not*
-use screenshots of logs, copy them as text.
+If applicable, copy **full** logs from your console here. Please *do not*
+use screenshots of logs, copy them as text, use gist or attach an *uncompressed* file.

 **Desktop (please complete the following information):**
 - OS: [e.g. iOS]
--- a/BUILDING.md
+++ b/BUILDING.md
@@ -5,7 +5,7 @@
 To build Filament, you must first install the following tools:

 - CMake 3.19 (or more recent)
- clang 7.0 (or more recent)
+- clang 14.0 (or more recent)
 - [ninja 1.10](https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages) (or more recent)

 Additional dependencies may be required for your operating system. Please refer to the appropriate
@@ -87,10 +87,10 @@ Options can also be set with the CMake GUI.

 Make sure you've installed the following dependencies:

- `clang-7` or higher
+- `clang-14` or higher
 - `libglu1-mesa-dev`
- `libc++-7-dev` (`libcxx-devel` and `libcxx-static` on Fedora) or higher
- `libc++abi-7-dev` (`libcxxabi-static` on Fedora) or higher
+- `libc++-14-dev` (`libcxx-devel` and `libcxx-static` on Fedora) or higher
+- `libc++abi-14-dev` (`libcxxabi-static` on Fedora) or higher
 - `ninja-build`
 - `libxi-dev`
 - `libxcomposite-dev` (`libXcomposite-devel` on Fedora)
@@ -114,7 +114,7 @@ Your Linux distribution might default to `gcc` instead of `clang`, if that's the
 ```
 $ mkdir out/cmake-release
 $ cd out/cmake-release
-# Or use a specific version of clang, for instance /usr/bin/clang-7
+# Or use a specific version of clang, for instance /usr/bin/clang-14
 $ CC=/usr/bin/clang CXX=/usr/bin/clang++ CXXFLAGS=-stdlib=libc++ \
    cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../release/filament ../..
 ```
@@ -124,8 +124,8 @@ solution is to use `update-alternatives` to both change the default compiler, an
 specific version of clang:

 ```
-$ update-alternatives --install /usr/bin/clang clang /usr/bin/clang-7 100
-$ update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-7 100
+$ update-alternatives --install /usr/bin/clang clang /usr/bin/clang-14 100
+$ update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-14 100
 $ update-alternatives --install /usr/bin/cc cc /usr/bin/clang 100
 $ update-alternatives --install /usr/bin/c++ c++ /usr/bin/clang++ 100
 ```
--- a/NEW_RELEASE_NOTES.md
+++ b/NEW_RELEASE_NOTES.md
@@ -7,3 +7,9 @@ for next branch cut* header.
 appropriate header in [RELEASE_NOTES.md](./RELEASE_NOTES.md).

 ## Release notes for next branch cut
+
+- Fix possible NPE when updating fog options from Java/Kotlin
+- The `emissive` property was not applied properly to `MASKED` materials, and could cause
+  dark fringes to appear (recompile materials)
+- Allow glTF materials with transmission/volume extensions to choose their alpha mode
+  instead of forcing `MASKED`
--- a/README.md
+++ b/README.md
@@ -31,7 +31,7 @@ repositories {
 }

 dependencies {
-    implementation 'com.google.android.filament:filament-android:1.40.3'
+    implementation 'com.google.android.filament:filament-android:1.42.0'
 }
 ```

@@ -40,6 +40,7 @@ Here are all the libraries available in the group `com.google.android.filament`:
 | Artifact      | Description   |
 | ------------- | ------------- |
 | [![filament-android](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filament-android/badge.svg?subject=filament-android)](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filament-android)  | The Filament rendering engine itself. |
+| [![filament-android-debug](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filament-android-debug/badge.svg?subject=filament-android-debug)](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filament-android-debug)  | Debug version of `filament-android`. |
 | [![gltfio-android](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/gltfio-android/badge.svg?subject=gltfio-android)](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/gltfio-android) | A glTF 2.0 loader for Filament, depends on `filament-android`. |
 | [![filament-utils-android](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filament-utils-android/badge.svg?subject=filament-utils-android)](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filament-utils-android) | KTX loading, Kotlin math, and camera utilities, depends on `gltfio-android`. |
 | [![filamat-android](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filamat-android/badge.svg?subject=filamat-android)](https://maven-badges.herokuapp.com/maven-central/com.google.android.filament/filamat-android) | A runtime material builder/compiler. This library is large but contains a full shader compiler/validator/optimizer and supports both OpenGL and Vulkan. |
@@ -50,7 +51,7 @@ Here are all the libraries available in the group `com.google.android.filament`:
 iOS projects can use CocoaPods to install the latest release:

 ```
-pod 'Filament', '~> 1.40.3'
+pod 'Filament', '~> 1.42.0'
 ```

 ### Snapshots
--- a/RELEASE_NOTES.md
+++ b/RELEASE_NOTES.md
@@ -7,6 +7,37 @@ A new header is inserted each time a *tag* is created.
 Instead, if you are authoring a PR for the main branch, add your release note to
 [NEW_RELEASE_NOTES.md](./NEW_RELEASE_NOTES.md).

+## v1.42.1
+
+## v1.42.0
+
+- engine: add preliminary support for instanced stereoscopic rendering [⚠️ **Recompile materials**]
+
+## v1.41.0
+
+- backend: fix #6997 : picking can fail on Adreno [⚠️ **New Material Version**]
+- backend: A partial workaround for PowerVR devices (#5118, b/190221124) [⚠️ **Recompile Materials**]
+
+## v1.40.5
+
+- backend: Disable timer queries on all Mali GPUs (fixes b/233754398)
+- engine: Add a way to query the validity of most filament objects (see `Engine::isValid`)
+- opengl: fix b/290388359 : possible crash when shutting down the engine
+- engine: Improve precision of frame time measurement when using emulated TimerQueries
+- backend: Improve frame pacing on Android and Vulkan.
+- backend: workaround b/291140208 (gltf_viewer crashes on Nexus 6P)
+- engine: support `setDepthFunc` for `MaterialInstance`
+- web: Added setDepthFunc()/getDepthFunc() to MaterialInstance
+- android: Added setDepthFunc()/getDepthFunc() to MaterialInstance
+
+## v1.40.4
+
+- gltfio: fix crash when compute morph target without material
+- matc: fix buggy `variant-filter` flag
+- web: Added missing setMat3Parameter()/setMat4Parameter() to MaterialInstance
+- opengl: fix b/290670707 : crash when using the blob cache
+- engine: fix a crash with `Material::compile()` when a callback is specified
+
 ## v1.40.3

 ## v1.40.2
--- a/android/Windows.md
+++ b/android/Windows.md
@@ -135,7 +135,7 @@ gradlew -Pcom.google.android.filament.dist-dir=..\out\android-release\filament a
 If you're only interested in building SDK, you may skip samples build by passing a `com.google.android.filament.skip-samples` flag:

 ```
-gradlew -Pcom.google.android.filament.dist-dir=..\out\android-release\filament assembleRelease -Pfilament_skip_samples
+gradlew -Pcom.google.android.filament.dist-dir=..\out\android-release\filament assembleRelease -Pcom.google.android.filament.skip-samples
 ```


--- a/android/build.gradle
+++ b/android/build.gradle
@@ -83,12 +83,12 @@ buildscript {
        'minSdk': 19,
        'targetSdk': 33,
        'compileSdk': 33,
-        'kotlin': '1.8.20',
-        'kotlin_coroutines': '1.7.1',
-        'buildTools': '33.0.2',
+        'kotlin': '1.9.0',
+        'kotlin_coroutines': '1.7.2',
+        'buildTools': '34.0.0',
        'ndk': '25.1.8937393',
-        'androidx_core': '1.10.0',
-        'androidx_annotations': '1.3.0'
+        'androidx_core': '1.10.1',
+        'androidx_annotations': '1.6.0'
    ]

    ext.deps = [
@@ -104,7 +104,7 @@ buildscript {
    ]

    dependencies {
-        classpath 'com.android.tools.build:gradle:8.0.2'
+        classpath 'com.android.tools.build:gradle:8.1.0'
        classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:${versions.kotlin}"
    }

@@ -152,7 +152,7 @@ buildscript {
 }

 plugins {
-    id "io.github.gradle-nexus.publish-plugin" version "1.1.0"
+    id "io.github.gradle-nexus.publish-plugin" version "1.3.0"
 }

 // See https://github.com/gradle-nexus/publish-plugin
--- a/android/filament-android/src/main/cpp/Engine.cpp
+++ b/android/filament-android/src/main/cpp/Engine.cpp
@@ -278,6 +278,112 @@ Java_com_google_android_filament_Engine_nDestroyEntity(JNIEnv*, jclass,
    engine->destroy(entity);
 }

+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidRenderer(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeRenderer) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Renderer*)nativeRenderer);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidView(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeView) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((View*)nativeView);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidScene(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeScene) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Scene*)nativeScene);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidFence(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeFence) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Fence*)nativeFence);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidStream(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeStream) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Stream*)nativeStream);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidIndexBuffer(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeIndexBuffer) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((IndexBuffer*)nativeIndexBuffer);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidVertexBuffer(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeVertexBuffer) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((VertexBuffer*)nativeVertexBuffer);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidSkinningBuffer(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeSkinningBuffer) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((SkinningBuffer*)nativeSkinningBuffer);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidIndirectLight(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeIndirectLight) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((IndirectLight*)nativeIndirectLight);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidMaterial(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeMaterial) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Material*)nativeMaterial);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidSkybox(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeSkybox) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Skybox*)nativeSkybox);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidColorGrading(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeColorGrading) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((ColorGrading*)nativeColorGrading);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidTexture(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeTexture) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((Texture*)nativeTexture);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidRenderTarget(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeTarget) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((RenderTarget*)nativeTarget);
+}
+
+extern "C" JNIEXPORT jboolean JNICALL
+Java_com_google_android_filament_Engine_nIsValidSwapChain(JNIEnv*, jclass,
+        jlong nativeEngine, jlong nativeSwapChain) {
+    Engine* engine = (Engine *)nativeEngine;
+    return (jboolean)engine->isValid((SwapChain*)nativeSwapChain);
+}
+
 extern "C" JNIEXPORT void JNICALL
 Java_com_google_android_filament_Engine_nFlushAndWait(JNIEnv*, jclass,
        jlong nativeEngine) {
--- a/android/filament-android/src/main/cpp/MaterialInstance.cpp
+++ b/android/filament-android/src/main/cpp/MaterialInstance.cpp
@@ -246,17 +246,21 @@ Java_com_google_android_filament_MaterialInstance_nSetFloatParameterArray(JNIEnv
    env->ReleaseStringUTFChars(name_, name);
 }

+// defined in TextureSampler.cpp
+namespace filament::JniUtils {
+    TextureSampler from_long(jlong params) noexcept;
+} // TextureSamplerJniUtils
+
 extern "C"
 JNIEXPORT void JNICALL
 Java_com_google_android_filament_MaterialInstance_nSetParameterTexture(
        JNIEnv *env, jclass, jlong nativeMaterialInstance, jstring name_,
-        jlong nativeTexture, jint sampler_) {
+        jlong nativeTexture, jlong sampler_) {
    MaterialInstance* instance = (MaterialInstance*) nativeMaterialInstance;
    Texture* texture = (Texture*) nativeTexture;
-    TextureSampler& sampler = reinterpret_cast<TextureSampler&>(sampler_);

    const char *name = env->GetStringUTFChars(name_, 0);
-    instance->setParameter(name, texture, sampler);
+    instance->setParameter(name, texture, JniUtils::from_long(sampler_));
    env->ReleaseStringUTFChars(name_, name);
 }

@@ -357,6 +361,14 @@ Java_com_google_android_filament_MaterialInstance_nSetDepthCulling(JNIEnv*,
    instance->setDepthCulling(enable);
 }

+extern "C"
+JNIEXPORT void JNICALL
+Java_com_google_android_filament_MaterialInstance_nSetDepthFunc(JNIEnv*,
+        jclass, jlong nativeMaterialInstance, jlong function) {
+    MaterialInstance* instance = (MaterialInstance*) nativeMaterialInstance;
+    instance->setDepthFunc(static_cast<MaterialInstance::DepthFunc>(function));
+}
+
 extern "C"
 JNIEXPORT void JNICALL
 Java_com_google_android_filament_MaterialInstance_nSetStencilCompareFunction(JNIEnv*, jclass,
@@ -524,3 +536,11 @@ Java_com_google_android_filament_MaterialInstance_nIsDepthCullingEnabled(JNIEnv*
    MaterialInstance* instance = (MaterialInstance*)nativeMaterialInstance;
    return instance->isDepthCullingEnabled();
 }
+
+extern "C"
+JNIEXPORT jint JNICALL
+Java_com_google_android_filament_MaterialInstance_nGetDepthFunc(JNIEnv* env, jclass clazz,
+        jlong nativeMaterialInstance) {
+    MaterialInstance* instance = (MaterialInstance*)nativeMaterialInstance;
+    return (jint)instance->getDepthFunc();
+}
--- a/android/filament-android/src/main/cpp/SwapChain.cpp
+++ b/android/filament-android/src/main/cpp/SwapChain.cpp
@@ -27,11 +27,10 @@ extern "C" JNIEXPORT void JNICALL
 Java_com_google_android_filament_SwapChain_nSetFrameCompletedCallback(JNIEnv* env, jclass,
        jlong nativeSwapChain, jobject handler, jobject runnable) {
    SwapChain* swapChain = (SwapChain*) nativeSwapChain;
-    auto *callback = JniCallback::make(env, handler, runnable);
-    swapChain->setFrameCompletedCallback([](void* user) {
-        JniCallback* callback = (JniCallback*)user;
+    auto* callback = JniCallback::make(env, handler, runnable);
+    swapChain->setFrameCompletedCallback(nullptr, [callback](SwapChain* swapChain) {
        JniCallback::postToJavaAndDestroy(callback);
-    }, callback);
+    });
 }

 extern "C" JNIEXPORT jboolean JNICALL
--- a/android/filament-android/src/main/cpp/TextureSampler.cpp
+++ b/android/filament-android/src/main/cpp/TextureSampler.cpp
@@ -18,142 +18,139 @@

 #include <filament/TextureSampler.h>

+#include <utils/algorithm.h>
+
 using namespace filament;

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nCreateSampler(JNIEnv *env, jclass type, jint min,
+namespace filament::JniUtils {
+
+jlong to_long(TextureSampler const& sampler) noexcept {
+    return jlong(utils::bit_cast<uint32_t>(sampler.getSamplerParams()));
+}
+
+TextureSampler from_long(jlong params) noexcept {
+    return TextureSampler{
+            utils::bit_cast<backend::SamplerParams>(
+                    static_cast<uint32_t>(params))};
+}
+
+} // namespace filament::JniUtils
+
+using namespace JniUtils;
+
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nCreateSampler(JNIEnv *, jclass, jint min,
        jint max, jint s, jint t, jint r) {
-    return TextureSampler(static_cast<TextureSampler::MinFilter>(min),
-            static_cast<TextureSampler::MagFilter>(max), static_cast<TextureSampler::WrapMode>(s),
-            static_cast<TextureSampler::WrapMode>(t),
-            static_cast<TextureSampler::WrapMode>(r)).getSamplerParams().u;
+    TextureSampler sampler(static_cast<TextureSampler::MinFilter>(min),
+                           static_cast<TextureSampler::MagFilter>(max),
+                           static_cast<TextureSampler::WrapMode>(s),
+                           static_cast<TextureSampler::WrapMode>(t),
+                           static_cast<TextureSampler::WrapMode>(r));
+    return to_long(sampler);
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nCreateCompareSampler(JNIEnv *env, jclass type,
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nCreateCompareSampler(JNIEnv *, jclass,
        jint mode, jint function) {
-    return TextureSampler(static_cast<TextureSampler::CompareMode>(mode),
-            static_cast<TextureSampler::CompareFunc>(function)).getSamplerParams().u;
+    TextureSampler sampler(static_cast<TextureSampler::CompareMode>(mode),
+                           static_cast<TextureSampler::CompareFunc>(function));
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetMinFilter(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getMinFilter());
+Java_com_google_android_filament_TextureSampler_nGetMinFilter(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getMinFilter());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetMinFilter(JNIEnv *env, jclass type,
-        jint sampler_, jint filter) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetMinFilter(JNIEnv *, jclass, jlong sampler_, jint filter) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setMinFilter(static_cast<TextureSampler::MinFilter>(filter));
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetMagFilter(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getMagFilter());
+Java_com_google_android_filament_TextureSampler_nGetMagFilter(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getMagFilter());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetMagFilter(JNIEnv *env, jclass type,
-        jint sampler_, jint filter) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetMagFilter(JNIEnv *, jclass, jlong sampler_, jint filter) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setMagFilter(static_cast<TextureSampler::MagFilter>(filter));
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetWrapModeS(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getWrapModeS());
+Java_com_google_android_filament_TextureSampler_nGetWrapModeS(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getWrapModeS());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetWrapModeS(JNIEnv *env, jclass type,
-        jint sampler_, jint mode) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetWrapModeS(JNIEnv *, jclass, jlong sampler_, jint mode) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setWrapModeS(static_cast<TextureSampler::WrapMode>(mode));
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetWrapModeT(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getWrapModeT());
+Java_com_google_android_filament_TextureSampler_nGetWrapModeT(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getWrapModeT());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetWrapModeT(JNIEnv *env, jclass type,
-        jint sampler_, jint mode) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetWrapModeT(JNIEnv *, jclass, jlong sampler_, jint mode) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setWrapModeT(static_cast<TextureSampler::WrapMode>(mode));
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetWrapModeR(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getWrapModeR());
+Java_com_google_android_filament_TextureSampler_nGetWrapModeR(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getWrapModeR());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetWrapModeR(JNIEnv *env, jclass type,
-        jint sampler_, jint mode) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetWrapModeR(JNIEnv *, jclass, jlong sampler_, jint mode) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setWrapModeR(static_cast<TextureSampler::WrapMode>(mode));
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetCompareMode(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getCompareMode());
+Java_com_google_android_filament_TextureSampler_nGetCompareMode(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getCompareMode());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetCompareMode(JNIEnv *env, jclass type,
-        jint sampler_, jint mode) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetCompareMode(JNIEnv *, jclass, jlong sampler_, jint mode) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setCompareMode(static_cast<TextureSampler::CompareMode>(mode),
            sampler.getCompareFunc());
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nGetCompareFunction(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return static_cast<jint>(sampler.getCompareFunc());
+Java_com_google_android_filament_TextureSampler_nGetCompareFunction(JNIEnv *, jclass, jlong sampler) {
+    return static_cast<jint>(from_long(sampler).getCompareFunc());
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetCompareFunction(JNIEnv *env, jclass type,
-        jint sampler_, jint function) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetCompareFunction(JNIEnv *, jclass, jlong sampler_, jint function) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setCompareMode(sampler.getCompareMode(),
            static_cast<TextureSampler::CompareFunc>(function));
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }

 extern "C" JNIEXPORT jfloat JNICALL
-Java_com_google_android_filament_TextureSampler_nGetAnisotropy(JNIEnv *env, jclass type,
-        jint sampler_) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
-    return sampler.getAnisotropy();
+Java_com_google_android_filament_TextureSampler_nGetAnisotropy(JNIEnv *, jclass, jlong sampler) {
+    return from_long(sampler).getAnisotropy();
 }

-extern "C" JNIEXPORT jint JNICALL
-Java_com_google_android_filament_TextureSampler_nSetAnisotropy(JNIEnv *env, jclass type,
-        jint sampler_, jfloat anisotropy) {
-    TextureSampler &sampler = reinterpret_cast<TextureSampler &>(sampler_);
+extern "C" JNIEXPORT jlong JNICALL
+Java_com_google_android_filament_TextureSampler_nSetAnisotropy(JNIEnv *, jclass, jlong sampler_, jfloat anisotropy) {
+    TextureSampler sampler{from_long(sampler_)};
    sampler.setAnisotropy(anisotropy);
-    return sampler.getSamplerParams().u;
+    return to_long(sampler);
 }
--- a/android/filament-android/src/main/java/com/google/android/filament/Engine.java
+++ b/android/filament-android/src/main/java/com/google/android/filament/Engine.java
@@ -449,6 +449,141 @@ public class Engine {
        swapChain.clearNativeObject();
    }

+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidRenderer(@NonNull Renderer object) {
+        return nIsValidRenderer(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidView(@NonNull View object) {
+        return nIsValidView(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidScene(@NonNull Scene object) {
+        return nIsValidScene(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidFence(@NonNull Fence object) {
+        return nIsValidFence(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidStream(@NonNull Stream object) {
+        return nIsValidStream(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidIndexBuffer(@NonNull IndexBuffer object) {
+        return nIsValidIndexBuffer(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidVertexBuffer(@NonNull VertexBuffer object) {
+        return nIsValidVertexBuffer(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidSkinningBuffer(@NonNull SkinningBuffer object) {
+        return nIsValidSkinningBuffer(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidIndirectLight(@NonNull IndirectLight object) {
+        return nIsValidIndirectLight(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidMaterial(@NonNull Material object) {
+        return nIsValidMaterial(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidSkybox(@NonNull Skybox object) {
+        return nIsValidSkybox(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidColorGrading(@NonNull ColorGrading object) {
+        return nIsValidColorGrading(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidTexture(@NonNull Texture object) {
+        return nIsValidTexture(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidRenderTarget(@NonNull RenderTarget object) {
+        return nIsValidRenderTarget(getNativeObject(), object.getNativeObject());
+    }
+
+    /**
+     * Returns whether the object is valid.
+     * @param object Object to check for validity
+     * @return returns true if the specified object is valid.
+     */
+    public boolean isValidSwapChain(@NonNull SwapChain object) {
+        return nIsValidSwapChain(getNativeObject(), object.getNativeObject());
+    }
+
    // View

    /**
@@ -785,17 +920,17 @@ public class Engine {
    private static native long nCreateSwapChain(long nativeEngine, Object nativeWindow, long flags);
    private static native long nCreateSwapChainHeadless(long nativeEngine, int width, int height, long flags);
    private static native long nCreateSwapChainFromRawPointer(long nativeEngine, long pointer, long flags);
-    private static native boolean nDestroySwapChain(long nativeEngine, long nativeSwapChain);
    private static native long nCreateView(long nativeEngine);
-    private static native boolean nDestroyView(long nativeEngine, long nativeView);
    private static native long nCreateRenderer(long nativeEngine);
-    private static native boolean nDestroyRenderer(long nativeEngine, long nativeRenderer);
    private static native long nCreateCamera(long nativeEngine, int entity);
    private static native long nGetCameraComponent(long nativeEngine, int entity);
    private static native void nDestroyCameraComponent(long nativeEngine, int entity);
    private static native long nCreateScene(long nativeEngine);
-    private static native boolean nDestroyScene(long nativeEngine, long nativeScene);
    private static native long nCreateFence(long nativeEngine);
+
+    private static native boolean nDestroyRenderer(long nativeEngine, long nativeRenderer);
+    private static native boolean nDestroyView(long nativeEngine, long nativeView);
+    private static native boolean nDestroyScene(long nativeEngine, long nativeScene);
    private static native boolean nDestroyFence(long nativeEngine, long nativeFence);
    private static native boolean nDestroyStream(long nativeEngine, long nativeStream);
    private static native boolean nDestroyIndexBuffer(long nativeEngine, long nativeIndexBuffer);
@@ -808,6 +943,22 @@ public class Engine {
    private static native boolean nDestroyColorGrading(long nativeEngine, long nativeColorGrading);
    private static native boolean nDestroyTexture(long nativeEngine, long nativeTexture);
    private static native boolean nDestroyRenderTarget(long nativeEngine, long nativeTarget);
+    private static native boolean nDestroySwapChain(long nativeEngine, long nativeSwapChain);
+    private static native boolean nIsValidRenderer(long nativeEngine, long nativeRenderer);
+    private static native boolean nIsValidView(long nativeEngine, long nativeView);
+    private static native boolean nIsValidScene(long nativeEngine, long nativeScene);
+    private static native boolean nIsValidFence(long nativeEngine, long nativeFence);
+    private static native boolean nIsValidStream(long nativeEngine, long nativeStream);
+    private static native boolean nIsValidIndexBuffer(long nativeEngine, long nativeIndexBuffer);
+    private static native boolean nIsValidVertexBuffer(long nativeEngine, long nativeVertexBuffer);
+    private static native boolean nIsValidSkinningBuffer(long nativeEngine, long nativeSkinningBuffer);
+    private static native boolean nIsValidIndirectLight(long nativeEngine, long nativeIndirectLight);
+    private static native boolean nIsValidMaterial(long nativeEngine, long nativeMaterial);
+    private static native boolean nIsValidSkybox(long nativeEngine, long nativeSkybox);
+    private static native boolean nIsValidColorGrading(long nativeEngine, long nativeColorGrading);
+    private static native boolean nIsValidTexture(long nativeEngine, long nativeTexture);
+    private static native boolean nIsValidRenderTarget(long nativeEngine, long nativeTarget);
+    private static native boolean nIsValidSwapChain(long nativeEngine, long nativeSwapChain);
    private static native void nDestroyEntity(long nativeEngine, int entity);
    private static native void nFlushAndWait(long nativeEngine);
    private static native long nGetTransformManager(long nativeEngine);
--- a/android/filament-android/src/main/java/com/google/android/filament/MaterialInstance.java
+++ b/android/filament-android/src/main/java/com/google/android/filament/MaterialInstance.java
@@ -625,6 +625,15 @@ public class MaterialInstance {
        nSetDepthCulling(getNativeObject(), enable);
    }

+    /**
+     * Sets the depth comparison function (default is {@link TextureSampler.CompareFunction#GE}).
+     *
+     * @param func the depth comparison function
+     */
+    public void setDepthFunc(TextureSampler.CompareFunction func) {
+        nSetDepthFunc(getNativeObject(), func.ordinal());
+    }
+
    /**
     * Returns whether depth culling is enabled.
     */
@@ -632,6 +641,13 @@ public class MaterialInstance {
        return nIsDepthCullingEnabled(getNativeObject());
    }

+    /**
+     * Returns the depth comparison function.
+     */
+    public TextureSampler.CompareFunction getDepthFunc() {
+        return TextureSampler.EnumCache.sCompareFunctionValues[nGetDepthFunc(getNativeObject())];
+    }
+
    /**
     * Sets the stencil comparison function (default is {@link TextureSampler.CompareFunction#ALWAYS}).
     *
@@ -884,7 +900,7 @@ public class MaterialInstance {
            @IntRange(from = 0) int offset, @IntRange(from = 1) int count);

    private static native void nSetParameterTexture(long nativeMaterialInstance,
-            @NonNull String name, long nativeTexture, int sampler);
+            @NonNull String name, long nativeTexture, long sampler);

    private static native void nSetScissor(long nativeMaterialInstance,
            @IntRange(from = 0) int left, @IntRange(from = 0) int bottom,
@@ -908,6 +924,7 @@ public class MaterialInstance {
    private static native void nSetDepthWrite(long nativeMaterialInstance, boolean enable);
    private static native void nSetStencilWrite(long nativeMaterialInstance, boolean enable);
    private static native void nSetDepthCulling(long nativeMaterialInstance, boolean enable);
+    private static native void nSetDepthFunc(long nativeMaterialInstance, long function);

    private static native void nSetStencilCompareFunction(long nativeMaterialInstance,
            long function, long face);
@@ -939,4 +956,5 @@ public class MaterialInstance {
    private static native boolean nIsDepthWriteEnabled(long nativeMaterialInstance);
    private static native boolean nIsStencilWriteEnabled(long nativeMaterialInstance);
    private static native boolean nIsDepthCullingEnabled(long nativeMaterialInstance);
+    private static native int nGetDepthFunc(long nativeMaterialInstance);
 }
--- a/android/filament-android/src/main/java/com/google/android/filament/RenderTarget.java
+++ b/android/filament-android/src/main/java/com/google/android/filament/RenderTarget.java
@@ -81,8 +81,6 @@ public class RenderTarget {
        /**
         * Sets a texture to a given attachment point.
         *
-         * <p>All RenderTargets must have a non-null <code>COLOR</code> attachment.</p>
-         *
         * @param attachment The attachment point of the texture.
         * @param texture The associated texture object.
         * @return A reference to this Builder for chaining calls.
--- a/android/filament-android/src/main/java/com/google/android/filament/SwapChain.java
+++ b/android/filament-android/src/main/java/com/google/android/filament/SwapChain.java
@@ -137,10 +137,6 @@ public class SwapChain {
     * </p>
     *
     * <p>
-     * The FrameCompletedCallback is guaranteed to be called on the main Filament thread.
-     * </p>
-     *
-     * <p>
     * Warning: Only Filament's Metal backend supports frame callbacks. Other backends ignore the
     * callback (which will never be called) and proceed normally.
     * </p>
--- a/android/filament-android/src/main/java/com/google/android/filament/TextureSampler.java
+++ b/android/filament-android/src/main/java/com/google/android/filament/TextureSampler.java
@@ -126,7 +126,7 @@ public class TextureSampler {
        NEVER
    }

-    int mSampler = 0; // bit field used by native
+    long mSampler = 0; // bit field used by native

    /**
     * Initializes the <code>TextureSampler</code> with default values.
@@ -342,26 +342,26 @@ public class TextureSampler {
        }
    }

-    private static native int nCreateSampler(int min, int max, int s, int t, int r);
-    private static native int nCreateCompareSampler(int mode, int function);
+    private static native long nCreateSampler(int min, int max, int s, int t, int r);
+    private static native long nCreateCompareSampler(int mode, int function);

-    private static native int nGetMinFilter(int sampler);
-    private static native int nSetMinFilter(int sampler, int filter);
-    private static native int nGetMagFilter(int sampler);
-    private static native int nSetMagFilter(int sampler, int filter);
+    private static native int nGetMinFilter(long sampler);
+    private static native long nSetMinFilter(long sampler, int filter);
+    private static native int nGetMagFilter(long sampler);
+    private static native long nSetMagFilter(long sampler, int filter);

-    private static native int nGetWrapModeS(int sampler);
-    private static native int nSetWrapModeS(int sampler, int mode);
-    private static native int nGetWrapModeT(int sampler);
-    private static native int nSetWrapModeT(int sampler, int mode);
-    private static native int nGetWrapModeR(int sampler);
-    private static native int nSetWrapModeR(int sampler, int mode);
+    private static native int nGetWrapModeS(long sampler);
+    private static native long nSetWrapModeS(long sampler, int mode);
+    private static native int nGetWrapModeT(long sampler);
+    private static native long nSetWrapModeT(long sampler, int mode);
+    private static native int nGetWrapModeR(long sampler);
+    private static native long nSetWrapModeR(long sampler, int mode);

-    private static native int nGetCompareMode(int sampler);
-    private static native int nSetCompareMode(int sampler, int mode);
-    private static native int nGetCompareFunction(int sampler);
-    private static native int nSetCompareFunction(int sampler, int function);
+    private static native int nGetCompareMode(long sampler);
+    private static native long nSetCompareMode(long sampler, int mode);
+    private static native int nGetCompareFunction(long sampler);
+    private static native long nSetCompareFunction(long sampler, int function);

-    private static native float nGetAnisotropy(int sampler);
-    private static native int nSetAnisotropy(int sampler, float anisotropy);
+    private static native float nGetAnisotropy(long sampler);
+    private static native long nSetAnisotropy(long sampler, float anisotropy);
 }
--- a/android/filament-android/src/main/java/com/google/android/filament/View.java
+++ b/android/filament-android/src/main/java/com/google/android/filament/View.java
@@ -27,6 +27,8 @@ import static com.google.android.filament.Asserts.assertFloat3In;
 import static com.google.android.filament.Asserts.assertFloat4In;
 import static com.google.android.filament.Colors.LinearColor;

+import com.google.android.filament.proguard.UsedByNative;
+
 /**
 * Encompasses all the state needed for rendering a {@link Scene}.
 *
@@ -965,7 +967,8 @@ public class View {
                options.heightFalloff, options.cutOffDistance,
                options.color[0], options.color[1], options.color[2],
                options.density, options.inScatteringStart, options.inScatteringSize,
-                options.fogColorFromIbl, options.skyColor.getNativeObject(),
+                options.fogColorFromIbl,
+                options.skyColor == null ? 0 : options.skyColor.getNativeObject(),
                options.enabled);
    }

@@ -1095,10 +1098,29 @@ public class View {
        nPick(getNativeObject(), x, y, handler, internalCallback);
    }

+    @UsedByNative("View.cpp")
    private static class InternalOnPickCallback implements Runnable {
+        private final OnPickCallback mUserCallback;
+        private final PickingQueryResult mPickingQueryResult = new PickingQueryResult();
+
+        @UsedByNative("View.cpp")
+        @Entity
+        int mRenderable;
+
+        @UsedByNative("View.cpp")
+        float mDepth;
+
+        @UsedByNative("View.cpp")
+        float mFragCoordsX;
+        @UsedByNative("View.cpp")
+        float mFragCoordsY;
+        @UsedByNative("View.cpp")
+        float mFragCoordsZ;
+
        public InternalOnPickCallback(OnPickCallback mUserCallback) {
            this.mUserCallback = mUserCallback;
        }
+
        @Override
        public void run() {
            mPickingQueryResult.renderable = mRenderable;
@@ -1108,13 +1130,6 @@ public class View {
            mPickingQueryResult.fragCoords[2] = mFragCoordsZ;
            mUserCallback.onPick(mPickingQueryResult);
        }
-        private final OnPickCallback mUserCallback;
-        private final PickingQueryResult mPickingQueryResult = new PickingQueryResult();
-        @Entity int mRenderable;
-        float mDepth;
-        float mFragCoordsX;
-        float mFragCoordsY;
-        float mFragCoordsZ;
    }

    /**
@@ -1377,13 +1392,13 @@ public class View {
        /**
         * resolution of vertical axis (2^levels to 2048)
         */
-        public int resolution = 360;
+        public int resolution = 384;
        /**
         * bloom x/y aspect-ratio (1/32 to 32)
         */
        public float anamorphism = 1.0f;
        /**
-         * number of blur levels (3 to 11)
+         * number of blur levels (1 to 11)
         */
        public int levels = 6;
        /**
@@ -1971,4 +1986,11 @@ public class View {
         */
        public float penumbraRatioScale = 1.0f;
    }
+
+    /**
+     * Options for stereoscopic (multi-eye) rendering.
+     */
+    public static class StereoscopicOptions {
+        public boolean enabled = false;
+    }
 }
--- a/android/gradle.properties
+++ b/android/gradle.properties
@@ -1,5 +1,5 @@
 GROUP=com.google.android.filament
-VERSION_NAME=1.40.3
+VERSION_NAME=1.42.0

 POM_DESCRIPTION=Real-time physically based rendering engine for Android.

--- a/android/samples/sample-gltf-viewer/src/main/java/com/google/android/filament/gltf/MainActivity.kt
+++ b/android/samples/sample-gltf-viewer/src/main/java/com/google/android/filament/gltf/MainActivity.kt
@@ -28,6 +28,7 @@ import com.google.android.filament.Fence
 import com.google.android.filament.IndirectLight
 import com.google.android.filament.Skybox
 import com.google.android.filament.View
+import com.google.android.filament.View.OnPickCallback
 import com.google.android.filament.utils.*
 import kotlinx.coroutines.CoroutineScope
 import kotlinx.coroutines.Dispatchers
@@ -56,7 +57,9 @@ class MainActivity : Activity() {
    private lateinit var modelViewer: ModelViewer
    private lateinit var titlebarHint: TextView
    private val doubleTapListener = DoubleTapListener()
+    private val singleTapListener = SingleTapListener()
    private lateinit var doubleTapDetector: GestureDetector
+    private lateinit var singleTapDetector: GestureDetector
    private var remoteServer: RemoteServer? = null
    private var statusToast: Toast? = null
    private var statusText: String? = null
@@ -77,6 +80,7 @@ class MainActivity : Activity() {
        choreographer = Choreographer.getInstance()

        doubleTapDetector = GestureDetector(applicationContext, doubleTapListener)
+        singleTapDetector = GestureDetector(applicationContext, singleTapListener)

        modelViewer = ModelViewer(surfaceView)
        viewerContent.view = modelViewer.view
@@ -88,6 +92,7 @@ class MainActivity : Activity() {
        surfaceView.setOnTouchListener { _, event ->
            modelViewer.onTouchEvent(event)
            doubleTapDetector.onTouchEvent(event)
+            singleTapDetector.onTouchEvent(event)
            true
        }

@@ -229,6 +234,7 @@ class MainActivity : Activity() {
                modelViewer.scene.skybox = sky
                modelViewer.scene.indirectLight = ibl
                viewerContent.indirectLight = ibl
+
            }
        }
    }
@@ -337,6 +343,11 @@ class MainActivity : Activity() {
        remoteServer?.close()
    }

+    override fun onBackPressed() {
+        super.onBackPressed()
+        finish()
+    }
+
    fun loadModelData(message: RemoteServer.ReceivedMessage) {
        Log.i(TAG, "Downloaded model ${message.label} (${message.buffer.capacity()} bytes)")
        clearStatusText()
@@ -425,4 +436,19 @@ class MainActivity : Activity() {
            return super.onDoubleTap(e)
        }
    }
+
+    // Just for testing purposes
+    inner class SingleTapListener : GestureDetector.SimpleOnGestureListener() {
+        override fun onSingleTapUp(event: MotionEvent): Boolean {
+            modelViewer.view.pick(
+                event.x.toInt(),
+                surfaceView.height - event.y.toInt(),
+                surfaceView.handler, {
+                    val name = modelViewer.asset!!.getName(it.renderable)
+                    Log.v("Filament", "Picked ${it.renderable}: " + name)
+                },
+            )
+            return super.onSingleTapUp(event)
+        }
+    }
 }
--- a/android/samples/sample-image-based-lighting/src/main/java/com/google/android/filament/ibl/MainActivity.kt
+++ b/android/samples/sample-image-based-lighting/src/main/java/com/google/android/filament/ibl/MainActivity.kt
@@ -118,9 +118,10 @@ class MainActivity : Activity() {
    }

    private fun setupView() {
-        val ssaoOptions = view.ambientOcclusionOptions
-        ssaoOptions.enabled = true
-        view.ambientOcclusionOptions = ssaoOptions
+        // ambient occlusion is the cheapest effect that adds a lot of quality
+        view.ambientOcclusionOptions = view.ambientOcclusionOptions.apply {
+            enabled = true
+        }

        // NOTE: Try to disable post-processing (tone-mapping, etc.) to see the difference
        // view.isPostProcessingEnabled = false
--- a/docs/Materials.md.html
+++ b/docs/Materials.md.html
@@ -1139,7 +1139,8 @@ Type
 :    array of `string`

 Value
-:     Each entry must be any of `dynamicLighting`, `directionalLighting`, `shadowReceiver`,`skinning` or `ssr`.
+:     Each entry must be any of `dynamicLighting`, `directionalLighting`, `shadowReceiver`,
+      `skinning`, `ssr`, or `stereo`.

 Description
 :     Used to specify a list of shader variants that the application guarantees will never be
@@ -1158,6 +1159,7 @@ Description of the variants:
 - `fog`, used when global fog is applied to the scene
 - `vsm`, used when VSM shadows are enabled and the object is a shadow receiver
 - `ssr`, used when screen-space reflections are enabled in the View
+- `stereo`, used when stereoscopic rendering is enabled in the View

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ JSON
 material {
--- a/docs/remote/filament.js
+++ b/docs/remote/filament.js
--- a/docs/remote/filament.wasm
+++ b/docs/remote/filament.wasm
--- a/docs/webgl/filament.js
+++ b/docs/webgl/filament.js
--- a/docs/webgl/filament.wasm
+++ b/docs/webgl/filament.wasm
--- a/docs/webgl/parquet.filamat
+++ b/docs/webgl/parquet.filamat
--- a/docs/webgl/plastic.filamat
+++ b/docs/webgl/plastic.filamat
--- a/docs/webgl/textured.filamat
+++ b/docs/webgl/textured.filamat
--- a/docs/webgl/triangle.filamat
+++ b/docs/webgl/triangle.filamat
--- a/filament/backend/CMakeLists.txt
+++ b/filament/backend/CMakeLists.txt
@@ -27,10 +27,10 @@ set(SRCS
        src/BackendUtils.cpp
        src/BlobCacheKey.cpp
        src/Callable.cpp
-        src/CallbackHandler.cpp
        src/CircularBuffer.cpp
        src/CommandBufferQueue.cpp
        src/CommandStream.cpp
+        src/CompilerThreadPool.cpp
        src/Driver.cpp
        src/Handle.cpp
        src/HandleAllocator.cpp
@@ -55,6 +55,7 @@ set(PRIVATE_HDRS
        include/private/backend/PlatformFactory.h
        include/private/backend/SamplerGroup.h
        src/CommandStreamDispatcher.h
+        src/CompilerThreadPool.h
        src/DataReshaper.h
        src/DriverBase.h
 )
@@ -66,6 +67,8 @@ set(PRIVATE_HDRS
 if (FILAMENT_SUPPORTS_OPENGL AND NOT FILAMENT_USE_EXTERNAL_GLES3 AND NOT FILAMENT_USE_SWIFTSHADER)
    list(APPEND SRCS
            include/backend/platforms/OpenGLPlatform.h
+            src/opengl/CallbackManager.h
+            src/opengl/CallbackManager.cpp
            src/opengl/gl_headers.cpp
            src/opengl/gl_headers.h
            src/opengl/GLUtils.cpp
@@ -174,8 +177,6 @@ if (FILAMENT_SUPPORTS_VULKAN)
            src/vulkan/VulkanConstants.h
            src/vulkan/VulkanContext.cpp
            src/vulkan/VulkanContext.h
-            src/vulkan/VulkanDisposer.cpp
-            src/vulkan/VulkanDisposer.h
            src/vulkan/VulkanDriver.cpp
            src/vulkan/VulkanDriver.h
            src/vulkan/VulkanDriverFactory.h
@@ -195,6 +196,11 @@ if (FILAMENT_SUPPORTS_VULKAN)
            src/vulkan/VulkanStagePool.h
            src/vulkan/VulkanSwapChain.cpp
            src/vulkan/VulkanSwapChain.h
+            src/vulkan/VulkanReadPixels.cpp
+            src/vulkan/VulkanReadPixels.h
+            src/vulkan/VulkanResourceAllocator.h
+            src/vulkan/VulkanResources.cpp
+            src/vulkan/VulkanResources.h
            src/vulkan/VulkanTexture.cpp
            src/vulkan/VulkanTexture.h
            src/vulkan/VulkanUtility.cpp
--- a/filament/backend/include/backend/CallbackHandler.h
+++ b/filament/backend/include/backend/CallbackHandler.h
@@ -66,7 +66,7 @@ public:
    virtual void post(void* user, Callback callback) = 0;

 protected:
-    virtual ~CallbackHandler();
+    virtual ~CallbackHandler() = default;
 };

 } // namespace filament::backend
--- a/filament/backend/include/backend/DriverEnums.h
+++ b/filament/backend/include/backend/DriverEnums.h
@@ -796,32 +796,53 @@ enum class SamplerCompareFunc : uint8_t {

 //! Sampler parameters
 struct SamplerParams { // NOLINT
-    union {
-        struct {
-            SamplerMagFilter filterMag      : 1;    //!< magnification filter (NEAREST)
-            SamplerMinFilter filterMin      : 3;    //!< minification filter  (NEAREST)
-            SamplerWrapMode wrapS           : 2;    //!< s-coordinate wrap mode (CLAMP_TO_EDGE)
-            SamplerWrapMode wrapT           : 2;    //!< t-coordinate wrap mode (CLAMP_TO_EDGE)
+    SamplerMagFilter filterMag      : 1;    //!< magnification filter (NEAREST)
+    SamplerMinFilter filterMin      : 3;    //!< minification filter  (NEAREST)
+    SamplerWrapMode wrapS           : 2;    //!< s-coordinate wrap mode (CLAMP_TO_EDGE)
+    SamplerWrapMode wrapT           : 2;    //!< t-coordinate wrap mode (CLAMP_TO_EDGE)

-            SamplerWrapMode wrapR           : 2;    //!< r-coordinate wrap mode (CLAMP_TO_EDGE)
-            uint8_t anisotropyLog2          : 3;    //!< anisotropy level (0)
-            SamplerCompareMode compareMode  : 1;    //!< sampler compare mode (NONE)
-            uint8_t padding0                : 2;    //!< reserved. must be 0.
+    SamplerWrapMode wrapR           : 2;    //!< r-coordinate wrap mode (CLAMP_TO_EDGE)
+    uint8_t anisotropyLog2          : 3;    //!< anisotropy level (0)
+    SamplerCompareMode compareMode  : 1;    //!< sampler compare mode (NONE)
+    uint8_t padding0                : 2;    //!< reserved. must be 0.

-            SamplerCompareFunc compareFunc  : 3;    //!< sampler comparison function (LE)
-            uint8_t padding1                : 5;    //!< reserved. must be 0.
+    SamplerCompareFunc compareFunc  : 3;    //!< sampler comparison function (LE)
+    uint8_t padding1                : 5;    //!< reserved. must be 0.
+    uint8_t padding2                : 8;    //!< reserved. must be 0.

-            uint8_t padding2                : 8;    //!< reserved. must be 0.
-        };
-        uint32_t u;
+    struct Hasher {
+        size_t operator()(SamplerParams p) const noexcept {
+            // we don't use std::hash<> here, so we don't have to include <functional>
+            return *reinterpret_cast<uint64_t const*>(reinterpret_cast<char const*>(&p));
+        }
    };
+
+    struct EqualTo {
+        bool operator()(SamplerParams lhs, SamplerParams rhs) const noexcept {
+            auto* pLhs = reinterpret_cast<uint64_t const*>(reinterpret_cast<char const*>(&lhs));
+            auto* pRhs = reinterpret_cast<uint64_t const*>(reinterpret_cast<char const*>(&rhs));
+            return *pLhs == *pRhs;
+        }
+    };
+
+    struct LessThan {
+        bool operator()(SamplerParams lhs, SamplerParams rhs) const noexcept {
+            auto* pLhs = reinterpret_cast<uint64_t const*>(reinterpret_cast<char const*>(&lhs));
+            auto* pRhs = reinterpret_cast<uint64_t const*>(reinterpret_cast<char const*>(&rhs));
+            return *pLhs == *pRhs;
+        }
+    };
+
 private:
-    friend inline bool operator < (SamplerParams lhs, SamplerParams rhs) {
-        return lhs.u < rhs.u;
+    friend inline bool operator < (SamplerParams lhs, SamplerParams rhs) noexcept {
+        return SamplerParams::LessThan{}(lhs, rhs);
    }
 };

-static_assert(sizeof(SamplerParams) == sizeof(uint32_t), "SamplerParams must be 32 bits");
+// The limitation to 64-bits max comes from how we store a SamplerParams in our JNI code
+// see android/.../TextureSampler.cpp
+static_assert(sizeof(SamplerParams) <= sizeof(uint64_t),
+        "SamplerParams must be no more than 64 bits");

 //! blending equation function
 enum class BlendEquation : uint8_t {
@@ -1126,8 +1147,6 @@ static_assert(sizeof(StencilState) == 12u,

 using FrameScheduledCallback = void(*)(PresentCallable callable, void* user);

-using FrameCompletedCallback = void(*)(void* user);
-
 enum class Workaround : uint16_t {
    // The EASU pass must split because shader compiler flattens early-exit branch
    SPLIT_EASU,
@@ -1141,6 +1160,11 @@ enum class Workaround : uint16_t {
    A8X_STATIC_TEXTURE_TARGET_ERROR,
    // Adreno drivers sometimes aren't able to blit into a layer of a texture array.
    DISABLE_BLIT_INTO_TEXTURE_ARRAY,
+    // Multiple workarounds needed for PowerVR GPUs
+    POWER_VR_SHADER_WORKAROUNDS,
+    // The driver has some threads pinned, and we can't easily know on which core, it can hurt
+    // performance more if we end-up pinned on the same one.
+    DISABLE_THREAD_AFFINITY
 };

 } // namespace filament::backend
--- a/filament/backend/include/backend/Handle.h
+++ b/filament/backend/include/backend/Handle.h
@@ -39,7 +39,6 @@ struct HwRenderTarget;
 struct HwSamplerGroup;
 struct HwStream;
 struct HwSwapChain;
-struct HwSync;
 struct HwTexture;
 struct HwTimerQuery;
 struct HwVertexBuffer;
@@ -126,7 +125,6 @@ using RenderTargetHandle    = Handle<HwRenderTarget>;
 using SamplerGroupHandle    = Handle<HwSamplerGroup>;
 using StreamHandle          = Handle<HwStream>;
 using SwapChainHandle       = Handle<HwSwapChain>;
-using SyncHandle            = Handle<HwSync>;
 using TextureHandle         = Handle<HwTexture>;
 using TimerQueryHandle      = Handle<HwTimerQuery>;
 using VertexBufferHandle    = Handle<HwVertexBuffer>;
--- a/filament/backend/include/backend/platforms/OpenGLPlatform.h
+++ b/filament/backend/include/backend/platforms/OpenGLPlatform.h
@@ -288,6 +288,12 @@ public:
     * @see terminate()
     */
    virtual void createContext(bool shared);
+
+    /**
+     * Detach and destroy the current context if any and releases all resources associated to
+     * this thread.
+     */
+    virtual void releaseContext() noexcept;
 };

 } // namespace filament
--- a/filament/backend/include/backend/platforms/PlatformEGL.h
+++ b/filament/backend/include/backend/platforms/PlatformEGL.h
@@ -40,6 +40,7 @@ public:
    PlatformEGL() noexcept;
    bool isExtraContextSupported() const noexcept override;
    void createContext(bool shared) override;
+    void releaseContext() noexcept override;

 protected:

@@ -139,6 +140,7 @@ protected:
            bool KHR_create_context = false;
            bool KHR_gl_colorspace = false;
            bool KHR_no_config_context = false;
+            bool KHR_surfaceless_context = false;
        } egl;
    } ext;

--- a/filament/backend/include/private/backend/CommandStream.h
+++ b/filament/backend/include/private/backend/CommandStream.h
@@ -73,14 +73,32 @@ public:
        // a cost here (writing and reading the stack at each iteration), in the end it's
        // probably better to pay the cost at just one location.
        intptr_t next;
+        driver.mCurrentExecutingCommand = this;
        mExecute(driver, this, &next);
        return reinterpret_cast<CommandBase*>(reinterpret_cast<intptr_t>(this) + next);
    }

+    inline void captureCallstack() noexcept {
+        auto c = utils::CallStack::unwind(4);
+        size_t i = 0;
+        for (; i < c.getFrameCount() && i < 16; i++) {
+            mCallstack[i] = c[i];
+        }
+        for (; i < 16; i++) {
+            mCallstack[i] = 0;
+        }
+    }
+
+    void printCallstack() noexcept {
+        auto c = utils::CallStack(mCallstack);
+        utils::slog.d << c << utils::io::endl;
+    }
+
    inline ~CommandBase() noexcept = default;

 private:
    Execute mExecute;
+    std::array<intptr_t, 16> mCallstack = {0};
 };

 // ------------------------------------------------------------------------------------------------
@@ -218,6 +236,7 @@ public:
        using Cmd = COMMAND_TYPE(methodName);                                                   \
        void* const p = allocateCommand(CommandBase::align(sizeof(Cmd)));                       \
        new(p) Cmd(mDispatcher.methodName##_, APPLY(std::move, params));                        \
+        ((Cmd*)p)->captureCallstack();                                                          \
        DEBUG_COMMAND_END(methodName, false);                                                   \
    }

@@ -237,6 +256,7 @@ public:
        using Cmd = COMMAND_TYPE(methodName##R);                                                \
        void* const p = allocateCommand(CommandBase::align(sizeof(Cmd)));                       \
        new(p) Cmd(mDispatcher.methodName##_, RetType(result), APPLY(std::move, params));       \
+        ((Cmd*)p)->captureCallstack();                                                          \
        DEBUG_COMMAND_END(methodName, false);                                                   \
        return result;                                                                          \
    }
--- a/filament/backend/include/private/backend/Driver.h
+++ b/filament/backend/include/private/backend/Driver.h
@@ -53,6 +53,7 @@ template<typename T>
 class ConcreteDispatcher;
 class Dispatcher;
 class CommandStream;
+class CommandBase;

 class Driver {
 public:
@@ -83,6 +84,8 @@ public:
    virtual void debugCommandEnd(CommandStream* cmds,
            bool synchronous, const char* methodName) noexcept = 0;

+    CommandBase* mCurrentExecutingCommand = nullptr;
+
    /*
     * Asynchronous calls here only to provide a type to CommandStream. They must be non-virtual
     * so that calling the concrete implementation won't go through a vtable.
--- a/filament/backend/include/private/backend/DriverAPI.inc
+++ b/filament/backend/include/private/backend/DriverAPI.inc
@@ -142,7 +142,8 @@ DECL_DRIVER_API_N(setFrameScheduledCallback,

 DECL_DRIVER_API_N(setFrameCompletedCallback,
        backend::SwapChainHandle, sch,
-        backend::FrameCompletedCallback, callback,
+        backend::CallbackHandler*, handler,
+        backend::CallbackHandler::Callback, callback,
        void*, user)

 DECL_DRIVER_API_N(setPresentationTime,
@@ -245,8 +246,6 @@ DECL_DRIVER_API_R_N(backend::RenderTargetHandle, createRenderTarget,

 DECL_DRIVER_API_R_0(backend::FenceHandle, createFence)

-DECL_DRIVER_API_R_0(backend::SyncHandle, createSync)
-
 DECL_DRIVER_API_R_N(backend::SwapChainHandle, createSwapChain,
        void*, nativeWindow,
        uint64_t, flags)
@@ -275,7 +274,7 @@ DECL_DRIVER_API_N(destroyRenderTarget,    backend::RenderTargetHandle, rth)
 DECL_DRIVER_API_N(destroySwapChain,       backend::SwapChainHandle, sch)
 DECL_DRIVER_API_N(destroyStream,          backend::StreamHandle, sh)
 DECL_DRIVER_API_N(destroyTimerQuery,      backend::TimerQueryHandle, sh)
-DECL_DRIVER_API_N(destroySync,            backend::SyncHandle, sh)
+DECL_DRIVER_API_N(destroyFence,           backend::FenceHandle, fh)

 /*
 * Synchronous APIs
@@ -289,8 +288,7 @@ DECL_DRIVER_API_SYNCHRONOUS_N(void, setAcquiredImage, backend::StreamHandle, str
 DECL_DRIVER_API_SYNCHRONOUS_N(void, setStreamDimensions, backend::StreamHandle, stream, uint32_t, width, uint32_t, height)
 DECL_DRIVER_API_SYNCHRONOUS_N(int64_t, getStreamTimestamp, backend::StreamHandle, stream)
 DECL_DRIVER_API_SYNCHRONOUS_N(void, updateStreams, backend::DriverApi*, driver)
-DECL_DRIVER_API_SYNCHRONOUS_N(void, destroyFence, backend::FenceHandle, fh)
-DECL_DRIVER_API_SYNCHRONOUS_N(backend::FenceStatus, wait, backend::FenceHandle, fh, uint64_t, timeout)
+DECL_DRIVER_API_SYNCHRONOUS_N(backend::FenceStatus, getFenceStatus, backend::FenceHandle, fh)
 DECL_DRIVER_API_SYNCHRONOUS_N(bool, isTextureFormatSupported, backend::TextureFormat, format)
 DECL_DRIVER_API_SYNCHRONOUS_0(bool, isTextureSwizzleSupported)
 DECL_DRIVER_API_SYNCHRONOUS_N(bool, isTextureFormatMipmappable, backend::TextureFormat, format)
@@ -300,13 +298,14 @@ DECL_DRIVER_API_SYNCHRONOUS_0(bool, isFrameBufferFetchMultiSampleSupported)
 DECL_DRIVER_API_SYNCHRONOUS_0(bool, isFrameTimeSupported)
 DECL_DRIVER_API_SYNCHRONOUS_0(bool, isAutoDepthResolveSupported)
 DECL_DRIVER_API_SYNCHRONOUS_0(bool, isSRGBSwapChainSupported)
+DECL_DRIVER_API_SYNCHRONOUS_0(bool, isStereoSupported)
+DECL_DRIVER_API_SYNCHRONOUS_0(bool, isParallelShaderCompileSupported)
 DECL_DRIVER_API_SYNCHRONOUS_0(uint8_t, getMaxDrawBuffers)
 DECL_DRIVER_API_SYNCHRONOUS_0(size_t, getMaxUniformBufferSize)
 DECL_DRIVER_API_SYNCHRONOUS_0(math::float2, getClipSpaceParams)
 DECL_DRIVER_API_SYNCHRONOUS_0(bool, canGenerateMipmaps)
 DECL_DRIVER_API_SYNCHRONOUS_N(void, setupExternalImage, void*, image)
 DECL_DRIVER_API_SYNCHRONOUS_N(bool, getTimerQueryValue, backend::TimerQueryHandle, query, uint64_t*, elapsedTime)
-DECL_DRIVER_API_SYNCHRONOUS_N(backend::SyncStatus, getSyncStatus, backend::SyncHandle, sh)
 DECL_DRIVER_API_SYNCHRONOUS_N(bool, isWorkaroundNeeded, backend::Workaround, workaround)
 DECL_DRIVER_API_SYNCHRONOUS_0(backend::FeatureLevel, getFeatureLevel)

@@ -389,6 +388,7 @@ DECL_DRIVER_API_N(endTimerQuery,
        backend::TimerQueryHandle, query)

 DECL_DRIVER_API_N(compilePrograms,
+        backend::CompilerPriorityQueue, priority,
        backend::CallbackHandler*, handler,
        backend::CallbackHandler::Callback, callback,
        void*, user)
--- a/filament/backend/src/CallbackHandler.cpp
+++ b/filament/backend/src/CallbackHandler.cpp
@@ -1,23 +0,0 @@
-/*
- * Copyright (C) 2021 The Android Open Source Project
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- *      http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include <backend/CallbackHandler.h>
-
-namespace filament::backend {
-
-CallbackHandler::~CallbackHandler() = default;
-
-} // namespace filament::backend
--- a/filament/backend/src/CompilerThreadPool.cpp
+++ b/filament/backend/src/CompilerThreadPool.cpp
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "CompilerThreadPool.h"
+
+#include <utils/Systrace.h>
+
+#include <memory>
+
+namespace filament::backend {
+
+using namespace utils;
+
+ProgramToken::~ProgramToken() = default;
+
+CompilerThreadPool::CompilerThreadPool() noexcept = default;
+
+CompilerThreadPool::~CompilerThreadPool() noexcept {
+    assert_invariant(mCompilerThreads.empty());
+    assert_invariant(mQueues[0].empty());
+    assert_invariant(mQueues[1].empty());
+}
+
+void CompilerThreadPool::init(uint32_t threadCount,
+        ThreadSetup&& threadSetup, ThreadCleanup&& threadCleanup) noexcept {
+    auto setup = std::make_shared<ThreadSetup>(std::move(threadSetup));
+    auto cleanup = std::make_shared<ThreadCleanup>(std::move(threadCleanup));
+
+    for (size_t i = 0; i < threadCount; i++) {
+        mCompilerThreads.emplace_back([this, setup, cleanup]() {
+            SYSTRACE_CONTEXT();
+
+            (*setup)();
+
+            // process jobs from the queue until we're asked to exit
+            while (!mExitRequested) {
+                std::unique_lock lock(mQueueLock);
+                mQueueCondition.wait(lock, [this]() {
+                    return  mExitRequested ||
+                            (!std::all_of( std::begin(mQueues), std::end(mQueues),
+                                    [](auto&& q) { return q.empty(); }));
+                });
+
+                SYSTRACE_VALUE32("CompilerThreadPool Jobs",
+                        mQueues[0].size() + mQueues[1].size());
+
+                if (UTILS_LIKELY(!mExitRequested)) {
+                    Job job;
+                    // use the first queue that's not empty
+                    auto& queue = [this]() -> auto& {
+                        for (auto& q: mQueues) {
+                            if (!q.empty()) {
+                                return q;
+                            }
+                        }
+                        return mQueues[0]; // we should never end-up here.
+                    }();
+                    assert_invariant(!queue.empty());
+                    std::swap(job, queue.front().second);
+                    queue.pop_front();
+
+                    // execute the job without holding any locks
+                    lock.unlock();
+                    job();
+                }
+            }
+
+            (*cleanup)();
+        });
+
+    }
+}
+
+auto CompilerThreadPool::find(program_token_t const& token) -> std::pair<Queue&, Queue::iterator> {
+    for (auto&& q: mQueues) {
+        auto pos = std::find_if(q.begin(), q.end(), [&token](auto&& item) {
+            return item.first == token;
+        });
+        if (pos != q.end()) {
+            return { q, pos };
+        }
+    }
+    // this can happen if the program is being processed right now
+    return { mQueues[0], mQueues[0].end() };
+}
+
+auto CompilerThreadPool::dequeue(program_token_t const& token) -> Job {
+    std::unique_lock const lock(mQueueLock);
+    Job job;
+    auto&& [q, pos] = find(token);
+    if (pos != q.end()) {
+        std::swap(job, pos->second);
+        q.erase(pos);
+    }
+    return job;
+}
+
+void CompilerThreadPool::queue(CompilerPriorityQueue priorityQueue,
+        program_token_t const& token, Job&& job) {
+    std::unique_lock const lock(mQueueLock);
+    mQueues[size_t(priorityQueue)].emplace_back(token, std::move(job));
+    mQueueCondition.notify_one();
+}
+
+void CompilerThreadPool::terminate() noexcept {
+    std::unique_lock lock(mQueueLock);
+    mExitRequested = true;
+    mQueueCondition.notify_all();
+    lock.unlock();
+
+    for (auto& thread: mCompilerThreads) {
+        if (thread.joinable()) {
+            thread.join();
+        }
+    }
+    mCompilerThreads.clear();
+
+    // Clear all the queues, dropping the remaining jobs. This relies on the jobs being cancelable.
+    for (auto&& q : mQueues) {
+        q.clear();
+    }
+}
+
+} // namespace filament::backend
--- a/filament/backend/src/CompilerThreadPool.h
+++ b/filament/backend/src/CompilerThreadPool.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef TNT_FILAMENT_BACKEND_COMPILERTHREADPOOL_H
+#define TNT_FILAMENT_BACKEND_COMPILERTHREADPOOL_H
+
+#include <backend/DriverEnums.h>
+
+#include <utils/Invocable.h>
+#include <utils/Mutex.h>
+#include <utils/Condition.h>
+
+#include <array>
+#include <deque>
+#include <memory>
+#include <thread>
+#include <utility>
+#include <vector>
+
+namespace filament::backend {
+
+struct ProgramToken {
+    virtual ~ProgramToken();
+};
+
+using program_token_t = std::shared_ptr<ProgramToken>;
+
+class Platform;
+
+class CompilerThreadPool {
+public:
+    CompilerThreadPool() noexcept;
+    ~CompilerThreadPool() noexcept;
+    using Job = utils::Invocable<void()>;
+    using ThreadSetup = utils::Invocable<void()>;
+    using ThreadCleanup = utils::Invocable<void()>;
+    void init(uint32_t threadCount,
+            ThreadSetup&& threadSetup, ThreadCleanup&& threadCleanup) noexcept;
+    void terminate() noexcept;
+    void queue(CompilerPriorityQueue priorityQueue, program_token_t const& token, Job&& job);
+    Job dequeue(program_token_t const& token);
+
+private:
+    using Queue = std::deque<std::pair<program_token_t, Job>>;
+    std::vector<std::thread> mCompilerThreads;
+    bool mExitRequested{ false };
+    utils::Mutex mQueueLock;
+    utils::Condition mQueueCondition;
+    std::array<Queue, 2> mQueues;
+    // lock must be held for methods below
+    std::pair<Queue&, Queue::iterator> find(program_token_t const& token);
+};
+
+} // namespace filament::backend
+
+#endif  // TNT_FILAMENT_BACKEND_COMPILERTHREADPOOL_H
+
--- a/filament/backend/src/Driver.cpp
+++ b/filament/backend/src/Driver.cpp
@@ -63,6 +63,8 @@ DriverBase::DriverBase() noexcept {
 }

 DriverBase::~DriverBase() noexcept {
+    assert_invariant(mCallbacks.empty());
+    assert_invariant(mServiceThreadCallbackQueue.empty());
    if constexpr (UTILS_HAS_THREADING) {
        // quit our service thread
        std::unique_lock<std::mutex> lock(mServiceThreadLock);
--- a/filament/backend/src/DriverBase.h
+++ b/filament/backend/src/DriverBase.h
@@ -135,9 +135,6 @@ struct HwFence : public HwBase {
    Platform::Fence* fence = nullptr;
 };

-struct HwSync : public HwBase {
-};
-
 struct HwSwapChain : public HwBase {
    Platform::SwapChain* swapChain = nullptr;
 };
@@ -168,13 +165,6 @@ public:

    void purge() noexcept final;

-    // --------------------------------------------------------------------------------------------
-    // Privates
-    // --------------------------------------------------------------------------------------------
-
-protected:
-    class CallbackDataDetails;
-
    // Helpers...
    struct CallbackData {
        CallbackData(CallbackData const &) = delete;
@@ -205,6 +195,13 @@ protected:

    void scheduleCallback(CallbackHandler* handler, void* user, CallbackHandler::Callback callback);

+    // --------------------------------------------------------------------------------------------
+    // Privates
+    // --------------------------------------------------------------------------------------------
+
+protected:
+    class CallbackDataDetails;
+
    inline void scheduleDestroy(BufferDescriptor&& buffer) noexcept {
        if (buffer.hasCallback()) {
            scheduleDestroySlow(std::move(buffer));
--- a/filament/backend/src/Handle.cpp
+++ b/filament/backend/src/Handle.cpp
@@ -67,7 +67,6 @@ template io::ostream& operator<<(io::ostream& out, const Handle<HwFence>& h) noe
 template io::ostream& operator<<(io::ostream& out, const Handle<HwSwapChain>& h) noexcept;
 template io::ostream& operator<<(io::ostream& out, const Handle<HwStream>& h) noexcept;
 template io::ostream& operator<<(io::ostream& out, const Handle<HwTimerQuery>& h) noexcept;
-template io::ostream& operator<<(io::ostream& out, const Handle<HwSync>& h) noexcept;
 template io::ostream& operator<<(io::ostream& out, const Handle<HwBufferObject>& h) noexcept;

 #endif
--- a/filament/backend/src/metal/MetalDriver.mm
+++ b/filament/backend/src/metal/MetalDriver.mm
@@ -176,9 +176,9 @@ void MetalDriver::setFrameScheduledCallback(Handle<HwSwapChain> sch,
 }

 void MetalDriver::setFrameCompletedCallback(Handle<HwSwapChain> sch,
-        FrameCompletedCallback callback, void* user) {
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {
    auto* swapChain = handle_cast<MetalSwapChain>(sch);
-    swapChain->setFrameCompletedCallback(callback, user);
+    swapChain->setFrameCompletedCallback(handler, callback, user);
 }

 void MetalDriver::execute(std::function<void(void)> const& fn) noexcept {
@@ -380,11 +380,6 @@ void MetalDriver::createFenceR(Handle<HwFence> fh, int dummy) {
    fence->encode();
 }

-void MetalDriver::createSyncR(Handle<HwSync> sh, int) {
-    auto* fence = handle_cast<MetalFence>(sh);
-    fence->encode();
-}
-
 void MetalDriver::createSwapChainR(Handle<HwSwapChain> sch, void* nativeWindow, uint64_t flags) {
    if (UTILS_UNLIKELY(flags & SWAP_CHAIN_CONFIG_APPLE_CVPIXELBUFFER)) {
        CVPixelBufferRef pixelBuffer = (CVPixelBufferRef) nativeWindow;
@@ -454,12 +449,6 @@ Handle<HwFence> MetalDriver::createFenceS() noexcept {
    return alloc_and_construct_handle<MetalFence, HwFence>(*mContext);
 }

-Handle<HwSync> MetalDriver::createSyncS() noexcept {
-    // The handle must be constructed here, as a synchronous call to getSyncStatus might happen
-    // before createSyncR is executed.
-    return alloc_and_construct_handle<MetalFence, HwSync>(*mContext);
-}
-
 Handle<HwSwapChain> MetalDriver::createSwapChainS() noexcept {
    return alloc_handle<MetalSwapChain>();
 }
@@ -567,13 +556,6 @@ void MetalDriver::destroyTimerQuery(Handle<HwTimerQuery> tqh) {
    }
 }

-void MetalDriver::destroySync(Handle<HwSync> sh) {
-    if (sh) {
-        destruct_handle<MetalFence>(sh);
-    }
-}
-
-
 void MetalDriver::terminate() {
    // finish() will flush the pending command buffer and will ensure all GPU work has finished.
    // This must be done before calling bufferPool->reset() to ensure no buffers are in flight.
@@ -625,12 +607,12 @@ void MetalDriver::destroyFence(Handle<HwFence> fh) {
    }
 }

-FenceStatus MetalDriver::wait(Handle<HwFence> fh, uint64_t timeout) {
+FenceStatus MetalDriver::getFenceStatus(Handle<HwFence> fh) {
    auto* fence = handle_cast<MetalFence>(fh);
    if (!fence) {
        return FenceStatus::ERROR;
    }
-    return fence->wait(timeout);
+    return fence->wait(0);
 }

 bool MetalDriver::isTextureFormatSupported(TextureFormat format) {
@@ -714,6 +696,14 @@ bool MetalDriver::isSRGBSwapChainSupported() {
    return false;
 }

+bool MetalDriver::isStereoSupported() {
+    return true;
+}
+
+bool MetalDriver::isParallelShaderCompileSupported() {
+    return false;
+}
+
 bool MetalDriver::isWorkaroundNeeded(Workaround workaround) {
    switch (workaround) {
        case Workaround::SPLIT_EASU:
@@ -726,6 +716,8 @@ bool MetalDriver::isWorkaroundNeeded(Workaround workaround) {
            return mContext->bugs.a8xStaticTextureTargetError;
        case Workaround::DISABLE_BLIT_INTO_TEXTURE_ARRAY:
            return false;
+        default:
+            return false;
    }
    return false;
 }
@@ -841,17 +833,6 @@ bool MetalDriver::getTimerQueryValue(Handle<HwTimerQuery> tqh, uint64_t* elapsed
    return mContext->timerQueryImpl->getQueryResult(tq, elapsedTime);
 }

-SyncStatus MetalDriver::getSyncStatus(Handle<HwSync> sh) {
-    auto* fence = handle_cast<MetalFence>(sh);
-    FenceStatus status = fence->wait(0);
-    if (status == FenceStatus::TIMEOUT_EXPIRED) {
-        return SyncStatus::NOT_SIGNALED;
-    } else if (status == FenceStatus::CONDITION_SATISFIED) {
-        return SyncStatus::SIGNALED;
-    }
-    return SyncStatus::ERROR;
-}
-
 void MetalDriver::generateMipmaps(Handle<HwTexture> th) {
    ASSERT_PRECONDITION(!isInRenderPass(mContext),
                        "generateMipmaps must be called outside of a render pass.");
@@ -975,8 +956,8 @@ void MetalDriver::updateSamplerGroup(Handle<HwSamplerGroup> sbh, BufferDescripto
    scheduleDestroy(std::move(data));
 }

-void MetalDriver::compilePrograms(CallbackHandler* handler,
-        CallbackHandler::Callback callback, void* user) {
+void MetalDriver::compilePrograms(CompilerPriorityQueue priority,
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {
    if (callback) {
        scheduleCallback(handler, user, callback);
    }
--- a/filament/backend/src/metal/MetalHandles.h
+++ b/filament/backend/src/metal/MetalHandles.h
@@ -70,7 +70,8 @@ public:
    void releaseDrawable();

    void setFrameScheduledCallback(FrameScheduledCallback callback, void* user);
-    void setFrameCompletedCallback(FrameCompletedCallback callback, void* user);
+    void setFrameCompletedCallback(CallbackHandler* handler,
+            CallbackHandler::Callback callback, void* user);

    // For CAMetalLayer-backed SwapChains, presents the drawable or schedules a
    // FrameScheduledCallback.
@@ -112,8 +113,11 @@ private:
    FrameScheduledCallback frameScheduledCallback = nullptr;
    void* frameScheduledUserData = nullptr;

-    FrameCompletedCallback frameCompletedCallback = nullptr;
-    void* frameCompletedUserData = nullptr;
+    struct {
+        CallbackHandler* handler = nullptr;
+        CallbackHandler::Callback callback = {};
+        void* user = nullptr;
+    } frameCompleted;
 };

 class MetalBufferObject : public HwBufferObject {
@@ -446,9 +450,7 @@ private:
 };

 // MetalFence is used to implement both Fences and Syncs.
-// There's no diamond problem, because HwBase (superclass of HwFence and HwSync) is empty.
-static_assert(std::is_empty_v<HwBase>);
-class MetalFence : public HwFence, public HwSync {
+class MetalFence : public HwFence {
 public:

    // MetalFence is special, as it gets constructed on the Filament thread. We must delay inserting
--- a/filament/backend/src/metal/MetalHandles.mm
+++ b/filament/backend/src/metal/MetalHandles.mm
@@ -194,13 +194,15 @@ void MetalSwapChain::setFrameScheduledCallback(FrameScheduledCallback callback,
    frameScheduledUserData = user;
 }

-void MetalSwapChain::setFrameCompletedCallback(FrameCompletedCallback callback, void* user) {
-    frameCompletedCallback = callback;
-    frameCompletedUserData = user;
+void MetalSwapChain::setFrameCompletedCallback(CallbackHandler* handler,
+        CallbackHandler::Callback callback, void* user) {
+    frameCompleted.handler = handler;
+    frameCompleted.callback = callback;
+    frameCompleted.user = user;
 }

 void MetalSwapChain::present() {
-    if (frameCompletedCallback) {
+    if (frameCompleted.callback) {
        scheduleFrameCompletedCallback();
    }
    if (drawable) {
@@ -244,30 +246,17 @@ void MetalSwapChain::scheduleFrameScheduledCallback() {
 }

 void MetalSwapChain::scheduleFrameCompletedCallback() {
-    if (!frameCompletedCallback) {
+    if (!frameCompleted.callback) {
        return;
    }

-    FrameCompletedCallback callback = frameCompletedCallback;
-    void* userData = frameCompletedUserData;
-    [getPendingCommandBuffer(&context) addCompletedHandler:^(id<MTLCommandBuffer> cb) {
-        struct CallbackData {
-            void* userData;
-            FrameCompletedCallback callback;
-        };
-        CallbackData* data = new CallbackData();
-        data->userData = userData;
-        data->callback = callback;
+    CallbackHandler* handler = frameCompleted.handler;
+    void* user = frameCompleted.user;
+    CallbackHandler::Callback callback = frameCompleted.callback;

-        // Instantiate a BufferDescriptor with a callback for the sole purpose of passing it to
-        // scheduleDestroy. This forces the BufferDescriptor callback (and thus the
-        // FrameCompletedCallback) to be called on the user thread.
-        BufferDescriptor b(nullptr, 0u, [](void* buffer, size_t size, void* user) {
-            CallbackData* data = (CallbackData*) user;
-            data->callback(data->userData);
-            free(data);
-        }, data);
-        context.driver->scheduleDestroy(std::move(b));
+    MetalDriver* driver = context.driver;
+    [getPendingCommandBuffer(&context) addCompletedHandler:^(id<MTLCommandBuffer> cb) {
+        driver->scheduleCallback(handler, user, callback);
    }];
 }

--- a/filament/backend/src/metal/MetalState.h
+++ b/filament/backend/src/metal/MetalState.h
@@ -34,7 +34,7 @@ namespace filament {
 namespace backend {

 inline bool operator==(const SamplerParams& lhs, const SamplerParams& rhs) {
-    return lhs.u == rhs.u;
+    return SamplerParams::EqualTo{}(lhs, rhs);
 }

 //   Rasterization Bindings
--- a/filament/backend/src/noop/NoopDriver.cpp
+++ b/filament/backend/src/noop/NoopDriver.cpp
@@ -58,7 +58,7 @@ void NoopDriver::setFrameScheduledCallback(Handle<HwSwapChain> sch,
 }

 void NoopDriver::setFrameCompletedCallback(Handle<HwSwapChain> sch,
-        FrameCompletedCallback callback, void* user) {
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {

 }

@@ -107,9 +107,6 @@ void NoopDriver::destroyStream(Handle<HwStream> sh) {
 void NoopDriver::destroyTimerQuery(Handle<HwTimerQuery> tqh) {
 }

-void NoopDriver::destroySync(Handle<HwSync> fh) {
-}
-
 Handle<HwStream> NoopDriver::createStreamNative(void* nativeStream) {
    return {};
 }
@@ -135,7 +132,7 @@ void NoopDriver::updateStreams(CommandStream* driver) {
 void NoopDriver::destroyFence(Handle<HwFence> fh) {
 }

-FenceStatus NoopDriver::wait(Handle<HwFence> fh, uint64_t timeout) {
+FenceStatus NoopDriver::getFenceStatus(Handle<HwFence> fh) {
    return FenceStatus::CONDITION_SATISFIED;
 }

@@ -177,6 +174,14 @@ bool NoopDriver::isSRGBSwapChainSupported() {
    return false;
 }

+bool NoopDriver::isStereoSupported() {
+    return false;
+}
+
+bool NoopDriver::isParallelShaderCompileSupported() {
+    return false;
+}
+
 bool NoopDriver::isWorkaroundNeeded(Workaround) {
    return false;
 }
@@ -236,10 +241,6 @@ bool NoopDriver::getTimerQueryValue(Handle<HwTimerQuery> tqh, uint64_t* elapsedT
    return false;
 }

-SyncStatus NoopDriver::getSyncStatus(Handle<HwSync> sh) {
-    return SyncStatus::SIGNALED;
-}
-
 void NoopDriver::setExternalImage(Handle<HwTexture> th, void* image) {
 }

@@ -260,8 +261,8 @@ void NoopDriver::updateSamplerGroup(Handle<HwSamplerGroup> sbh,
    scheduleDestroy(std::move(data));
 }

-void NoopDriver::compilePrograms(CallbackHandler* handler,
-        CallbackHandler::Callback callback, void* user) {
+void NoopDriver::compilePrograms(CompilerPriorityQueue priority,
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {
    if (callback) {
        scheduleCallback(handler, user, callback);
    }
--- a/filament/backend/src/opengl/CallbackManager.cpp
+++ b/filament/backend/src/opengl/CallbackManager.cpp
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "CallbackManager.h"
+
+#include "DriverBase.h"
+
+namespace filament::backend {
+
+CallbackManager::CallbackManager(DriverBase& driver) noexcept
+    : mDriver(driver), mCallbacks(1) {
+}
+
+CallbackManager::~CallbackManager() noexcept = default;
+
+void CallbackManager::terminate() noexcept {
+    for (auto&& item: mCallbacks) {
+        if (item.func) {
+            mDriver.scheduleCallback(
+                    item.handler, item.user, item.func);
+        }
+    }
+}
+
+CallbackManager::Handle CallbackManager::get() const noexcept {
+    Container::const_iterator const curr = getCurrent();
+    curr->count.fetch_add(1);
+    return curr;
+}
+
+void CallbackManager::put(Handle& curr) noexcept {
+    if (curr->count.fetch_sub(1) == 1) {
+        if (curr->func) {
+            mDriver.scheduleCallback(
+                    curr->handler, curr->user, curr->func);
+            destroySlot(curr);
+        }
+    }
+    curr = {};
+}
+
+void CallbackManager::setCallback(
+        CallbackHandler* handler, CallbackHandler::Callback func, void* user) {
+    assert_invariant(func);
+    Container::iterator const curr = allocateNewSlot();
+    curr->handler = handler;
+    curr->func = func;
+    curr->user = user;
+    if (curr->count == 0) {
+        mDriver.scheduleCallback(
+                curr->handler, curr->user, curr->func);
+        destroySlot(curr);
+    }
+}
+
+} // namespace filament::backend
--- a/filament/backend/src/opengl/CallbackManager.h
+++ b/filament/backend/src/opengl/CallbackManager.h
@@ -0,0 +1,98 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef TNT_FILAMENT_BACKEND_OPENGL_CALLBACKMANAGER_H
+#define TNT_FILAMENT_BACKEND_OPENGL_CALLBACKMANAGER_H
+
+#include <backend/CallbackHandler.h>
+
+#include <utils/Mutex.h>
+
+#include <atomic>
+#include <mutex>
+#include <list>
+
+namespace filament::backend {
+
+class DriverBase;
+class CallbackHandler;
+
+/*
+ * CallbackManager schedules user callbacks once all previous conditions are met.
+ * A "Condition" is created by calling "get" and is met by calling "put". These
+ * are typically called from different threads.
+ * The callback is specified with "setCallback", which atomically creates a new set of
+ * conditions to be met.
+ */
+class CallbackManager {
+    struct Callback {
+        mutable std::atomic_int count{};
+        CallbackHandler* handler = nullptr;
+        CallbackHandler::Callback func = {};
+        void* user = nullptr;
+    };
+
+    using Container = std::list<Callback>;
+
+public:
+    using Handle = Container::const_iterator;
+
+    explicit CallbackManager(DriverBase& driver) noexcept;
+
+    ~CallbackManager() noexcept;
+
+    // Calls all the pending callbacks regardless of remaining conditions to be met. This is to
+    // avoid leaking resources for instance. It also doesn't matter if the conditions are met
+    // because we're shutting down.
+    void terminate() noexcept;
+
+    // creates a condition and get a handle for it
+    Handle get() const noexcept;
+
+    // Announces the specified condition is met. If a callback was specified and all conditions
+    // prior to setting the callback are met, the callback is scheduled.
+    void put(Handle& curr) noexcept;
+
+    // Sets a callback to be called when all previously created (get) conditions are met (put).
+    // If there were no conditions created, or they're all already met, the callback is scheduled
+    // immediately.
+    void setCallback(CallbackHandler* handler, CallbackHandler::Callback func, void* user);
+
+private:
+    Container::const_iterator getCurrent() const noexcept {
+        std::lock_guard const lock(mLock);
+        return --mCallbacks.end();
+    }
+
+    Container::iterator allocateNewSlot() noexcept {
+        std::lock_guard const lock(mLock);
+        auto curr = --mCallbacks.end();
+        mCallbacks.emplace_back();
+        return curr;
+    }
+    void destroySlot(Container::const_iterator curr) noexcept {
+        std::lock_guard const lock(mLock);
+        mCallbacks.erase(curr);
+    }
+
+    DriverBase& mDriver;
+    mutable utils::Mutex mLock;
+    Container mCallbacks;
+};
+
+} // namespace filament::backend
+
+#endif // TNT_FILAMENT_BACKEND_OPENGL_CALLBACKMANAGER_H
--- a/filament/backend/src/opengl/OpenGLContext.cpp
+++ b/filament/backend/src/opengl/OpenGLContext.cpp
@@ -49,6 +49,7 @@ bool OpenGLContext::queryOpenGLVersion(GLint* major, GLint* minor) noexcept {
 }

 OpenGLContext::OpenGLContext() noexcept {
+
    state.vao.p = &mDefaultVAO;

    // These queries work with all GL/GLES versions!
@@ -61,264 +62,74 @@ OpenGLContext::OpenGLContext() noexcept {
              "[" << state.version << "], [" << state.shader << "]" << io::endl;

    /*
-     * Figure out GL / GLES version and available features
+     * Figure out GL / GLES version, extensions and capabilities we need to
+     * determine the feature level
     */

    queryOpenGLVersion(&state.major, &state.minor);

-    glGetIntegerv(GL_MAX_RENDERBUFFER_SIZE, &gets.max_renderbuffer_size);
-    glGetIntegerv(GL_MAX_TEXTURE_IMAGE_UNITS, &gets.max_texture_image_units);
-    glGetIntegerv(GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS, &gets.max_combined_texture_image_units);
+    OpenGLContext::initExtensions(&ext, state.major, state.minor);

-    if (state.major > 2) { // this check works for both GL and GLES, but is intended for GLES
+    OpenGLContext::initProcs(&procs, ext, state.major, state.minor);
+
+    OpenGLContext::initBugs(&bugs, ext, state.major, state.minor,
+            state.vendor, state.renderer, state.version, state.shader);
+
+    glGetIntegerv(GL_MAX_RENDERBUFFER_SIZE,             &gets.max_renderbuffer_size);
+    glGetIntegerv(GL_MAX_TEXTURE_IMAGE_UNITS,           &gets.max_texture_image_units);
+    glGetIntegerv(GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS,  &gets.max_combined_texture_image_units);
+
+    mFeatureLevel = OpenGLContext::resolveFeatureLevel(state.major, state.minor, ext, gets, bugs);
+
+#ifdef BACKEND_OPENGL_VERSION_GLES
+    mShaderModel = ShaderModel::MOBILE;
+#else
+    mShaderModel = ShaderModel::DESKTOP;
+#endif
+
+#ifdef BACKEND_OPENGL_VERSION_GLES
+    if (mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_2) {
+        features.multisample_texture = true;
+    }
+#else
+    if (mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_1) {
+        features.multisample_texture = true;
+    }
+#endif
+
+    if (mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_1) {
 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-        glGetIntegerv(GL_MAX_UNIFORM_BLOCK_SIZE, &gets.max_uniform_block_size);
-        glGetIntegerv(GL_MAX_UNIFORM_BUFFER_BINDINGS, &gets.max_uniform_buffer_bindings);
-        glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &gets.uniform_buffer_offset_alignment);
-        glGetIntegerv(GL_MAX_SAMPLES, &gets.max_samples);
-        glGetIntegerv(GL_MAX_DRAW_BUFFERS, &gets.max_draw_buffers);
+        glGetIntegerv(GL_MAX_UNIFORM_BLOCK_SIZE,
+                &gets.max_uniform_block_size);
+        glGetIntegerv(GL_MAX_UNIFORM_BUFFER_BINDINGS,
+                &gets.max_uniform_buffer_bindings);
+        glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT,
+                &gets.uniform_buffer_offset_alignment);
+        glGetIntegerv(GL_MAX_SAMPLES,
+                &gets.max_samples);
+        glGetIntegerv(GL_MAX_DRAW_BUFFERS,
+                &gets.max_draw_buffers);
        glGetIntegerv(GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS,
                &gets.max_transform_feedback_separate_attribs);
+#ifdef GL_EXT_texture_filter_anisotropic
+        if (ext.EXT_texture_filter_anisotropic) {
+            glGetFloatv(GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT, &gets.max_anisotropy);
+        }
 #endif
-    } else {
+#endif
+    }
+#ifdef BACKEND_OPENGL_VERSION_GLES
+    else {
        gets.max_uniform_block_size = 0;
        gets.max_uniform_buffer_bindings = 0;
        gets.uniform_buffer_offset_alignment = 0;
        gets.max_samples = 1;
        gets.max_draw_buffers = 1;
        gets.max_transform_feedback_separate_attribs = 0;
-    }
-
-    constexpr auto const caps3 = FEATURE_LEVEL_CAPS[+FeatureLevel::FEATURE_LEVEL_3];
-    constexpr GLint MAX_VERTEX_SAMPLER_COUNT = caps3.MAX_VERTEX_SAMPLER_COUNT;
-    constexpr GLint MAX_FRAGMENT_SAMPLER_COUNT = caps3.MAX_FRAGMENT_SAMPLER_COUNT;
-
-    // default procs that can be overridden based on runtime version
-#ifdef BACKEND_OPENGL_LEVEL_GLES30
-    procs.genVertexArrays = glGenVertexArrays;
-    procs.bindVertexArray = glBindVertexArray;
-    procs.deleteVertexArrays = glDeleteVertexArrays;
-
-    // these are core in GL and GLES 3.x
-    procs.genQueries = glGenQueries;
-    procs.deleteQueries = glDeleteQueries;
-    procs.beginQuery = glBeginQuery;
-    procs.endQuery = glEndQuery;
-    procs.getQueryObjectuiv = glGetQueryObjectuiv;
-#   ifdef BACKEND_OPENGL_VERSION_GL
-        procs.getQueryObjectui64v = glGetQueryObjectui64v; // only core in GL
-#   elif defined(GL_EXT_disjoint_timer_query)
-        procs.getQueryObjectui64v = glGetQueryObjectui64vEXT;
-#   endif // BACKEND_OPENGL_VERSION_GL
-
-     // core in ES 3.0 and GL 4.3
-    procs.invalidateFramebuffer = glInvalidateFramebuffer;
-#endif // BACKEND_OPENGL_LEVEL_GLES30
-
-    // no-op if not supported
-    procs.maxShaderCompilerThreadsKHR = +[](GLuint) {};
-
-#ifdef BACKEND_OPENGL_VERSION_GLES
-    initExtensionsGLES();
-    if (state.major == 3) {
-        // Runtime OpenGL version is ES 3.x
-        assert_invariant(gets.max_texture_image_units >= 16);
-        assert_invariant(gets.max_combined_texture_image_units >= 32);
-        if (state.minor >= 1) {
-            features.multisample_texture = true;
-            // figure out our feature level
-            if (ext.EXT_texture_cube_map_array) {
-                mFeatureLevel = FeatureLevel::FEATURE_LEVEL_2;
-                if (gets.max_texture_image_units >= MAX_FRAGMENT_SAMPLER_COUNT &&
-                    gets.max_combined_texture_image_units >=
-                            (MAX_FRAGMENT_SAMPLER_COUNT + MAX_VERTEX_SAMPLER_COUNT)) {
-                    mFeatureLevel = FeatureLevel::FEATURE_LEVEL_3;
-                }
-            }
-        }
-    }
-#ifndef IOS // IOS is guaranteed to have ES3.x
-    else if (UTILS_UNLIKELY(state.major == 2)) {
-        // Runtime OpenGL version is ES 2.x
-
-#if defined(BACKEND_OPENGL_LEVEL_GLES30)
-        // mandatory extensions (all supported by Mali-400 and Adreno 304)
-        assert_invariant(ext.OES_depth_texture);
-        assert_invariant(ext.OES_depth24);
-        assert_invariant(ext.OES_packed_depth_stencil);
-        assert_invariant(ext.OES_rgb8_rgba8);
-        assert_invariant(ext.OES_standard_derivatives);
-        assert_invariant(ext.OES_texture_npot);
-#endif
-
-        if (UTILS_LIKELY(ext.OES_vertex_array_object)) {
-            procs.genVertexArrays = glGenVertexArraysOES;
-            procs.bindVertexArray = glBindVertexArrayOES;
-            procs.deleteVertexArrays = glDeleteVertexArraysOES;
-        } else {
-            // if we don't have OES_vertex_array_object, just don't do anything with real VAOs,
-            // we'll just rebind everything each time. Most Mali-400 support this extension, but
-            // a few don't.
-            procs.genVertexArrays = +[](GLsizei, GLuint*) {};
-            procs.bindVertexArray = +[](GLuint) {};
-            procs.deleteVertexArrays = +[](GLsizei, GLuint const*) {};
-            // we activate this workaround path, which does the reset of array buffer
-            bugs.vao_doesnt_store_element_array_buffer_binding = true;
-        }
-
-        // EXT_disjoint_timer_query is optional -- pointers will be null if not available
-        procs.genQueries = glGenQueriesEXT;
-        procs.deleteQueries = glDeleteQueriesEXT;
-        procs.beginQuery = glBeginQueryEXT;
-        procs.endQuery = glEndQueryEXT;
-        procs.getQueryObjectuiv = glGetQueryObjectuivEXT;
-        procs.getQueryObjectui64v = glGetQueryObjectui64vEXT;
-
-        procs.invalidateFramebuffer = glDiscardFramebufferEXT;
-
-        procs.maxShaderCompilerThreadsKHR = glMaxShaderCompilerThreadsKHR;
-
-        mFeatureLevel = FeatureLevel::FEATURE_LEVEL_0;
-    }
-#endif // IOS
-#else
-    initExtensionsGL();
-    if (state.major == 4) {
-        assert_invariant(state.minor >= 1);
-        mShaderModel = ShaderModel::DESKTOP;
-        if (state.minor >= 3) {
-            // cubemap arrays are available as of OpenGL 4.0
-            mFeatureLevel = FeatureLevel::FEATURE_LEVEL_2;
-            // figure out our feature level
-            if (gets.max_texture_image_units >= MAX_FRAGMENT_SAMPLER_COUNT &&
-                gets.max_combined_texture_image_units >=
-                        (MAX_FRAGMENT_SAMPLER_COUNT + MAX_VERTEX_SAMPLER_COUNT)) {
-                mFeatureLevel = FeatureLevel::FEATURE_LEVEL_3;
-            }
-        }
-        features.multisample_texture = true;
-    }
-    // feedback loops are allowed on GL desktop as long as writes are disabled
-    bugs.allow_read_only_ancillary_feedback_loop = true;
-    assert_invariant(gets.max_texture_image_units >= 16);
-    assert_invariant(gets.max_combined_texture_image_units >= 32);
-
-    procs.maxShaderCompilerThreadsKHR = glMaxShaderCompilerThreadsARB;
-#endif
-
-#ifdef GL_EXT_texture_filter_anisotropic
-    if (ext.EXT_texture_filter_anisotropic) {
-        glGetFloatv(GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT, &gets.max_anisotropy);
+        gets.max_anisotropy = 1;
    }
 #endif

-    /*
-     * Figure out which driver bugs we need to workaround
-     */
-
-    const bool isAngle = strstr(state.renderer, "ANGLE");
-    if (!isAngle) {
-        if (strstr(state.renderer, "Adreno")) {
-            // Qualcomm GPU
-            bugs.invalidate_end_only_if_invalidate_start = true;
-
-            // On Adreno (As of 3/20) timer query seem to return the CPU time, not the GPU time.
-            bugs.dont_use_timer_query = true;
-
-            // Blits to texture arrays are failing
-            //   This bug continues to reproduce, though at times we've seen it appear to "go away".
-            //   The standalone sample app that was written to show this problem still reproduces.
-            //   The working hypothesis is that some other state affects this behavior.
-            bugs.disable_blit_into_texture_array = true;
-
-            // early exit condition is flattened in EASU code
-            bugs.split_easu = true;
-
-            // initialize the non-used uniform array for Adreno drivers.
-            bugs.enable_initialize_non_used_uniform_array = true;
-
-            int maj, min, driverMajor, driverMinor;
-            int const c = sscanf(state.version, "OpenGL ES %d.%d V@%d.%d", // NOLINT(cert-err34-c)
-                    &maj, &min, &driverMajor, &driverMinor);
-            if (c == 4) {
-                // Workarounds based on version here.
-                // notes:
-                //  bugs.invalidate_end_only_if_invalidate_start
-                //  - appeared at least in
-                //      "OpenGL ES 3.2 V@0490.0 (GIT@85da404, I46ff5fc46f, 1606794520) (Date:11/30/20)"
-                //  - wasn't present in
-                //      "OpenGL ES 3.2 V@0490.0 (GIT@0905e9f, Ia11ce2d146, 1599072951) (Date:09/02/20)"
-                //  - has been confirmed fixed in V@570.1 by Qualcomm
-                if (driverMajor < 490 || driverMajor > 570 ||
-                    (driverMajor == 570 && driverMinor >= 1)) {
-                    bugs.invalidate_end_only_if_invalidate_start = false;
-                }
-            }
-
-            // qualcomm seems to have no problem with this (which is good for us)
-            bugs.allow_read_only_ancillary_feedback_loop = true;
-        } else if (strstr(state.renderer, "Mali")) {
-            // ARM GPU
-            bugs.vao_doesnt_store_element_array_buffer_binding = true;
-            if (strstr(state.renderer, "Mali-T")) {
-                bugs.disable_glFlush = true;
-                bugs.disable_shared_context_draws = true;
-                bugs.texture_external_needs_rebind = true;
-                // We have not verified that timer queries work on Mali-T, so we disable to be safe.
-                bugs.dont_use_timer_query = true;
-            }
-            if (strstr(state.renderer, "Mali-G")) {
-                // assume we don't have working timer queries
-                bugs.dont_use_timer_query = true;
-
-                int maj, min, driverVersion, driverRevision, driverPatch;
-                int const c = sscanf(state.version, "OpenGL ES %d.%d v%d.r%dp%d", // NOLINT(cert-err34-c)
-                        &maj, &min, &driverVersion, &driverRevision, &driverPatch);
-                if (c == 5) {
-                    // Workarounds based on version here.
-                    // notes:
-                    //  bugs.dont_use_timer_query : on some Mali-Gxx drivers timer query seems
-                    //  to cause memory corruptions in some cases on some devices (see b/233754398).
-                    //  - appeared at least in
-                    //      "OpenGL ES 3.2 v1.r26p0-01eac0"
-                    //  - wasn't present in
-                    //      "OpenGL ES 3.2 v1.r32p1-00pxl1"
-                    if (driverVersion >= 2 || (driverVersion == 1 && driverRevision >= 32)) {
-                        bugs.dont_use_timer_query = false;
-                    }
-                }
-            }
-            // Mali seems to have no problem with this (which is good for us)
-            bugs.allow_read_only_ancillary_feedback_loop = true;
-        } else if (strstr(state.renderer, "Intel")) {
-            // Intel GPU
-            bugs.vao_doesnt_store_element_array_buffer_binding = true;
-        } else if (strstr(state.renderer, "PowerVR")) {
-            // PowerVR GPU
-        } else if (strstr(state.renderer, "Apple")) {
-            // Apple GPU
-        } else if (strstr(state.renderer, "Tegra") ||
-                   strstr(state.renderer, "GeForce") ||
-                   strstr(state.renderer, "NV")) {
-            // NVIDIA GPU
-        } else if (strstr(state.renderer, "Vivante")) {
-            // Vivante GPU
-        } else if (strstr(state.renderer, "AMD") ||
-                   strstr(state.renderer, "ATI")) {
-            // AMD/ATI GPU
-        } else if (strstr(state.renderer, "Mozilla")) {
-            bugs.disable_invalidate_framebuffer = true;
-        }
-    } else {
-        // When running under ANGLE, it's a different set of workaround that we need.
-        if (strstr(state.renderer, "Adreno")) {
-            // Qualcomm GPU
-            // early exit condition is flattened in EASU code
-            // (that should be regardless of ANGLE, but we should double-check)
-            bugs.split_easu = true;
-        }
-        // TODO: see if we could use `bugs.allow_read_only_ancillary_feedback_loop = true`
-    }

    slog.v << "Feature level: " << +mFeatureLevel << '\n';
    slog.v << "Active workarounds: " << '\n';
@@ -344,14 +155,14 @@ OpenGLContext::OpenGLContext() noexcept {
 #endif

 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-    assert_invariant(state.major <= 2 || gets.max_draw_buffers >= 4); // minspec
+    assert_invariant(mFeatureLevel == FeatureLevel::FEATURE_LEVEL_0 || gets.max_draw_buffers >= 4); // minspec
 #endif

    setDefaultState();

 #ifdef GL_EXT_texture_filter_anisotropic
 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-    if (state.major > 2 && ext.EXT_texture_filter_anisotropic) {
+    if (mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_1 && ext.EXT_texture_filter_anisotropic) {
        // make sure we don't have any error flag
        while (glGetError() != GL_NO_ERROR) { }

@@ -451,11 +262,293 @@ void OpenGLContext::setDefaultState() noexcept {
        glClipControlEXT(GL_LOWER_LEFT_EXT, GL_ZERO_TO_ONE_EXT);
 #endif
    }
+
+    if (ext.EXT_clip_cull_distance) {
+        glEnable(GL_CLIP_DISTANCE0);
+    }
+}
+
+
+void OpenGLContext::initProcs(Procs* procs,
+        Extensions const& ext, GLint major, GLint) noexcept {
+    (void)ext;
+    (void)major;
+
+    // default procs that can be overridden based on runtime version
+#ifdef BACKEND_OPENGL_LEVEL_GLES30
+    procs->genVertexArrays = glGenVertexArrays;
+    procs->bindVertexArray = glBindVertexArray;
+    procs->deleteVertexArrays = glDeleteVertexArrays;
+
+    // these are core in GL and GLES 3.x
+    procs->genQueries = glGenQueries;
+    procs->deleteQueries = glDeleteQueries;
+    procs->beginQuery = glBeginQuery;
+    procs->endQuery = glEndQuery;
+    procs->getQueryObjectuiv = glGetQueryObjectuiv;
+#   ifdef BACKEND_OPENGL_VERSION_GL
+    procs->getQueryObjectui64v = glGetQueryObjectui64v; // only core in GL
+#   elif defined(GL_EXT_disjoint_timer_query)
+    procs->getQueryObjectui64v = glGetQueryObjectui64vEXT;
+#   endif // BACKEND_OPENGL_VERSION_GL
+
+    // core in ES 3.0 and GL 4.3
+    procs->invalidateFramebuffer = glInvalidateFramebuffer;
+#endif // BACKEND_OPENGL_LEVEL_GLES30
+
+    // no-op if not supported
+    procs->maxShaderCompilerThreadsKHR = +[](GLuint) {};
+
+#ifdef BACKEND_OPENGL_VERSION_GLES
+#   ifndef IOS // IOS is guaranteed to have ES3.x
+    if (UTILS_UNLIKELY(major == 2)) {
+        // Runtime OpenGL version is ES 2.x
+        if (UTILS_LIKELY(ext.OES_vertex_array_object)) {
+            procs->genVertexArrays = glGenVertexArraysOES;
+            procs->bindVertexArray = glBindVertexArrayOES;
+            procs->deleteVertexArrays = glDeleteVertexArraysOES;
+        } else {
+            // if we don't have OES_vertex_array_object, just don't do anything with real VAOs,
+            // we'll just rebind everything each time. Most Mali-400 support this extension, but
+            // a few don't.
+            procs->genVertexArrays = +[](GLsizei, GLuint*) {};
+            procs->bindVertexArray = +[](GLuint) {};
+            procs->deleteVertexArrays = +[](GLsizei, GLuint const*) {};
+        }
+
+        // EXT_disjoint_timer_query is optional -- pointers will be null if not available
+        procs->genQueries = glGenQueriesEXT;
+        procs->deleteQueries = glDeleteQueriesEXT;
+        procs->beginQuery = glBeginQueryEXT;
+        procs->endQuery = glEndQueryEXT;
+        procs->getQueryObjectuiv = glGetQueryObjectuivEXT;
+        procs->getQueryObjectui64v = glGetQueryObjectui64vEXT;
+
+        procs->invalidateFramebuffer = glDiscardFramebufferEXT;
+
+        procs->maxShaderCompilerThreadsKHR = glMaxShaderCompilerThreadsKHR;
+    }
+#   endif // IOS
+#else
+    procs->maxShaderCompilerThreadsKHR = glMaxShaderCompilerThreadsARB;
+#endif
+}
+
+void OpenGLContext::initBugs(Bugs* bugs, Extensions const& exts,
+        GLint major, GLint minor,
+        char const* vendor,
+        char const* renderer,
+        char const* version,
+        char const* shader) {
+
+    (void)major;
+    (void)minor;
+    (void)vendor;
+    (void)renderer;
+    (void)version;
+    (void)shader;
+
+    const bool isAngle = strstr(renderer, "ANGLE");
+    if (!isAngle) {
+        if (strstr(renderer, "Adreno")) {
+            // Qualcomm GPU
+            bugs->invalidate_end_only_if_invalidate_start = true;
+
+            // On Adreno (As of 3/20) timer query seem to return the CPU time, not the GPU time.
+            bugs->dont_use_timer_query = true;
+
+            // Blits to texture arrays are failing
+            //   This bug continues to reproduce, though at times we've seen it appear to "go away".
+            //   The standalone sample app that was written to show this problem still reproduces.
+            //   The working hypothesis is that some other state affects this behavior.
+            bugs->disable_blit_into_texture_array = true;
+
+            // early exit condition is flattened in EASU code
+            bugs->split_easu = true;
+
+            // initialize the non-used uniform array for Adreno drivers.
+            bugs->enable_initialize_non_used_uniform_array = true;
+
+            int maj, min, driverMajor, driverMinor;
+            int const c = sscanf(version, "OpenGL ES %d.%d V@%d.%d", // NOLINT(cert-err34-c)
+                    &maj, &min, &driverMajor, &driverMinor);
+            if (c == 4) {
+                // Workarounds based on version here.
+                // Notes:
+                //  bugs.invalidate_end_only_if_invalidate_start
+                //  - appeared at least in
+                //      "OpenGL ES 3.2 V@0490.0 (GIT@85da404, I46ff5fc46f, 1606794520) (Date:11/30/20)"
+                //  - wasn't present in
+                //      "OpenGL ES 3.2 V@0490.0 (GIT@0905e9f, Ia11ce2d146, 1599072951) (Date:09/02/20)"
+                //  - has been confirmed fixed in V@570.1 by Qualcomm
+                if (driverMajor < 490 || driverMajor > 570 ||
+                    (driverMajor == 570 && driverMinor >= 1)) {
+                    bugs->invalidate_end_only_if_invalidate_start = false;
+                }
+            }
+
+            // qualcomm seems to have no problem with this (which is good for us)
+            bugs->allow_read_only_ancillary_feedback_loop = true;
+
+            // Older Adreno devices that support ES3.0 only tend to be extremely buggy, so we
+            // fall back to ES2.0.
+            if (major == 3 && minor == 0) {
+                bugs->force_feature_level0 = true;
+            }
+        } else if (strstr(renderer, "Mali")) {
+            // ARM GPU
+            bugs->vao_doesnt_store_element_array_buffer_binding = true;
+            if (strstr(renderer, "Mali-T")) {
+                bugs->disable_glFlush = true;
+                bugs->disable_shared_context_draws = true;
+                bugs->texture_external_needs_rebind = true;
+                // We have not verified that timer queries work on Mali-T, so we disable to be safe.
+                bugs->dont_use_timer_query = true;
+            }
+            if (strstr(renderer, "Mali-G")) {
+                // We have run into several problems with timer queries on Mali-Gxx:
+                // - timer queries seem to cause memory corruptions in some cases on some devices
+                //   (see b/233754398)
+                //          - appeared at least in: "OpenGL ES 3.2 v1.r26p0-01eac0"
+                //          - wasn't present in: "OpenGL ES 3.2 v1.r32p1-00pxl1"
+                // - timer queries sometime crash with an NPE (see b/273759031)
+                bugs->dont_use_timer_query = true;
+            }
+            // Mali seems to have no problem with this (which is good for us)
+            bugs->allow_read_only_ancillary_feedback_loop = true;
+        } else if (strstr(renderer, "Intel")) {
+            // Intel GPU
+            bugs->vao_doesnt_store_element_array_buffer_binding = true;
+        } else if (strstr(renderer, "PowerVR")) {
+            // PowerVR GPU
+            // On PowerVR (Rogue GE8320) glFlush doesn't seem to do anything, in particular,
+            // it doesn't kick the GPU earlier, so don't issue these calls as they seem to slow
+            // things down.
+            bugs->disable_glFlush = true;
+            // On PowerVR (Rogue GE8320) using gl_InstanceID too early in the shader doesn't work.
+            bugs->powervr_shader_workarounds = true;
+            // On PowerVR (Rogue GE8320) destroying a fbo after glBlitFramebuffer is effectively
+            // equivalent to glFinish.
+            bugs->delay_fbo_destruction = true;
+            // PowerVR seems to have no problem with this (which is good for us)
+            bugs->allow_read_only_ancillary_feedback_loop = true;
+            // PowerVR has a shader compiler thread pinned on the last core
+            bugs->disable_thread_affinity = true;
+        } else if (strstr(renderer, "Apple")) {
+            // Apple GPU
+        } else if (strstr(renderer, "Tegra") ||
+                   strstr(renderer, "GeForce") ||
+                   strstr(renderer, "NV")) {
+            // NVIDIA GPU
+        } else if (strstr(renderer, "Vivante")) {
+            // Vivante GPU
+        } else if (strstr(renderer, "AMD") ||
+                   strstr(renderer, "ATI")) {
+            // AMD/ATI GPU
+        } else if (strstr(renderer, "Mozilla")) {
+            bugs->disable_invalidate_framebuffer = true;
+        }
+    } else {
+        // When running under ANGLE, it's a different set of workaround that we need.
+        if (strstr(renderer, "Adreno")) {
+            // Qualcomm GPU
+            // early exit condition is flattened in EASU code
+            // (that should be regardless of ANGLE, but we should double-check)
+            bugs->split_easu = true;
+        }
+        // TODO: see if we could use `bugs.allow_read_only_ancillary_feedback_loop = true`
+    }
+
+#ifdef BACKEND_OPENGL_VERSION_GLES
+#   ifndef IOS // IOS is guaranteed to have ES3.x
+    if (UTILS_UNLIKELY(major == 2)) {
+        if (UTILS_UNLIKELY(!exts.OES_vertex_array_object)) {
+            // we activate this workaround path, which does the reset of array buffer
+            bugs->vao_doesnt_store_element_array_buffer_binding = true;
+        }
+    }
+#   endif // IOS
+#else
+    // feedback loops are allowed on GL desktop as long as writes are disabled
+    bugs->allow_read_only_ancillary_feedback_loop = true;
+#endif
+}
+
+FeatureLevel OpenGLContext::resolveFeatureLevel(GLint major, GLint minor,
+        Extensions const& exts,
+        Gets const& gets,
+        Bugs const& bugs) noexcept {
+
+    constexpr auto const caps3 = FEATURE_LEVEL_CAPS[+FeatureLevel::FEATURE_LEVEL_3];
+    constexpr GLint MAX_VERTEX_SAMPLER_COUNT = caps3.MAX_VERTEX_SAMPLER_COUNT;
+    constexpr GLint MAX_FRAGMENT_SAMPLER_COUNT = caps3.MAX_FRAGMENT_SAMPLER_COUNT;
+
+    (void)exts;
+    (void)gets;
+    (void)bugs;
+
+    FeatureLevel featureLevel = FeatureLevel::FEATURE_LEVEL_1;
+
+#ifdef BACKEND_OPENGL_VERSION_GLES
+    if (major == 3) {
+        // Runtime OpenGL version is ES 3.x
+        assert_invariant(gets.max_texture_image_units >= 16);
+        assert_invariant(gets.max_combined_texture_image_units >= 32);
+        if (minor >= 1) {
+            // figure out our feature level
+            if (exts.EXT_texture_cube_map_array) {
+                featureLevel = FeatureLevel::FEATURE_LEVEL_2;
+                if (gets.max_texture_image_units >= MAX_FRAGMENT_SAMPLER_COUNT &&
+                    gets.max_combined_texture_image_units >=
+                    (MAX_FRAGMENT_SAMPLER_COUNT + MAX_VERTEX_SAMPLER_COUNT)) {
+                    featureLevel = FeatureLevel::FEATURE_LEVEL_3;
+                }
+            }
+        }
+    }
+#   ifndef IOS // IOS is guaranteed to have ES3.x
+    else if (UTILS_UNLIKELY(major == 2)) {
+        // Runtime OpenGL version is ES 2.x
+#       if defined(BACKEND_OPENGL_LEVEL_GLES30)
+        // mandatory extensions (all supported by Mali-400 and Adreno 304)
+        assert_invariant(exts.OES_depth_texture);
+        assert_invariant(exts.OES_depth24);
+        assert_invariant(exts.OES_packed_depth_stencil);
+        assert_invariant(exts.OES_rgb8_rgba8);
+        assert_invariant(exts.OES_standard_derivatives);
+        assert_invariant(exts.OES_texture_npot);
+#       endif
+        featureLevel = FeatureLevel::FEATURE_LEVEL_0;
+    }
+#   endif // IOS
+#else
+    assert_invariant(gets.max_texture_image_units >= 16);
+    assert_invariant(gets.max_combined_texture_image_units >= 32);
+    if (major == 4) {
+        assert_invariant(minor >= 1);
+        if (minor >= 3) {
+            // cubemap arrays are available as of OpenGL 4.0
+            featureLevel = FeatureLevel::FEATURE_LEVEL_2;
+            // figure out our feature level
+            if (gets.max_texture_image_units >= MAX_FRAGMENT_SAMPLER_COUNT &&
+                gets.max_combined_texture_image_units >=
+                (MAX_FRAGMENT_SAMPLER_COUNT + MAX_VERTEX_SAMPLER_COUNT)) {
+                featureLevel = FeatureLevel::FEATURE_LEVEL_3;
+            }
+        }
+    }
+#endif
+
+    if (bugs.force_feature_level0) {
+        featureLevel = FeatureLevel::FEATURE_LEVEL_0;
+    }
+
+    return featureLevel;
 }

 #ifdef BACKEND_OPENGL_VERSION_GLES

-void OpenGLContext::initExtensionsGLES() noexcept {
+void OpenGLContext::initExtensionsGLES(Extensions* ext, GLint major, GLint minor) noexcept {
    const char * const extensions = (const char*)glGetString(GL_EXTENSIONS);
    GLUtils::unordered_string_set const exts = GLUtils::split(extensions);
    if constexpr (DEBUG_PRINT_EXTENSIONS) {
@@ -467,50 +560,50 @@ void OpenGLContext::initExtensionsGLES() noexcept {

    // figure out and initialize the extensions we need
    using namespace std::literals;
-    ext.APPLE_color_buffer_packed_float = exts.has("GL_APPLE_color_buffer_packed_float"sv);
-    ext.EXT_clip_control = exts.has("GL_EXT_clip_control"sv);
-    ext.EXT_color_buffer_float = exts.has("GL_EXT_color_buffer_float"sv);
-    ext.EXT_color_buffer_half_float = exts.has("GL_EXT_color_buffer_half_float"sv);
-    ext.EXT_debug_marker = exts.has("GL_EXT_debug_marker"sv);
-    ext.EXT_discard_framebuffer = exts.has("GL_EXT_discard_framebuffer"sv);
-    ext.EXT_disjoint_timer_query = exts.has("GL_EXT_disjoint_timer_query"sv);
-    ext.EXT_multisampled_render_to_texture = exts.has("GL_EXT_multisampled_render_to_texture"sv);
-    ext.EXT_multisampled_render_to_texture2 = exts.has("GL_EXT_multisampled_render_to_texture2"sv);
-    ext.EXT_shader_framebuffer_fetch = exts.has("GL_EXT_shader_framebuffer_fetch"sv);
+    ext->APPLE_color_buffer_packed_float = exts.has("GL_APPLE_color_buffer_packed_float"sv);
+    ext->EXT_clip_control = exts.has("GL_EXT_clip_control"sv);
+    ext->EXT_clip_cull_distance = exts.has("GL_EXT_clip_cull_distance"sv);
+    ext->EXT_color_buffer_float = exts.has("GL_EXT_color_buffer_float"sv);
+    ext->EXT_color_buffer_half_float = exts.has("GL_EXT_color_buffer_half_float"sv);
+    ext->EXT_debug_marker = exts.has("GL_EXT_debug_marker"sv);
+    ext->EXT_discard_framebuffer = exts.has("GL_EXT_discard_framebuffer"sv);
+    ext->EXT_disjoint_timer_query = exts.has("GL_EXT_disjoint_timer_query"sv);
+    ext->EXT_multisampled_render_to_texture = exts.has("GL_EXT_multisampled_render_to_texture"sv);
+    ext->EXT_multisampled_render_to_texture2 = exts.has("GL_EXT_multisampled_render_to_texture2"sv);
+    ext->EXT_shader_framebuffer_fetch = exts.has("GL_EXT_shader_framebuffer_fetch"sv);
 #if !defined(__EMSCRIPTEN__)
-    ext.EXT_texture_compression_etc2 = true;
+    ext->EXT_texture_compression_etc2 = true;
 #endif
-    ext.EXT_texture_compression_s3tc = exts.has("GL_EXT_texture_compression_s3tc"sv);
-    ext.EXT_texture_compression_s3tc_srgb = exts.has("GL_EXT_texture_compression_s3tc_srgb"sv);
-    ext.EXT_texture_compression_rgtc = exts.has("GL_EXT_texture_compression_rgtc"sv);
-    ext.EXT_texture_compression_bptc = exts.has("GL_EXT_texture_compression_bptc"sv);
-    ext.EXT_texture_cube_map_array = exts.has("GL_EXT_texture_cube_map_array"sv) || exts.has("GL_OES_texture_cube_map_array"sv);
-    ext.GOOGLE_cpp_style_line_directive = exts.has("GL_GOOGLE_cpp_style_line_directive"sv);
-    ext.KHR_debug = exts.has("GL_KHR_debug"sv);
-    ext.KHR_parallel_shader_compile = exts.has("GL_KHR_parallel_shader_compile"sv);
-    ext.KHR_texture_compression_astc_hdr = exts.has("GL_KHR_texture_compression_astc_hdr"sv);
-    ext.KHR_texture_compression_astc_ldr = exts.has("GL_KHR_texture_compression_astc_ldr"sv);
-    ext.OES_depth_texture = exts.has("GL_OES_depth_texture"sv);
-    ext.OES_depth24 = exts.has("GL_OES_depth24"sv);
-    ext.OES_packed_depth_stencil = exts.has("GL_OES_packed_depth_stencil"sv);
-    ext.OES_EGL_image_external_essl3 = exts.has("GL_OES_EGL_image_external_essl3"sv);
-    ext.OES_rgb8_rgba8 = exts.has("GL_OES_rgb8_rgba8"sv);
-    ext.OES_standard_derivatives = exts.has("GL_OES_standard_derivatives"sv);
-    ext.OES_texture_npot = exts.has("GL_OES_texture_npot"sv);
-    ext.OES_vertex_array_object = exts.has("GL_OES_vertex_array_object"sv);
-    ext.WEBGL_compressed_texture_etc = exts.has("WEBGL_compressed_texture_etc"sv);
-    ext.WEBGL_compressed_texture_s3tc = exts.has("WEBGL_compressed_texture_s3tc"sv);
-    ext.WEBGL_compressed_texture_s3tc_srgb = exts.has("WEBGL_compressed_texture_s3tc_srgb"sv);
+    ext->EXT_texture_compression_s3tc = exts.has("GL_EXT_texture_compression_s3tc"sv);
+    ext->EXT_texture_compression_s3tc_srgb = exts.has("GL_EXT_texture_compression_s3tc_srgb"sv);
+    ext->EXT_texture_compression_rgtc = exts.has("GL_EXT_texture_compression_rgtc"sv);
+    ext->EXT_texture_compression_bptc = exts.has("GL_EXT_texture_compression_bptc"sv);
+    ext->EXT_texture_cube_map_array = exts.has("GL_EXT_texture_cube_map_array"sv) || exts.has("GL_OES_texture_cube_map_array"sv);
+    ext->GOOGLE_cpp_style_line_directive = exts.has("GL_GOOGLE_cpp_style_line_directive"sv);
+    ext->KHR_debug = exts.has("GL_KHR_debug"sv);
+    ext->KHR_parallel_shader_compile = exts.has("GL_KHR_parallel_shader_compile"sv);
+    ext->KHR_texture_compression_astc_hdr = exts.has("GL_KHR_texture_compression_astc_hdr"sv);
+    ext->KHR_texture_compression_astc_ldr = exts.has("GL_KHR_texture_compression_astc_ldr"sv);
+    ext->OES_depth_texture = exts.has("GL_OES_depth_texture"sv);
+    ext->OES_depth24 = exts.has("GL_OES_depth24"sv);
+    ext->OES_packed_depth_stencil = exts.has("GL_OES_packed_depth_stencil"sv);
+    ext->OES_EGL_image_external_essl3 = exts.has("GL_OES_EGL_image_external_essl3"sv);
+    ext->OES_rgb8_rgba8 = exts.has("GL_OES_rgb8_rgba8"sv);
+    ext->OES_standard_derivatives = exts.has("GL_OES_standard_derivatives"sv);
+    ext->OES_texture_npot = exts.has("GL_OES_texture_npot"sv);
+    ext->OES_vertex_array_object = exts.has("GL_OES_vertex_array_object"sv);
+    ext->WEBGL_compressed_texture_etc = exts.has("WEBGL_compressed_texture_etc"sv);
+    ext->WEBGL_compressed_texture_s3tc = exts.has("WEBGL_compressed_texture_s3tc"sv);
+    ext->WEBGL_compressed_texture_s3tc_srgb = exts.has("WEBGL_compressed_texture_s3tc_srgb"sv);

    // ES 3.2 implies EXT_color_buffer_float
-    if (state.major > 3 || (state.major == 3 && state.minor >= 2)) {
-        ext.EXT_color_buffer_float = true;
+    if (major > 3 || (major == 3 && minor >= 2)) {
+        ext->EXT_color_buffer_float = true;
    }
-
    // ES 3.x implies EXT_discard_framebuffer and OES_vertex_array_object
-    if (state.major >= 3) {
-        ext.EXT_color_buffer_float = true;
-        ext.OES_vertex_array_object = true;
+    if (major >= 3) {
+        ext->EXT_discard_framebuffer = true;
+        ext->OES_vertex_array_object = true;
    }
 }

@@ -518,7 +611,7 @@ void OpenGLContext::initExtensionsGLES() noexcept {

 #ifdef BACKEND_OPENGL_VERSION_GL

-void OpenGLContext::initExtensionsGL() noexcept {
+void OpenGLContext::initExtensionsGL(Extensions* ext, GLint major, GLint minor) noexcept {
    GLUtils::unordered_string_set exts;
    GLint n = 0;
    glGetIntegerv(GL_NUM_EXTENSIONS, &n);
@@ -533,54 +626,52 @@ void OpenGLContext::initExtensionsGL() noexcept {
    }

    using namespace std::literals;
-    ext.APPLE_color_buffer_packed_float = true;  // Assumes core profile.
-    ext.ARB_shading_language_packing = exts.has("GL_ARB_shading_language_packing"sv);
-    ext.EXT_color_buffer_float = true;  // Assumes core profile.
-    ext.EXT_color_buffer_half_float = true;  // Assumes core profile.
-    ext.EXT_debug_marker = exts.has("GL_EXT_debug_marker"sv);
-    ext.EXT_discard_framebuffer = false;
-    ext.EXT_disjoint_timer_query = true;
-    ext.EXT_multisampled_render_to_texture = false;
-    ext.EXT_multisampled_render_to_texture2 = false;
-    ext.EXT_shader_framebuffer_fetch = exts.has("GL_EXT_shader_framebuffer_fetch"sv);
-    ext.EXT_texture_compression_bptc = exts.has("GL_EXT_texture_compression_bptc"sv);
-    ext.EXT_texture_compression_etc2 = exts.has("GL_ARB_ES3_compatibility"sv);
-    ext.EXT_texture_compression_rgtc = exts.has("GL_EXT_texture_compression_rgtc"sv);
-    ext.EXT_texture_compression_s3tc = exts.has("GL_EXT_texture_compression_s3tc"sv);
-    ext.EXT_texture_compression_s3tc_srgb = exts.has("GL_EXT_texture_compression_s3tc_srgb"sv);
-    ext.EXT_texture_cube_map_array = true;
-    ext.EXT_texture_filter_anisotropic = exts.has("GL_EXT_texture_filter_anisotropic"sv);
-    ext.EXT_texture_sRGB = exts.has("GL_EXT_texture_sRGB"sv);
-    ext.GOOGLE_cpp_style_line_directive = exts.has("GL_GOOGLE_cpp_style_line_directive"sv);
-    ext.KHR_parallel_shader_compile = exts.has("GL_KHR_parallel_shader_compile"sv);
-    ext.KHR_texture_compression_astc_hdr = exts.has("GL_KHR_texture_compression_astc_hdr"sv);
-    ext.KHR_texture_compression_astc_ldr = exts.has("GL_KHR_texture_compression_astc_ldr"sv);
-    ext.OES_depth_texture = true;
-    ext.OES_depth24 = true;
-    ext.OES_EGL_image_external_essl3 = false;
-    ext.OES_rgb8_rgba8 = true;
-    ext.OES_standard_derivatives = true;
-    ext.OES_texture_npot = true;
-    ext.OES_vertex_array_object = true;
-    ext.WEBGL_compressed_texture_etc = false;
-    ext.WEBGL_compressed_texture_s3tc = false;
-    ext.WEBGL_compressed_texture_s3tc_srgb = false;
-
-    auto const major = state.major;
-    auto const minor = state.minor;
+    ext->APPLE_color_buffer_packed_float = true;  // Assumes core profile.
+    ext->ARB_shading_language_packing = exts.has("GL_ARB_shading_language_packing"sv);
+    ext->EXT_color_buffer_float = true;  // Assumes core profile.
+    ext->EXT_color_buffer_half_float = true;  // Assumes core profile.
+    ext->EXT_clip_cull_distance = true;
+    ext->EXT_debug_marker = exts.has("GL_EXT_debug_marker"sv);
+    ext->EXT_discard_framebuffer = false;
+    ext->EXT_disjoint_timer_query = true;
+    ext->EXT_multisampled_render_to_texture = false;
+    ext->EXT_multisampled_render_to_texture2 = false;
+    ext->EXT_shader_framebuffer_fetch = exts.has("GL_EXT_shader_framebuffer_fetch"sv);
+    ext->EXT_texture_compression_bptc = exts.has("GL_EXT_texture_compression_bptc"sv);
+    ext->EXT_texture_compression_etc2 = exts.has("GL_ARB_ES3_compatibility"sv);
+    ext->EXT_texture_compression_rgtc = exts.has("GL_EXT_texture_compression_rgtc"sv);
+    ext->EXT_texture_compression_s3tc = exts.has("GL_EXT_texture_compression_s3tc"sv);
+    ext->EXT_texture_compression_s3tc_srgb = exts.has("GL_EXT_texture_compression_s3tc_srgb"sv);
+    ext->EXT_texture_cube_map_array = true;
+    ext->EXT_texture_filter_anisotropic = exts.has("GL_EXT_texture_filter_anisotropic"sv);
+    ext->EXT_texture_sRGB = exts.has("GL_EXT_texture_sRGB"sv);
+    ext->GOOGLE_cpp_style_line_directive = exts.has("GL_GOOGLE_cpp_style_line_directive"sv);
+    ext->KHR_parallel_shader_compile = exts.has("GL_KHR_parallel_shader_compile"sv);
+    ext->KHR_texture_compression_astc_hdr = exts.has("GL_KHR_texture_compression_astc_hdr"sv);
+    ext->KHR_texture_compression_astc_ldr = exts.has("GL_KHR_texture_compression_astc_ldr"sv);
+    ext->OES_depth_texture = true;
+    ext->OES_depth24 = true;
+    ext->OES_EGL_image_external_essl3 = false;
+    ext->OES_rgb8_rgba8 = true;
+    ext->OES_standard_derivatives = true;
+    ext->OES_texture_npot = true;
+    ext->OES_vertex_array_object = true;
+    ext->WEBGL_compressed_texture_etc = false;
+    ext->WEBGL_compressed_texture_s3tc = false;
+    ext->WEBGL_compressed_texture_s3tc_srgb = false;

    // OpenGL 4.2 implies ARB_shading_language_packing
    if (major > 4 || (major == 4 && minor >= 2)) {
-        ext.ARB_shading_language_packing = true;
+        ext->ARB_shading_language_packing = true;
    }
    // OpenGL 4.3 implies EXT_discard_framebuffer
    if (major > 4 || (major == 4 && minor >= 3)) {
-        ext.EXT_discard_framebuffer = true;
-        ext.KHR_debug = true;
+        ext->EXT_discard_framebuffer = true;
+        ext->KHR_debug = true;
    }
    // OpenGL 4.5 implies EXT_clip_control
    if (major > 4 || (major == 4 && minor >= 5)) {
-        ext.EXT_clip_control = true;
+        ext->EXT_clip_control = true;
    }
 }

@@ -676,7 +767,7 @@ void OpenGLContext::deleteBuffers(GLsizei n, const GLuint* buffers, GLenum targe
    }

 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-    assert_invariant(state.major > 2 ||
+    assert_invariant(mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_1 ||
            (target != GL_UNIFORM_BUFFER && target != GL_TRANSFORM_FEEDBACK_BUFFER));

    if (target == GL_UNIFORM_BUFFER || target == GL_TRANSFORM_FEEDBACK_BUFFER) {
@@ -888,63 +979,4 @@ void OpenGLContext::resetState() noexcept {
    
 }

-OpenGLContext::FenceSync OpenGLContext::createFenceSync(
-        OpenGLPlatform& platform) noexcept {
-
-    if (UTILS_UNLIKELY(isES2())) {
-        assert_invariant(platform.canCreateFence());
-        return { .fence = platform.createFence() };
-    }
-
-#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-    auto sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
-    CHECK_GL_ERROR(utils::slog.e)
-    return { .sync = sync };
-#else
-    return {};
-#endif
-}
-
-void OpenGLContext::destroyFenceSync(
-        OpenGLPlatform& platform, FenceSync sync) noexcept {
-
-    if (UTILS_UNLIKELY(isES2())) {
-        platform.destroyFence(static_cast<Platform::Fence*>(sync.fence));
-        return;
-    }
-
-#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-    glDeleteSync(sync.sync);
-    CHECK_GL_ERROR(utils::slog.e)
-#endif
-}
-
-OpenGLContext::FenceSync::Status OpenGLContext::clientWaitSync(
-        OpenGLPlatform& platform, FenceSync sync) const noexcept {
-
-    if (UTILS_UNLIKELY(isES2())) {
-        using Status = OpenGLContext::FenceSync::Status;
-        auto const status = platform.waitFence(static_cast<Platform::Fence*>(sync.fence), 0u);
-        switch (status) {
-            case FenceStatus::ERROR:                return Status::FAILURE;
-            case FenceStatus::CONDITION_SATISFIED:  return Status::CONDITION_SATISFIED;
-            case FenceStatus::TIMEOUT_EXPIRED:      return Status ::TIMEOUT_EXPIRED;
-        }
-    }
-
-#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-    GLenum const status = glClientWaitSync(sync.sync, 0, 0u);
-    CHECK_GL_ERROR(utils::slog.e)
-    using Status = OpenGLContext::FenceSync::Status;
-    switch (status) {
-        case GL_ALREADY_SIGNALED:       return Status::ALREADY_SIGNALED;
-        case GL_TIMEOUT_EXPIRED:        return Status::TIMEOUT_EXPIRED;
-        case GL_CONDITION_SATISFIED:    return Status::CONDITION_SATISFIED;
-        default:                        return Status::FAILURE;
-    }
-#else
-    return FenceSync::Status::FAILURE;
-#endif
-}
-
 } // namesapce filament
--- a/filament/backend/src/opengl/OpenGLContext.h
+++ b/filament/backend/src/opengl/OpenGLContext.h
@@ -92,7 +92,7 @@ public:
 #   ifndef BACKEND_OPENGL_LEVEL_GLES30
            return true;
 #   else
-            return state.major == 2;
+            return mFeatureLevel == FeatureLevel::FEATURE_LEVEL_0;
 #   endif
 #else
        return false;
@@ -150,28 +150,8 @@ public:
    void deleteBuffers(GLsizei n, const GLuint* buffers, GLenum target) noexcept;
    void deleteVertexArrays(GLsizei n, const GLuint* arrays) noexcept;

-    // we abstract GL's sync because it's not available in ES2, but we can use EGL's sync
-    // instead, if available.
-    struct FenceSync {
-        enum class Status {
-            ALREADY_SIGNALED,
-            TIMEOUT_EXPIRED,
-            CONDITION_SATISFIED,
-            FAILURE
-        };
-        union {
-            void* fence;
-            GLsync sync;
-        };
-    };
-
-    FenceSync createFenceSync(OpenGLPlatform& platform) noexcept;
-    void destroyFenceSync(OpenGLPlatform& platform, FenceSync sync) noexcept;
-    FenceSync::Status clientWaitSync(OpenGLPlatform& platform, FenceSync sync) const noexcept;
-
-
    // glGet*() values
-    struct {
+    struct Gets {
        GLfloat max_anisotropy;
        GLint max_draw_buffers;
        GLint max_renderbuffer_size;
@@ -190,10 +170,11 @@ public:
    } features = {};

    // supported extensions detected at runtime
-    struct {
+    struct Extensions {
        bool APPLE_color_buffer_packed_float;
        bool ARB_shading_language_packing;
        bool EXT_clip_control;
+        bool EXT_clip_cull_distance;
        bool EXT_color_buffer_float;
        bool EXT_color_buffer_half_float;
        bool EXT_debug_marker;
@@ -228,7 +209,7 @@ public:
        bool WEBGL_compressed_texture_s3tc_srgb;
    } ext = {};

-    struct {
+    struct Bugs {
        // Some drivers have issues with UBOs in the fragment shader when
        // glFlush() is called between draw calls.
        bool disable_glFlush;
@@ -280,6 +261,24 @@ public:
        // Some Adreno drivers crash in glDrawXXX() when there's an uninitialized uniform block,
        // even when the shader doesn't access it.
        bool enable_initialize_non_used_uniform_array;
+
+        // Workarounds specific to PowerVR GPUs affecting shaders (currently, we lump them all
+        // under one specialization constant).
+        // - gl_InstanceID is invalid when used first in the vertex shader
+        bool powervr_shader_workarounds;
+
+        // On PowerVR destroying the destination of a glBlitFramebuffer operation is equivalent to
+        // a glFinish. So we must delay the destruction until we know the GPU is finished.
+        bool delay_fbo_destruction;
+
+        // The driver has some threads pinned, and we can't easily know on which core, it can hurt
+        // performance more if we end-up pinned on the same one.
+        bool disable_thread_affinity;
+
+        // Force feature level 0. Typically used for low end ES3 devices with significant driver
+        // bugs or performance issues.
+        bool force_feature_level0;
+
    } bugs = {};

    // state getters -- as needed.
@@ -402,7 +401,7 @@ public:
        } window;
    } state;

-    struct {
+    struct Procs {
        void (* bindVertexArray)(GLuint array);
        void (* deleteVertexArrays)(GLsizei n, const GLuint* arrays);
        void (* genVertexArrays)(GLsizei n, GLuint* arrays);
@@ -463,18 +462,55 @@ private:
            {   bugs.enable_initialize_non_used_uniform_array,
                    "enable_initialize_non_used_uniform_array",
                    ""},
+            {   bugs.powervr_shader_workarounds,
+                    "powervr_shader_workarounds",
+                    ""},
+            {   bugs.delay_fbo_destruction,
+                    "delay_fbo_destruction",
+                    ""},
+            {   bugs.disable_thread_affinity,
+                    "disable_thread_affinity",
+                    ""},
+            {   bugs.force_feature_level0,
+                    "force_feature_level0",
+                    ""},
    }};

    RenderPrimitive mDefaultVAO;

    // this is chosen to minimize code size
 #if defined(BACKEND_OPENGL_VERSION_GLES)
-    void initExtensionsGLES() noexcept;
+    static void initExtensionsGLES(Extensions* ext, GLint major, GLint minor) noexcept;
 #endif
 #if defined(BACKEND_OPENGL_VERSION_GL)
-    void initExtensionsGL() noexcept;
+    static void initExtensionsGL(Extensions* ext, GLint major, GLint minor) noexcept;
 #endif

+    static void initExtensions(Extensions* ext, GLint major, GLint minor) noexcept {
+#if defined(BACKEND_OPENGL_VERSION_GLES)
+        initExtensionsGLES(ext, major, minor);
+#endif
+#if defined(BACKEND_OPENGL_VERSION_GL)
+        initExtensionsGL(ext, major, minor);
+#endif
+    }
+
+    static void initBugs(Bugs* bugs, Extensions const& exts,
+            GLint major, GLint minor,
+            char const* vendor,
+            char const* renderer,
+            char const* version,
+            char const* shader
+    );
+
+    static void initProcs(Procs* procs,
+            Extensions const& exts, GLint major, GLint minor) noexcept;
+
+    static FeatureLevel resolveFeatureLevel(GLint major, GLint minor,
+            Extensions const& exts,
+            Gets const& gets,
+            Bugs const& bugs) noexcept;
+
    template <typename T, typename F>
    static inline void update_state(T& state, T const& expected, F functor, bool force = false) noexcept {
        if (UTILS_UNLIKELY(force || state != expected)) {
@@ -567,7 +603,7 @@ void OpenGLContext::activeTexture(GLuint unit) noexcept {

 void OpenGLContext::bindSampler(GLuint unit, GLuint sampler) noexcept {
    assert_invariant(unit < MAX_TEXTURE_UNIT_COUNT);
-    assert_invariant(state.major > 2);
+    assert_invariant(mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_1);
 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
    update_state(state.textures.units[unit].sampler, sampler, [&]() {
        glBindSampler(unit, sampler);
@@ -613,7 +649,7 @@ void OpenGLContext::bindVertexArray(RenderPrimitive const* p) noexcept {

 void OpenGLContext::bindBufferRange(GLenum target, GLuint index, GLuint buffer,
        GLintptr offset, GLsizeiptr size) noexcept {
-    assert_invariant(state.major > 2);
+    assert_invariant(mFeatureLevel >= FeatureLevel::FEATURE_LEVEL_1);

 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
 #   ifdef BACKEND_OPENGL_LEVEL_GLES31
--- a/filament/backend/src/opengl/OpenGLDriver.cpp
+++ b/filament/backend/src/opengl/OpenGLDriver.cpp
@@ -191,27 +191,7 @@ OpenGLDriver::OpenGLDriver(OpenGLPlatform* platform, const Platform::DriverConfi
    assert_invariant(mContext.ext.EXT_disjoint_timer_query);
 #endif

-#if defined(BACKEND_OPENGL_VERSION_GL) || defined(GL_EXT_disjoint_timer_query)
-    if (mContext.ext.EXT_disjoint_timer_query) {
-        // timer queries are available
-        if (mContext.bugs.dont_use_timer_query && mPlatform.canCreateFence()) {
-            // however, they don't work well, revert to using fences if we can.
-            mTimerQueryImpl = new OpenGLTimerQueryFence(mPlatform);
-        } else {
-            mTimerQueryImpl = new TimerQueryNative(mContext);
-        }
-        mFrameTimeSupported = true;
-    } else
-#endif
-    if (mPlatform.canCreateFence()) {
-        // no timer queries, but we can use fences
-        mTimerQueryImpl = new OpenGLTimerQueryFence(mPlatform);
-        mFrameTimeSupported = true;
-    } else {
-        // no queries, no fences -- that's a problem
-        mTimerQueryImpl = new TimerQueryFallback();
-        mFrameTimeSupported = false;
-    }
+    mTimerQueryImpl = OpenGLTimerQueryFactory::init(mPlatform, *this);

    mShaderCompilerService.init();
 }
@@ -231,13 +211,23 @@ void OpenGLDriver::terminate() {
    // wait for the GPU to finish executing all commands
    glFinish();

+    mShaderCompilerService.terminate();
+
+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
    // and make sure to execute all the GpuCommandCompleteOps callbacks
    executeGpuCommandsCompleteOps();

+    // as well as the FrameCompleteOps callbacks
+    if (UTILS_UNLIKELY(!mFrameCompleteOps.empty())) {
+        for (auto&& op: mFrameCompleteOps) {
+            op();
+        }
+        mFrameCompleteOps.clear();
+    }
+
    // because we called glFinish(), all callbacks should have been executed
    assert_invariant(mGpuCommandCompleteOps.empty());

-#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
    if (!getContext().isES2()) {
        for (auto& item: mSamplerMap) {
            mContext.unbindSampler(item.second);
@@ -249,8 +239,6 @@ void OpenGLDriver::terminate() {

    delete mTimerQueryImpl;

-    mShaderCompilerService.terminate();
-
    mPlatform.terminate();
 }

@@ -436,11 +424,7 @@ Handle<HwRenderTarget> OpenGLDriver::createRenderTargetS() noexcept {
 }

 Handle<HwFence> OpenGLDriver::createFenceS() noexcept {
-    return initHandle<HwFence>();
-}
-
-Handle<HwSync> OpenGLDriver::createSyncS() noexcept {
-    return initHandle<GLSync>();
+    return initHandle<GLFence>();
 }

 Handle<HwSwapChain> OpenGLDriver::createSwapChainS() noexcept {
@@ -1352,28 +1336,25 @@ void OpenGLDriver::createRenderTargetR(Handle<HwRenderTarget> rth,
 void OpenGLDriver::createFenceR(Handle<HwFence> fh, int) {
    DEBUG_MARKER()

-    HwFence* f = handle_cast<HwFence*>(fh);
-    f->fence = mPlatform.createFence();
-}
+    GLFence* f = handle_cast<GLFence*>(fh);

-void OpenGLDriver::createSyncR(Handle<HwSync> fh, int) {
-    DEBUG_MARKER()
-
-    GLSync* f = handle_cast<GLSync *>(fh);
-    f->handle = mContext.createFenceSync(mPlatform);
-
-    // check the status of the sync once a frame, since we must do this from our thread
-    std::weak_ptr<GLSync::State> const weak = f->result;
-    runEveryNowAndThen(
-            [&platform = mPlatform, context = mContext, handle = f->handle, weak]() -> bool {
-        auto result = weak.lock();
-        if (result) {
-            auto const status = context.clientWaitSync(platform, handle);
-            result->status.store(status, std::memory_order_relaxed);
-            return (status != OpenGLContext::FenceSync::Status::TIMEOUT_EXPIRED);
-        }
-        return true;
-    });
+    if (mPlatform.canCreateFence() || mContext.isES2()) {
+        assert_invariant(mPlatform.canCreateFence());
+        f->fence = mPlatform.createFence();
+    }
+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
+    else {
+        std::weak_ptr<GLFence::State> const weak = f->state;
+        whenGpuCommandsComplete([weak](){
+            auto state = weak.lock();
+            if (state) {
+                std::lock_guard const lock(state->lock);
+                state->status = FenceStatus::CONDITION_SATISFIED;
+                state->cond.notify_all();
+            }
+        });
+    }
+#endif
 }

 void OpenGLDriver::createSwapChainR(Handle<HwSwapChain> sch, void* nativeWindow, uint64_t flags) {
@@ -1405,10 +1386,8 @@ void OpenGLDriver::createSwapChainHeadlessR(Handle<HwSwapChain> sch,

 void OpenGLDriver::createTimerQueryR(Handle<HwTimerQuery> tqh, int) {
    DEBUG_MARKER()
-
    GLTimerQuery* tq = handle_cast<GLTimerQuery*>(tqh);
-    mContext.procs.genQueries(1u, &tq->gl.query);
-    CHECK_GL_ERROR(utils::slog.e)
+    mTimerQueryImpl->createTimerQuery(tq);
 }

 // ------------------------------------------------------------------------------------------------
@@ -1513,12 +1492,33 @@ void OpenGLDriver::destroyRenderTarget(Handle<HwRenderTarget> rth) {
        if (rt->gl.fbo) {
            // first unbind this framebuffer if needed
            gl.bindFramebuffer(GL_FRAMEBUFFER, 0);
-            glDeleteFramebuffers(1, &rt->gl.fbo);
        }
        if (rt->gl.fbo_read) {
            // first unbind this framebuffer if needed
            gl.bindFramebuffer(GL_FRAMEBUFFER, 0);
-            glDeleteFramebuffers(1, &rt->gl.fbo_read);
+        }
+
+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
+        if (UTILS_UNLIKELY(gl.bugs.delay_fbo_destruction)) {
+            if (rt->gl.fbo) {
+                whenFrameComplete([fbo = rt->gl.fbo]() {
+                    glDeleteFramebuffers(1, &fbo);
+                });
+            }
+            if (rt->gl.fbo_read) {
+                whenFrameComplete([fbo_read = rt->gl.fbo_read]() {
+                    glDeleteFramebuffers(1, &fbo_read);
+                });
+            }
+        } else
+#endif
+        {
+            if (rt->gl.fbo) {
+                glDeleteFramebuffers(1, &rt->gl.fbo);
+            }
+            if (rt->gl.fbo_read) {
+                glDeleteFramebuffers(1, &rt->gl.fbo_read);
+            }
        }
        destruct(rth, rt);
    }
@@ -1567,20 +1567,11 @@ void OpenGLDriver::destroyTimerQuery(Handle<HwTimerQuery> tqh) {

    if (tqh) {
        GLTimerQuery* tq = handle_cast<GLTimerQuery*>(tqh);
-        getContext().procs.deleteQueries(1u, &tq->gl.query);
+        mTimerQueryImpl->destroyTimerQuery(tq);
        destruct(tqh, tq);
    }
 }

-void OpenGLDriver::destroySync(Handle<HwSync> sh) {
-    DEBUG_MARKER()
-    if (sh) {
-        GLSync* s = handle_cast<GLSync*>(sh);
-        mContext.destroyFenceSync(mPlatform, s->handle);
-        destruct(sh, s);
-    }
-}
-
 // ------------------------------------------------------------------------------------------------
 // Synchronous APIs
 // These are called on the application's thread
@@ -1683,24 +1674,39 @@ int64_t OpenGLDriver::getStreamTimestamp(Handle<HwStream> sh) {

 void OpenGLDriver::destroyFence(Handle<HwFence> fh) {
    if (fh) {
-        HwFence* f = handle_cast<HwFence*>(fh);
-        mPlatform.destroyFence(f->fence);
+        GLFence* f = handle_cast<GLFence*>(fh);
+        if (mPlatform.canCreateFence() || mContext.isES2()) {
+            mPlatform.destroyFence(f->fence);
+        }
        destruct(fh, f);
    }
 }

-FenceStatus OpenGLDriver::wait(Handle<HwFence> fh, uint64_t timeout) {
+FenceStatus OpenGLDriver::getFenceStatus(Handle<HwFence> fh) {
    if (fh) {
-        HwFence* f = handle_cast<HwFence*>(fh);
-        if (f->fence == nullptr) {
-            // we can end-up here if:
-            // - the platform doesn't support h/w fences
-            // - wait() was called before the fence was asynchronously created.
-            //   This case is not handled in OpenGLDriver but is handled by FFence.
-            //   TODO: move FFence logic into the backend.
-            return FenceStatus::ERROR;
+        GLFence* f = handle_cast<GLFence*>(fh);
+        if (mPlatform.canCreateFence() || mContext.isES2()) {
+            if (f->fence == nullptr) {
+                // we can end-up here if:
+                // - the platform doesn't support h/w fences
+                if (UTILS_UNLIKELY(!mPlatform.canCreateFence())) {
+                    return FenceStatus::ERROR;
+                }
+                // - wait() was called before the fence was asynchronously created.
+                return FenceStatus::TIMEOUT_EXPIRED;
+            }
+            return mPlatform.waitFence(f->fence, 0);
        }
-        return mPlatform.waitFence(f->fence, timeout);
+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
+        else {
+            assert_invariant(f->state);
+            std::unique_lock lock(f->state->lock);
+            f->state->cond.wait_for(lock, std::chrono::nanoseconds(0), [&state = f->state]() {
+                return state->status != FenceStatus::TIMEOUT_EXPIRED;
+            });
+            return f->state->status;
+        }
+#endif
    }
    return FenceStatus::ERROR;
 }
@@ -1848,7 +1854,7 @@ bool OpenGLDriver::isFrameBufferFetchMultiSampleSupported() {
 }

 bool OpenGLDriver::isFrameTimeSupported() {
-    return mFrameTimeSupported;
+    return OpenGLTimerQueryFactory::isGpuTimeSupported();
 }

 bool OpenGLDriver::isAutoDepthResolveSupported() {
@@ -1866,6 +1872,18 @@ bool OpenGLDriver::isSRGBSwapChainSupported() {
    return mPlatform.isSRGBSwapChainSupported();
 }

+bool OpenGLDriver::isStereoSupported() {
+    // Stereo requires instancing and EXT_clip_cull_distance.
+    if (UTILS_UNLIKELY(mContext.isES2())) {
+        return false;
+    }
+    return mContext.ext.EXT_clip_cull_distance;
+}
+
+bool OpenGLDriver::isParallelShaderCompileSupported() {
+    return mShaderCompilerService.isParallelShaderCompileSupported();
+}
+
 bool OpenGLDriver::isWorkaroundNeeded(Workaround workaround) {
    switch (workaround) {
        case Workaround::SPLIT_EASU:
@@ -1876,6 +1894,10 @@ bool OpenGLDriver::isWorkaroundNeeded(Workaround workaround) {
            return mContext.bugs.enable_initialize_non_used_uniform_array;
        case Workaround::DISABLE_BLIT_INTO_TEXTURE_ARRAY:
            return mContext.bugs.disable_blit_into_texture_array;
+        case Workaround::POWER_VR_SHADER_WORKAROUNDS:
+            return mContext.bugs.powervr_shader_workarounds;
+        case Workaround::DISABLE_THREAD_AFFINITY:
+            return mContext.bugs.disable_thread_affinity;
        default:
            return false;
    }
@@ -1912,6 +1934,16 @@ void OpenGLDriver::commit(Handle<HwSwapChain> sch) {

    GLSwapChain* sc = handle_cast<GLSwapChain*>(sch);
    mPlatform.commit(sc->swapChain);
+
+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
+    if (UTILS_UNLIKELY(!mFrameCompleteOps.empty())) {
+        whenGpuCommandsComplete([ops = std::move(mFrameCompleteOps)]() {
+            for (auto&& op: ops) {
+                op();
+            }
+        });
+    }
+#endif
 }

 void OpenGLDriver::makeCurrent(Handle<HwSwapChain> schDraw, Handle<HwSwapChain> schRead) {
@@ -2556,8 +2588,6 @@ void OpenGLDriver::replaceStream(GLTexture* texture, GLStream* newStream) noexce
 void OpenGLDriver::beginTimerQuery(Handle<HwTimerQuery> tqh) {
    DEBUG_MARKER()
    GLTimerQuery* tq = handle_cast<GLTimerQuery*>(tqh);
-    // reset the state of the result availability
-    tq->elapsed.store(0, std::memory_order_relaxed);
    mTimerQueryImpl->beginTimeElapsedQuery(tq);
 }

@@ -2565,50 +2595,15 @@ void OpenGLDriver::endTimerQuery(Handle<HwTimerQuery> tqh) {
    DEBUG_MARKER()
    GLTimerQuery* tq = handle_cast<GLTimerQuery*>(tqh);
    mTimerQueryImpl->endTimeElapsedQuery(tq);
-
-    runEveryNowAndThen([this, tq]() -> bool {
-        if (!mTimerQueryImpl->queryResultAvailable(tq)) {
-            // we need to try this one again later
-            return false;
-        }
-        tq->elapsed.store(mTimerQueryImpl->queryResult(tq), std::memory_order_relaxed);
-        return true;
-    });
 }

 bool OpenGLDriver::getTimerQueryValue(Handle<HwTimerQuery> tqh, uint64_t* elapsedTime) {
    GLTimerQuery* tq = handle_cast<GLTimerQuery*>(tqh);
-    uint64_t const d = tq->elapsed.load(std::memory_order_relaxed);
-    if (!d) {
-        return false;
-    }
-    if (elapsedTime) {
-        *elapsedTime = d;
-    }
-    return true;
+    return OpenGLTimerQueryInterface::getTimerQueryValue(tq, elapsedTime);
 }

-SyncStatus OpenGLDriver::getSyncStatus(Handle<HwSync> sh) {
-    GLSync* s = handle_cast<GLSync*>(sh);
-    if (!s->result) {
-        return SyncStatus::NOT_SIGNALED;
-    }
-    auto status = s->result->status.load(std::memory_order_relaxed);
-    using Status = OpenGLContext::FenceSync::Status;
-    switch (status) {
-        case Status::CONDITION_SATISFIED:
-        case Status::ALREADY_SIGNALED:
-            return SyncStatus::SIGNALED;
-        case Status::TIMEOUT_EXPIRED:
-            return SyncStatus::NOT_SIGNALED;
-        case Status::FAILURE:
-        default:
-            return SyncStatus::ERROR;
-    }
-}
-
-void OpenGLDriver::compilePrograms(CallbackHandler* handler,
-        CallbackHandler::Callback callback, void* user) {
+void OpenGLDriver::compilePrograms(CompilerPriorityQueue priority,
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {
    if (callback) {
        getShaderCompilerService().notifyWhenAllProgramsAreReady(handler, callback, user);
    }
@@ -2922,7 +2917,7 @@ void OpenGLDriver::bindSamplers(uint32_t index, Handle<HwSamplerGroup> sbh) {

 #ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
 GLuint OpenGLDriver::getSamplerSlow(SamplerParams params) const noexcept {
-    assert_invariant(mSamplerMap.find(params.u) == mSamplerMap.end());
+    assert_invariant(mSamplerMap.find(params) == mSamplerMap.end());

    GLuint s;
    glGenSamplers(1, &s);
@@ -2944,7 +2939,7 @@ GLuint OpenGLDriver::getSamplerSlow(SamplerParams params) const noexcept {
    }
 #endif
    CHECK_GL_ERROR(utils::slog.e)
-    mSamplerMap[params.u] = s;
+    mSamplerMap[params] = s;
    return s;
 }
 #endif
@@ -3180,37 +3175,11 @@ void OpenGLDriver::readBufferSubData(backend::BufferObjectHandle boh,
 #endif
 }

-void OpenGLDriver::whenGpuCommandsComplete(std::function<void()> fn) noexcept {
-    OpenGLContext::FenceSync sync = mContext.createFenceSync(mPlatform);
-    mGpuCommandCompleteOps.emplace_back(sync, std::move(fn));
-    CHECK_GL_ERROR(utils::slog.e)
-}

 void OpenGLDriver::runEveryNowAndThen(std::function<bool()> fn) noexcept {
    mEveryNowAndThenOps.push_back(std::move(fn));
 }

-void OpenGLDriver::executeGpuCommandsCompleteOps() noexcept {
-    auto& v = mGpuCommandCompleteOps;
-    auto it = v.begin();
-    while (it != v.end()) {
-        using Status = OpenGLContext::FenceSync::Status;
-        auto const status = mContext.clientWaitSync(mPlatform, it->first);
-        if (status == Status::ALREADY_SIGNALED || status == Status::CONDITION_SATISFIED) {
-            it->second();
-            mContext.destroyFenceSync(mPlatform, it->first);
-            it = v.erase(it);
-        } else if (UTILS_UNLIKELY(status == Status::FAILURE)) {
-            // This should never happen, but is very problematic if it does, as we might leak
-            // some data depending on what the callback does. However, we clean up our own state.
-            mContext.destroyFenceSync(mPlatform, it->first);
-            it = v.erase(it);
-        } else {
-            ++it;
-        }
-    }
-}
-
 void OpenGLDriver::executeEveryNowAndThenOps() noexcept {
    auto& v = mEveryNowAndThenOps;
    auto it = v.begin();
@@ -3223,6 +3192,46 @@ void OpenGLDriver::executeEveryNowAndThenOps() noexcept {
    }
 }

+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
+void OpenGLDriver::whenFrameComplete(const std::function<void()>& fn) noexcept {
+    mFrameCompleteOps.push_back(fn);
+}
+
+void OpenGLDriver::whenGpuCommandsComplete(const std::function<void()>& fn) noexcept {
+    GLsync sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
+    mGpuCommandCompleteOps.emplace_back(sync, fn);
+    CHECK_GL_ERROR(utils::slog.e)
+}
+
+void OpenGLDriver::executeGpuCommandsCompleteOps() noexcept {
+    auto& v = mGpuCommandCompleteOps;
+    auto it = v.begin();
+    while (it != v.end()) {
+        auto const& [sync, fn] = *it;
+        GLenum const syncStatus = glClientWaitSync(sync, 0, 0u);
+        switch (syncStatus) {
+            case GL_TIMEOUT_EXPIRED:
+                // not ready
+                ++it;
+                break;
+            case GL_ALREADY_SIGNALED:
+            case GL_CONDITION_SATISFIED:
+                // ready
+                it->second();
+                glDeleteSync(sync);
+                it = v.erase(it);
+                break;
+            default:
+                // This should never happen, but is very problematic if it does, as we might leak
+                // some data depending on what the callback does. However, we clean up our own state.
+                glDeleteSync(sync);
+                it = v.erase(it);
+                break;
+        }
+    }
+}
+#endif
+
 // ------------------------------------------------------------------------------------------------
 // Rendering ops
 // ------------------------------------------------------------------------------------------------
@@ -3261,7 +3270,7 @@ void OpenGLDriver::setFrameScheduledCallback(Handle<HwSwapChain> sch,
 }

 void OpenGLDriver::setFrameCompletedCallback(Handle<HwSwapChain> sch,
-        FrameCompletedCallback callback, void* user) {
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {
    DEBUG_MARKER()
 }

@@ -3296,19 +3305,19 @@ void OpenGLDriver::flush(int) {
    if (!gl.bugs.disable_glFlush) {
        glFlush();
    }
-    mTimerQueryImpl->flush();
 }

 void OpenGLDriver::finish(int) {
    DEBUG_MARKER()
    glFinish();
-    mTimerQueryImpl->flush();
+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
    executeGpuCommandsCompleteOps();
+    assert_invariant(mGpuCommandCompleteOps.empty());
+#endif
    executeEveryNowAndThenOps();
    // Note: since we executed a glFinish(), all pending tasks should be done
-    assert_invariant(mGpuCommandCompleteOps.empty());

-    // however, some tasks rely on a separated thread to publish their result (e.g.
+    // However, some tasks rely on a separated thread to publish their result (e.g.
    // endTimerQuery), so the result could very well not be ready, and the task will
    // linger a bit longer, this is only true for mEveryNowAndThenOps tasks.
    // The fallout of this is that we can't assert that mEveryNowAndThenOps is empty.
--- a/filament/backend/src/opengl/OpenGLDriver.h
+++ b/filament/backend/src/opengl/OpenGLDriver.h
@@ -151,15 +151,12 @@ public:

    struct GLTimerQuery : public HwTimerQuery {
        struct State {
-            std::atomic<uint64_t> elapsed{};
-            std::atomic_bool available{};
+            struct {
+                GLuint query;
+            } gl;
+            std::atomic<int64_t> elapsed{};
        };
-        struct {
-            GLuint query = 0;
-            std::shared_ptr<State> emulation;
-        } gl;
-        // 0 means not available, otherwise query result in ns.
-        std::atomic<uint64_t> elapsed{};
+        std::shared_ptr<State> state;
    };

    struct GLStream : public HwStream {
@@ -196,14 +193,14 @@ public:
        TargetBufferFlags targets = {};
    };

-    struct GLSync : public HwSync {
-        using HwSync::HwSync;
+    struct GLFence : public HwFence {
+        using HwFence::HwFence;
        struct State {
-            std::atomic<OpenGLContext::FenceSync::Status> status{
-                OpenGLContext::FenceSync::Status::TIMEOUT_EXPIRED };
+            std::mutex lock;
+            std::condition_variable cond;
+            FenceStatus status{ FenceStatus::TIMEOUT_EXPIRED };
        };
-        OpenGLContext::FenceSync handle{};
-        std::shared_ptr<State> result{ std::make_shared<GLSync::State>() };
+        std::shared_ptr<State> state{ std::make_shared<GLFence::State>() };
    };

    OpenGLDriver(OpenGLDriver const&) = delete;
@@ -214,6 +211,8 @@ private:
    OpenGLContext mContext;
    ShaderCompilerService mShaderCompilerService;

+    friend class OpenGLTimerQueryFactory;
+    friend class TimerQueryNative;
    OpenGLContext& getContext() noexcept { return mContext; }

    ShaderCompilerService& getShaderCompilerService() noexcept {
@@ -335,7 +334,7 @@ private:
        assert_invariant(!sp.padding1);
        assert_invariant(!sp.padding2);
        auto& samplerMap = mSamplerMap;
-        auto pos = samplerMap.find(sp.u);
+        auto pos = samplerMap.find(sp);
        if (UTILS_UNLIKELY(pos == samplerMap.end())) {
            return getSamplerSlow(sp);
        }
@@ -369,7 +368,8 @@ private:
    // sampler buffer binding points (nullptr if not used)
    std::array<GLSamplerGroup*, Program::SAMPLER_BINDING_COUNT> mSamplerBindings = {};   // 4 pointers

-    mutable tsl::robin_map<uint32_t, GLuint> mSamplerMap;
+    mutable tsl::robin_map<SamplerParams, GLuint,
+            SamplerParams::Hasher, SamplerParams::EqualTo> mSamplerMap;

    // this must be accessed from the driver thread only
    std::vector<GLTexture*> mTexturesWithStreamsAttached;
@@ -383,10 +383,15 @@ private:

    void updateTextureLodRange(GLTexture* texture, int8_t targetLevel) noexcept;

+#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
    // tasks executed on the main thread after the fence signaled
-    void whenGpuCommandsComplete(std::function<void()> fn) noexcept;
+    void whenGpuCommandsComplete(const std::function<void()>& fn) noexcept;
    void executeGpuCommandsCompleteOps() noexcept;
-    std::vector<std::pair<OpenGLContext::FenceSync, std::function<void()>>> mGpuCommandCompleteOps;
+    std::vector<std::pair<GLsync, std::function<void()>>> mGpuCommandCompleteOps;
+
+    void whenFrameComplete(const std::function<void()>& fn) noexcept;
+    std::vector<std::function<void()>> mFrameCompleteOps;
+#endif

    // tasks regularly executed on the main thread at until they return true
    void runEveryNowAndThen(std::function<bool()> fn) noexcept;
@@ -395,7 +400,6 @@ private:

    // timer query implementation
    OpenGLTimerQueryInterface* mTimerQueryImpl = nullptr;
-    bool mFrameTimeSupported = false;

    // for ES2 sRGB support
    GLSwapChain* mCurrentDrawSwapChain = nullptr;
--- a/filament/backend/src/opengl/OpenGLPlatform.cpp
+++ b/filament/backend/src/opengl/OpenGLPlatform.cpp
@@ -116,4 +116,7 @@ bool OpenGLPlatform::isExtraContextSupported() const noexcept {
 void OpenGLPlatform::createContext(bool) {
 }

+void OpenGLPlatform::releaseContext() noexcept {
+}
+
 } // namespace filament::backend
--- a/filament/backend/src/opengl/OpenGLTimerQuery.cpp
+++ b/filament/backend/src/opengl/OpenGLTimerQuery.cpp
@@ -19,6 +19,7 @@
 #include <backend/platforms/OpenGLPlatform.h>

 #include <utils/compiler.h>
+#include <utils/JobSystem.h>
 #include <utils/Log.h>
 #include <utils/Systrace.h>
 #include <utils/debug.h>
@@ -30,43 +31,111 @@ using namespace GLUtils;

 // ------------------------------------------------------------------------------------------------

+bool OpenGLTimerQueryFactory::mGpuTimeSupported = false;
+
+OpenGLTimerQueryInterface* OpenGLTimerQueryFactory::init(
+        OpenGLPlatform& platform, OpenGLDriver& driver) noexcept {
+    (void)driver;
+
+    OpenGLTimerQueryInterface* impl;
+
+#if defined(BACKEND_OPENGL_VERSION_GL) || defined(GL_EXT_disjoint_timer_query)
+    auto& context = driver.getContext();
+    if (context.ext.EXT_disjoint_timer_query) {
+        // timer queries are available
+        if (context.bugs.dont_use_timer_query && platform.canCreateFence()) {
+            // however, they don't work well, revert to using fences if we can.
+            impl = new(std::nothrow) OpenGLTimerQueryFence(platform);
+        } else {
+            impl = new(std::nothrow) TimerQueryNative(driver);
+        }
+        mGpuTimeSupported = true;
+    } else
+#endif
+    if (platform.canCreateFence()) {
+        // no timer queries, but we can use fences
+        impl = new(std::nothrow) OpenGLTimerQueryFence(platform);
+        mGpuTimeSupported = true;
+    } else {
+        // no queries, no fences -- that's a problem
+        impl = new(std::nothrow) TimerQueryFallback();
+        mGpuTimeSupported = false;
+    }
+    return impl;
+}
+
+// ------------------------------------------------------------------------------------------------
+
 OpenGLTimerQueryInterface::~OpenGLTimerQueryInterface() = default;

+// This is a backend synchronous call
+bool OpenGLTimerQueryInterface::getTimerQueryValue(GLTimerQuery* tq, uint64_t* elapsedTime) noexcept {
+    if (UTILS_LIKELY(tq->state)) {
+        int64_t const elapsed = tq->state->elapsed.load(std::memory_order_relaxed);
+        bool const available = elapsed > 0;
+        if (available) {
+            *elapsedTime = elapsed;
+        }
+        return available;
+    }
+    return false;
+}
+
 // ------------------------------------------------------------------------------------------------

 #if defined(BACKEND_OPENGL_VERSION_GL) || defined(GL_EXT_disjoint_timer_query)

-TimerQueryNative::TimerQueryNative(OpenGLContext& context) : mContext(context) {
+TimerQueryNative::TimerQueryNative(OpenGLDriver& driver)
+        : mDriver(driver) {
 }

 TimerQueryNative::~TimerQueryNative() = default;

-void TimerQueryNative::flush() {
-}
-
-void TimerQueryNative::beginTimeElapsedQuery(GLTimerQuery* query) {
-    mContext.procs.beginQuery(GL_TIME_ELAPSED, query->gl.query);
+void TimerQueryNative::createTimerQuery(GLTimerQuery* tq) {
+    if (UTILS_UNLIKELY(!tq->state)) {
+        tq->state = std::make_shared<GLTimerQuery::State>();
+    }
+    mDriver.getContext().procs.genQueries(1u, &tq->state->gl.query);
    CHECK_GL_ERROR(utils::slog.e)
 }

-void TimerQueryNative::endTimeElapsedQuery(GLTimerQuery*) {
-    mContext.procs.endQuery(GL_TIME_ELAPSED);
+void TimerQueryNative::destroyTimerQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    mDriver.getContext().procs.deleteQueries(1u, &tq->state->gl.query);
    CHECK_GL_ERROR(utils::slog.e)
 }

-bool TimerQueryNative::queryResultAvailable(GLTimerQuery* query) {
-    GLuint available = 0;
-    mContext.procs.getQueryObjectuiv(query->gl.query, GL_QUERY_RESULT_AVAILABLE, &available);
+void TimerQueryNative::beginTimeElapsedQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    tq->state->elapsed.store(0);
+    mDriver.getContext().procs.beginQuery(GL_TIME_ELAPSED, tq->state->gl.query);
    CHECK_GL_ERROR(utils::slog.e)
-    return available != 0;
 }

-uint64_t TimerQueryNative::queryResult(GLTimerQuery* query) {
-    GLuint64 elapsedTime = 0;
-    // we won't end-up here if we're on ES and don't have GL_EXT_disjoint_timer_query
-    mContext.procs.getQueryObjectui64v(query->gl.query, GL_QUERY_RESULT, &elapsedTime);
+void TimerQueryNative::endTimeElapsedQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    mDriver.getContext().procs.endQuery(GL_TIME_ELAPSED);
    CHECK_GL_ERROR(utils::slog.e)
-    return elapsedTime;
+
+    std::weak_ptr<GLTimerQuery::State> const weak = tq->state;
+
+    mDriver.runEveryNowAndThen([context = mDriver.getContext(), weak]() -> bool {
+        auto state = weak.lock();
+        if (state) {
+            GLuint available = 0;
+            context.procs.getQueryObjectuiv(state->gl.query, GL_QUERY_RESULT_AVAILABLE, &available);
+            CHECK_GL_ERROR(utils::slog.e)
+            if (!available) {
+                // we need to try this one again later
+                return false;
+            }
+            GLuint64 elapsedTime = 0;
+            // we won't end-up here if we're on ES and don't have GL_EXT_disjoint_timer_query
+            context.procs.getQueryObjectui64v(state->gl.query, GL_QUERY_RESULT, &elapsedTime);
+            state->elapsed.store((int64_t)elapsedTime, std::memory_order_relaxed);
+        }
+        return true;
+    });
 }

 #endif
@@ -77,6 +146,8 @@ OpenGLTimerQueryFence::OpenGLTimerQueryFence(OpenGLPlatform& platform)
        : mPlatform(platform) {
    mQueue.reserve(2);
    mThread = std::thread([this]() {
+        utils::JobSystem::setThreadName("OpenGLTimerQueryFence");
+        utils::JobSystem::setThreadPriority(utils::JobSystem::Priority::URGENT_DISPLAY);
        auto& queue = mQueue;
        bool exitRequested;
        do {
@@ -101,7 +172,9 @@ OpenGLTimerQueryFence::~OpenGLTimerQueryFence() {
        mExitRequested = true;
        mCondition.notify_one();
        lock.unlock();
-        mThread.join();
+        if (mThread.joinable()) {
+            mThread.join();
+        }
    }
 }

@@ -111,59 +184,60 @@ void OpenGLTimerQueryFence::enqueue(OpenGLTimerQueryFence::Job&& job) {
    mCondition.notify_one();
 }

-void OpenGLTimerQueryFence::flush() {
-    // Use calls to flush() as a proxy for when the GPU work started.
-    GLTimerQuery* query = mActiveQuery;
-    if (query) {
-        uint64_t const elapsed = query->gl.emulation->elapsed.load(std::memory_order_relaxed);
-        if (!elapsed) {
-            uint64_t const now = clock::now().time_since_epoch().count();
-            query->gl.emulation->elapsed.store(now, std::memory_order_relaxed);
-            //SYSTRACE_CONTEXT();
-            //SYSTRACE_ASYNC_BEGIN("gpu", query->gl.query);
-        }
+void OpenGLTimerQueryFence::createTimerQuery(GLTimerQuery* tq) {
+    if (UTILS_UNLIKELY(!tq->state)) {
+        tq->state = std::make_shared<GLTimerQuery::State>();
    }
 }

-void OpenGLTimerQueryFence::beginTimeElapsedQuery(GLTimerQuery* query) {
-    assert_invariant(!mActiveQuery);
-    // We can't use a fence to figure out when a GPU operation starts (only when it finishes)
-    // so instead, we use when glFlush() was issued as a proxy.
-    if (UTILS_UNLIKELY(!query->gl.emulation)) {
-        query->gl.emulation = std::make_shared<GLTimerQuery::State>();
-    }
-    query->gl.emulation->elapsed.store(0, std::memory_order_relaxed);
-    query->gl.emulation->available.store(false);
-    mActiveQuery = query;
+void OpenGLTimerQueryFence::destroyTimerQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
 }

-void OpenGLTimerQueryFence::endTimeElapsedQuery(GLTimerQuery* query) {
-    assert_invariant(mActiveQuery);
+void OpenGLTimerQueryFence::beginTimeElapsedQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    tq->state->elapsed.store(0);
+
    Platform::Fence* fence = mPlatform.createFence();
-    std::weak_ptr<GLTimerQuery::State> const weak = query->gl.emulation;
-    mActiveQuery = nullptr;
-    //uint32_t cookie = cookie = query->gl.query;
+    std::weak_ptr<GLTimerQuery::State> const weak = tq->state;
+
+    // FIXME: this implementation of beginTimeElapsedQuery is usually wrong; it ends up
+    //    measuring the current CPU time because the fence signals immediately (usually there is
+    //    no work on the GPU at this point). We could workaround this by sending a small glClear
+    //    on a dummy target for instance, or somehow latch the begin time at the next renderpass
+    //    start.
+
    push([&platform = mPlatform, fence, weak]() {
-        auto emulation = weak.lock();
-        if (emulation) {
+        auto state = weak.lock();
+        if (state) {
            platform.waitFence(fence, FENCE_WAIT_FOR_EVER);
-            auto now = clock::now().time_since_epoch().count();
-            auto then = emulation->elapsed.load(std::memory_order_relaxed);
-            emulation->elapsed.store(now - then, std::memory_order_relaxed);
-            emulation->available.store(true);
-            //SYSTRACE_CONTEXT();
-            //SYSTRACE_ASYNC_END("gpu", cookie);
+            int64_t const then = clock::now().time_since_epoch().count();
+            state->elapsed.store(-then, std::memory_order_relaxed);
+            SYSTRACE_CONTEXT();
+            SYSTRACE_ASYNC_BEGIN("OpenGLTimerQueryFence", intptr_t(state.get()));
        }
        platform.destroyFence(fence);
    });
 }

-bool OpenGLTimerQueryFence::queryResultAvailable(GLTimerQuery* query) {
-    return query->gl.emulation->available.load();
-}
+void OpenGLTimerQueryFence::endTimeElapsedQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    Platform::Fence* fence = mPlatform.createFence();
+    std::weak_ptr<GLTimerQuery::State> const weak = tq->state;

-uint64_t OpenGLTimerQueryFence::queryResult(GLTimerQuery* query) {
-    return query->gl.emulation->elapsed;
+    push([&platform = mPlatform, fence, weak]() {
+        auto state = weak.lock();
+        if (state) {
+            platform.waitFence(fence, FENCE_WAIT_FOR_EVER);
+            int64_t const now = clock::now().time_since_epoch().count();
+            int64_t const then = state->elapsed.load(std::memory_order_relaxed);
+            assert_invariant(then < 0);
+            state->elapsed.store(now + then, std::memory_order_relaxed);
+            SYSTRACE_CONTEXT();
+            SYSTRACE_ASYNC_END("OpenGLTimerQueryFence", intptr_t(state.get()));
+        }
+        platform.destroyFence(fence);
+    });
 }

 // ------------------------------------------------------------------------------------------------
@@ -172,30 +246,30 @@ TimerQueryFallback::TimerQueryFallback() = default;

 TimerQueryFallback::~TimerQueryFallback() = default;

-void TimerQueryFallback::flush() {
-}
-
-void TimerQueryFallback::beginTimeElapsedQuery(OpenGLTimerQueryInterface::GLTimerQuery* query) {
-    if (!query->gl.emulation) {
-        query->gl.emulation = std::make_shared<GLTimerQuery::State>();
+void TimerQueryFallback::createTimerQuery(GLTimerQuery* tq) {
+    if (UTILS_UNLIKELY(!tq->state)) {
+        tq->state = std::make_shared<GLTimerQuery::State>();
    }
-    // this implementation clearly doesn't work at all, but we have no h/w support
-    query->gl.emulation->available.store(false, std::memory_order_relaxed);
-    query->gl.emulation->elapsed = clock::now().time_since_epoch().count();
 }

-void TimerQueryFallback::endTimeElapsedQuery(OpenGLTimerQueryInterface::GLTimerQuery* query) {
-    // this implementation clearly doesn't work at all, but we have no h/w support
-    query->gl.emulation->elapsed = clock::now().time_since_epoch().count() - query->gl.emulation->elapsed;
-    query->gl.emulation->available.store(true, std::memory_order_relaxed);
+void TimerQueryFallback::destroyTimerQuery(GLTimerQuery* tq) {
+    assert_invariant(tq->state);
 }

-bool TimerQueryFallback::queryResultAvailable(OpenGLTimerQueryInterface::GLTimerQuery* query) {
-    return query->gl.emulation->available.load(std::memory_order_relaxed);
+void TimerQueryFallback::beginTimeElapsedQuery(OpenGLTimerQueryInterface::GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    // this implementation measures the CPU time, but we have no h/w support
+    int64_t const then = clock::now().time_since_epoch().count();
+    tq->state->elapsed.store(-then, std::memory_order_relaxed);
 }

-uint64_t TimerQueryFallback::queryResult(OpenGLTimerQueryInterface::GLTimerQuery* query) {
-    return query->gl.emulation->elapsed;
+void TimerQueryFallback::endTimeElapsedQuery(OpenGLTimerQueryInterface::GLTimerQuery* tq) {
+    assert_invariant(tq->state);
+    // this implementation measures the CPU time, but we have no h/w support
+    int64_t const now = clock::now().time_since_epoch().count();
+    int64_t const then = tq->state->elapsed.load(std::memory_order_relaxed);
+    assert_invariant(then < 0);
+    tq->state->elapsed.store(now + then, std::memory_order_relaxed);
 }

 } // namespace filament::backend
--- a/filament/backend/src/opengl/OpenGLTimerQuery.h
+++ b/filament/backend/src/opengl/OpenGLTimerQuery.h
@@ -20,20 +20,36 @@
 #include "OpenGLDriver.h"

 #include <utils/Condition.h>
+#include <utils/Mutex.h>

 #include <thread>
 #include <vector>

 namespace filament::backend {

+class OpenGLPlatform;
+class OpenGLTimerQueryInterface;
+
 /*
- * we need two implementation of timer queries (only elapsed time), because
+ * We need two implementation of timer queries (only elapsed time), because
 * on some gpu disjoint_timer_query/arb_timer_query is much less accurate than
 * using fences.
 *
 * These classes implement the various strategies...
 */

+
+class OpenGLTimerQueryFactory {
+    static bool mGpuTimeSupported;
+public:
+    static OpenGLTimerQueryInterface* init(
+            OpenGLPlatform& platform, OpenGLDriver& driver) noexcept;
+
+    static bool isGpuTimeSupported() noexcept {
+        return mGpuTimeSupported;
+    }
+};
+
 class OpenGLTimerQueryInterface {
 protected:
    using GLTimerQuery = OpenGLDriver::GLTimerQuery;
@@ -41,26 +57,26 @@ protected:

 public:
    virtual ~OpenGLTimerQueryInterface();
-    virtual void flush() = 0;
+    virtual void createTimerQuery(GLTimerQuery* query) = 0;
+    virtual void destroyTimerQuery(GLTimerQuery* query) = 0;
    virtual void beginTimeElapsedQuery(GLTimerQuery* query) = 0;
    virtual void endTimeElapsedQuery(GLTimerQuery* query) = 0;
-    virtual bool queryResultAvailable(GLTimerQuery* query) = 0;
-    virtual uint64_t queryResult(GLTimerQuery* query) = 0;
+
+    static bool getTimerQueryValue(GLTimerQuery* tq, uint64_t* elapsedTime) noexcept;
 };

 #if defined(BACKEND_OPENGL_VERSION_GL) || defined(GL_EXT_disjoint_timer_query)

 class TimerQueryNative : public OpenGLTimerQueryInterface {
 public:
-    explicit TimerQueryNative(OpenGLContext& context);
+    explicit TimerQueryNative(OpenGLDriver& driver);
    ~TimerQueryNative() override;
 private:
-    void flush() override;
+    void createTimerQuery(GLTimerQuery* query) override;
+    void destroyTimerQuery(GLTimerQuery* query) override;
    void beginTimeElapsedQuery(GLTimerQuery* query) override;
    void endTimeElapsedQuery(GLTimerQuery* query) override;
-    bool queryResultAvailable(GLTimerQuery* query) override;
-    uint64_t queryResult(GLTimerQuery* query) override;
-    OpenGLContext& mContext;
+    OpenGLDriver& mDriver;
 };

 #endif
@@ -71,13 +87,12 @@ public:
    ~OpenGLTimerQueryFence() override;
 private:
    using Job = std::function<void()>;
-    void flush() override;
-    void beginTimeElapsedQuery(GLTimerQuery* query) override;
-    void endTimeElapsedQuery(GLTimerQuery* query) override;
-    bool queryResultAvailable(GLTimerQuery* query) override;
-    uint64_t queryResult(GLTimerQuery* query) override;
-    void enqueue(Job&& job);
+    void createTimerQuery(GLTimerQuery* query) override;
+    void destroyTimerQuery(GLTimerQuery* query) override;
+    void beginTimeElapsedQuery(GLTimerQuery* tq) override;
+    void endTimeElapsedQuery(GLTimerQuery* tq) override;

+    void enqueue(Job&& job);
    template<typename CALLABLE, typename ... ARGS>
    void push(CALLABLE&& func, ARGS&& ... args) {
        enqueue(Job(std::bind(std::forward<CALLABLE>(func), std::forward<ARGS>(args)...)));
@@ -89,7 +104,6 @@ private:
    mutable utils::Condition mCondition;
    std::vector<Job> mQueue;
    bool mExitRequested = false;
-    GLTimerQuery* mActiveQuery = nullptr;
 };

 class TimerQueryFallback : public OpenGLTimerQueryInterface {
@@ -97,11 +111,10 @@ public:
    explicit TimerQueryFallback();
    ~TimerQueryFallback() override;
 private:
-    void flush() override;
+    void createTimerQuery(GLTimerQuery* query) override;
+    void destroyTimerQuery(GLTimerQuery* query) override;
    void beginTimeElapsedQuery(GLTimerQuery* query) override;
    void endTimeElapsedQuery(GLTimerQuery* query) override;
-    bool queryResultAvailable(GLTimerQuery* query) override;
-    uint64_t queryResult(GLTimerQuery* query) override;
 };

 } // namespace filament::backend
--- a/filament/backend/src/opengl/ShaderCompilerService.cpp
+++ b/filament/backend/src/opengl/ShaderCompilerService.cpp
@@ -32,7 +32,6 @@
 #include <utils/Systrace.h>

 #include <chrono>
-#include <future>
 #include <string>
 #include <string_view>
 #include <variant>
@@ -64,17 +63,18 @@ static inline std::string to_string(float f) noexcept {

 // ------------------------------------------------------------------------------------------------

-struct ShaderCompilerService::ProgramToken {
-    struct ProgramBinary {
-        GLenum format{};
+struct ShaderCompilerService::OpenGLProgramToken : ProgramToken {
+    struct ProgramData {
        GLuint program{};
        std::array<GLuint, Program::SHADER_TYPE_COUNT> shaders{};
-        std::vector<char> blob;
    };

-    ProgramToken(ShaderCompilerService& compiler, utils::CString const& name) noexcept
+    ~OpenGLProgramToken() override;
+
+    OpenGLProgramToken(ShaderCompilerService& compiler, utils::CString const& name) noexcept
            : compiler(compiler), name(name) {
    }
+
    ShaderCompilerService& compiler;
    utils::CString const& name;
    utils::FixedCapacityVector<std::pair<utils::CString, uint8_t>> attributes;
@@ -85,12 +85,44 @@ struct ShaderCompilerService::ProgramToken {
        GLuint program = 0;
    } gl; // 12 bytes

+
+    // Sets the programData, typically from the compiler thread, and signal the main thread.
+    // This is similar to std::promise::set_value.
+    void set(ProgramData const& data) noexcept {
+        std::unique_lock const l(lock);
+        programData = data;
+        signaled = true;
+        cond.notify_one();
+    }
+
+    // Get the programBinary, wait if necessary.
+    // This is similar to std::future::get
+    ProgramData const& get() const noexcept {
+        std::unique_lock l(lock);
+        cond.wait(l, [this](){ return signaled; });
+        return programData;
+    }
+
+    // Checks if the programBinary is ready.
+    // This is similar to std::future::wait_for(0s)
+    bool isReady() const noexcept {
+        std::unique_lock l(lock);
+        using namespace std::chrono_literals;
+        return cond.wait_for(l, 0s, [this](){ return signaled; });
+    }
+
+    CallbackManager::Handle handle{};
    BlobCacheKey key;
-    std::future<ProgramBinary> binary;
-    CompilerPriorityQueue priorityQueue = CompilerPriorityQueue::HIGH;
-    bool canceled = false;
+    mutable utils::Mutex lock;
+    mutable utils::Condition cond;
+    ProgramData programData;
+    bool signaled = false;
+
+    bool canceled = false; // not part of the signaling
 };

+ShaderCompilerService::OpenGLProgramToken::~OpenGLProgramToken() = default;
+
 void ShaderCompilerService::setUserData(const program_token_t& token, void* user) noexcept {
    token->user = user;
 }
@@ -101,261 +133,181 @@ void* ShaderCompilerService::getUserData(const program_token_t& token) noexcept

 // ------------------------------------------------------------------------------------------------

-void ShaderCompilerService::CompilerThreadPool::init(
-        bool useSharedContexts, uint32_t threadCount, OpenGLPlatform& platform) noexcept {
-
-    for (size_t i = 0; i < threadCount; i++) {
-        mCompilerThreads.emplace_back([this, useSharedContexts, &platform]() {
-            // give the thread a name
-            JobSystem::setThreadName("CompilerThreadPool");
-
-            // create a gl context current to this thread
-            platform.createContext(useSharedContexts);
-
-            // process jobs from the queue until we're asked to exit
-            while (!mExitRequested) {
-                std::unique_lock lock(mQueueLock);
-                mQueueCondition.wait(lock, [this]() {
-                        return  mExitRequested ||
-                                mUrgentJob ||
-                                (!std::all_of( std::begin(mQueues), std::end(mQueues),
-                                        [](auto&& q) { return q.empty(); }));
-                });
-                if (!mExitRequested) {
-                    Job job{ std::move(mUrgentJob) };
-                    if (!job) {
-                        // use the first queue that's not empty
-                        auto& queue = [this]() -> auto& {
-                            for (auto& q: mQueues) {
-                                if (!q.empty()) {
-                                    return q;
-                                }
-                            }
-                            return mQueues[0]; // we should never end-up here.
-                        }();
-                        assert_invariant(!queue.empty());
-                        std::swap(job, queue.front().second);
-                        queue.pop_front();
-                    }
-
-                    // execute the job without holding any locks
-                    lock.unlock();
-                    job();
-                }
-            }
-        });
-
-    }
-}
-
-auto ShaderCompilerService::CompilerThreadPool::dequeue(program_token_t const& token) -> Job {
-    auto& q = mQueues[size_t(token->priorityQueue)];
-    auto pos = std::find_if(q.begin(), q.end(), [&token](auto&& item) {
-        return item.first == token;
-    });
-    Job job;
-    if (pos != q.end()) {
-        std::swap(job, pos->second);
-        q.erase(pos);
-    }
-    return job;
-}
-
-void ShaderCompilerService::CompilerThreadPool::makeUrgent(program_token_t const& token) {
-    std::unique_lock const lock(mQueueLock);
-    assert_invariant(!mUrgentJob);
-    Job job{ dequeue(token) };
-    std::swap(job, mUrgentJob);
-    mQueueCondition.notify_one();
-}
-
-void ShaderCompilerService::CompilerThreadPool::queue(program_token_t const& token, Job&& job) {
-    std::unique_lock const lock(mQueueLock);
-    mQueues[size_t(token->priorityQueue)].emplace_back(token, std::move(job));
-    mQueueCondition.notify_one();
-}
-
-void ShaderCompilerService::CompilerThreadPool::exit() noexcept {
-    std::unique_lock lock(mQueueLock);
-    mExitRequested = true;
-    mQueueCondition.notify_all();
-    lock.unlock();
-    for (auto& thread: mCompilerThreads) {
-        if (thread.joinable()) {
-            thread.join();
-        }
-    }
-}
-
-// ------------------------------------------------------------------------------------------------
-
 ShaderCompilerService::ShaderCompilerService(OpenGLDriver& driver)
        : mDriver(driver),
+          mCallbackManager(driver),
          KHR_parallel_shader_compile(driver.getContext().ext.KHR_parallel_shader_compile) {
 }

 ShaderCompilerService::~ShaderCompilerService() noexcept = default;

+bool ShaderCompilerService::isParallelShaderCompileSupported() const noexcept {
+    return KHR_parallel_shader_compile || mShaderCompilerThreadCount;
+}
+
 void ShaderCompilerService::init() noexcept {
    // If we have KHR_parallel_shader_compile, we always use it, it should be more resource
    // friendly.
    if (!KHR_parallel_shader_compile) {
        // - on Adreno there is a single compiler object. We can't use a pool > 1
        //   also glProgramBinary blocks if other threads are compiling.
-        // - on Mali shader compilation can be multithreaded, but program linking happens on
+        // - on Mali shader compilation can be multi-threaded, but program linking happens on
        //   a single service thread, so we don't bother using more than one thread either.
-        // - on desktop we could use more threads, tbd.
+        // - on PowerVR shader compilation and linking can be multi-threaded.
+        //   How many threads should we use?
+        // - on macOS (M1 MacBook Pro/Ventura) there is global lock around all GL APIs when using
+        //   a shared context, so parallel shader compilation yields no benefit.
+        // - on windows/linux we could use more threads, tbd.
        if (mDriver.mPlatform.isExtraContextSupported()) {
-            mShaderCompilerThreadCount = 1;
-            mCompilerThreadPool.init(mUseSharedContext,
-                    mShaderCompilerThreadCount, mDriver.mPlatform);
+            // By default, we use one thread at the same priority as the gl thread. This is the
+            // safest choice that avoids priority inversions.
+            uint32_t poolSize = 1;
+            JobSystem::Priority priority = JobSystem::Priority::DISPLAY;
+
+            auto const& renderer = mDriver.getContext().state.renderer;
+            if (UTILS_UNLIKELY(strstr(renderer, "PowerVR"))) {
+                // The PowerVR driver support parallel shader compilation well, so we use 2
+                // threads, we can use lower priority threads here because urgent compilations
+                // will most likely happen on the main gl thread. Using too many thread can
+                // increase memory pressure significantly.
+                poolSize = 2;
+                priority = JobSystem::Priority::BACKGROUND;
+            }
+
+            mShaderCompilerThreadCount = poolSize;
+            mCompilerThreadPool.init(mShaderCompilerThreadCount,
+                    [&platform = mDriver.mPlatform, priority]() {
+                        // give the thread a name
+                        JobSystem::setThreadName("CompilerThreadPool");
+                        // run at a slightly lower priority than other filament threads
+                        JobSystem::setThreadPriority(priority);
+                        // create a gl context current to this thread
+                        platform.createContext(true);
+                    },
+                    [&platform = mDriver.mPlatform]() {
+                        // release context and thread state
+                        platform.releaseContext();
+                    });
        }
    }
 }

 void ShaderCompilerService::terminate() noexcept {
-    // FIXME: could we have some user callbacks pending here?
-    mCompilerThreadPool.exit();
+    // Finally stop the thread pool immediately. Pending jobs will be discarded. We guarantee by
+    // construction that nobody is waiting on a token (because waiting is only done on the main
+    // backend thread, and if we're here, we're on the backend main thread).
+    mCompilerThreadPool.terminate();
+
+    mRunAtNextTickOps.clear();
+
+    // We could have some pending callbacks here, we need to execute them.
+    // This is equivalent to calling cancelTickOp() on all active tokens.
+    mCallbackManager.terminate();
 }

 ShaderCompilerService::program_token_t ShaderCompilerService::createProgram(
        utils::CString const& name, Program&& program) {
    auto& gl = mDriver.getContext();

-    auto token = std::make_shared<ProgramToken>(*this, name);
-
+    auto token = std::make_shared<OpenGLProgramToken>(*this, name);
    if (UTILS_UNLIKELY(gl.isES2())) {
        token->attributes = std::move(program.getAttributes());
    }

    token->gl.program = OpenGLBlobCache::retrieve(&token->key, mDriver.mPlatform, program);
-    if (!token->gl.program) {
-        if (mShaderCompilerThreadCount) {
-            // set the future in the token and pass the promise to the worker thread
-            std::promise<ProgramToken::ProgramBinary> promise;
-            token->binary = promise.get_future();
-            token->priorityQueue = program.getPriorityQueue();
-            // queue a compile job
-            mCompilerThreadPool.queue(token,
-                    [this, &gl, promise = std::move(promise),
-                            program = std::move(program), token]() mutable {
+    if (token->gl.program) {
+        return token;
+    }

-                        // compile the shaders
-                        std::array<GLuint, Program::SHADER_TYPE_COUNT> shaders{};
-                        std::array<utils::CString, Program::SHADER_TYPE_COUNT> shaderSourceCode;
-                        compileShaders(gl,
-                                std::move(program.getShadersSource()),
-                                program.getSpecializationConstants(),
-                                shaders,
-                                shaderSourceCode);
+    token->handle = mCallbackManager.get();

-                        // link the program
-                        GLuint const glProgram = linkProgram(gl, shaders, token->attributes);
+    CompilerPriorityQueue const priorityQueue = program.getPriorityQueue();
+    if (mShaderCompilerThreadCount) {
+        // queue a compile job
+        mCompilerThreadPool.queue(priorityQueue, token,
+                [this, &gl, program = std::move(program), token]() mutable {
+                    // compile the shaders
+                    std::array<GLuint, Program::SHADER_TYPE_COUNT> shaders{};
+                    std::array<utils::CString, Program::SHADER_TYPE_COUNT> shaderSourceCode;
+                    compileShaders(gl,
+                            std::move(program.getShadersSource()),
+                            program.getSpecializationConstants(),
+                            shaders,
+                            shaderSourceCode);

-                        ProgramToken::ProgramBinary binary;
-                        binary.shaders = shaders;
+                    // link the program
+                    GLuint const glProgram = linkProgram(gl, shaders, token->attributes);

-                        if (UTILS_LIKELY(mUseSharedContext)) {
-                            // We need to query the link status here to guarantee that the
-                            // program is compiled and linked now (we don't want this to be
-                            // deferred to later). We don't care about the result at this point.
-                            GLint status;
-                            glGetProgramiv(glProgram, GL_LINK_STATUS, &status);
-                            binary.program = glProgram;
-                            if (token->key) {
-                                // Attempt to cache. This calls glGetProgramBinary.
-                                OpenGLBlobCache::insert(mDriver.mPlatform,
-                                        token->key, token->gl.program);
-                            }
-                        }
-#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-                        else {
-                            // retrieve the program binary
-                            GLsizei programBinarySize = 0;
-                            glGetProgramiv(glProgram, GL_PROGRAM_BINARY_LENGTH, &programBinarySize);
-                            assert_invariant(programBinarySize);
-                            if (programBinarySize) {
-                                binary.blob.resize(programBinarySize);
-                                glGetProgramBinary(glProgram, programBinarySize,
-                                        &programBinarySize, &binary.format, binary.blob.data());
-                            }
-                            // and we can destroy the program
-                            glDeleteProgram(glProgram);
-                            if (token->key) {
-                                // attempt to cache
-                                OpenGLBlobCache::insert(mDriver.mPlatform, token->key,
-                                        binary.format,
-                                        binary.blob.data(), GLsizei(binary.blob.size()));
-                            }
-                        }
-#endif
-                        // we don't need to check for success here, it'll be done on the
-                        // main thread side.
-                        promise.set_value(binary);
-                    });
-        } else
-        {
-            // this cannot fail because we check compilation status after linking the program
-            // shaders[] is filled with id of shader stages present.
-            compileShaders(gl,
-                    std::move(program.getShadersSource()),
-                    program.getSpecializationConstants(),
-                    token->gl.shaders,
-                    token->shaderSourceCode);
+                    OpenGLProgramToken::ProgramData programData;
+                    programData.shaders = shaders;

-        }
-
-        runAtNextTick(token, [this, token]() {
-            if (mShaderCompilerThreadCount) {
-                if (!token->gl.program) {
-                    // TODO: see if we could completely eliminate this callback here
-                    //       and instead just rely on token->gl.program being atomically
-                    //       set by the compiler thread.
-                    assert_invariant(token->binary.valid());
-                    // we're using the compiler thread, check if the program is ready, no-op if not.
-                    using namespace std::chrono_literals;
-                    if (token->binary.wait_for(0s) != std::future_status::ready) {
-                        return false;
-                    }
-                    // program binary is ready, retrieve it without blocking
-                    ShaderCompilerService::getProgramFromCompilerPool(
-                            const_cast<program_token_t&>(token));
-                }
-            } else {
-                if (KHR_parallel_shader_compile) {
-                    // don't attempt to link this program if all shaders are not done compiling
+                    // We need to query the link status here to guarantee that the
+                    // program is compiled and linked now (we don't want this to be
+                    // deferred to later). We don't care about the result at this point.
                    GLint status;
-                    if (token->gl.program) {
-                        glGetProgramiv(token->gl.program, GL_COMPLETION_STATUS, &status);
-                        if (status == GL_FALSE) {
-                            return false;
-                        }
-                    } else {
-                        for (auto shader: token->gl.shaders) {
-                            if (shader) {
-                                glGetShaderiv(shader, GL_COMPLETION_STATUS, &status);
-                                if (status == GL_FALSE) {
-                                    return false;
-                                }
+                    glGetProgramiv(glProgram, GL_LINK_STATUS, &status);
+                    programData.program = glProgram;
+
+                    token->gl.program = programData.program;
+
+                    // we don't need to check for success here, it'll be done on the
+                    // main thread side.
+                    token->set(programData);
+
+                    mCallbackManager.put(token->handle);
+
+                    // caching must be the last thing we do
+                    if (token->key) {
+                        // Attempt to cache. This calls glGetProgramBinary.
+                        OpenGLBlobCache::insert(mDriver.mPlatform, token->key, glProgram);
+                    }
+                });
+
+    } else {
+        // this cannot fail because we check compilation status after linking the program
+        // shaders[] is filled with id of shader stages present.
+        compileShaders(gl,
+                std::move(program.getShadersSource()),
+                program.getSpecializationConstants(),
+                token->gl.shaders,
+                token->shaderSourceCode);
+
+        runAtNextTick(priorityQueue, token, [this, token](Job const&) {
+            if (KHR_parallel_shader_compile) {
+                // don't attempt to link this program if all shaders are not done compiling
+                GLint status;
+                if (token->gl.program) {
+                    glGetProgramiv(token->gl.program, GL_COMPLETION_STATUS, &status);
+                    if (status == GL_FALSE) {
+                        return false;
+                    }
+                } else {
+                    for (auto shader: token->gl.shaders) {
+                        if (shader) {
+                            glGetShaderiv(shader, GL_COMPLETION_STATUS, &status);
+                            if (status == GL_FALSE) {
+                                return false;
                            }
                        }
                    }
                }
+            }

-                if (!token->gl.program) {
-                    // link the program, this also cannot fail because status is checked later.
-                    token->gl.program = linkProgram(mDriver.getContext(),
-                            token->gl.shaders, token->attributes);
-                    if (KHR_parallel_shader_compile) {
-                        // wait until the link finishes...
-                        return false;
-                    }
+            if (!token->gl.program) {
+                // link the program, this also cannot fail because status is checked later.
+                token->gl.program = linkProgram(mDriver.getContext(),
+                        token->gl.shaders, token->attributes);
+                if (KHR_parallel_shader_compile) {
+                    // wait until the link finishes...
+                    return false;
                }
            }

            assert_invariant(token->gl.program);

-            if (token->key && !mShaderCompilerThreadCount) {
+            mCallbackManager.put(token->handle);
+
+            if (token->key) {
                // TODO: technically we don't have to cache right now. Is it advantageous to
                //       do this later, maybe depending on CPU usage?
                // attempt to cache if we don't have a thread pool (otherwise it's done
@@ -370,31 +322,12 @@ ShaderCompilerService::program_token_t ShaderCompilerService::createProgram(
    return token;
 }

-bool ShaderCompilerService::isProgramReady(
-        const ShaderCompilerService::program_token_t& token) const noexcept {
-
-    assert_invariant(token);
-
-    if (!token->gl.program) {
-        return false;
-    }
-
-    if (KHR_parallel_shader_compile) {
-        GLint status = GL_FALSE;
-        glGetProgramiv(token->gl.program, GL_COMPLETION_STATUS, &status);
-        return (bool)status;
-    }
-
-    // If gl.program is set, this means the program was linked. Some drivers may defer the link
-    // in which case we might block in getProgram() when we check the program status.
-    // Unfortunately, this is nothing we can do about that.
-    return bool(token->gl.program);
-}
-
 GLuint ShaderCompilerService::getProgram(ShaderCompilerService::program_token_t& token) {
    GLuint const program = initialize(token);
    assert_invariant(token == nullptr);
+#ifndef FILAMENT_ENABLE_MATDBG
    assert_invariant(program);
+#endif
    return program;
 }

@@ -418,74 +351,30 @@ GLuint ShaderCompilerService::getProgram(ShaderCompilerService::program_token_t&
        glDeleteProgram(token->gl.program);
    }

-    token = nullptr;
+    token.reset();
 }

 void ShaderCompilerService::tick() {
-    executeTickOps();
+    // we don't need to run executeTickOps() if we're using the thread-pool
+    if (UTILS_UNLIKELY(!mShaderCompilerThreadCount)) {
+        executeTickOps();
+    }
 }

-void ShaderCompilerService::notifyWhenAllProgramsAreReady(CallbackHandler* handler,
-        CallbackHandler::Callback callback, void* user) {
-
-    if (KHR_parallel_shader_compile || mShaderCompilerThreadCount) {
-        // list all programs up to this point
-        utils::FixedCapacityVector<program_token_t, std::allocator<program_token_t>, false> tokens;
-        tokens.reserve(mRunAtNextTickOps.size());
-        for (auto& [token, _] : mRunAtNextTickOps) {
-            if (token) {
-                tokens.push_back(token);
-            }
-        }
-
-        runAtNextTick(nullptr, [this, tokens = std::move(tokens), handler, user, callback]() {
-            for (auto const& token : tokens) {
-                assert_invariant(token);
-                if (!isProgramReady(token)) {
-                    // one of the program is not ready, try next time
-                    return false;
-                }
-            }
-            if (callback) {
-                // all programs are ready, we can call the callbacks
-                mDriver.scheduleCallback(handler, user, callback);
-            }
-            // and we're done
-            return true;
-        });
-
-        return;
+void ShaderCompilerService::notifyWhenAllProgramsAreReady(
+        CallbackHandler* handler, CallbackHandler::Callback callback, void* user) {
+    if (callback) {
+        mCallbackManager.setCallback(handler, callback, user);
    }
-
-    // we don't have KHR_parallel_shader_compile
-
-    runAtNextTick(nullptr, [this, handler, user, callback]() {
-        mDriver.scheduleCallback(handler, user, callback);
-        return true;
-    });
-
-    // TODO: we could spread the compiles over several frames, the tick() below then is not
-    //       needed here. We keep it for now as to not change the current behavior too much.
-    // this will block until all programs are linked
-    tick();
 }

 // ------------------------------------------------------------------------------------------------

 void ShaderCompilerService::getProgramFromCompilerPool(program_token_t& token) noexcept {
-    ProgramToken::ProgramBinary const binary{ token->binary.get() };
+    OpenGLProgramToken::ProgramData const& programData{ token->get() };
    if (!token->canceled) {
-        token->gl.shaders = binary.shaders;
-        if (UTILS_LIKELY(mUseSharedContext)) {
-            token->gl.program = binary.program;
-        }
-#ifndef FILAMENT_SILENCE_NOT_SUPPORTED_BY_ES2
-        else {
-            token->gl.program = glCreateProgram();
-            glProgramBinary(token->gl.program, binary.format,
-                    binary.blob.data(), GLsizei(binary.blob.size()));
-        }
-#endif
+        token->gl.shaders = programData.shaders;
+        token->gl.program = programData.program;
    }
 }

@@ -493,24 +382,36 @@ GLuint ShaderCompilerService::initialize(program_token_t& token) noexcept {
    SYSTRACE_CALL();
    if (!token->gl.program) {
        if (mShaderCompilerThreadCount) {
-            // Block until the program is ready. This could take a very long time.
-            assert_invariant(token->binary.valid());
-
-            // we need this program right now, so move it to the head of the queue.
-            mCompilerThreadPool.makeUrgent(token);
+            // we need this program right now, remove it from the queue
+            auto job = mCompilerThreadPool.dequeue(token);
+            if (job) {
+                // if we were able to remove it, we execute the job now, otherwise it means
+                // it's being executed right now.
+                job();
+            }

            if (!token->canceled) {
                token->compiler.cancelTickOp(token);
            }

-            // block until we get the program from the pool
+            // Block until we get the program from the pool. Generally this wouldn't block
+            // because we just compiled the program above, when executing job.
            ShaderCompilerService::getProgramFromCompilerPool(token);
        } else if (KHR_parallel_shader_compile) {
            // we force the program link -- which might stall, either here or below in
            // checkProgramStatus(), but we don't have a choice, we need to use the program now.
            token->compiler.cancelTickOp(token);
+
            token->gl.program = linkProgram(mDriver.getContext(),
                    token->gl.shaders, token->attributes);
+
+            assert_invariant(token->gl.program);
+
+            mCallbackManager.put(token->handle);
+
+            if (token->key) {
+                OpenGLBlobCache::insert(mDriver.mPlatform, token->key, token->gl.program);
+            }
        } else {
            // if we don't have a program yet, block until we get it.
            tick();
@@ -661,8 +562,8 @@ float u16tofp32(highp uint v) {
    v <<= 16u;
    highp uint s = v & 0x80000000u;
    highp uint n = v & 0x7FFFFFFFu;
-    highp uint nz = n == 0u ? 0u : 0xFFFFFFFF;
-    return uintBitsToFloat(s | ((((n >> 3u) + (0x70u << 23))) & nz));
+    highp uint nz = (n == 0u) ? 0u : 0xFFFFFFFFu;
+    return uintBitsToFloat(s | ((((n >> 3u) + (0x70u << 23u))) & nz));
 }
 vec2 unpackHalf2x16(highp uint v) {
    return vec2(u16tofp32(v&0xFFFFu), u16tofp32(v>>16u));
@@ -670,11 +571,11 @@ vec2 unpackHalf2x16(highp uint v) {
 uint fp32tou16(float val) {
    uint f32 = floatBitsToUint(val);
    uint f16 = 0u;
-    uint sign = (f32 >> 16) & 0x8000u;
-    int exponent = int((f32 >> 23) & 0xFFu) - 127;
+    uint sign = (f32 >> 16u) & 0x8000u;
+    int exponent = int((f32 >> 23u) & 0xFFu) - 127;
    uint mantissa = f32 & 0x007FFFFFu;
    if (exponent > 15) {
-        f16 = sign | (0x1Fu << 10);
+        f16 = sign | (0x1Fu << 10u);
    } else if (exponent > -15) {
        exponent += 15;
        mantissa >>= 13;
@@ -687,7 +588,7 @@ uint fp32tou16(float val) {
 highp uint packHalf2x16(vec2 v) {
    highp uint x = fp32tou16(v.x);
    highp uint y = fp32tou16(v.y);
-    return (y << 16) | x;
+    return (y << 16u) | x;
 }
 )"sv;
    }
@@ -747,17 +648,15 @@ GLuint ShaderCompilerService::linkProgram(OpenGLContext& context,

 // ------------------------------------------------------------------------------------------------

-void ShaderCompilerService::runAtNextTick(
-        const program_token_t& token, std::function<bool()> fn) noexcept {
+void ShaderCompilerService::runAtNextTick(CompilerPriorityQueue priority,
+        const program_token_t& token, Job job) noexcept {
    // insert items in order of priority and at the end of the range
    auto& ops = mRunAtNextTickOps;
-    using ContainerType = std::pair<program_token_t, std::function<bool()>>;
-    auto const pos = std::lower_bound(ops.begin(), ops.end(),
-            token->priorityQueue,
+    auto const pos = std::lower_bound(ops.begin(), ops.end(), priority,
            [](ContainerType const& lhs, CompilerPriorityQueue priorityQueue) {
-                return lhs.first->priorityQueue < priorityQueue;
+                return std::get<0>(lhs) < priorityQueue;
            });
-    ops.emplace(pos, token, std::move(fn));
+    ops.emplace(pos, priority, token, std::move(job));

    SYSTRACE_CONTEXT();
    SYSTRACE_VALUE32("ShaderCompilerService Jobs", mRunAtNextTickOps.size());
@@ -766,9 +665,8 @@ void ShaderCompilerService::runAtNextTick(
 void ShaderCompilerService::cancelTickOp(program_token_t token) noexcept {
    // We do a linear search here, but this is rare, and we know the list is pretty small.
    auto& ops = mRunAtNextTickOps;
-    auto pos = std::find_if(ops.begin(), ops.end(),
-            [&](const auto& item) {
-        return item.first == token;
+    auto pos = std::find_if(ops.begin(), ops.end(), [&](const auto& item) {
+        return std::get<1>(item) == token;
    });
    if (pos != ops.end()) {
        ops.erase(pos);
@@ -781,7 +679,8 @@ void ShaderCompilerService::executeTickOps() noexcept {
    auto& ops = mRunAtNextTickOps;
    auto it = ops.begin();
    while (it != ops.end()) {
-        bool const remove = it->second();
+        Job const& job = std::get<2>(*it);
+        bool const remove = job.fn(job);
        if (remove) {
            it = ops.erase(it);
        } else {
--- a/filament/backend/src/opengl/ShaderCompilerService.h
+++ b/filament/backend/src/opengl/ShaderCompilerService.h
@@ -19,12 +19,16 @@

 #include "gl_headers.h"

+#include "CallbackManager.h"
+#include "CompilerThreadPool.h"
+
 #include <backend/CallbackHandler.h>
 #include <backend/Program.h>

 #include <utils/CString.h>
-#include <utils/Invocable.h>
 #include <utils/FixedCapacityVector.h>
+#include <utils/Invocable.h>
+#include <utils/JobSystem.h>

 #include <atomic>
 #include <condition_variable>
@@ -33,6 +37,7 @@
 #include <memory>
 #include <mutex>
 #include <thread>
+#include <utility>
 #include <vector>

 namespace filament::backend {
@@ -47,10 +52,10 @@ class CallbackHandler;
 * A class handling shader compilation that supports asynchronous compilation.
 */
 class ShaderCompilerService {
-    struct ProgramToken;
+    struct OpenGLProgramToken;

 public:
-    using program_token_t = std::shared_ptr<ProgramToken>;
+    using program_token_t = std::shared_ptr<OpenGLProgramToken>;

    explicit ShaderCompilerService(OpenGLDriver& driver);

@@ -61,16 +66,14 @@ public:

    ~ShaderCompilerService() noexcept;

+    bool isParallelShaderCompileSupported() const noexcept;
+
    void init() noexcept;
    void terminate() noexcept;

    // creates a program (compile + link) asynchronously if supported
    program_token_t createProgram(utils::CString const& name, Program&& program);

-    // Returns true if the program is linked (successfully or not). Guarantees that
-    // getProgram() won't block. Does not block.
-    bool isProgramReady(const program_token_t& token) const noexcept;
-
    // Return the GL program, blocks if necessary. The Token is destroyed and becomes invalid.
    GLuint getProgram(program_token_t& token);

@@ -87,38 +90,17 @@ public:
    static void* getUserData(const program_token_t& token) noexcept;

    // call the callback when all active programs are ready
-    void notifyWhenAllProgramsAreReady(CallbackHandler* handler,
-            CallbackHandler::Callback callback, void* user);
+    void notifyWhenAllProgramsAreReady(
+            CallbackHandler* handler, CallbackHandler::Callback callback, void* user);

 private:
-    class CompilerThreadPool {
-    public:
-        using Job = utils::Invocable<void()>;
-        void init(bool useSharedContexts, uint32_t threadCount, OpenGLPlatform& platform) noexcept;
-        void exit() noexcept;
-        void queue(program_token_t const& token, Job&& job);
-        void makeUrgent(program_token_t const& token);
-
-    private:
-        std::vector<std::thread> mCompilerThreads;
-        std::atomic_bool mExitRequested{ false };
-        std::mutex mQueueLock;
-        std::condition_variable mQueueCondition;
-        std::array<std::deque<std::pair<program_token_t, Job>>, 2> mQueues;
-        Job mUrgentJob;
-        Job dequeue(program_token_t const& token); // lock must be held
-    };
-
    OpenGLDriver& mDriver;
+    CallbackManager mCallbackManager;
    CompilerThreadPool mCompilerThreadPool;

    const bool KHR_parallel_shader_compile;
    uint32_t mShaderCompilerThreadCount = 0u;

-    // For now, we assume shared contexts are supported everywhere. If they are not,
-    // we don't use the shader compiler pool. However, the code supports it.
-    static constexpr bool mUseSharedContext = true;
-
    GLuint initialize(ShaderCompilerService::program_token_t& token) noexcept;

    static void getProgramFromCompilerPool(program_token_t& token) noexcept;
@@ -143,11 +125,27 @@ private:

    static bool checkProgramStatus(program_token_t const& token) noexcept;

-    void runAtNextTick(const program_token_t& token, std::function<bool()> fn) noexcept;
+    struct Job {
+        template<typename FUNC>
+        Job(FUNC&& fn) : fn(std::forward<FUNC>(fn)) {}
+        Job(std::function<bool(Job const& job)> fn,
+                CallbackHandler* handler, void* user, CallbackHandler::Callback callback)
+                : fn(std::move(fn)), handler(handler), user(user), callback(callback) {
+        }
+        std::function<bool(Job const& job)> fn;
+        CallbackHandler* handler = nullptr;
+        void* user = nullptr;
+        CallbackHandler::Callback callback{};
+    };
+
+    void runAtNextTick(CompilerPriorityQueue priority,
+            const program_token_t& token, Job job) noexcept;
    void executeTickOps() noexcept;
    void cancelTickOp(program_token_t token) noexcept;
    // order of insertion is important
-    std::vector<std::pair<program_token_t, std::function<bool()>>> mRunAtNextTickOps;
+
+    using ContainerType = std::tuple<CompilerPriorityQueue, program_token_t, Job>;
+    std::vector<ContainerType> mRunAtNextTickOps;
 };

 } // namespace filament::backend
--- a/filament/backend/src/opengl/gl_headers.h
+++ b/filament/backend/src/opengl/gl_headers.h
@@ -188,6 +188,12 @@ using namespace glext;
 #   define GL_TEXTURE_CUBE_MAP_ARRAY                0x9009
 #endif

+#if defined(GL_EXT_clip_cull_distance)
+#   define GL_CLIP_DISTANCE0                        GL_CLIP_DISTANCE0_EXT
+#else
+#   define GL_CLIP_DISTANCE0                        0x3000
+#endif
+
 #if defined(GL_KHR_debug)
 #   define GL_DEBUG_OUTPUT                          GL_DEBUG_OUTPUT_KHR
 #   define GL_DEBUG_OUTPUT_SYNCHRONOUS              GL_DEBUG_OUTPUT_SYNCHRONOUS_KHR
--- a/filament/backend/src/opengl/platforms/PlatformCocoaGL.mm
+++ b/filament/backend/src/opengl/platforms/PlatformCocoaGL.mm
@@ -173,6 +173,9 @@ Driver* PlatformCocoaGL::createDriver(void* sharedContext, const Platform::Drive
 }

 bool PlatformCocoaGL::isExtraContextSupported() const noexcept {
+    // macOS supports shared contexts however, it looks like the implementation uses a global
+    // lock around all GL APIs. It's a problem for API calls that take a long time to execute,
+    // one such call is e.g.: glCompileProgram.
    return true;
 }

--- a/filament/backend/src/opengl/platforms/PlatformEGL.cpp
+++ b/filament/backend/src/opengl/platforms/PlatformEGL.cpp
@@ -115,9 +115,14 @@ Driver* PlatformEGL::createDriver(void* sharedContext, const Platform::DriverCon

    auto extensions = GLUtils::split(eglQueryString(mEGLDisplay, EGL_EXTENSIONS));
    ext.egl.ANDROID_recordable = extensions.has("EGL_ANDROID_recordable");
-    ext.egl.KHR_create_context = extensions.has("EGL_KHR_create_context");
    ext.egl.KHR_gl_colorspace = extensions.has("EGL_KHR_gl_colorspace");
+    ext.egl.KHR_create_context = extensions.has("EGL_KHR_create_context");
    ext.egl.KHR_no_config_context = extensions.has("EGL_KHR_no_config_context");
+    ext.egl.KHR_surfaceless_context = extensions.has("KHR_surfaceless_context");
+    if (ext.egl.KHR_create_context) {
+        // KHR_create_context implies KHR_surfaceless_context for ES3.x contexts
+        ext.egl.KHR_surfaceless_context = true;
+    }

    eglCreateSyncKHR = (PFNEGLCREATESYNCKHRPROC) eglGetProcAddress("eglCreateSyncKHR");
    eglDestroySyncKHR = (PFNEGLDESTROYSYNCKHRPROC) eglGetProcAddress("eglDestroySyncKHR");
@@ -181,13 +186,6 @@ Driver* PlatformEGL::createDriver(void* sharedContext, const Platform::DriverCon
        eglConfig = mEGLConfig;
    }

-    // create the dummy surface, just for being able to make the context current.
-    mEGLDummySurface = eglCreatePbufferSurface(mEGLDisplay, mEGLConfig, pbufferAttribs);
-    if (UTILS_UNLIKELY(mEGLDummySurface == EGL_NO_SURFACE)) {
-        logEglError("eglCreatePbufferSurface");
-        goto error;
-    }
-
    for (size_t tries = 0; tries < 3; tries++) {
        mEGLContext = eglCreateContext(mEGLDisplay, eglConfig,
                (EGLContext)sharedContext, contextAttribs.data());
@@ -220,6 +218,26 @@ Driver* PlatformEGL::createDriver(void* sharedContext, const Platform::DriverCon
        goto error;
    }

+    if (ext.egl.KHR_surfaceless_context) {
+        // Adreno 306 driver advertises KHR_create_context but doesn't support passing
+        // EGL_NO_SURFACE to eglMakeCurrent with a 3.0 context.
+        if (UTILS_UNLIKELY(!eglMakeCurrent(mEGLDisplay,
+                EGL_NO_SURFACE, EGL_NO_SURFACE, mEGLContext))) {
+            if (eglGetError() == EGL_BAD_MATCH) {
+                ext.egl.KHR_surfaceless_context = false;
+            }
+        }
+    }
+
+    if (UTILS_UNLIKELY(!ext.egl.KHR_surfaceless_context)) {
+        // create the dummy surface, just for being able to make the context current.
+        mEGLDummySurface = eglCreatePbufferSurface(mEGLDisplay, mEGLConfig, pbufferAttribs);
+        if (UTILS_UNLIKELY(mEGLDummySurface == EGL_NO_SURFACE)) {
+            logEglError("eglCreatePbufferSurface");
+            goto error;
+        }
+    }
+
    if (UTILS_UNLIKELY(!makeCurrent(mEGLDummySurface, mEGLDummySurface))) {
        // eglMakeCurrent failed
        logEglError("eglMakeCurrent");
@@ -255,11 +273,13 @@ error:
 }

 bool PlatformEGL::isExtraContextSupported() const noexcept {
-    return ext.egl.KHR_no_config_context;
+    return ext.egl.KHR_surfaceless_context;
 }

 void PlatformEGL::createContext(bool shared) {
-    EGLContext context = eglCreateContext(mEGLDisplay, EGL_NO_CONFIG_KHR,
+    EGLConfig config = ext.egl.KHR_no_config_context ? EGL_NO_CONFIG_KHR : mEGLConfig;
+
+    EGLContext context = eglCreateContext(mEGLDisplay, config,
            shared ? mEGLContext : EGL_NO_CONTEXT, mContextAttribs.data());

    if (UTILS_UNLIKELY(context == EGL_NO_CONTEXT)) {
@@ -274,6 +294,22 @@ void PlatformEGL::createContext(bool shared) {
    mAdditionalContexts.push_back(context);
 }

+void PlatformEGL::releaseContext() noexcept {
+    EGLContext context = eglGetCurrentContext();
+    eglMakeCurrent(mEGLDisplay, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
+    if (context != EGL_NO_CONTEXT) {
+        eglDestroyContext(mEGLDisplay, context);
+    }
+
+    mAdditionalContexts.erase(
+            std::remove_if(mAdditionalContexts.begin(), mAdditionalContexts.end(),
+                    [context](EGLContext c) {
+                        return c == context;
+                    }), mAdditionalContexts.end());
+
+    eglReleaseThread();
+}
+
 EGLBoolean PlatformEGL::makeCurrent(EGLSurface drawSurface, EGLSurface readSurface) noexcept {
    if (UTILS_UNLIKELY((drawSurface != mCurrentDrawSurface || readSurface != mCurrentReadSurface))) {
        mCurrentDrawSurface = drawSurface;
@@ -284,8 +320,11 @@ EGLBoolean PlatformEGL::makeCurrent(EGLSurface drawSurface, EGLSurface readSurfa
 }

 void PlatformEGL::terminate() noexcept {
+    // it's always allowed to use EGL_NO_SURFACE, EGL_NO_CONTEXT
    eglMakeCurrent(mEGLDisplay, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
-    eglDestroySurface(mEGLDisplay, mEGLDummySurface);
+    if (mEGLDummySurface) {
+        eglDestroySurface(mEGLDisplay, mEGLDummySurface);
+    }
    eglDestroyContext(mEGLDisplay, mEGLContext);
    for (auto context : mAdditionalContexts) {
        eglDestroyContext(mEGLDisplay, context);
--- a/filament/backend/src/vulkan/VulkanBlitter.cpp
+++ b/filament/backend/src/vulkan/VulkanBlitter.cpp
@@ -16,7 +16,10 @@

 #include "VulkanBlitter.h"
 #include "VulkanContext.h"
+#include "VulkanFboCache.h"
 #include "VulkanHandles.h"
+#include "VulkanSamplerCache.h"
+#include "VulkanTexture.h"

 #include <utils/FixedCapacityVector.h>
 #include <utils/Panic.h>
@@ -134,16 +137,20 @@ void VulkanBlitter::blitColor(BlitArgs args) {
    VkFormatProperties info;
    vkGetPhysicalDeviceFormatProperties(gpu, src.getFormat(), &info);
    if (!ASSERT_POSTCONDITION_NON_FATAL(info.optimalTilingFeatures & VK_FORMAT_FEATURE_BLIT_SRC_BIT,
-            "Source format is not blittable")) {
+                "Source format is not blittable %d", src.getFormat())) {
        return;
    }
    vkGetPhysicalDeviceFormatProperties(gpu, dst.getFormat(), &info);
    if (!ASSERT_POSTCONDITION_NON_FATAL(info.optimalTilingFeatures & VK_FORMAT_FEATURE_BLIT_DST_BIT,
-            "Destination format is not blittable")) {
+                "Destination format is not blittable %d", dst.getFormat())) {
        return;
    }
 #endif
-    VkCommandBuffer const cmdbuffer = mCommands->get().cmdbuffer;
+    VulkanCommandBuffer& commands = mCommands->get();
+    VkCommandBuffer const cmdbuffer = commands.cmdbuffer;
+    commands.acquire(src.texture);
+    commands.acquire(dst.texture);
+
    blitFast(cmdbuffer, aspect, args.filter, args.srcTarget->getExtent(), src, dst,
            args.srcRectPair, args.dstRectPair);
 }
@@ -158,12 +165,12 @@ void VulkanBlitter::blitDepth(BlitArgs args) {
    VkFormatProperties info;
    vkGetPhysicalDeviceFormatProperties(gpu, src.getFormat(), &info);
    if (!ASSERT_POSTCONDITION_NON_FATAL(info.optimalTilingFeatures & VK_FORMAT_FEATURE_BLIT_SRC_BIT,
-            "Depth format is not blittable")) {
+                "Depth src format is not blittable %d", src.getFormat())) {
        return;
    }
    vkGetPhysicalDeviceFormatProperties(gpu, dst.getFormat(), &info);
    if (!ASSERT_POSTCONDITION_NON_FATAL(info.optimalTilingFeatures & VK_FORMAT_FEATURE_BLIT_DST_BIT,
-            "Depth format is not blittable")) {
+                "Depth dst format is not blittable %d", dst.getFormat())) {
        return;
    }
 #endif
@@ -175,7 +182,11 @@ void VulkanBlitter::blitDepth(BlitArgs args) {
                args.dstRectPair);
        return;
    }
-    VkCommandBuffer const cmdbuffer = mCommands->get().cmdbuffer;
+
+    VulkanCommandBuffer& commands = mCommands->get();
+    VkCommandBuffer const cmdbuffer = commands.cmdbuffer;
+    commands.acquire(src.texture);
+    commands.acquire(dst.texture);
    blitFast(cmdbuffer, aspect, args.filter, args.srcTarget->getExtent(), src, dst, args.srcRectPair,
            args.dstRectPair);
 }
@@ -245,13 +256,16 @@ void VulkanBlitter::lazyInit() noexcept {
        +1.0f, +1.0f,
    };

-    mTriangleBuffer = new VulkanBuffer(mAllocator, mCommands, mStagePool,
-            VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, sizeof(kTriangleVertices));
+    VulkanCommandBuffer& commands = mCommands->get();
+    VkCommandBuffer const cmdbuffer = commands.cmdbuffer;

-    mTriangleBuffer->loadFromCpu(kTriangleVertices, 0, sizeof(kTriangleVertices));
+    mTriangleBuffer = new VulkanBuffer(mAllocator, mStagePool, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT,
+            sizeof(kTriangleVertices));

-    mParamsBuffer = new VulkanBuffer(mAllocator, mCommands, mStagePool,
-            VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, sizeof(BlitterUniforms));
+    mTriangleBuffer->loadFromCpu(cmdbuffer, kTriangleVertices, 0, sizeof(kTriangleVertices));
+
+    mParamsBuffer = new VulkanBuffer(mAllocator, mStagePool, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT,
+            sizeof(BlitterUniforms));
 }

 // At a high level, the procedure for resolving depth looks like this:
@@ -263,11 +277,16 @@ void VulkanBlitter::blitSlowDepth(VkFilter filter, const VkExtent2D srcExtent, V
        VulkanAttachment dst, const VkOffset3D srcRect[2], const VkOffset3D dstRect[2]) {
    lazyInit();

+    VulkanCommandBuffer* commands = &mCommands->get();
+    VkCommandBuffer const cmdbuffer = commands->cmdbuffer;
+    commands->acquire(src.texture);
+    commands->acquire(dst.texture);
+
    BlitterUniforms const uniforms = {
-        .sampleCount = src.texture->samples,
-        .inverseSampleCount = 1.0f / float(src.texture->samples),
+            .sampleCount = src.texture->samples,
+            .inverseSampleCount = 1.0f / float(src.texture->samples),
    };
-    mParamsBuffer->loadFromCpu(&uniforms, 0, sizeof(uniforms));
+    mParamsBuffer->loadFromCpu(cmdbuffer, &uniforms, 0, sizeof(uniforms));

    VkImageAspectFlags const aspect = VK_IMAGE_ASPECT_DEPTH_BIT;

@@ -314,8 +333,6 @@ void VulkanBlitter::blitSlowDepth(VkFilter filter, const VkExtent2D srcExtent, V
    renderPassInfo.renderArea.extent.width = dstRect[1].x - dstRect[0].x;
    renderPassInfo.renderArea.extent.height = dstRect[1].y - dstRect[0].y;

-    const VkCommandBuffer cmdbuffer = mCommands->get().cmdbuffer;
-
    // We need to transition the source into a sampler since it'll be sampled in the shader.
    const VkImageSubresourceRange srcRange = {
            .aspectMask = aspect,
@@ -376,6 +393,7 @@ void VulkanBlitter::blitSlowDepth(VkFilter filter, const VkExtent2D srcExtent, V
    VkSampler vksampler = mSamplerCache.getSampler({});

    VkDescriptorImageInfo samplers[VulkanPipelineCache::SAMPLER_BINDING_COUNT];
+    VulkanTexture* textures[VulkanPipelineCache::SAMPLER_BINDING_COUNT] = {nullptr};
    for (auto& sampler : samplers) {
        sampler = {
            .sampler = vksampler,
@@ -389,8 +407,9 @@ void VulkanBlitter::blitSlowDepth(VkFilter filter, const VkExtent2D srcExtent, V
        .imageView = src.getImageView(VK_IMAGE_ASPECT_DEPTH_BIT),
        .imageLayout = ImgUtil::getVkLayout(samplerLayout),
    };
+    textures[0] = src.texture;

-    mPipelineCache.bindSamplers(samplers,
+    mPipelineCache.bindSamplers(samplers, textures,
            VulkanPipelineCache::getUsageFlags(0, ShaderStageFlags::FRAGMENT));

    auto previousUbo = mPipelineCache.getUniformBufferBinding(0);
@@ -407,7 +426,7 @@ void VulkanBlitter::blitSlowDepth(VkFilter filter, const VkExtent2D srcExtent, V

    mPipelineCache.bindScissor(cmdbuffer, scissor);

-    if (!mPipelineCache.bindPipeline(cmdbuffer)) {
+    if (!mPipelineCache.bindPipeline(commands)) {
        assert_invariant(false);
    }

--- a/filament/backend/src/vulkan/VulkanBuffer.cpp
+++ b/filament/backend/src/vulkan/VulkanBuffer.cpp
@@ -23,9 +23,11 @@ using namespace bluevk;

 namespace filament::backend {

-VulkanBuffer::VulkanBuffer(VmaAllocator allocator, VulkanCommands* commands,
-        VulkanStagePool& stagePool, VkBufferUsageFlags usage, uint32_t numBytes)
-    : mAllocator(allocator), mCommands(commands), mStagePool(stagePool), mUsage(usage) {
+VulkanBuffer::VulkanBuffer(VmaAllocator allocator, VulkanStagePool& stagePool,
+        VkBufferUsageFlags usage, uint32_t numBytes)
+    : mAllocator(allocator),
+      mStagePool(stagePool),
+      mUsage(usage) {

    // for now make sure that only 1 bit is set in usage
    // (because loadFromCpu() assumes that somewhat)
@@ -53,7 +55,8 @@ void VulkanBuffer::terminate() {
    mGpuBuffer = VK_NULL_HANDLE;
 }

-void VulkanBuffer::loadFromCpu(const void* cpuData, uint32_t byteOffset, uint32_t numBytes) const {
+void VulkanBuffer::loadFromCpu(VkCommandBuffer cmdbuf, const void* cpuData, uint32_t byteOffset,
+        uint32_t numBytes) const {
    assert_invariant(byteOffset == 0);
    VulkanStage const* stage = mStagePool.acquireStage(numBytes);
    void* mapped;
@@ -62,10 +65,8 @@ void VulkanBuffer::loadFromCpu(const void* cpuData, uint32_t byteOffset, uint32_
    vmaUnmapMemory(mAllocator, stage->memory);
    vmaFlushAllocation(mAllocator, stage->memory, byteOffset, numBytes);

-    VkCommandBuffer const cmdbuffer = mCommands->get(true).cmdbuffer;
-
    VkBufferCopy region{ .size = numBytes };
-    vkCmdCopyBuffer(cmdbuffer, stage->buffer, mGpuBuffer, 1, &region);
+    vkCmdCopyBuffer(cmdbuf, stage->buffer, mGpuBuffer, 1, &region);

    // Firstly, ensure that the copy finishes before the next draw call.
    // Secondly, in case the user decides to upload another chunk (without ever using the first one)
@@ -99,7 +100,7 @@ void VulkanBuffer::loadFromCpu(const void* cpuData, uint32_t byteOffset, uint32_
 	    .size = VK_WHOLE_SIZE,
    };

-    vkCmdPipelineBarrier(cmdbuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, dstStageMask, 0, 0, nullptr, 1,
+    vkCmdPipelineBarrier(cmdbuf, VK_PIPELINE_STAGE_TRANSFER_BIT, dstStageMask, 0, 0, nullptr, 1,
 	    &barrier, 0, nullptr);
 }

--- a/filament/backend/src/vulkan/VulkanBuffer.h
+++ b/filament/backend/src/vulkan/VulkanBuffer.h
@@ -25,15 +25,18 @@ namespace filament::backend {
 // Encapsulates a Vulkan buffer, its attached DeviceMemory and a staging area.
 class VulkanBuffer {
 public:
-    VulkanBuffer(VmaAllocator allocator, VulkanCommands* commands, VulkanStagePool& stagePool,
-            VkBufferUsageFlags usage, uint32_t numBytes);
+    VulkanBuffer(VmaAllocator allocator, VulkanStagePool& stagePool, VkBufferUsageFlags usage,
+            uint32_t numBytes);
    ~VulkanBuffer();
    void terminate();
-    void loadFromCpu(const void* cpuData, uint32_t byteOffset, uint32_t numBytes) const;
-    VkBuffer getGpuBuffer() const { return mGpuBuffer; }
+    void loadFromCpu(VkCommandBuffer cmdbuf, const void* cpuData, uint32_t byteOffset,
+            uint32_t numBytes) const;
+    VkBuffer getGpuBuffer() const {
+        return mGpuBuffer;
+    }
+
 private:
    VmaAllocator mAllocator;
-    VulkanCommands* mCommands;
    VulkanStagePool& mStagePool;

    VmaAllocation mGpuMemory = VK_NULL_HANDLE;
--- a/filament/backend/src/vulkan/VulkanCommands.cpp
+++ b/filament/backend/src/vulkan/VulkanCommands.cpp
@@ -23,7 +23,6 @@

 #include "VulkanConstants.h"
 #include "VulkanContext.h"
-#include "VulkanDriver.h"

 #include <utils/Log.h>
 #include <utils/Panic.h>
@@ -69,22 +68,35 @@ static VkCommandPool createPool(VkDevice device, uint32_t queueFamilyIndex) {
 }

 void VulkanGroupMarkers::push(std::string const& marker, Timestamp start) noexcept {
-    mMarkers.push(marker);
+    mMarkers.push_back(marker);
 #if FILAMENT_VULKAN_VERBOSE
-    mTimestamps.push(start.time_since_epoch().count() > 0.0
+    mTimestamps.push_back(start.time_since_epoch().count() > 0.0
                                  ? start
                                  : std::chrono::high_resolution_clock::now());
 #endif
 }

 std::pair<std::string, Timestamp> VulkanGroupMarkers::pop() noexcept {
-    auto const marker = mMarkers.top();
-    mMarkers.pop();
+    auto const marker = mMarkers.back();
+    mMarkers.pop_back();

 #if FILAMENT_VULKAN_VERBOSE
-    auto const topTimestamp = mTimestamps.top();
-    mTimestamps.pop();
-    return std::make_pair(marker, topTimestamp);
+    auto const timestamp = mTimestamps.back();
+    mTimestamps.pop_back();
+    return std::make_pair(marker, timestamp);
+#else
+    return std::make_pair(marker, Timestamp{});
+#endif
+}
+
+std::pair<std::string, Timestamp> VulkanGroupMarkers::pop_bottom() noexcept {
+    auto const marker = mMarkers.front();
+    mMarkers.pop_front();
+
+#if FILAMENT_VULKAN_VERBOSE
+    auto const timestamp = mTimestamps.front();
+    mTimestamps.pop_front();
+    return std::make_pair(marker, timestamp);
 #else
    return std::make_pair(marker, Timestamp{});
 #endif
@@ -92,7 +104,7 @@ std::pair<std::string, Timestamp> VulkanGroupMarkers::pop() noexcept {

 std::pair<std::string, Timestamp> VulkanGroupMarkers::top() const {
    assert_invariant(!empty());
-    auto const marker = mMarkers.top();
+    auto const marker = mMarkers.back();
 #if FILAMENT_VULKAN_VERBOSE
    auto const topTimestamp = mTimestamps.top();
    return std::make_pair(marker, topTimestamp);
@@ -106,15 +118,20 @@ bool VulkanGroupMarkers::empty() const noexcept {
 }

 VulkanCommands::VulkanCommands(VkDevice device, VkQueue queue, uint32_t queueFamilyIndex,
-        VulkanContext* context)
+        VulkanContext* context, VulkanResourceAllocator* allocator)
    : mDevice(device),
      mQueue(queue),
      mPool(createPool(mDevice, queueFamilyIndex)),
-      mContext(context) {
+      mContext(context),
+      mStorage(CAPACITY) {
    VkSemaphoreCreateInfo sci{.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO};
    for (auto& semaphore: mSubmissionSignals) {
        vkCreateSemaphore(mDevice, &sci, nullptr, &semaphore);
    }
+
+    for (size_t i = 0; i < CAPACITY; ++i) {
+        mStorage[i] = std::make_unique<VulkanCommandBuffer>(allocator);
+    }
 }

 VulkanCommands::~VulkanCommands() {
@@ -126,10 +143,9 @@ VulkanCommands::~VulkanCommands() {
    }
 }

-VulkanCommandBuffer const& VulkanCommands::get(bool blockOnGC) {
-    if (mCurrent) {
-        mCurrent->blockOnGC = mCurrent->blockOnGC || blockOnGC;
-        return *mCurrent;
+VulkanCommandBuffer& VulkanCommands::get() {
+    if (mCurrentCommandBufferIndex >= 0) {
+        return *mStorage[mCurrentCommandBufferIndex].get();
    }

    // If we ran out of available command buffers, stall until one finishes. This is very rare.
@@ -145,15 +161,18 @@ VulkanCommandBuffer const& VulkanCommands::get(bool blockOnGC) {
        gc();
    }

+    VulkanCommandBuffer* currentbuf = nullptr;
    // Find an available slot.
-    for (VulkanCommandBuffer& wrapper : mStorage) {
-        if (wrapper.cmdbuffer == VK_NULL_HANDLE) {
-            mCurrent = &wrapper;
+    for (size_t i = 0; i < CAPACITY; ++i) {
+        auto wrapper = mStorage[i].get();
+        if (wrapper->cmdbuffer == VK_NULL_HANDLE) {
+            mCurrentCommandBufferIndex = static_cast<int8_t>(i);
+            currentbuf = wrapper;
            break;
        }
    }

-    assert_invariant(mCurrent);
+    assert_invariant(currentbuf);
    --mAvailableCount;

    // Create the low-level command buffer.
@@ -163,47 +182,46 @@ VulkanCommandBuffer const& VulkanCommands::get(bool blockOnGC) {
        .level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,
        .commandBufferCount = 1
    };
-    vkAllocateCommandBuffers(mDevice, &allocateInfo, &mCurrent->cmdbuffer);
-
-    mCurrent->blockOnGC = blockOnGC;
+    vkAllocateCommandBuffers(mDevice, &allocateInfo, &currentbuf->cmdbuffer);

    // Note that the fence wrapper uses shared_ptr because a DriverAPI fence can also have ownership
    // over it.  The destruction of the low-level fence occurs either in VulkanCommands::gc(), or in
    // VulkanDriver::destroyFence(), both of which are safe spots.
-    mCurrent->fence = std::make_shared<VulkanCmdFence>(mDevice);
+    currentbuf->fence = std::make_shared<VulkanCmdFence>(mDevice);

    // Begin writing into the command buffer.
    const VkCommandBufferBeginInfo binfo {
        .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
        .flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT,
    };
-    vkBeginCommandBuffer(mCurrent->cmdbuffer, &binfo);
+    vkBeginCommandBuffer(currentbuf->cmdbuffer, &binfo);

    // Notify the observer that a new command buffer has been activated.
    if (mObserver) {
-        mObserver->onCommandBuffer(*mCurrent);
+        mObserver->onCommandBuffer(*currentbuf);
    }

-    // We push the current markers onto a temporary stack. This must be placed after mCurrent is set
-    // to the new command buffer since pushGroupMarker also calls get().
+    // We push the current markers onto a temporary stack. This must be placed after currentbuf is
+    // set to the new command buffer since pushGroupMarker also calls get().
    while (mCarriedOverMarkers && !mCarriedOverMarkers->empty()) {
        auto [marker, time] = mCarriedOverMarkers->pop();
        pushGroupMarker(marker.c_str(), time);
    }

-    return *mCurrent;
+    return *currentbuf;
 }

 bool VulkanCommands::flush() {
    // It's perfectly fine to call flush when no commands have been written.
-    if (mCurrent == nullptr) {
+    if (mCurrentCommandBufferIndex < 0) {
        return false;
    }

-    const int64_t index = mCurrent - &mStorage[0];
-    VkSemaphore renderingFinished = mSubmissionSignals[index];
+    int8_t const index = mCurrentCommandBufferIndex;
+    VulkanCommandBuffer const* currentbuf = mStorage[index].get();
+    VkSemaphore const renderingFinished = mSubmissionSignals[index];

-    vkEndCommandBuffer(mCurrent->cmdbuffer);
+    vkEndCommandBuffer(currentbuf->cmdbuffer);

    // If the injected semaphore is an "image available" semaphore that has not yet been signaled,
    // it is sometimes fine to start executing commands anyway, as along as we stall the GPU at the
@@ -227,7 +245,7 @@ bool VulkanCommands::flush() {
        .pWaitSemaphores = signals,
        .pWaitDstStageMask = waitDestStageMasks,
        .commandBufferCount = 1,
-        .pCommandBuffers = &mCurrent->cmdbuffer,
+        .pCommandBuffers = &currentbuf->cmdbuffer,
        .signalSemaphoreCount = 1u,
        .pSignalSemaphores = &renderingFinished,
    };
@@ -240,13 +258,18 @@ bool VulkanCommands::flush() {
        signals[submitInfo.waitSemaphoreCount++] = mInjectedSignal;
    }

-    if (FILAMENT_VULKAN_VERBOSE) {
-        slog.i << "Submitting cmdbuffer=" << mCurrent->cmdbuffer
-            << " wait=(" << signals[0] << ", " << signals[1] << ") "
-            << " signal=" << renderingFinished
-            << io::endl;
+    // To address a validation warning.
+    if (submitInfo.waitSemaphoreCount == 0) {
+        submitInfo.pWaitSemaphores = VK_NULL_HANDLE;
    }

+#if FILAMENT_VULKAN_VERBOSE
+    slog.i << "Submitting cmdbuffer=" << currentbuf->cmdbuffer
+        << " wait=(" << signals[0] << ", " << signals[1] << ") "
+        << " signal=" << renderingFinished
+        << io::endl;
+#endif
+
    // Before actually submitting, we need to pop any leftover group markers.
    while (mGroupMarkers && !mGroupMarkers->empty()) {
        if (!mCarriedOverMarkers) {
@@ -258,44 +281,51 @@ bool VulkanCommands::flush() {
        popGroupMarker();
    }

-    auto& cmdfence = mCurrent->fence;
+    auto& cmdfence = currentbuf->fence;
    std::unique_lock<utils::Mutex> lock(cmdfence->mutex);
    cmdfence->status.store(VK_NOT_READY);
    UTILS_UNUSED_IN_RELEASE VkResult result = vkQueueSubmit(mQueue, 1, &submitInfo, cmdfence->fence);
    cmdfence->condition.notify_all();
    lock.unlock();

+#if FILAMENT_VULKAN_VERBOSE
+    if (result != VK_SUCCESS) {
+        utils::slog.d <<"Failed command buffer submission result: " << result << utils::io::endl;
+    }
+#endif
    assert_invariant(result == VK_SUCCESS);

    mSubmissionSignal = renderingFinished;
    mInjectedSignal = VK_NULL_HANDLE;
-    mCurrent = nullptr;
+    mCurrentCommandBufferIndex = -1;
    return true;
 }

 VkSemaphore VulkanCommands::acquireFinishedSignal() {
    VkSemaphore semaphore = mSubmissionSignal;
    mSubmissionSignal = VK_NULL_HANDLE;
-    if (FILAMENT_VULKAN_VERBOSE) {
-        slog.i << "Acquiring " << semaphore << " (e.g. for vkQueuePresentKHR)" << io::endl;
-    }
+#if FILAMENT_VULKAN_VERBOSE
+     slog.i << "Acquiring " << semaphore << " (e.g. for vkQueuePresentKHR)" << io::endl;
+#endif
    return semaphore;
 }

 void VulkanCommands::injectDependency(VkSemaphore next) {
    assert_invariant(mInjectedSignal == VK_NULL_HANDLE);
    mInjectedSignal = next;
-    if (FILAMENT_VULKAN_VERBOSE) {
-        slog.i << "Injecting " << next << " (e.g. due to vkAcquireNextImageKHR)" << io::endl;
-    }
+#if FILAMENT_VULKAN_VERBOSE
+    slog.i << "Injecting " << next << " (e.g. due to vkAcquireNextImageKHR)" << io::endl;
+#endif
 }

 void VulkanCommands::wait() {
    VkFence fences[CAPACITY];
-    uint32_t count = 0;
-    for (auto& wrapper : mStorage) {
-        if (wrapper.cmdbuffer != VK_NULL_HANDLE && mCurrent != &wrapper) {
-            fences[count++] = wrapper.fence->fence;
+    size_t count = 0;
+    for (size_t i = 0; i < CAPACITY; i++) {
+        auto wrapper = mStorage[i].get();
+        if (wrapper->cmdbuffer != VK_NULL_HANDLE
+                && mCurrentCommandBufferIndex != static_cast<int8_t>(i)) {
+            fences[count++] = wrapper->fence->fence;
        }
    }
    if (count > 0) {
@@ -304,26 +334,34 @@ void VulkanCommands::wait() {
 }

 void VulkanCommands::gc() {
-    for (auto& wrapper : mStorage) {
-        if (wrapper.cmdbuffer != VK_NULL_HANDLE) {
-            uint64_t const timeout = wrapper.blockOnGC ? UINT64_MAX : 0;
-            VkResult const result
-                    = vkWaitForFences(mDevice, 1, &wrapper.fence->fence, VK_TRUE, timeout);
-            if (result == VK_SUCCESS) {
-                vkFreeCommandBuffers(mDevice, mPool, 1, &wrapper.cmdbuffer);
-                wrapper.cmdbuffer = VK_NULL_HANDLE;
-                wrapper.fence->status.store(VK_SUCCESS);
-                wrapper.fence.reset();
-                ++mAvailableCount;
-            }
+    VkCommandBuffer buffers[CAPACITY];
+    size_t count = 0;
+    for (size_t i = 0; i < CAPACITY; i++) {
+        auto wrapper = mStorage[i].get();
+        if (wrapper->cmdbuffer == VK_NULL_HANDLE) {
+            continue;
        }
+        VkResult const result = vkWaitForFences(mDevice, 1, &wrapper->fence->fence, VK_TRUE, 0);
+        if (result != VK_SUCCESS) {
+            continue;
+        }
+        buffers[count++] = wrapper->cmdbuffer;
+        wrapper->cmdbuffer = VK_NULL_HANDLE;
+        wrapper->fence->status.store(VK_SUCCESS);
+        wrapper->fence.reset();
+        wrapper->clearResources();
+        ++mAvailableCount;
+    }
+    if (count > 0) {
+        vkFreeCommandBuffers(mDevice, mPool, count, buffers);
    }
 }

 void VulkanCommands::updateFences() {
-    for (auto& wrapper : mStorage) {
-        if (wrapper.cmdbuffer != VK_NULL_HANDLE) {
-            VulkanCmdFence* fence = wrapper.fence.get();
+    for (size_t i = 0; i < CAPACITY; i++) {
+        auto wrapper = mStorage[i].get();
+        if (wrapper->cmdbuffer != VK_NULL_HANDLE) {
+            VulkanCmdFence* fence = wrapper->fence.get();
            if (fence) {
                VkResult status = vkGetFenceStatus(mDevice, fence->fence);
                // This is either VK_SUCCESS, VK_NOT_READY, or VK_ERROR_DEVICE_LOST.
@@ -389,8 +427,9 @@ void VulkanCommands::popGroupMarker() {
        }
    } else if (mCarriedOverMarkers && !mCarriedOverMarkers->empty()) {
        // It could be that pop is called between flush() and get() (new command buffer), in which
-        // case the marker is in "carried over" state. We'd just remove that
-        mCarriedOverMarkers->pop();
+        // case the marker is in "carried over" state, we'd just remove that. Since the
+        // mCarriedOverMarkers is in the opposite order, we pop the bottom instead of the top.
+        mCarriedOverMarkers->pop_bottom();
    }
 }

--- a/filament/backend/src/vulkan/VulkanCommands.h
+++ b/filament/backend/src/vulkan/VulkanCommands.h
@@ -19,15 +19,19 @@

 #include <bluevk/BlueVK.h>

+#include "DriverBase.h"
+
 #include "VulkanConstants.h"
+#include "VulkanResources.h"

 #include <utils/Condition.h>
+#include <utils/FixedCapacityVector.h>
 #include <utils/Mutex.h>

 #include <atomic>

 #include <chrono>
-#include <stack>
+#include <list>
 #include <string>
 #include <utility>

@@ -41,13 +45,14 @@ public:

    void push(std::string const& marker, Timestamp start = {}) noexcept;
    std::pair<std::string, Timestamp> pop() noexcept;
+    std::pair<std::string, Timestamp> pop_bottom() noexcept;
    std::pair<std::string, Timestamp> top() const;
    bool empty() const noexcept;

 private:
-    std::stack<std::string> mMarkers;
+    std::list<std::string> mMarkers;
 #if FILAMENT_VULKAN_VERBOSE
-    std::stack<Timestamp> mTimestamps;
+    std::list<Timestamp> mTimestamps;
 #endif
 };

@@ -66,12 +71,28 @@ struct VulkanCmdFence {
 // DriverApi fence object and should not be destroyed until both the DriverApi object is freed and
 // we're done waiting on the most recent submission of the given command buffer.
 struct VulkanCommandBuffer {
-    VulkanCommandBuffer() {}
+    VulkanCommandBuffer(VulkanResourceAllocator* allocator)
+        : mResourceManager(allocator) {}
+
    VulkanCommandBuffer(VulkanCommandBuffer const&) = delete;
    VulkanCommandBuffer& operator=(VulkanCommandBuffer const&) = delete;
    VkCommandBuffer cmdbuffer = VK_NULL_HANDLE;
    std::shared_ptr<VulkanCmdFence> fence;
-    bool blockOnGC = false;
+
+    inline void acquire(VulkanResource* resource) {
+        mResourceManager.acquire(resource);
+    }
+
+    inline void acquire(VulkanAcquireOnlyResourceManager* srcResources) {
+        mResourceManager.acquire(srcResources);
+    }
+
+    inline void clearResources() {
+        mResourceManager.clear();
+    }
+
+private:
+    VulkanAcquireOnlyResourceManager mResourceManager;
 };

 // Allows classes to be notified after a new command buffer has been activated.
@@ -110,14 +131,11 @@ public:
 class VulkanCommands {
    public:
        VulkanCommands(VkDevice device, VkQueue queue, uint32_t queueFamilyIndex,
-                VulkanContext* context);
+                VulkanContext* context, VulkanResourceAllocator* allocator);
        ~VulkanCommands();

        // Creates a "current" command buffer if none exists, otherwise returns the current one.
-        // `blockOnGC` guarrantees that this buffer will be waited on when gc() is called on it so
-        // that dependent resources can be gc'd safetly after the buffer is sumbitted, completed,
-        // and gc'd.
-        VulkanCommandBuffer const& get(bool blockOnGC = false);
+        VulkanCommandBuffer& get();

        // Submits the current command buffer if it exists, then sets "current" to null.
        // If there are no outstanding commands then nothing happens and this returns false.
@@ -160,10 +178,12 @@ class VulkanCommands {
        VkCommandPool const mPool;
        VulkanContext const* mContext;

-        VulkanCommandBuffer* mCurrent = nullptr;
+        // int8 only goes up to 127, therefore capacity must be less than that.
+        static_assert(CAPACITY < 128);
+        int8_t mCurrentCommandBufferIndex = -1;
        VkSemaphore mSubmissionSignal = {};
        VkSemaphore mInjectedSignal = {};
-        VulkanCommandBuffer mStorage[CAPACITY] = {};
+        utils::FixedCapacityVector<std::unique_ptr<VulkanCommandBuffer>> mStorage;
        VkSemaphore mSubmissionSignals[CAPACITY] = {};
        size_t mAvailableCount = CAPACITY;
        CommandBufferObserver* mObserver = nullptr;
--- a/filament/backend/src/vulkan/VulkanContext.h
+++ b/filament/backend/src/vulkan/VulkanContext.h
@@ -97,8 +97,7 @@ public:
            }
            flags >>= 1;
        }
-        ASSERT_POSTCONDITION(false, "Unable to find a memory type that meets requirements.");
-        return (uint32_t) ~0ul;
+        return (uint32_t) VK_MAX_MEMORY_TYPES;
    }

    inline VkFormat getDepthFormat() const {
--- a/filament/backend/src/vulkan/VulkanDisposer.cpp
+++ b/filament/backend/src/vulkan/VulkanDisposer.cpp
@@ -1,97 +0,0 @@
-/*
- * Copyright (C) 2019 The Android Open Source Project
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- *      http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include "VulkanDisposer.h"
-#include "VulkanConstants.h"
-
-#include <utils/debug.h>
-#include <utils/Log.h>
-
-namespace filament::backend {
-
-// Always wait at least 3 frames after a DriverAPI-level resource has been destroyed for safe
-// destruction, due to potential usage by outstanding command buffers and triple buffering.
-static constexpr uint32_t FRAMES_BEFORE_EVICTION = VK_MAX_COMMAND_BUFFERS;
-
-void VulkanDisposer::createDisposable(Key resource, std::function<void()> destructor) noexcept {
-    mDisposables[resource].destructor = destructor;
-}
-
-void VulkanDisposer::removeReference(Key resource) noexcept {
-    // Null can be passed in as a no-op, this is not an error.
-    if (resource == nullptr) {
-        return;
-    }
-    assert_invariant(mDisposables[resource].refcount > 0);
-    --mDisposables[resource].refcount;
-}
-
-void VulkanDisposer::acquire(Key resource) noexcept {
-    // It's fine to "acquire" a non-managed resource, it's just a no-op.
-    if (resource == nullptr) {
-        return;
-    }
-    auto iter = mDisposables.find(resource);
-    if (iter == mDisposables.end()) {
-        return;
-    }
-    Disposable& disposable = iter.value();
-    assert_invariant(disposable.refcount > 0 && disposable.refcount < 65535);
-
-    // If an auto-decrement is already in place, do not increase the ref count.
-    if (disposable.remainingFrames == 0) {
-        ++disposable.refcount;
-    }
-
-    disposable.remainingFrames = FRAMES_BEFORE_EVICTION;
-}
-
-void VulkanDisposer::gc() noexcept {
-    // First decrement the frame count of all resources that were held by a command buffer.
-    // If any of these reaches zero, decrement its reference count.
-    for (auto iter = mDisposables.begin(); iter != mDisposables.end(); ++iter) {
-        Disposable& disposable = iter.value();
-        if (disposable.refcount > 0 && disposable.remainingFrames > 0) {
-            if (--disposable.remainingFrames == 0) {
-                removeReference(iter.key());
-            }
-        }
-    }
-
-    // Next, destroy all resources with a zero refcount.
-    decltype(mDisposables) disposables;
-    for (auto iter : mDisposables) {
-        Disposable& disposable = iter.second;
-        if (disposable.refcount == 0) {
-            disposable.destructor();
-        } else {
-            disposables.insert({iter.first, disposable});
-        }
-    }
-    disposables.swap(mDisposables);
-}
-
-void VulkanDisposer::terminate() noexcept {
-#ifndef NDEBUG
-    utils::slog.i << mDisposables.size() << " disposables are outstanding." << utils::io::endl;
-#endif
-    for (auto iter : mDisposables) {
-        iter.second.destructor();
-    }
-    mDisposables.clear();
-}
-
-} // namespace filament::backend
--- a/filament/backend/src/vulkan/VulkanDisposer.h
+++ b/filament/backend/src/vulkan/VulkanDisposer.h
@@ -1,60 +0,0 @@
-/*
- * Copyright (C) 2019 The Android Open Source Project
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- *      http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef TNT_FILAMENT_BACKEND_VULKANDISPOSER_H
-#define TNT_FILAMENT_BACKEND_VULKANDISPOSER_H
-
-#include <tsl/robin_map.h>
-
-#include <functional>
-
-namespace filament::backend {
-
-// VulkanDisposer tracks resources (such as textures or vertex buffers) that need deferred
-// destruction due to potential use by a Vulkan command buffer. Resources are represented with void*
-// to allow callers to use any type of handle.
-class VulkanDisposer {
-public:
-    using Key = const void*;
-
-    // Adds the given resource to the disposer and sets its reference count to 1.
-    void createDisposable(Key resource, std::function<void()> destructor) noexcept;
-
-    // Decrements the reference count.
-    void removeReference(Key resource) noexcept;
-
-    // Increments the reference count and auto-decrements it after FRAMES_BEFORE_EVICTION frames.
-    // This is used to indicate that the current command buffer has a reference to the resource.
-    void acquire(Key resource) noexcept;
-
-    // Invokes the destructor function for each disposable with a 0 refcount.
-    void gc() noexcept;
-
-    // Invokes the destructor function for all disposables, regardless of reference count.
-    void terminate() noexcept;
-
-private:
-    struct Disposable {
-        uint16_t refcount = 1;
-        uint16_t remainingFrames = 0;
-        std::function<void()> destructor = []() {};
-    };
-    tsl::robin_map<Key, Disposable> mDisposables;
-};
-
-} // namespace filament::backend
-
-#endif // TNT_FILAMENT_BACKEND_VULKANDISPOSER_H
--- a/filament/backend/src/vulkan/VulkanDriver.cpp
+++ b/filament/backend/src/vulkan/VulkanDriver.cpp
--- a/filament/backend/src/vulkan/VulkanDriver.h
+++ b/filament/backend/src/vulkan/VulkanDriver.h
@@ -17,22 +17,23 @@
 #ifndef TNT_FILAMENT_BACKEND_VULKANDRIVER_H
 #define TNT_FILAMENT_BACKEND_VULKANDRIVER_H

-#include "VulkanPipelineCache.h"
 #include "VulkanBlitter.h"
-#include "VulkanDisposer.h"
 #include "VulkanConstants.h"
 #include "VulkanContext.h"
 #include "VulkanFboCache.h"
+#include "VulkanHandles.h"
+#include "VulkanPipelineCache.h"
+#include "VulkanReadPixels.h"
+#include "VulkanResourceAllocator.h"
 #include "VulkanSamplerCache.h"
 #include "VulkanStagePool.h"
 #include "VulkanUtility.h"

-#include "private/backend/Driver.h"
-#include "private/backend/HandleAllocator.h"
 #include "DriverBase.h"
+#include "private/backend/Driver.h"

-#include <utils/compiler.h>
 #include <utils/Allocator.h>
+#include <utils/compiler.h>

 namespace filament::backend {

@@ -45,8 +46,8 @@ public:
            Platform::DriverConfig const& driverConfig) noexcept;

 private:
-
-    void debugCommandBegin(CommandStream* cmds, bool synchronous, const char* methodName) noexcept override;
+    void debugCommandBegin(CommandStream* cmds, bool synchronous,
+            const char* methodName) noexcept override;

    inline VulkanDriver(VulkanPlatform* platform, VulkanContext const& context,
            Platform::DriverConfig const& driverConfig) noexcept;
@@ -60,77 +61,24 @@ private:
    template<typename T>
    friend class ConcreteDispatcher;

-#define DECL_DRIVER_API(methodName, paramsDecl, params) \
+#define DECL_DRIVER_API(methodName, paramsDecl, params)                                            \
    UTILS_ALWAYS_INLINE inline void methodName(paramsDecl);

-#define DECL_DRIVER_API_SYNCHRONOUS(RetType, methodName, paramsDecl, params) \
+#define DECL_DRIVER_API_SYNCHRONOUS(RetType, methodName, paramsDecl, params)                       \
    RetType methodName(paramsDecl) override;

-#define DECL_DRIVER_API_RETURN(RetType, methodName, paramsDecl, params) \
-    RetType methodName##S() noexcept override; \
+#define DECL_DRIVER_API_RETURN(RetType, methodName, paramsDecl, params)                            \
+    RetType methodName##S() noexcept override;                                                     \
    UTILS_ALWAYS_INLINE inline void methodName##R(RetType, paramsDecl);

 #include "private/backend/DriverAPI.inc"

    VulkanDriver(VulkanDriver const&) = delete;
-    VulkanDriver& operator = (VulkanDriver const&) = delete;
+    VulkanDriver& operator=(VulkanDriver const&) = delete;

 private:
-
-    template<typename D, typename ... ARGS>
-    Handle<D> initHandle(ARGS&& ... args) noexcept {
-        return mHandleAllocator.allocateAndConstruct<D>(std::forward<ARGS>(args) ...);
-    }
-
-    template<typename D>
-    Handle<D> allocHandle() noexcept {
-        return mHandleAllocator.allocate<D>();
-    }
-
-    template<typename D, typename B, typename ... ARGS>
-    typename std::enable_if<std::is_base_of<B, D>::value, D>::type*
-    construct(Handle<B> const& handle, ARGS&& ... args) noexcept {
-        return mHandleAllocator.construct<D, B>(handle, std::forward<ARGS>(args) ...);
-    }
-
-    template<typename B, typename D,
-            typename = typename std::enable_if<std::is_base_of<B, D>::value, D>::type>
-    void destruct(Handle<B> handle, D const* p) noexcept {
-        return mHandleAllocator.deallocate(handle, p);
-    }
-
-    template<typename Dp, typename B>
-    typename std::enable_if_t<
-            std::is_pointer_v<Dp> &&
-            std::is_base_of_v<B, typename std::remove_pointer_t<Dp>>, Dp>
-    handle_cast(Handle<B>& handle) noexcept {
-        return mHandleAllocator.handle_cast<Dp, B>(handle);
-    }
-
-    template<typename Dp, typename B>
-    inline typename std::enable_if_t<
-            std::is_pointer_v<Dp> &&
-            std::is_base_of_v<B, typename std::remove_pointer_t<Dp>>, Dp>
-    handle_cast(Handle<B> const& handle) noexcept {
-        return mHandleAllocator.handle_cast<Dp, B>(handle);
-    }
-
-    template<typename D, typename B>
-    void destruct(Handle<B> handle) noexcept {
-        destruct(handle, handle_cast<D const*>(handle));
-    }
-
-    // This version of destruct takes a VulkanContext and calls a terminate(VulkanContext&)
-    // on the handle before calling the dtor
-    template<typename Dp, typename B>
-    void destructBuffer(Handle<B> handle) noexcept {
-        auto ptr = handle_cast<Dp*>(handle);
-        ptr->terminate();
-        mHandleAllocator.deallocate(handle, ptr);
-    }
-
-    inline void setRenderPrimitiveBuffer(Handle<HwRenderPrimitive> rph,
-            Handle<HwVertexBuffer> vbh, Handle<HwIndexBuffer> ibh);
+    inline void setRenderPrimitiveBuffer(Handle<HwRenderPrimitive> rph, Handle<HwVertexBuffer> vbh,
+            Handle<HwIndexBuffer> ibh);

    inline void setRenderPrimitiveRange(Handle<HwRenderPrimitive> rph, PrimitiveType pt,
            uint32_t offset, uint32_t minIndex, uint32_t maxIndex, uint32_t count);
@@ -150,14 +98,20 @@ private:
    VkDebugUtilsMessengerEXT mDebugMessenger = VK_NULL_HANDLE;

    VulkanContext mContext = {};
-    HandleAllocatorVK mHandleAllocator;
+    VulkanResourceAllocator mResourceAllocator;
+    VulkanResourceManager mResourceManager;
+
+    // Used for resources that are created synchronously and used and destroyed on the backend
+    // thread.
+    VulkanThreadSafeResourceManager mThreadSafeResourceManager;
+
    VulkanPipelineCache mPipelineCache;
-    VulkanDisposer mDisposer;
    VulkanStagePool mStagePool;
    VulkanFboCache mFramebufferCache;
    VulkanSamplerCache mSamplerCache;
    VulkanBlitter mBlitter;
    VulkanSamplerGroup* mSamplerBindings[VulkanPipelineCache::SAMPLER_BINDING_COUNT] = {};
+    VulkanReadPixels mReadPixels;
 };

 } // namespace filament::backend
--- a/filament/backend/src/vulkan/VulkanHandles.cpp
+++ b/filament/backend/src/vulkan/VulkanHandles.cpp
@@ -46,10 +46,12 @@ static void clampToFramebuffer(VkRect2D* rect, uint32_t fbWidth, uint32_t fbHeig
    rect->extent.height = std::max(top - y, 0);
 }

-VulkanProgram::VulkanProgram(VkDevice device, const Program& builder) noexcept :
-        HwProgram(builder.getName()), mDevice(device) {
+VulkanProgram::VulkanProgram(VkDevice device, const Program& builder) noexcept
+    : HwProgram(builder.getName()),
+      VulkanResource(VulkanResourceType::PROGRAM),
+      mDevice(device) {
    auto const& blobs = builder.getShadersSource();
-    VkShaderModule* modules[2] = { &bundle.vertex, &bundle.fragment };
+    VkShaderModule* modules[2] = {&bundle.vertex, &bundle.fragment};
    // TODO: handle compute shaders.
    for (size_t i = 0; i < 2; i++) {
        const auto& blob = blobs[i];
@@ -113,8 +115,9 @@ VulkanProgram::VulkanProgram(VkDevice device, const Program& builder) noexcept :
    }
 }

-VulkanProgram::VulkanProgram(VkDevice device, VkShaderModule vs, VkShaderModule fs) noexcept :
-        mDevice(device) {
+VulkanProgram::VulkanProgram(VkDevice device, VkShaderModule vs, VkShaderModule fs) noexcept
+    : VulkanResource(VulkanResourceType::PROGRAM),
+      mDevice(device) {
    bundle.vertex = vs;
    bundle.fragment = fs;
 }
@@ -126,7 +129,10 @@ VulkanProgram::~VulkanProgram() {
 }

 // Creates a special "default" render target (i.e. associated with the swap chain)
-VulkanRenderTarget::VulkanRenderTarget() : HwRenderTarget(0, 0), mOffscreen(false), mSamples(1) {}
+VulkanRenderTarget::VulkanRenderTarget() :
+    HwRenderTarget(0, 0),
+    VulkanResource(VulkanResourceType::RENDER_TARGET),
+    mOffscreen(false), mSamples(1) {}

 void VulkanRenderTarget::bindToSwapChain(VulkanSwapChain& swapChain) {
    assert_invariant(!mOffscreen);
@@ -138,11 +144,14 @@ void VulkanRenderTarget::bindToSwapChain(VulkanSwapChain& swapChain) {
 }

 VulkanRenderTarget::VulkanRenderTarget(VkDevice device, VkPhysicalDevice physicalDevice,
-        VulkanContext const& context, VmaAllocator allocator,
-        VulkanCommands* commands, uint32_t width, uint32_t height, uint8_t samples,
+        VulkanContext const& context, VmaAllocator allocator, VulkanCommands* commands,
+        uint32_t width, uint32_t height, uint8_t samples,
        VulkanAttachment color[MRT::MAX_SUPPORTED_RENDER_TARGET_COUNT],
        VulkanAttachment depthStencil[2], VulkanStagePool& stagePool)
-    : HwRenderTarget(width, height), mOffscreen(true), mSamples(samples) {
+    : HwRenderTarget(width, height),
+      VulkanResource(VulkanResourceType::RENDER_TARGET),
+      mOffscreen(true),
+      mSamples(samples) {
    for (int index = 0; index < MRT::MAX_SUPPORTED_RENDER_TARGET_COUNT; index++) {
        mColor[index] = color[index];
    }
@@ -166,10 +175,11 @@ VulkanRenderTarget::VulkanRenderTarget(VkDevice device, VkPhysicalDevice physica
        if (texture && texture->samples == 1) {
            auto msTexture = texture->getSidecar();
            if (UTILS_UNLIKELY(!msTexture)) {
-                msTexture = new VulkanTexture(device, physicalDevice, context,
-                        allocator, commands, texture->target,
-                        ((VulkanTexture const*) texture)->levels, texture->format, samples,
-                        texture->width, texture->height, texture->depth, texture->usage, stagePool);
+                // TODO: This should be allocated with the ResourceAllocator.
+                msTexture = new VulkanTexture(device, physicalDevice, context, allocator, commands,
+                        texture->target, ((VulkanTexture const*) texture)->levels, texture->format,
+                        samples, texture->width, texture->height, texture->depth, texture->usage,
+                        stagePool, true /* heap allocated */);
                texture->setSidecar(msTexture);
            }
            mMsaaAttachments[index] = {.texture = msTexture};
@@ -198,7 +208,7 @@ VulkanRenderTarget::VulkanRenderTarget(VkDevice device, VkPhysicalDevice physica
        msTexture = new VulkanTexture(device, physicalDevice, context, allocator,
                commands, depthTexture->target, msLevel, depthTexture->format, samples,
                depthTexture->width, depthTexture->height, depthTexture->depth, depthTexture->usage,
-                stagePool);
+                stagePool, true /* heap allocated */);
        depthTexture->setSidecar(msTexture);
    }

@@ -257,16 +267,23 @@ uint8_t VulkanRenderTarget::getColorTargetCount(const VulkanRenderPass& pass) co
 }

 VulkanVertexBuffer::VulkanVertexBuffer(VulkanContext& context, VulkanStagePool& stagePool,
-        uint8_t bufferCount, uint8_t attributeCount,
-        uint32_t elementCount, AttributeArray const& attribs) :
-        HwVertexBuffer(bufferCount, attributeCount, elementCount, attribs),
-        buffers(bufferCount, nullptr) {}
+        VulkanResourceAllocator* allocator, uint8_t bufferCount, uint8_t attributeCount,
+        uint32_t elementCount, AttributeArray const& attribs)
+    : HwVertexBuffer(bufferCount, attributeCount, elementCount, attribs),
+      VulkanResource(VulkanResourceType::VERTEX_BUFFER),
+      buffers(bufferCount, nullptr),
+      mResources(allocator) {}

-VulkanBufferObject::VulkanBufferObject(VmaAllocator allocator,
-        VulkanCommands* commands, VulkanStagePool& stagePool, uint32_t byteCount,
-        BufferObjectBinding bindingType, BufferUsage usage)
+void VulkanVertexBuffer::setBuffer(VulkanBufferObject* bufferObject, uint32_t index) {
+    buffers[index] = &bufferObject->buffer;
+    mResources.acquire(bufferObject);
+}
+
+VulkanBufferObject::VulkanBufferObject(VmaAllocator allocator, VulkanStagePool& stagePool,
+        uint32_t byteCount, BufferObjectBinding bindingType, BufferUsage usage)
    : HwBufferObject(byteCount),
-      buffer(allocator, commands, stagePool, getBufferObjectUsage(bindingType), byteCount),
+      VulkanResource(VulkanResourceType::BUFFER_OBJECT),
+      buffer(allocator, stagePool, getBufferObjectUsage(bindingType), byteCount),
      bindingType(bindingType) {}

 void VulkanRenderPrimitive::setPrimitiveType(PrimitiveType pt) {
@@ -294,10 +311,14 @@ void VulkanRenderPrimitive::setBuffers(VulkanVertexBuffer* vertexBuffer,
        VulkanIndexBuffer* indexBuffer) {
    this->vertexBuffer = vertexBuffer;
    this->indexBuffer = indexBuffer;
+    mResources.acquire(vertexBuffer);
+    mResources.acquire(indexBuffer);
 }

 VulkanTimerQuery::VulkanTimerQuery(std::tuple<uint32_t, uint32_t> indices)
-    : mStartingQueryIndex(std::get<0>(indices)), mStoppingQueryIndex(std::get<1>(indices)) {}
+    : VulkanThreadSafeResource(VulkanResourceType::TIMER_QUERY),
+      mStartingQueryIndex(std::get<0>(indices)),
+      mStoppingQueryIndex(std::get<1>(indices)) {}

 void VulkanTimerQuery::setFence(std::shared_ptr<VulkanCmdFence> fence) noexcept {
    std::unique_lock<utils::Mutex> lock(mFenceMutex);
--- a/filament/backend/src/vulkan/VulkanHandles.h
+++ b/filament/backend/src/vulkan/VulkanHandles.h
@@ -14,25 +14,28 @@
 * limitations under the License.
 */

- #ifndef TNT_FILAMENT_BACKEND_VULKANHANDLES_H
- #define TNT_FILAMENT_BACKEND_VULKANHANDLES_H
+#ifndef TNT_FILAMENT_BACKEND_VULKANHANDLES_H
+#define TNT_FILAMENT_BACKEND_VULKANHANDLES_H
+
+// This needs to be at the top
+#include "DriverBase.h"

-#include "VulkanDriver.h"
-#include "VulkanPipelineCache.h"
 #include "VulkanBuffer.h"
+#include "VulkanPipelineCache.h"
+#include "VulkanResources.h"
 #include "VulkanSwapChain.h"
 #include "VulkanTexture.h"
 #include "VulkanUtility.h"

 #include "private/backend/SamplerGroup.h"

-#include "utils/Mutex.h"
+#include <utils/Mutex.h>

 namespace filament::backend {

 class VulkanTimestamps;

-struct VulkanProgram : public HwProgram {
+struct VulkanProgram : public HwProgram, VulkanResource {
    VulkanProgram(VkDevice device, const Program& builder) noexcept;
    VulkanProgram(VkDevice device, VkShaderModule vs, VkShaderModule fs) noexcept;
    ~VulkanProgram();
@@ -51,7 +54,7 @@ private:
 //
 // We use private inheritance to shield clients from the width / height fields in HwRenderTarget,
 // which are not representative when this is the default render target.
-struct VulkanRenderTarget : private HwRenderTarget {
+struct VulkanRenderTarget : private HwRenderTarget, VulkanResource {
    // Creates an offscreen render target.
    VulkanRenderTarget(VkDevice device, VkPhysicalDevice physicalDevice,
            VulkanContext const& context, VmaAllocator allocator,
@@ -84,29 +87,43 @@ private:
    uint8_t mSamples : 7;
 };

-struct VulkanVertexBuffer : public HwVertexBuffer {
+struct VulkanBufferObject;
+
+struct VulkanVertexBuffer : public HwVertexBuffer, VulkanResource {
    VulkanVertexBuffer(VulkanContext& context, VulkanStagePool& stagePool,
-            uint8_t bufferCount, uint8_t attributeCount, uint32_t elementCount,
-            AttributeArray const& attributes);
+            VulkanResourceAllocator* allocator, uint8_t bufferCount, uint8_t attributeCount,
+            uint32_t elementCount, AttributeArray const& attributes);
+
+    void setBuffer(VulkanBufferObject* bufferObject, uint32_t index);
+
+    inline void terminate() {
+        mResources.clear();
+    }
+
    utils::FixedCapacityVector<VulkanBuffer const*> buffers;
+
+private:
+    FixedSizeVulkanResourceManager mResources;
 };

-struct VulkanIndexBuffer : public HwIndexBuffer {
-    VulkanIndexBuffer(VmaAllocator allocator, VulkanCommands* commands,
-            VulkanStagePool& stagePool, uint8_t elementSize, uint32_t indexCount)
+struct VulkanIndexBuffer : public HwIndexBuffer, VulkanResource {
+    VulkanIndexBuffer(VmaAllocator allocator, VulkanStagePool& stagePool, uint8_t elementSize,
+            uint32_t indexCount)
        : HwIndexBuffer(elementSize, indexCount),
-          buffer(allocator, commands, stagePool, VK_BUFFER_USAGE_INDEX_BUFFER_BIT,
-                  elementSize * indexCount),
+          VulkanResource(VulkanResourceType::INDEX_BUFFER),
+          buffer(allocator, stagePool, VK_BUFFER_USAGE_INDEX_BUFFER_BIT, elementSize * indexCount),
          indexType(elementSize == 2 ? VK_INDEX_TYPE_UINT16 : VK_INDEX_TYPE_UINT32) {}
-    void terminate() { buffer.terminate(); }
+
+    void terminate() {
+        buffer.terminate();
+    }
    VulkanBuffer buffer;
    const VkIndexType indexType;
 };

-struct VulkanBufferObject : public HwBufferObject {
-    VulkanBufferObject(VmaAllocator allocator, VulkanCommands* commands,
-            VulkanStagePool& stagePool, uint32_t byteCount, BufferObjectBinding bindingType,
-            BufferUsage usage);
+struct VulkanBufferObject : public HwBufferObject, VulkanResource {
+    VulkanBufferObject(VmaAllocator allocator, VulkanStagePool& stagePool, uint32_t byteCount,
+            BufferObjectBinding bindingType, BufferUsage usage);
    void terminate() {
        buffer.terminate();
    }
@@ -114,32 +131,45 @@ struct VulkanBufferObject : public HwBufferObject {
    const BufferObjectBinding bindingType;
 };

-struct VulkanSamplerGroup : public HwSamplerGroup {
+struct VulkanSamplerGroup : public HwSamplerGroup, VulkanResource {
    // NOTE: we have to use out-of-line allocation here because the size of a Handle<> is limited
-    std::unique_ptr<SamplerGroup> sb; // FIXME: this shouldn't depend on filament::SamplerGroup
-    explicit VulkanSamplerGroup(size_t size) noexcept : sb(new SamplerGroup(size)) { }
+    std::unique_ptr<SamplerGroup> sb;// FIXME: this shouldn't depend on filament::SamplerGroup
+    explicit VulkanSamplerGroup(size_t size) noexcept
+        : VulkanResource(VulkanResourceType::SAMPLER_GROUP),
+          sb(new SamplerGroup(size)) {}
 };

-struct VulkanRenderPrimitive : public HwRenderPrimitive {
+struct VulkanRenderPrimitive : public HwRenderPrimitive, VulkanResource {
+    VulkanRenderPrimitive(VulkanResourceAllocator* allocator)
+        : VulkanResource(VulkanResourceType::RENDER_PRIMITIVE),
+          mResources(allocator) {}
+
+    ~VulkanRenderPrimitive() {
+        mResources.clear();
+    }
+
    void setPrimitiveType(PrimitiveType pt);
    void setBuffers(VulkanVertexBuffer* vertexBuffer, VulkanIndexBuffer* indexBuffer);
    VulkanVertexBuffer* vertexBuffer = nullptr;
    VulkanIndexBuffer* indexBuffer = nullptr;
    VkPrimitiveTopology primitiveTopology;
+
+private:
+    FixedSizeVulkanResourceManager mResources;
 };

-struct VulkanFence : public HwFence {
-    explicit VulkanFence(const VulkanCommandBuffer& commands) : fence(commands.fence) {}
+struct VulkanFence : public HwFence, VulkanResource {
+    VulkanFence()
+        : VulkanResource(VulkanResourceType::FENCE) {}
+
+    explicit VulkanFence(std::shared_ptr<VulkanCmdFence> fence)
+        : VulkanResource(VulkanResourceType::FENCE),
+          fence(fence) {}
+
    std::shared_ptr<VulkanCmdFence> fence;
 };

-struct VulkanSync : public HwSync {
-    VulkanSync() = default;
-    explicit VulkanSync(const VulkanCommandBuffer& commands) : fence(commands.fence) {}
-    std::shared_ptr<VulkanCmdFence> fence;
-};
-
-struct VulkanTimerQuery : public HwTimerQuery {
+struct VulkanTimerQuery : public HwTimerQuery, VulkanThreadSafeResource {
    explicit VulkanTimerQuery(std::tuple<uint32_t, uint32_t> indices);
    ~VulkanTimerQuery();

--- a/filament/backend/src/vulkan/VulkanPipelineCache.cpp
+++ b/filament/backend/src/vulkan/VulkanPipelineCache.cpp
@@ -22,6 +22,7 @@

 #include "VulkanConstants.h"
 #include "VulkanHandles.h"
+#include "VulkanTexture.h"
 #include "VulkanUtility.h"

 // Vulkan functions often immediately dereference pointers, so it's fine to pass in a pointer
@@ -64,7 +65,10 @@ VulkanPipelineCache::getUsageFlags(uint16_t binding, ShaderStageFlags flags, Usa
    return src;
 }

-VulkanPipelineCache::VulkanPipelineCache() : mCurrentRasterState(createDefaultRasterState()) {
+VulkanPipelineCache::VulkanPipelineCache(VulkanResourceAllocator* allocator)
+    : mCurrentRasterState(createDefaultRasterState()),
+      mResourceAllocator(allocator),
+      mPipelineBoundResources(allocator) {
    mDummyBufferWriteInfo.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    mDummyBufferWriteInfo.pNext = nullptr;
    mDummyBufferWriteInfo.dstArrayElement = 0;
@@ -144,6 +148,15 @@ bool VulkanPipelineCache::bindDescriptors(VkCommandBuffer cmdbuffer) noexcept {

    cacheEntry->lastUsed = mCurrentTime;
    mBoundDescriptor = mDescriptorRequirements;
+    // This passes the currently "bound" uniform buffer objects to pipeline that will be used in the
+    // draw call.
+    auto resourceEntry = mDescriptorResources.find(cacheEntry->id);
+    if (resourceEntry == mDescriptorResources.end()) {
+        mDescriptorResources[cacheEntry->id]
+                = std::make_unique<VulkanAcquireOnlyResourceManager>(mResourceAllocator);
+        resourceEntry = mDescriptorResources.find(cacheEntry->id);
+    }
+    resourceEntry->second->acquire(&mPipelineBoundResources);

    vkCmdBindDescriptorSets(cmdbuffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
            getOrCreatePipelineLayout()->handle, 0, VulkanPipelineCache::DESCRIPTOR_TYPE_COUNT,
@@ -152,7 +165,9 @@ bool VulkanPipelineCache::bindDescriptors(VkCommandBuffer cmdbuffer) noexcept {
    return true;
 }

-bool VulkanPipelineCache::bindPipeline(VkCommandBuffer cmdbuffer) noexcept {
+bool VulkanPipelineCache::bindPipeline(VulkanCommandBuffer* commands) noexcept {
+    VkCommandBuffer const cmdbuffer = commands->cmdbuffer;
+
    PipelineMap::iterator pipelineIter = mPipelines.find(mPipelineRequirements);

    // Check if the required pipeline is already bound.
@@ -191,7 +206,10 @@ void VulkanPipelineCache::bindScissor(VkCommandBuffer cmdbuffer, VkRect2D scisso
 VulkanPipelineCache::DescriptorCacheEntry* VulkanPipelineCache::createDescriptorSets() noexcept {
    PipelineLayoutCacheEntry* layoutCacheEntry = getOrCreatePipelineLayout();

-    DescriptorCacheEntry descriptorCacheEntry = { .pipelineLayout = mPipelineRequirements.layout };
+    DescriptorCacheEntry descriptorCacheEntry = {
+        .pipelineLayout = mPipelineRequirements.layout,
+        .id = mDescriptorCacheEntryCount++,
+    };

    // Each of the arenas for this particular layout are guaranteed to have the same size. Check
    // the first arena to see if any descriptor sets are available that can be re-claimed. If not,
@@ -234,9 +252,9 @@ VulkanPipelineCache::DescriptorCacheEntry* VulkanPipelineCache::createDescriptor
    // Rewrite every binding in the new descriptor sets.
    VkDescriptorBufferInfo descriptorBuffers[UBUFFER_BINDING_COUNT];
    VkDescriptorImageInfo descriptorSamplers[SAMPLER_BINDING_COUNT];
-    VkDescriptorImageInfo descriptorInputAttachments[TARGET_BINDING_COUNT];
+    VkDescriptorImageInfo descriptorInputAttachments[INPUT_ATTACHMENT_COUNT];
    VkWriteDescriptorSet descriptorWrites[UBUFFER_BINDING_COUNT + SAMPLER_BINDING_COUNT +
-            TARGET_BINDING_COUNT];
+            INPUT_ATTACHMENT_COUNT];
    uint32_t nwrites = 0;
    VkWriteDescriptorSet* writes = descriptorWrites;
    nwrites = 0;
@@ -286,9 +304,9 @@ VulkanPipelineCache::DescriptorCacheEntry* VulkanPipelineCache::createDescriptor
            writeInfo.dstBinding = binding;
        }
    }
-    for (uint32_t binding = 0; binding < TARGET_BINDING_COUNT; binding++) {
-        VkWriteDescriptorSet& writeInfo = writes[nwrites++];
+    for (uint32_t binding = 0; binding < INPUT_ATTACHMENT_COUNT; binding++) {
        if (mDescriptorRequirements.inputAttachments[binding].imageView) {
+            VkWriteDescriptorSet& writeInfo = writes[nwrites++];
            VkDescriptorImageInfo& imageInfo = descriptorInputAttachments[binding];
            imageInfo = mDescriptorRequirements.inputAttachments[binding];
            writeInfo.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
@@ -299,13 +317,11 @@ VulkanPipelineCache::DescriptorCacheEntry* VulkanPipelineCache::createDescriptor
            writeInfo.pImageInfo = &imageInfo;
            writeInfo.pBufferInfo = nullptr;
            writeInfo.pTexelBufferView = nullptr;
-        } else {
-            writeInfo = mDummyTargetWriteInfo;
-            assert_invariant(mDummyTargetInfo.imageView);
+            writeInfo.dstSet = descriptorCacheEntry.handles[2];
+            writeInfo.dstBinding = binding;
        }
-        writeInfo.dstSet = descriptorCacheEntry.handles[2];
-        writeInfo.dstBinding = binding;
    }
+
    vkUpdateDescriptorSets(mDevice, nwrites, writes, 0, nullptr);

    return &mDescriptorSets.emplace(mDescriptorRequirements, descriptorCacheEntry).first.value();
@@ -518,14 +534,14 @@ VulkanPipelineCache::PipelineLayoutCacheEntry* VulkanPipelineCache::getOrCreateP
    vkCreateDescriptorSetLayout(mDevice, &dlinfo, VKALLOC, &cacheEntry.descriptorSetLayouts[1]);

    // Next create the descriptor set layout for input attachments.
-    VkDescriptorSetLayoutBinding tbindings[TARGET_BINDING_COUNT];
+    VkDescriptorSetLayoutBinding tbindings[INPUT_ATTACHMENT_COUNT];
    binding.descriptorType = VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT;
    binding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
-    for (uint32_t i = 0; i < TARGET_BINDING_COUNT; i++) {
+    for (uint32_t i = 0; i < INPUT_ATTACHMENT_COUNT; i++) {
        binding.binding = i;
        tbindings[i] = binding;
    }
-    dlinfo.bindingCount = TARGET_BINDING_COUNT;
+    dlinfo.bindingCount = INPUT_ATTACHMENT_COUNT;
    dlinfo.pBindings = tbindings;
    vkCreateDescriptorSetLayout(mDevice, &dlinfo, VKALLOC, &cacheEntry.descriptorSetLayouts[2]);

@@ -604,13 +620,19 @@ void VulkanPipelineCache::unbindImageView(VkImageView imageView) noexcept {
    }
 }

-void VulkanPipelineCache::bindUniformBuffer(uint32_t bindingIndex, VkBuffer uniformBuffer,
+void VulkanPipelineCache::bindUniformBufferObject(uint32_t bindingIndex,
+        VulkanBufferObject* bufferObject, VkDeviceSize offset, VkDeviceSize size) noexcept {
+    bindUniformBuffer(bindingIndex, bufferObject->buffer.getGpuBuffer(), offset, size);
+    mPipelineBoundResources.acquire(bufferObject);
+}
+
+void VulkanPipelineCache::bindUniformBuffer(uint32_t bindingIndex, VkBuffer buffer,
        VkDeviceSize offset, VkDeviceSize size) noexcept {
    ASSERT_POSTCONDITION(bindingIndex < UBUFFER_BINDING_COUNT,
-            "Uniform bindings overflow: index = %d, capacity = %d.",
-            bindingIndex, UBUFFER_BINDING_COUNT);
+            "Uniform bindings overflow: index = %d, capacity = %d.", bindingIndex,
+            UBUFFER_BINDING_COUNT);
    auto& key = mDescriptorRequirements;
-    key.uniformBuffers[bindingIndex] = uniformBuffer;
+    key.uniformBuffers[bindingIndex] = buffer;

    if (size == VK_WHOLE_SIZE) {
        size = WHOLE_SIZE;
@@ -624,18 +646,21 @@ void VulkanPipelineCache::bindUniformBuffer(uint32_t bindingIndex, VkBuffer unif
 }

 void VulkanPipelineCache::bindSamplers(VkDescriptorImageInfo samplers[SAMPLER_BINDING_COUNT],
-        UsageFlags flags) noexcept {
+        VulkanTexture* textures[SAMPLER_BINDING_COUNT], UsageFlags flags) noexcept {
    for (uint32_t bindingIndex = 0; bindingIndex < SAMPLER_BINDING_COUNT; bindingIndex++) {
        mDescriptorRequirements.samplers[bindingIndex] = samplers[bindingIndex];
+        if (textures[bindingIndex]) {
+            mPipelineBoundResources.acquire(textures[bindingIndex]);
+        }
    }
    mPipelineRequirements.layout = flags;
 }

 void VulkanPipelineCache::bindInputAttachment(uint32_t bindingIndex,
        VkDescriptorImageInfo targetInfo) noexcept {
-    ASSERT_POSTCONDITION(bindingIndex < TARGET_BINDING_COUNT,
+    ASSERT_POSTCONDITION(bindingIndex < INPUT_ATTACHMENT_COUNT,
            "Input attachment bindings overflow: index = %d, capacity = %d.",
-            bindingIndex, TARGET_BINDING_COUNT);
+            bindingIndex, INPUT_ATTACHMENT_COUNT);
    mDescriptorRequirements.inputAttachments[bindingIndex] = targetInfo;
 }

@@ -645,6 +670,7 @@ void VulkanPipelineCache::terminate() noexcept {
    for (auto& iter : mPipelines) {
        vkDestroyPipeline(mDevice, iter.second.handle, VKALLOC);
    }
+    mPipelineBoundResources.clear();
    mPipelines.clear();
    mBoundPipeline = {};
    vmaDestroyBuffer(mAllocator, mDummyBuffer, mDummyMemory);
@@ -678,6 +704,7 @@ void VulkanPipelineCache::onCommandBuffer(const VulkanCommandBuffer& cmdbuffer)
                arenas[i].push_back(cacheEntry.handles[i]);
            }
            ++mDescriptorArenasCount;
+            mDescriptorResources.erase(cacheEntry.id);
            iter = mDescriptorSets.erase(iter);
        } else {
            ++iter;
@@ -738,6 +765,10 @@ void VulkanPipelineCache::onCommandBuffer(const VulkanCommandBuffer& cmdbuffer)
            vkDestroyDescriptorPool(mDevice, pool, VKALLOC);
        }
        mExtinctDescriptorPools.clear();
+
+        for (auto const& entry : mExtinctDescriptorBundles) {
+            mDescriptorResources.erase(entry.id);
+        }
        mExtinctDescriptorBundles.clear();
    }
 }
@@ -757,7 +788,7 @@ VkDescriptorPool VulkanPipelineCache::createDescriptorPool(uint32_t size) const
    poolSizes[1].type = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
    poolSizes[1].descriptorCount = poolInfo.maxSets * SAMPLER_BINDING_COUNT;
    poolSizes[2].type = VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT;
-    poolSizes[2].descriptorCount = poolInfo.maxSets * TARGET_BINDING_COUNT;
+    poolSizes[2].descriptorCount = poolInfo.maxSets * INPUT_ATTACHMENT_COUNT;

    VkDescriptorPool pool;
    const UTILS_UNUSED VkResult result = vkCreateDescriptorPool(mDevice, &poolInfo, VKALLOC, &pool);
@@ -800,6 +831,10 @@ void VulkanPipelineCache::destroyLayoutsAndDescriptors() noexcept {
    mExtinctDescriptorPools.clear();
    mExtinctDescriptorBundles.clear();

+    // Both mDescriptorSets and mExtinctDescriptorBundles have been cleared, so it's safe to call
+    // clear() on mDescriptorResources.
+    mDescriptorResources.clear();
+
    mBoundDescriptor = {};
 }

@@ -864,7 +899,7 @@ bool VulkanPipelineCache::DescEqual::operator()(const DescriptorKey& k1,
            return false;
        }
    }
-    for (uint32_t i = 0; i < TARGET_BINDING_COUNT; i++) {
+    for (uint32_t i = 0; i < INPUT_ATTACHMENT_COUNT; i++) {
        if (k1.inputAttachments[i].imageView != k2.inputAttachments[i].imageView ||
            k1.inputAttachments[i].imageLayout != k2.inputAttachments[i].imageLayout) {
            return false;
--- a/filament/backend/src/vulkan/VulkanPipelineCache.h
+++ b/filament/backend/src/vulkan/VulkanPipelineCache.h
@@ -28,9 +28,11 @@
 #include <utils/compiler.h>
 #include <utils/Hash.h>

+#include <list>
 #include <tsl/robin_map.h>
 #include <type_traits>
 #include <vector>
+#include <unordered_map>

 #include "VulkanCommands.h"

@@ -41,6 +43,9 @@ VK_DEFINE_HANDLE(VmaPool)
 namespace filament::backend {

 struct VulkanProgram;
+struct VulkanBufferObject;
+struct VulkanTexture;
+class VulkanResourceAllocator;

 // VulkanPipelineCache manages a cache of descriptor sets and pipelines.
 //
@@ -58,7 +63,11 @@ public:

    static constexpr uint32_t UBUFFER_BINDING_COUNT = Program::UNIFORM_BINDING_COUNT;
    static constexpr uint32_t SAMPLER_BINDING_COUNT = MAX_SAMPLER_COUNT;
-    static constexpr uint32_t TARGET_BINDING_COUNT = MRT::MAX_SUPPORTED_RENDER_TARGET_COUNT;
+
+    // We assume only one possible input attachment between two subpasses. See also the subpasses
+    // definition in VulkanFboCache.
+    static constexpr uint32_t INPUT_ATTACHMENT_COUNT = 1;
+
    static constexpr uint32_t SHADER_MODULE_COUNT = 2;
    static constexpr uint32_t VERTEX_ATTRIBUTE_COUNT = MAX_VERTEX_ATTRIBUTE_COUNT;

@@ -127,7 +136,7 @@ public:

    // Upon construction, the pipeCache initializes some internal state but does not make any Vulkan
    // calls. On destruction it will free any cached Vulkan objects that haven't already been freed.
-    VulkanPipelineCache();
+    VulkanPipelineCache(VulkanResourceAllocator* allocator);
    ~VulkanPipelineCache();
    void setDevice(VkDevice device, VmaAllocator allocator);

@@ -137,7 +146,7 @@ public:

    // Creates a new pipeline if necessary and binds it using vkCmdBindPipeline.
    // Returns false if an error occurred.
-    bool bindPipeline(VkCommandBuffer cmdbuffer) noexcept;
+    bool bindPipeline(VulkanCommandBuffer* commands) noexcept;

    // Sets up a new scissor rectangle if it has been dirtied.
    void bindScissor(VkCommandBuffer cmdbuffer, VkRect2D scissor) noexcept;
@@ -147,9 +156,12 @@ public:
    void bindRasterState(const RasterState& rasterState) noexcept;
    void bindRenderPass(VkRenderPass renderPass, int subpassIndex) noexcept;
    void bindPrimitiveTopology(VkPrimitiveTopology topology) noexcept;
-    void bindUniformBuffer(uint32_t bindingIndex, VkBuffer uniformBuffer,
+    void bindUniformBufferObject(uint32_t bindingIndex, VulkanBufferObject* bufferObject,
            VkDeviceSize offset = 0, VkDeviceSize size = VK_WHOLE_SIZE) noexcept;
-    void bindSamplers(VkDescriptorImageInfo samplers[SAMPLER_BINDING_COUNT], UsageFlags flags) noexcept;
+    void bindUniformBuffer(uint32_t bindingIndex, VkBuffer buffer,
+            VkDeviceSize offset = 0, VkDeviceSize size = VK_WHOLE_SIZE) noexcept;
+    void bindSamplers(VkDescriptorImageInfo samplers[SAMPLER_BINDING_COUNT],
+            VulkanTexture* textures[SAMPLER_BINDING_COUNT], UsageFlags flags) noexcept;
    void bindInputAttachment(uint32_t bindingIndex, VkDescriptorImageInfo imageInfo) noexcept;
    void bindVertexArray(const VertexArray& varray) noexcept;

@@ -181,17 +193,22 @@ public:
        mDummyTargetInfo.imageView = imageView;
    }

+    // Acquires a resource to be bound to the current pipeline. The ownership of the resource
+    // will be transferred to the corresponding pipeline when pipeline is bound.
+    void acquireResource(VulkanResource* resource) {
+        mPipelineBoundResources.acquire(resource);
+    }
+
    inline RasterState getCurrentRasterState() const noexcept {
-	return mCurrentRasterState;
+        return mCurrentRasterState;
    }

    // We need to update this outside of bindRasterState due to VulkanDriver::draw.
    inline void setCurrentRasterState(RasterState const& rasterState) noexcept {
-	mCurrentRasterState = rasterState;
+        mCurrentRasterState = rasterState;
    }

 private:
-
    // PIPELINE LAYOUT CACHE KEY
    // -------------------------

@@ -298,17 +315,17 @@ private:

    // Represents all the Vulkan state that comprises a bound descriptor set.
    struct DescriptorKey {
-        VkBuffer uniformBuffers[UBUFFER_BINDING_COUNT];             //   80     0
-        DescriptorImageInfo samplers[SAMPLER_BINDING_COUNT];        // 1488    80
-        DescriptorImageInfo inputAttachments[TARGET_BINDING_COUNT]; //  192  1568
-        uint32_t uniformBufferOffsets[UBUFFER_BINDING_COUNT];       //   40  1760
-        uint32_t uniformBufferSizes[UBUFFER_BINDING_COUNT];         //   40  1080
+        VkBuffer uniformBuffers[UBUFFER_BINDING_COUNT];               //   80     0
+        DescriptorImageInfo samplers[SAMPLER_BINDING_COUNT];          // 1488    80
+        DescriptorImageInfo inputAttachments[INPUT_ATTACHMENT_COUNT]; //   24  1568
+        uint32_t uniformBufferOffsets[UBUFFER_BINDING_COUNT];         //   40  1592
+        uint32_t uniformBufferSizes[UBUFFER_BINDING_COUNT];           //   40  1632
    };
    static_assert(offsetof(DescriptorKey, samplers)              == 80);
    static_assert(offsetof(DescriptorKey, inputAttachments)      == 1568);
-    static_assert(offsetof(DescriptorKey, uniformBufferOffsets)  == 1760);
-    static_assert(offsetof(DescriptorKey, uniformBufferSizes)    == 1800);
-    static_assert(sizeof(DescriptorKey) == 1840, "DescriptorKey must not have implicit padding.");
+    static_assert(offsetof(DescriptorKey, uniformBufferOffsets)  == 1592);
+    static_assert(offsetof(DescriptorKey, uniformBufferSizes)    == 1632);
+    static_assert(sizeof(DescriptorKey) == 1672, "DescriptorKey must not have implicit padding.");

    using DescHashFn = utils::hash::MurmurHashFn<DescriptorKey>;

@@ -333,7 +350,10 @@ private:
        std::array<VkDescriptorSet, DESCRIPTOR_TYPE_COUNT> handles;
        Timestamp lastUsed;
        PipelineLayoutKey pipelineLayout;
+        uint32_t id;
    };
+    uint32_t mDescriptorCacheEntryCount = 0;
+

    struct PipelineCacheEntry {
        VkPipeline handle;
@@ -368,12 +388,15 @@ private:
            PipelineLayoutKeyHashFn, PipelineLayoutKeyEqual>;
    using PipelineMap = tsl::robin_map<PipelineKey, PipelineCacheEntry,
            PipelineHashFn, PipelineEqual>;
-    using DescriptorMap = tsl::robin_map<DescriptorKey, DescriptorCacheEntry,
-            DescHashFn, DescEqual>;
+    using DescriptorMap
+            = tsl::robin_map<DescriptorKey, DescriptorCacheEntry, DescHashFn, DescEqual>;
+    using DescriptorResourceMap
+            = std::unordered_map<uint32_t, std::unique_ptr<VulkanAcquireOnlyResourceManager>>;

    PipelineLayoutMap mPipelineLayouts;
    PipelineMap mPipelines;
    DescriptorMap mDescriptorSets;
+    DescriptorResourceMap mDescriptorResources;

    // These helpers all return unstable pointers that should not be stored.
    DescriptorCacheEntry* createDescriptorSets() noexcept;
@@ -421,8 +444,8 @@ private:
    // After a growth event (i.e. when the VkDescriptorPool is replaced with a bigger version), all
    // currently used descriptors are moved into the "extinct" sets so that they can be safely
    // destroyed a few frames later.
-    std::vector<VkDescriptorPool> mExtinctDescriptorPools;
-    std::vector<DescriptorCacheEntry> mExtinctDescriptorBundles;
+    std::list<VkDescriptorPool> mExtinctDescriptorPools;
+    std::list<DescriptorCacheEntry> mExtinctDescriptorBundles;

    VkDescriptorBufferInfo mDummyBufferInfo = {};
    VkWriteDescriptorSet mDummyBufferWriteInfo = {};
@@ -431,6 +454,9 @@ private:

    VkBuffer mDummyBuffer;
    VmaAllocation mDummyMemory;
+
+    VulkanResourceAllocator* mResourceAllocator;
+    VulkanAcquireOnlyResourceManager mPipelineBoundResources;
 };

 } // namespace filament::backend
--- a/filament/backend/src/vulkan/VulkanReadPixels.cpp
+++ b/filament/backend/src/vulkan/VulkanReadPixels.cpp
@@ -0,0 +1,345 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "VulkanReadPixels.h"
+
+#include "DataReshaper.h"
+#include "VulkanCommands.h"
+#include "VulkanHandles.h"
+#include "VulkanImageUtility.h"
+#include "VulkanTexture.h"
+
+#include <utils/Log.h>
+
+using namespace bluevk;
+
+namespace filament::backend {
+
+using ImgUtil = VulkanImageUtility;
+using TaskHandler = VulkanReadPixels::TaskHandler;
+using WorkloadFunc = TaskHandler::WorkloadFunc;
+using OnCompleteFunc = TaskHandler::OnCompleteFunc;
+
+TaskHandler::TaskHandler()
+    : mShouldStop(false),
+      mThread(&TaskHandler::loop, this) {}
+
+void TaskHandler::post(WorkloadFunc&& workload, OnCompleteFunc&& oncomplete) {
+    assert_invariant(!mShouldStop);
+    {
+        std::unique_lock<std::mutex> lock(mTaskQueueMutex);
+        mTaskQueue.push(std::make_pair(std::move(workload), std::move(oncomplete)));
+    }
+    mHasTaskCondition.notify_one();
+}
+
+void TaskHandler::drain() {
+    assert_invariant(!mShouldStop);
+
+    std::mutex syncPointMutex;
+    std::condition_variable syncCondition;
+    bool done = false;
+    post([] {},
+            [&syncPointMutex, &syncCondition, &done] {
+                {
+                    std::unique_lock<std::mutex> lock(syncPointMutex);
+                    done = true;
+                    syncCondition.notify_one();
+                }
+            });
+
+    std::unique_lock<std::mutex> lock(syncPointMutex);
+    syncCondition.wait(lock, [&done] { return done; });
+}
+
+void TaskHandler::shutdown() {
+    {
+        std::unique_lock<std::mutex> lock(mTaskQueueMutex);
+        mShouldStop = true;
+    }
+    mHasTaskCondition.notify_one();
+    mThread.join();
+    ASSERT_POSTCONDITION(mTaskQueue.empty(),
+            "ReadPixels handler has tasks in the queue after shutdown");
+}
+
+void TaskHandler::loop() {
+    while (true) {
+        std::unique_lock<std::mutex> lock(mTaskQueueMutex);
+        mHasTaskCondition.wait(lock, [this] { return !mTaskQueue.empty() || mShouldStop; });
+        if (mShouldStop) {
+            break;
+        }
+        auto [workload, oncomplete] = mTaskQueue.front();
+        mTaskQueue.pop();
+        lock.unlock();
+        workload();
+        oncomplete();
+    }
+
+    // Clean-up: we still need to call oncomplete for clients to do clean-up.
+    while (true) {
+        std::unique_lock<std::mutex> lock(mTaskQueueMutex);
+        if (mTaskQueue.empty()) {
+            break;
+        }
+        auto [workload, oncomplete] = mTaskQueue.front();
+        mTaskQueue.pop();
+        lock.unlock();
+        oncomplete();
+    }
+}
+
+void VulkanReadPixels::terminate() noexcept {
+    assert_invariant(mDevice != VK_NULL_HANDLE);
+    if (mCommandPool == VK_NULL_HANDLE) {
+        return;
+    }
+    vkDestroyCommandPool(mDevice, mCommandPool, VKALLOC);
+    mDevice = VK_NULL_HANDLE;
+
+    mTaskHandler->shutdown();
+    mTaskHandler.reset();
+}
+
+VulkanReadPixels::VulkanReadPixels(VkDevice device)
+    : mDevice(device) {}
+
+void VulkanReadPixels::run(VulkanRenderTarget const* srcTarget, uint32_t const x, uint32_t const y,
+        uint32_t const width, uint32_t const height, uint32_t const graphicsQueueFamilyIndex,
+        PixelBufferDescriptor&& pbd, SelecteMemoryFunction const& selectMemoryFunc,
+        OnReadCompleteFunction const& readCompleteFunc) {
+    assert_invariant(mDevice != VK_NULL_HANDLE);
+
+    VkDevice& device = mDevice;
+
+    if (mCommandPool == VK_NULL_HANDLE) {
+        // Create a command pool if one has not been created.
+        VkCommandPoolCreateInfo createInfo = {
+                .sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,
+                .flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT
+                         | VK_COMMAND_POOL_CREATE_TRANSIENT_BIT,
+                .queueFamilyIndex = graphicsQueueFamilyIndex,
+        };
+        vkCreateCommandPool(device, &createInfo, VKALLOC, &mCommandPool);
+    }
+
+    // We don't create a task handler (start a thread) unless readPixels is called.
+    if (!mTaskHandler) {
+        mTaskHandler = std::make_unique<TaskHandler>();
+    }
+
+    VkCommandPool& cmdpool = mCommandPool;
+
+    VulkanTexture* srcTexture = srcTarget->getColor(0).texture;
+    assert_invariant(srcTexture);
+    VkFormat const srcFormat = srcTexture->getVkFormat();
+    bool const swizzle
+            = srcFormat == VK_FORMAT_B8G8R8A8_UNORM || srcFormat == VK_FORMAT_B8G8R8A8_SRGB;
+
+    // Create a host visible, linearly tiled image as a staging area.
+    VkImageCreateInfo const imageInfo{
+            .sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
+            .imageType = VK_IMAGE_TYPE_2D,
+            .format = srcFormat,
+            .extent = {width, height, 1},
+            .mipLevels = 1,
+            .arrayLayers = 1,
+            .samples = VK_SAMPLE_COUNT_1_BIT,
+            .tiling = VK_IMAGE_TILING_LINEAR,
+            .usage = VK_IMAGE_USAGE_TRANSFER_DST_BIT,
+            .initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
+    };
+
+    VkImage stagingImage;
+    vkCreateImage(device, &imageInfo, VKALLOC, &stagingImage);
+
+#if FILAMENT_VULKAN_VERBOSE
+    utils::slog.d << "readPixels created image=" << stagingImage
+                  << " to copy from image=" << srcTexture->getVkImage()
+                  << " src-layout=" << srcTexture->getLayout(0, 0) << utils::io::endl;
+#endif
+
+    VkMemoryRequirements memReqs;
+    VkDeviceMemory stagingMemory;
+    vkGetImageMemoryRequirements(device, stagingImage, &memReqs);
+
+    uint32_t memoryTypeIndex = selectMemoryFunc(memReqs.memoryTypeBits,
+            VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
+                    | VK_MEMORY_PROPERTY_HOST_CACHED_BIT);
+
+    // If VK_MEMORY_PROPERTY_HOST_CACHED_BIT is not supported, we try only
+    // HOST_VISIBLE+HOST_COHERENT.  HOST_CACHED helps a lot with readpixels performance.
+    if (memoryTypeIndex >= VK_MAX_MEMORY_TYPES) {
+        memoryTypeIndex = selectMemoryFunc(memReqs.memoryTypeBits,
+                VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
+        utils::slog.w
+                << "readPixels is slow because VK_MEMORY_PROPERTY_HOST_CACHED_BIT is not available"
+                << utils::io::endl;
+    }
+
+    ASSERT_POSTCONDITION(memoryTypeIndex < VK_MAX_MEMORY_TYPES,
+            "VulkanReadPixels: unable to find a memory type that meets requirements.");
+
+    VkMemoryAllocateInfo const allocInfo = {
+            .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
+            .allocationSize = memReqs.size,
+            .memoryTypeIndex = memoryTypeIndex,
+    };
+
+    vkAllocateMemory(device, &allocInfo, VKALLOC, &stagingMemory);
+    vkBindImageMemory(device, stagingImage, stagingMemory, 0);
+
+    VkCommandBuffer cmdbuffer;
+    VkCommandBufferAllocateInfo const allocateInfo{
+            .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO,
+            .commandPool = cmdpool,
+            .level = VK_COMMAND_BUFFER_LEVEL_PRIMARY,
+            .commandBufferCount = 1,
+    };
+    vkAllocateCommandBuffers(device, &allocateInfo, &cmdbuffer);
+
+    VkCommandBufferBeginInfo const binfo{
+            .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
+            .flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT,
+    };
+    vkBeginCommandBuffer(cmdbuffer, &binfo);
+
+    ImgUtil::transitionLayout(cmdbuffer, {
+        .image = stagingImage,
+        .oldLayout = VulkanLayout::UNDEFINED,
+        .newLayout = VulkanLayout::TRANSFER_DST,
+        .subresources = {
+            .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+            .baseMipLevel = 0,
+            .levelCount = 1,
+            .baseArrayLayer = 0,
+            .layerCount = 1,
+        },
+    });
+
+    VulkanAttachment const srcAttachment = srcTarget->getColor(0);
+    const VkImageSubresourceRange srcRange
+            = srcAttachment.getSubresourceRange(VK_IMAGE_ASPECT_COLOR_BIT);
+    srcTexture->transitionLayout(cmdbuffer, srcRange, VulkanLayout::TRANSFER_SRC);
+
+    VkImageCopy const imageCopyRegion = {
+        .srcSubresource = {
+            .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+            .mipLevel = srcAttachment.level,
+            .baseArrayLayer = srcAttachment.layer,
+            .layerCount = 1,
+        },
+        .srcOffset = {
+            .x = (int32_t)x,
+            .y = (int32_t)(srcTarget->getExtent().height - (height + y)),
+        },
+        .dstSubresource = {
+            .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
+            .layerCount = 1,
+        },
+        .extent = {
+            .width = width,
+            .height = height,
+            .depth = 1,
+        },
+    };
+
+    // Perform the copy into the staging area. At this point we know that the src
+    // layout is TRANSFER_SRC_OPTIMAL and the staging area is GENERAL.
+    UTILS_UNUSED_IN_RELEASE VkExtent2D srcExtent = srcAttachment.getExtent2D();
+    assert_invariant(imageCopyRegion.srcOffset.x + imageCopyRegion.extent.width <= srcExtent.width);
+    assert_invariant(
+            imageCopyRegion.srcOffset.y + imageCopyRegion.extent.height <= srcExtent.height);
+
+    vkCmdCopyImage(cmdbuffer, srcAttachment.getImage(),
+            ImgUtil::getVkLayout(VulkanLayout::TRANSFER_SRC), stagingImage,
+            ImgUtil::getVkLayout(VulkanLayout::TRANSFER_DST), 1, &imageCopyRegion);
+
+    // Restore the source image layout.
+    srcTexture->transitionLayout(cmdbuffer, srcRange, VulkanLayout::COLOR_ATTACHMENT);
+
+    vkEndCommandBuffer(cmdbuffer);
+
+    VkQueue queue;
+    vkGetDeviceQueue(device, graphicsQueueFamilyIndex, 0, &queue);
+    VkFence readCompleteFence;
+    VkFenceCreateInfo const fenceCreateInfo{
+            .sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO,
+    };
+    vkCreateFence(device, &fenceCreateInfo, VKALLOC, &readCompleteFence);
+    VkSubmitInfo const submitInfo{
+            .sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
+            .waitSemaphoreCount = 0,
+            .pWaitSemaphores = VK_NULL_HANDLE,
+            .pWaitDstStageMask = VK_NULL_HANDLE,
+            .commandBufferCount = 1,
+            .pCommandBuffers = &cmdbuffer,
+            .signalSemaphoreCount = 0,
+            .pSignalSemaphores = VK_NULL_HANDLE,
+    };
+    vkQueueSubmit(queue, 1, &submitInfo, readCompleteFence);
+
+    auto* const pUserBuffer = new PixelBufferDescriptor(std::move(pbd));
+    auto cleanPbdFunc = [pUserBuffer, readCompleteFunc]() {
+        PixelBufferDescriptor& p = *pUserBuffer;
+        readCompleteFunc(std::move(p));
+        delete pUserBuffer;
+    };
+    auto waitFenceFunc = [device, width, height, swizzle, srcFormat, stagingImage, stagingMemory,
+                                 cmdpool, cmdbuffer, pUserBuffer,
+                                 fence = readCompleteFence]() mutable {
+        VkResult status = vkWaitForFences(device, 1, &fence, VK_TRUE, UINT64_MAX);
+        // Fence hasn't been reached. Try waiting again.
+        if (status != VK_SUCCESS) {
+            utils::slog.e << "Failed to wait for readPixels fence" << utils::io::endl;
+            return;
+        }
+
+        PixelBufferDescriptor& p = *pUserBuffer;
+        VkImageSubresource subResource{.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT};
+        VkSubresourceLayout subResourceLayout;
+        vkGetImageSubresourceLayout(device, stagingImage, &subResource, &subResourceLayout);
+
+        // Map image memory so that we can start copying from it.
+        uint8_t const* srcPixels;
+        vkMapMemory(device, stagingMemory, 0, VK_WHOLE_SIZE, 0, (void**) &srcPixels);
+        srcPixels += subResourceLayout.offset;
+
+        if (!DataReshaper::reshapeImage(&p, getComponentType(srcFormat),
+                    getComponentCount(srcFormat), srcPixels,
+                    static_cast<int>(subResourceLayout.rowPitch), static_cast<int>(width),
+                    static_cast<int>(height), swizzle)) {
+            utils::slog.e << "Unsupported PixelDataFormat or PixelDataType" << utils::io::endl;
+        }
+
+        vkUnmapMemory(device, stagingMemory);
+        vkDestroyImage(device, stagingImage, VKALLOC);
+        vkFreeMemory(device, stagingMemory, VKALLOC);
+        vkDestroyFence(device, fence, VKALLOC);
+        vkFreeCommandBuffers(device, cmdpool, 1, &cmdbuffer);
+    };
+    mTaskHandler->post(std::move(waitFenceFunc), std::move(cleanPbdFunc));
+}
+
+void VulkanReadPixels::runUntilComplete() noexcept {
+    if (!mTaskHandler) {
+        return;
+    }
+    mTaskHandler->drain();
+}
+
+}// namespace filament::backend
--- a/filament/backend/src/vulkan/VulkanReadPixels.h
+++ b/filament/backend/src/vulkan/VulkanReadPixels.h
@@ -0,0 +1,93 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef TNT_FILAMENT_BACKEND_VULKANREADPIXELS_H
+#define TNT_FILAMENT_BACKEND_VULKANREADPIXELS_H
+
+#include "private/backend/Driver.h"
+
+#include <bluevk/BlueVK.h>
+#include <math/vec4.h>
+
+#include <condition_variable>
+#include <functional>
+#include <mutex>
+#include <queue>
+#include <set>
+#include <thread>
+#include <vector>
+
+namespace filament::backend {
+
+struct VulkanRenderTarget;
+
+class VulkanReadPixels {
+public:
+    // A helper class that runs tasks on a separate thread.
+    class TaskHandler {
+    public:
+        using WorkloadFunc = std::function<void()>;
+        using OnCompleteFunc = std::function<void()>;
+        using Task = std::pair<WorkloadFunc, OnCompleteFunc>;
+
+        TaskHandler();
+
+        // In addition to the workload that the handler will call, client must also provide an
+        // oncomplete function that the handler will call either when the workload completes or when
+        // the handler is shutdown (so that we can clean-up even when the task was not carried out).
+        void post(WorkloadFunc&& workload, OnCompleteFunc&& oncomplete);
+
+        // This will block until all of the tasks are done.
+        void drain();
+
+        // This will quit without running the workloads, but oncomplete callbacks will still be
+        // called.
+        void shutdown();
+
+    private:
+        void loop();
+
+        bool mShouldStop;
+        std::condition_variable mHasTaskCondition;
+        std::mutex mTaskQueueMutex;
+        std::queue<Task> mTaskQueue;
+        std::thread mThread;
+    };
+
+    using OnReadCompleteFunction = std::function<void(PixelBufferDescriptor&&)>;
+    using SelecteMemoryFunction = std::function<uint32_t(uint32_t, VkFlags)>;
+
+    explicit VulkanReadPixels(VkDevice device);
+
+    void terminate() noexcept;
+
+    void run(VulkanRenderTarget const* srcTarget, uint32_t x, uint32_t y, uint32_t width,
+            uint32_t height, uint32_t graphicsQueueFamilyIndex, PixelBufferDescriptor&& pbd,
+            SelecteMemoryFunction const& selectMemoryFunc,
+            OnReadCompleteFunction const& readCompleteFunc);
+
+    // This method will block until all of the in-flight requests are complete.
+    void runUntilComplete() noexcept;
+
+private:
+    VkDevice mDevice = VK_NULL_HANDLE;
+    VkCommandPool mCommandPool = VK_NULL_HANDLE;
+    std::unique_ptr<TaskHandler> mTaskHandler;
+};
+
+}// namespace filament::backend
+
+#endif// TNT_FILAMENT_BACKEND_VULKANREADPIXELS_H
--- a/filament/backend/src/vulkan/VulkanResourceAllocator.h
+++ b/filament/backend/src/vulkan/VulkanResourceAllocator.h
@@ -0,0 +1,136 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef TNT_FILAMENT_BACKEND_VULKANRESOURCEALLOCATOR_H
+#define TNT_FILAMENT_BACKEND_VULKANRESOURCEALLOCATOR_H
+
+#include "VulkanHandles.h"
+
+#include <private/backend/HandleAllocator.h>
+
+#include <utils/FixedCapacityVector.h>
+#include <utils/Log.h>
+
+#include <type_traits>
+#include <unordered_set>
+
+namespace filament::backend {
+
+// RESOURCE_TYPE_COUNT matches the count of enum VulkanResourceType.
+#define RESOURCE_TYPE_COUNT 12
+#define DEBUG_RESOURCE_LEAKS 0
+
+#if DEBUG_RESOURCE_LEAKS
+    #define TRACK_INCREMENT()                                       \
+    if (!IS_MANAGED_TYPE(obj->mType)) {                             \
+        mDebugOnlyResourceCount[static_cast<size_t>(obj->mType)]++; \
+    }
+    #define TRACK_DECREMENT()                                       \
+    if (!IS_MANAGED_TYPE(obj->mType)) {                             \
+        mDebugOnlyResourceCount[static_cast<size_t>(obj->mType)]--; \
+    }
+#else
+    // No-op
+    #define TRACK_INCREMENT()
+    #define TRACK_DECREMENT()
+#endif
+
+class VulkanResourceAllocator {
+
+public:
+    VulkanResourceAllocator(size_t arenaSize)
+        : mHandleAllocatorImpl("Handles", arenaSize)
+#if DEBUG_RESOURCE_LEAKS
+        , mDebugOnlyResourceCount(RESOURCE_TYPE_COUNT) {
+        std::memset(mDebugOnlyResourceCount.data(), 0, sizeof(size_t) * RESOURCE_TYPE_COUNT);
+    }
+#else
+    {}
+#endif
+
+    template<typename D, typename... ARGS>
+    inline Handle<D> initHandle(ARGS&&... args) noexcept {
+        auto handle = mHandleAllocatorImpl.allocateAndConstruct<D>(std::forward<ARGS>(args)...);
+        auto obj = handle_cast<D*>(handle);
+        obj->initResource(handle.getId());
+        TRACK_INCREMENT();
+        return handle;
+    }
+
+    template<typename D>
+    inline Handle<D> allocHandle() noexcept {
+        return mHandleAllocatorImpl.allocate<D>();
+    }
+
+    template<typename D, typename B, typename... ARGS>
+    inline typename std::enable_if<std::is_base_of<B, D>::value, D>::type* construct(
+            Handle<B> const& handle, ARGS&&... args) noexcept {
+        auto obj = mHandleAllocatorImpl.construct<D, B>(handle, std::forward<ARGS>(args)...);
+        obj->initResource(handle.getId());
+        TRACK_INCREMENT();
+        return obj;
+    }
+
+    template<typename Dp, typename B>
+    inline typename std::enable_if_t<
+            std::is_pointer_v<Dp> && std::is_base_of_v<B, typename std::remove_pointer_t<Dp>>, Dp>
+    handle_cast(Handle<B>& handle) noexcept {
+        return mHandleAllocatorImpl.handle_cast<Dp, B>(handle);
+    }
+
+    template<typename Dp, typename B>
+    inline typename std::enable_if_t<
+            std::is_pointer_v<Dp> && std::is_base_of_v<B, typename std::remove_pointer_t<Dp>>, Dp>
+    handle_cast(Handle<B> const& handle) noexcept {
+        return mHandleAllocatorImpl.handle_cast<Dp, B>(handle);
+    }
+
+    template<typename D, typename B>
+    inline void destruct(Handle<B> handle) noexcept {
+        auto obj = handle_cast<D*>(handle);
+        if constexpr (std::is_base_of_v<VulkanIndexBuffer, D>
+                      || std::is_base_of_v<VulkanBufferObject, D>) {
+            obj->terminate();
+        }
+        TRACK_DECREMENT();
+        mHandleAllocatorImpl.deallocate(handle, obj);
+    }
+
+private:
+    HandleAllocatorVK mHandleAllocatorImpl;
+
+#if DEBUG_RESOURCE_LEAKS
+public:
+    void print() {
+        utils::slog.d << "Resource Allocator state (debug only)" << utils::io::endl;
+        for (size_t i = 0; i < RESOURCE_TYPE_COUNT; i++) {
+            utils::slog.d << "[" << i << "]=" << mDebugOnlyResourceCount[i] << utils::io::endl;
+        }
+        utils::slog.d << "+++++++++++++++++++++++++++++++++++++" << utils::io::endl;
+    }
+private:
+    utils::FixedCapacityVector<size_t> mDebugOnlyResourceCount;
+#endif
+
+};
+
+#undef TRACK_INCREMENT
+#undef TRACK_DECREMENT
+#undef DEBUG_RESOURCE_LEAKS
+
+} // namespace filament::backend
+
+#endif // TNT_FILAMENT_BACKEND_VULKANRESOURCEALLOCATOR_H
--- a/filament/backend/src/vulkan/VulkanResources.cpp
+++ b/filament/backend/src/vulkan/VulkanResources.cpp
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+#include "VulkanResources.h"
+#include "VulkanHandles.h"
+#include "VulkanResourceAllocator.h"
+
+namespace filament::backend {
+
+void deallocateResource(VulkanResourceAllocator* allocator, VulkanResourceType type,
+        HandleBase::HandleId id) {
+
+    if (IS_HEAP_ALLOC_TYPE(type)) {
+        return;
+    }
+
+    switch (type) {
+        case VulkanResourceType::BUFFER_OBJECT:
+            allocator->destruct<VulkanBufferObject>(Handle<HwBufferObject>(id));
+            break;
+        case VulkanResourceType::INDEX_BUFFER:
+            allocator->destruct<VulkanIndexBuffer>(Handle<HwIndexBuffer>(id));
+            break;
+        case VulkanResourceType::PROGRAM:
+            allocator->destruct<VulkanProgram>(Handle<HwProgram>(id));
+            break;
+        case VulkanResourceType::RENDER_TARGET:
+            allocator->destruct<VulkanRenderTarget>(Handle<HwRenderTarget>(id));
+            break;
+        case VulkanResourceType::SAMPLER_GROUP:
+            allocator->destruct<VulkanSamplerGroup>(Handle<HwSamplerGroup>(id));
+            break;
+        case VulkanResourceType::SWAP_CHAIN:
+            allocator->destruct<VulkanSwapChain>(Handle<HwSwapChain>(id));
+            break;
+        case VulkanResourceType::TEXTURE:
+            allocator->destruct<VulkanTexture>(Handle<HwTexture>(id));
+            break;
+        case VulkanResourceType::TIMER_QUERY:
+            allocator->destruct<VulkanTimerQuery>(Handle<HwTimerQuery>(id));
+            break;
+        case VulkanResourceType::VERTEX_BUFFER:
+            allocator->destruct<VulkanVertexBuffer>(Handle<HwVertexBuffer>(id));
+            break;
+        case VulkanResourceType::RENDER_PRIMITIVE:
+            allocator->destruct<VulkanRenderPrimitive>(Handle<VulkanRenderPrimitive>(id));
+            break;
+        // If the resource is heap allocated, then the resource manager just skip refcounted
+        // destruction.
+        case VulkanResourceType::FENCE:
+        case VulkanResourceType::HEAP_ALLOCATED:
+            break;
+    }
+}
+
+} // namespace filament::backend
--- a/filament/backend/src/vulkan/VulkanResources.h
+++ b/filament/backend/src/vulkan/VulkanResources.h
@@ -0,0 +1,337 @@
+/*
+ * Copyright (C) 2023 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef TNT_FILAMENT_BACKEND_VULKANRESOURCES_H
+#define TNT_FILAMENT_BACKEND_VULKANRESOURCES_H
+
+#include <backend/DriverEnums.h>// For MAX_VERTEX_BUFFER_COUNT
+#include <backend/Handle.h>
+
+#include <tsl/robin_set.h>
+#include <utils/Mutex.h>
+#include <utils/Panic.h>
+
+#include <mutex>
+#include <unordered_set>
+
+namespace filament::backend {
+
+class VulkanResourceAllocator;
+struct VulkanThreadSafeResource;
+
+// Subclasses of VulkanResource must provide this enum in their construction.
+enum class VulkanResourceType : uint8_t {
+    BUFFER_OBJECT,
+    INDEX_BUFFER,
+    PROGRAM,
+    RENDER_TARGET,
+    SAMPLER_GROUP,
+    SWAP_CHAIN,
+    RENDER_PRIMITIVE,
+    TEXTURE,
+    TIMER_QUERY,
+    VERTEX_BUFFER,
+
+    // Below are resources that are managed manually (i.e. not ref counted).
+    FENCE,
+    HEAP_ALLOCATED,
+};
+
+#define IS_HEAP_ALLOC_TYPE(f)                                                                      \
+    (f == VulkanResourceType::FENCE || f == VulkanResourceType::HEAP_ALLOCATED)
+
+
+// This is a ref-counting base class that tracks how many references of this resource exist. This
+// class is paired with VulkanResourceManagerImpl which is responsible for incrementing or
+// decrementing the count. Once mRefCount == 0, VulkanResourceManagerImpl will also call the
+// appropriate destructor. VulkanCommandBuffer, VulkanDriver, and composite structure like
+// VulkanRenderPrimitive are owners of VulkanResourceManagerImpl instances.
+struct VulkanResourceBase {
+protected:
+    explicit VulkanResourceBase(VulkanResourceType type)
+        : mRefCount(IS_HEAP_ALLOC_TYPE(type) ? 1 : 0),
+          mType(type),
+          mHandleId(0) {}
+
+private:
+    inline VulkanResourceType getType() {
+        return mType;
+    }
+
+    inline HandleBase::HandleId getId() {
+        return mHandleId;
+    }
+
+    inline void initResource(HandleBase::HandleId id) noexcept {
+        mHandleId = id;
+    }
+
+    inline void ref() noexcept {
+        if (IS_HEAP_ALLOC_TYPE(mType)) {
+            return;
+        }
+        ++mRefCount;
+    }
+
+    inline void deref() noexcept {
+        if (IS_HEAP_ALLOC_TYPE(mType)) {
+            return;
+        }
+        --mRefCount;
+    }
+
+    inline size_t refcount() noexcept {
+        return mRefCount;
+    }
+
+    size_t mRefCount = 0;
+    VulkanResourceType mType = VulkanResourceType::BUFFER_OBJECT;
+    HandleBase::HandleId mHandleId;
+
+    friend struct VulkanThreadSafeResource;
+    friend class VulkanResourceAllocator;
+
+    template<typename RT, typename ST>
+    friend class VulkanResourceManagerImpl;
+};
+
+struct VulkanThreadSafeResource {
+protected:
+    explicit VulkanThreadSafeResource(VulkanResourceType type)
+        : mImpl(type) {}
+
+private:
+    inline VulkanResourceType getType() {
+        return mImpl.getType();
+    }
+
+    inline HandleBase::HandleId getId() {
+        return mImpl.getId();
+    }
+
+    inline void initResource(HandleBase::HandleId id) noexcept {
+        std::unique_lock<utils::Mutex> lock(mMutex);
+        mImpl.initResource(id);
+    }
+
+    inline void ref() noexcept {
+        std::unique_lock<utils::Mutex> lock(mMutex);
+        mImpl.ref();
+    }
+
+    inline void deref() noexcept {
+        std::unique_lock<utils::Mutex> lock(mMutex);
+        mImpl.deref();
+    }
+
+    inline size_t refcount() noexcept {
+        std::unique_lock<utils::Mutex> lock(mMutex);
+        return mImpl.refcount();
+    }
+
+    utils::Mutex mMutex;
+    VulkanResourceBase mImpl;
+
+    friend class VulkanResourceAllocator;
+    template<typename RT, typename ST>
+    friend class VulkanResourceManagerImpl;
+};
+
+using VulkanResource = VulkanResourceBase;
+
+namespace {
+
+// When the size of the resource set is known to be small, (for example for VulkanRenderPrimitive),
+// we just use a std::array to back the set.
+class FixedCapacityResourceSet {
+private:
+    constexpr static size_t const SIZE = MAX_VERTEX_BUFFER_COUNT;
+    using FixedSizeArray = std::array<VulkanResource*, SIZE>;
+
+public:
+    using const_iterator = FixedSizeArray::const_iterator;
+
+    inline const_iterator begin() {
+        if (mInd == 0) {
+            return mArray.cend();
+        }
+        return mArray.cbegin();
+    }
+
+    inline const_iterator end() {
+        if (mInd == 0) {
+            return mArray.cend();
+        }
+        if (mInd < SIZE) {
+            return mArray.begin() + mInd;
+        }
+        return mArray.cend();
+    }
+
+    inline const_iterator find(VulkanResource* resource) {
+        return std::find(mArray.begin(), mArray.end(), resource);
+    }
+
+    inline void insert(VulkanResource* resource) {
+        assert_invariant(mInd < SIZE);
+        mArray[mInd++] = resource;
+    }
+
+    inline void erase(VulkanResource* resource) {
+        assert_invariant(false && "FixedCapacityResourceSet::erase should not be called");
+    }
+
+    inline void clear() {
+        if (mInd == 0) {
+            return;
+        }
+        mInd = 0;
+    }
+
+private:
+    FixedSizeArray mArray{nullptr};
+    size_t mInd = 0;
+};
+
+// robin_set/map are useful for sets that are acquire only and the set will be iterated when the set
+// is cleared.
+using FastIterationResourceSet = tsl::robin_set<VulkanResource*>;
+
+// unoredered_set is used in the general case where insert/erase can occur at will. This is useful
+// for the basic object ownership count - i.e. VulkanDriver.
+using ResourceSet = std::unordered_set<VulkanResource*>;
+
+using ThreadSafeResourceSet = std::unordered_set<VulkanThreadSafeResource*>;
+
+} // anonymous namespace
+
+class VulkanResourceAllocator;
+
+#define LOCK_IF_NEEDED()                                                                           \
+    if constexpr (std::is_base_of_v<VulkanThreadSafeResource, ResourceType>) {                     \
+        mMutex->lock();                                                                            \
+    }
+
+#define UNLOCK_IF_NEEDED()                                                                         \
+    if constexpr (std::is_base_of_v<VulkanThreadSafeResource, ResourceType>) {                     \
+        mMutex->unlock();                                                                          \
+    }
+
+void deallocateResource(VulkanResourceAllocator* allocator, VulkanResourceType type,
+        HandleBase::HandleId id);
+
+template<typename ResourceType, typename SetType>
+class VulkanResourceManagerImpl {
+public:
+    explicit VulkanResourceManagerImpl(VulkanResourceAllocator* allocator)
+        : mAllocator(allocator) {
+        if constexpr (std::is_base_of_v<VulkanThreadSafeResource, ResourceType>) {
+            mMutex = std::make_unique<utils::Mutex>();
+        }
+    }
+
+    VulkanResourceManagerImpl(const VulkanResourceManagerImpl& other) = delete;
+    void operator=(const VulkanResourceManagerImpl& other) = delete;
+    VulkanResourceManagerImpl(const VulkanResourceManagerImpl&& other) = delete;
+    void operator=(const VulkanResourceManagerImpl&& other) = delete;
+
+    ~VulkanResourceManagerImpl() {
+        clear();
+    }
+
+    inline void acquire(ResourceType* resource) {
+        if (IS_HEAP_ALLOC_TYPE(resource->getType())) {
+            return;
+        }
+
+        LOCK_IF_NEEDED();
+        if (mResources.find(resource) != mResources.end()) {
+            UNLOCK_IF_NEEDED();
+            return;
+        }
+        mResources.insert(resource);
+        UNLOCK_IF_NEEDED();
+        resource->ref();
+    }
+
+    // Transfers ownership from one resource set to another
+    inline void acquire(VulkanResourceManagerImpl<ResourceType, SetType>* srcResources) {
+        LOCK_IF_NEEDED();
+        for (auto iter = srcResources->mResources.begin(); iter != srcResources->mResources.end();
+                iter++) {
+            acquire(*iter);
+        }
+        UNLOCK_IF_NEEDED();
+        srcResources->clear();
+    }
+
+    inline void release(ResourceType* resource) {
+        if (IS_HEAP_ALLOC_TYPE(resource->getType())) {
+            return;
+        }
+
+        LOCK_IF_NEEDED();
+        auto resItr = mResources.find(resource);
+        if (resItr == mResources.end()) {
+            UNLOCK_IF_NEEDED();
+            return;
+        }
+        mResources.erase(resItr);
+        UNLOCK_IF_NEEDED();
+        derefImpl(resource);
+    }
+
+    inline void clear() {
+        LOCK_IF_NEEDED();
+        for (auto iter = mResources.begin(); iter != mResources.end(); iter++) {
+            derefImpl(*iter);
+        }
+        mResources.clear();
+        UNLOCK_IF_NEEDED();
+    }
+
+    inline size_t size() {
+        return mResources.size();
+    }
+
+private:
+    inline void derefImpl(ResourceType* resource) {
+        resource->deref();
+        if (resource->refcount() != 0) {
+            return;
+        }
+        deallocateResource(mAllocator, resource->getType(), resource->getId());
+    }
+
+    VulkanResourceAllocator* mAllocator;
+    SetType mResources;
+    std::unique_ptr<utils::Mutex> mMutex;
+};
+
+using VulkanAcquireOnlyResourceManager
+        = VulkanResourceManagerImpl<VulkanResource, FastIterationResourceSet>;
+using VulkanResourceManager = VulkanResourceManagerImpl<VulkanResource, ResourceSet>;
+using FixedSizeVulkanResourceManager
+        = VulkanResourceManagerImpl<VulkanResource, FixedCapacityResourceSet>;
+using VulkanThreadSafeResourceManager
+        = VulkanResourceManagerImpl<VulkanThreadSafeResource, ThreadSafeResourceSet>;
+
+#undef LOCK_IF_NEEDED
+#undef UNLOCK_IF_NEEDED
+
+} // namespace filament::backend
+
+#endif // TNT_FILAMENT_BACKEND_VULKANRESOURCES_H
--- a/filament/backend/src/vulkan/VulkanSamplerCache.cpp
+++ b/filament/backend/src/vulkan/VulkanSamplerCache.cpp
@@ -87,8 +87,7 @@ constexpr inline float getMaxLod(SamplerMinFilter filter) noexcept {
        case SamplerMinFilter::LINEAR_MIPMAP_NEAREST:
        case SamplerMinFilter::NEAREST_MIPMAP_LINEAR:
        case SamplerMinFilter::LINEAR_MIPMAP_LINEAR:
-            // Assuming our maximum texture size is 4k, we'll never need more than 12 miplevels.
-            return 12.0f;
+            return VK_LOD_CLAMP_NONE;
    }
 }

@@ -99,7 +98,7 @@ constexpr inline VkBool32 getCompareEnable(SamplerCompareMode mode) noexcept {
 void VulkanSamplerCache::initialize(VkDevice device) { mDevice = device; }

 VkSampler VulkanSamplerCache::getSampler(SamplerParams params) noexcept {
-    auto iter = mCache.find(params.u);
+    auto iter = mCache.find(params);
    if (UTILS_LIKELY(iter != mCache.end())) {
        return iter->second;
    }
@@ -123,7 +122,7 @@ VkSampler VulkanSamplerCache::getSampler(SamplerParams params) noexcept {
    VkSampler sampler;
    VkResult error = vkCreateSampler(mDevice, &samplerInfo, VKALLOC, &sampler);
    ASSERT_POSTCONDITION(!error, "Unable to create sampler.");
-    mCache.insert({params.u, sampler});
+    mCache.insert({params, sampler});
    return sampler;
 }

--- a/filament/backend/src/vulkan/VulkanSamplerCache.h
+++ b/filament/backend/src/vulkan/VulkanSamplerCache.h
@@ -32,7 +32,7 @@ public:
    void terminate() noexcept;
 private:
    VkDevice mDevice;
-    tsl::robin_map<uint32_t, VkSampler> mCache;
+    tsl::robin_map<SamplerParams, VkSampler, SamplerParams::Hasher, SamplerParams::EqualTo> mCache;
 };

 } // namespace filament::backend
--- a/filament/backend/src/vulkan/VulkanSwapChain.cpp
+++ b/filament/backend/src/vulkan/VulkanSwapChain.cpp
@@ -28,7 +28,8 @@ namespace filament::backend {
 VulkanSwapChain::VulkanSwapChain(VulkanPlatform* platform, VulkanContext const& context,
        VmaAllocator allocator, VulkanCommands* commands, VulkanStagePool& stagePool,
        void* nativeWindow, uint64_t flags, VkExtent2D extent)
-    : mPlatform(platform),
+    : VulkanResource(VulkanResourceType::SWAP_CHAIN),
+      mPlatform(platform),
      mCommands(commands),
      mAllocator(allocator),
      mStagePool(stagePool),
@@ -67,11 +68,11 @@ void VulkanSwapChain::update() {
    for (auto const color: bundle.colors) {
        mColors.push_back(std::make_unique<VulkanTexture>(device, mAllocator, mCommands, color,
                bundle.colorFormat, 1, bundle.extent.width, bundle.extent.height,
-                TextureUsage::COLOR_ATTACHMENT, mStagePool));
+                TextureUsage::COLOR_ATTACHMENT, mStagePool, true /* heap allocated */));
    }
    mDepth = std::make_unique<VulkanTexture>(device, mAllocator, mCommands, bundle.depth,
            bundle.depthFormat, 1, bundle.extent.width, bundle.extent.height,
-            TextureUsage::DEPTH_ATTACHMENT, mStagePool);
+            TextureUsage::DEPTH_ATTACHMENT, mStagePool, true /* heap allocated */);

    mExtent = bundle.extent;
 }
--- a/filament/backend/src/vulkan/VulkanSwapChain.h
+++ b/filament/backend/src/vulkan/VulkanSwapChain.h
@@ -17,8 +17,10 @@
 #ifndef TNT_FILAMENT_BACKEND_VULKANSWAPCHAIN_H
 #define TNT_FILAMENT_BACKEND_VULKANSWAPCHAIN_H

+#include "DriverBase.h"
+
 #include "VulkanContext.h"
-#include "VulkanDriver.h"
+#include "VulkanResources.h"

 #include <backend/platforms/VulkanPlatform.h>

@@ -35,7 +37,7 @@ struct VulkanHeadlessSwapChain;
 struct VulkanSurfaceSwapChain;

 // A wrapper around the platform implementation of swapchain.
-struct VulkanSwapChain : public HwSwapChain {
+struct VulkanSwapChain : public HwSwapChain, VulkanResource {
    VulkanSwapChain(VulkanPlatform* platform, VulkanContext const& context, VmaAllocator allocator,
            VulkanCommands* commands, VulkanStagePool& stagePool,
            void* nativeWindow, uint64_t flags, VkExtent2D extent = {0, 0});
--- a/filament/backend/src/vulkan/VulkanTexture.cpp
+++ b/filament/backend/src/vulkan/VulkanTexture.cpp
@@ -29,12 +29,16 @@ using namespace bluevk;
 namespace filament::backend {

 using ImgUtil = VulkanImageUtility;
-VulkanTexture::VulkanTexture(VkDevice device, VmaAllocator allocator,
-        VulkanCommands* commands, VkImage image, VkFormat format, uint8_t samples,
-        uint32_t width, uint32_t height, TextureUsage tusage, VulkanStagePool& stagePool)
+VulkanTexture::VulkanTexture(VkDevice device, VmaAllocator allocator, VulkanCommands* commands,
+        VkImage image, VkFormat format, uint8_t samples, uint32_t width, uint32_t height,
+        TextureUsage tusage, VulkanStagePool& stagePool, bool heapAllocated)
    : HwTexture(SamplerType::SAMPLER_2D, 1, samples, width, height, 1, TextureFormat::UNUSED,
-              tusage),
-      mVkFormat(format), mViewType(ImgUtil::getViewType(target)), mSwizzle({}),
+            tusage),
+      VulkanResource(
+              heapAllocated ? VulkanResourceType::HEAP_ALLOCATED : VulkanResourceType::TEXTURE),
+      mVkFormat(format),
+      mViewType(ImgUtil::getViewType(target)),
+      mSwizzle({}),
      mTextureImage(image),
      mPrimaryViewRange{
              .aspectMask = getImageAspect(),
@@ -43,19 +47,28 @@ VulkanTexture::VulkanTexture(VkDevice device, VmaAllocator allocator,
              .baseArrayLayer = 0,
              .layerCount = 1,
      },
-      mStagePool(stagePool), mDevice(device), mAllocator(allocator), mCommands(commands) {}
+      mStagePool(stagePool),
+      mDevice(device),
+      mAllocator(allocator),
+      mCommands(commands) {}

 VulkanTexture::VulkanTexture(VkDevice device, VkPhysicalDevice physicalDevice,
-        VulkanContext const& context, VmaAllocator allocator,
-        VulkanCommands* commands, SamplerType target, uint8_t levels,
-        TextureFormat tformat, uint8_t samples, uint32_t w, uint32_t h, uint32_t depth,
-        TextureUsage tusage, VulkanStagePool& stagePool, VkComponentMapping swizzle)
+        VulkanContext const& context, VmaAllocator allocator, VulkanCommands* commands,
+        SamplerType target, uint8_t levels, TextureFormat tformat, uint8_t samples, uint32_t w,
+        uint32_t h, uint32_t depth, TextureUsage tusage, VulkanStagePool& stagePool,
+        bool heapAllocated, VkComponentMapping swizzle)
    : HwTexture(target, levels, samples, w, h, depth, tformat, tusage),
+      VulkanResource(
+              heapAllocated ? VulkanResourceType::HEAP_ALLOCATED : VulkanResourceType::TEXTURE),
      // Vulkan does not support 24-bit depth, use the official fallback format.
      mVkFormat(tformat == TextureFormat::DEPTH24 ? context.getDepthFormat()
                                                  : backend::getVkFormat(tformat)),
-      mViewType(ImgUtil::getViewType(target)), mSwizzle(swizzle), mStagePool(stagePool),
-      mDevice(device), mAllocator(allocator), mCommands(commands) {
+      mViewType(ImgUtil::getViewType(target)),
+      mSwizzle(swizzle),
+      mStagePool(stagePool),
+      mDevice(device),
+      mAllocator(allocator),
+      mCommands(commands) {

    // Create an appropriately-sized device-only VkImage, but do not fill it yet.
    VkImageCreateInfo imageInfo{.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
@@ -154,7 +167,7 @@ VulkanTexture::VulkanTexture(VkDevice device, VkPhysicalDevice physicalDevice,
            << "handle = " << utils::io::hex << mTextureImage << utils::io::dec << ", "
            << "extent = " << w << "x" << h << "x"<< depth << ", "
            << "mipLevels = " << int(levels) << ", "
-            << "TextureUsage = " << static_cast<int>(usage) << ", "            
+            << "TextureUsage = " << static_cast<int>(usage) << ", "
            << "usage = " << imageInfo.usage << ", "
            << "samples = " << imageInfo.samples << ", "
            << "type = " << imageInfo.imageType << ", "
@@ -167,11 +180,17 @@ VulkanTexture::VulkanTexture(VkDevice device, VkPhysicalDevice physicalDevice,
    // Allocate memory for the VkImage and bind it.
    VkMemoryRequirements memReqs = {};
    vkGetImageMemoryRequirements(mDevice, mTextureImage, &memReqs);
+
+    uint32_t memoryTypeIndex
+            = context.selectMemoryType(memReqs.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
+
+    ASSERT_POSTCONDITION(memoryTypeIndex < VK_MAX_MEMORY_TYPES,
+            "VulkanTexture: unable to find a memory type that meets requirements.");
+
    VkMemoryAllocateInfo allocInfo = {
        .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
        .allocationSize = memReqs.size,
-        .memoryTypeIndex = context.selectMemoryType(memReqs.memoryTypeBits,
-                VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT)
+        .memoryTypeIndex = memoryTypeIndex,
    };
    error = vkAllocateMemory(mDevice, &allocInfo, nullptr, &mTextureImageMemory);
    ASSERT_POSTCONDITION(!error, "Unable to allocate image memory.");
@@ -206,7 +225,11 @@ VulkanTexture::VulkanTexture(VkDevice device, VkPhysicalDevice physicalDevice,
           | VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT)) {
        const uint32_t layers = mPrimaryViewRange.layerCount;
        VkImageSubresourceRange range = { getImageAspect(), 0, levels, 0, layers };
-        VkCommandBuffer cmdbuf = mCommands->get().cmdbuffer;
+
+        VulkanCommandBuffer& commands = mCommands->get();
+        VkCommandBuffer const cmdbuf = commands.cmdbuffer;
+        commands.acquire(this);
+
        transitionLayout(cmdbuf, range, ImgUtil::getDefaultLayout(imageInfo.usage));
    }
 }
@@ -256,7 +279,9 @@ void VulkanTexture::updateImage(const PixelBufferDescriptor& data, uint32_t widt
    vmaUnmapMemory(mAllocator, stage->memory);
    vmaFlushAllocation(mAllocator, stage->memory, 0, hostData->size);

-    const VkCommandBuffer cmdbuf = mCommands->get(true).cmdbuffer;
+    VulkanCommandBuffer& commands = mCommands->get();
+    VkCommandBuffer const cmdbuf = commands.cmdbuffer;
+    commands.acquire(this);

    VkBufferImageCopy copyRegion = {
        .bufferOffset = {},
@@ -317,7 +342,9 @@ void VulkanTexture::updateImageWithBlit(const PixelBufferDescriptor& hostData, u
    vmaUnmapMemory(mAllocator, stage->memory);
    vmaFlushAllocation(mAllocator, stage->memory, 0, hostData.size);

-    const VkCommandBuffer cmdbuf = mCommands->get().cmdbuffer;
+    VulkanCommandBuffer& commands = mCommands->get();
+    VkCommandBuffer const cmdbuf = commands.cmdbuffer;
+    commands.acquire(this);

    // TODO: support blit-based format conversion for 3D images and cubemaps.
    const int layer = 0;
--- a/filament/backend/src/vulkan/VulkanTexture.h
+++ b/filament/backend/src/vulkan/VulkanTexture.h
@@ -17,26 +17,29 @@
 #ifndef TNT_FILAMENT_BACKEND_VULKANTEXTURE_H
 #define TNT_FILAMENT_BACKEND_VULKANTEXTURE_H

-#include "VulkanDriver.h"
+#include "DriverBase.h"
+
 #include "VulkanBuffer.h"
+#include "VulkanResources.h"
 #include "VulkanImageUtility.h"

 #include <utils/RangeMap.h>

 namespace filament::backend {

-struct VulkanTexture : public HwTexture {
+struct VulkanTexture : public HwTexture, VulkanResource {
    // Standard constructor for user-facing textures.
    VulkanTexture(VkDevice device, VkPhysicalDevice physicalDevice, VulkanContext const& context,
            VmaAllocator allocator, VulkanCommands* commands, SamplerType target, uint8_t levels,
            TextureFormat tformat, uint8_t samples, uint32_t w, uint32_t h, uint32_t depth,
-            TextureUsage tusage, VulkanStagePool& stagePool, VkComponentMapping swizzle = {});
+            TextureUsage tusage, VulkanStagePool& stagePool, bool heapAllocated = false,
+            VkComponentMapping swizzle = {});

    // Specialized constructor for internally created textures (e.g. from a swap chain)
    // The texture will never destroy the given VkImage, but it does manages its subresources.
    VulkanTexture(VkDevice device, VmaAllocator allocator, VulkanCommands* commands, VkImage image,
            VkFormat format, uint8_t samples, uint32_t width, uint32_t height, TextureUsage tusage,
-            VulkanStagePool& stagePool);
+            VulkanStagePool& stagePool, bool heapAllocated = false);

    ~VulkanTexture();

--- a/filament/backend/src/vulkan/platform/VulkanPlatformSwapChainImpl.cpp
+++ b/filament/backend/src/vulkan/platform/VulkanPlatformSwapChainImpl.cpp
@@ -57,11 +57,17 @@ std::tuple<VkImage, VkDeviceMemory> createImageAndMemory(VulkanContext const& co
    VkDeviceMemory imageMemory;
    VkMemoryRequirements memReqs;
    vkGetImageMemoryRequirements(device, image, &memReqs);
+
+    uint32_t memoryTypeIndex
+            = context.selectMemoryType(memReqs.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
+
+    ASSERT_POSTCONDITION(memoryTypeIndex < VK_MAX_MEMORY_TYPES,
+            "VulkanPlatformSwapChainImpl: unable to find a memory type that meets requirements.");
+
    VkMemoryAllocateInfo allocInfo = {
            .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
            .allocationSize = memReqs.size,
-            .memoryTypeIndex
-            = context.selectMemoryType(memReqs.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
+            .memoryTypeIndex = memoryTypeIndex,
    };
    result = vkAllocateMemory(device, &allocInfo, nullptr, &imageMemory);
    ASSERT_POSTCONDITION(result == VK_SUCCESS, "Unable to allocate image memory.");
--- a/filament/backend/test/BackendTest.cpp
+++ b/filament/backend/test/BackendTest.cpp
@@ -85,13 +85,10 @@ void BackendTest::executeCommands() {
    }
 }

-void BackendTest::flushAndWait(uint64_t timeout) {
+void BackendTest::flushAndWait() {
    auto& api = getDriverApi();
-    auto fence = api.createFence();
    api.finish();
    executeCommands();
-    api.wait(fence, timeout);
-    api.destroyFence(fence);
 }

 Handle<HwSwapChain> BackendTest::createSwapChain() {
--- a/filament/backend/test/BackendTest.h
+++ b/filament/backend/test/BackendTest.h
@@ -43,7 +43,7 @@ protected:

    void initializeDriver();
    void executeCommands();
-    void flushAndWait(uint64_t timeout = 1000);
+    void flushAndWait();

    filament::backend::Handle<filament::backend::HwSwapChain> createSwapChain();

--- a/filament/backend/test/ComputeTest.cpp
+++ b/filament/backend/test/ComputeTest.cpp
@@ -87,11 +87,8 @@ void ComputeTest::executeCommands() {
    }
 }

-void ComputeTest::flushAndWait(uint64_t timeout) {
+void ComputeTest::flushAndWait() {
    auto& api = getDriverApi();
-    auto fence = api.createFence();
    api.finish();
    executeCommands();
-    api.wait(fence, timeout);
-    api.destroyFence(fence);
 }
--- a/filament/backend/test/ComputeTest.h
+++ b/filament/backend/test/ComputeTest.h
@@ -34,7 +34,7 @@ protected:
    void TearDown() override;

    void executeCommands();
-    void flushAndWait(uint64_t timeout = 1000);
+    void flushAndWait();
    filament::backend::DriverApi& getDriverApi() { return *commandStream; }
    filament::backend::Driver& getDriver() { return *driver; }

--- a/filament/include/filament/Camera.h
+++ b/filament/include/filament/Camera.h
@@ -34,10 +34,13 @@ class Entity;
 namespace filament {

 /**
- * Camera represents the eye through which the scene is viewed.
+ * Camera represents the eye(s) through which the scene is viewed.
 *
 * A Camera has a position and orientation and controls the projection and exposure parameters.
 *
+ * For stereoscopic rendering, a Camera maintains two separate "eyes": Eye 0 and Eye 1. These are
+ * arbitrary and don't necessarily need to correspond to "left" and "right".
+ *
 * Creation and destruction
 * ========================
 *
@@ -140,6 +143,18 @@ namespace filament {
 * intensity and the Camera exposure interact to produce the final scene's brightness.
 *
 *
+ * Stereoscopic rendering
+ * ======================
+ *
+ * The Camera's transform (as set by setModelMatrix or via TransformManager) defines a "head" space,
+ * which typically corresponds to the location of the viewer's head. Each eye's transform is set
+ * relative to this head space by setEyeModelMatrix.
+ *
+ * Each eye also maintains its own projection matrix. These can be set with setCustomEyeProjection.
+ * Care must be taken to correctly set the projectionForCulling matrix, as well as its corresponding
+ * near and far values. The projectionForCulling matrix must define a frustum (in head space) that
+ * bounds the frustums of both eyes. Alternatively, culling may be disabled with
+ * View::setFrustumCullingEnabled.
 *
 * \see Frustum, View
 */
@@ -234,6 +249,24 @@ public:
     */
    void setCustomProjection(math::mat4 const& projection, double near, double far) noexcept;

+    /** Sets a custom projection matrix for each eye.
+     *
+     * The projectionForCulling, near, and far parameters establish a "culling frustum" which must
+     * encompass anything either eye can see.
+     *
+     * @param projection an array of projection matrices, only the first
+     *                   CONFIG_STEREOSCOPIC_EYES (2) are read
+     * @param count size of the projection matrix array to set, must be
+     *              >= CONFIG_STEREOSCOPIC_EYES (2)
+     * @param projectionForCulling custom projection matrix for culling, must encompass both eyes
+     * @param near distance in world units from the camera to the culling near plane. \p near > 0.
+     * @param far distance in world units from the camera to the culling far plane. \p far > \p
+     * near.
+     * @see setCustomProjection
+     */
+    void setCustomEyeProjection(math::mat4 const* projection, size_t count,
+            math::mat4 const& projectionForCulling, double near, double far);
+
    /** Sets the projection matrix.
     *
     * The projection matrices must be of one of the following form:
@@ -309,11 +342,14 @@ public:
     * The projection matrix used for rendering always has its far plane set to infinity. This
     * is why it may differ from the matrix set through setProjection() or setLensProjection().
     *
+     * @param eyeId the index of the eye to return the projection matrix for, must be <
+     *              CONFIG_STEREOSCOPIC_EYES (2)
     * @return The projection matrix used for rendering
     *
-     * @see setProjection, setLensProjection, setCustomProjection, getCullingProjectionMatrix
+     * @see setProjection, setLensProjection, setCustomProjection, getCullingProjectionMatrix,
+     * setCustomEyeProjection
     */
-    math::mat4 getProjectionMatrix() const noexcept;
+    math::mat4 getProjectionMatrix(uint8_t eyeId = 0) const;


    /** Returns the projection matrix used for culling (far plane is finite).
@@ -350,6 +386,26 @@ public:
    void setModelMatrix(const math::mat4& model) noexcept;
    void setModelMatrix(const math::mat4f& model) noexcept; //!< @overload

+    /** Set the position of an eye relative to this Camera (head).
+     *
+     * By default, both eyes' model matrices are identity matrices.
+     *
+     * For example, to position Eye 0 3cm leftwards and Eye 1 3cm rightwards:
+     * ~~~~~~~~~~~{.cpp}
+     * const mat4 leftEye  = mat4::translation(double3{-0.03, 0.0, 0.0});
+     * const mat4 rightEye = mat4::translation(double3{ 0.03, 0.0, 0.0});
+     * camera.setEyeModelMatrix(0, leftEye);
+     * camera.setEyeModelMatrix(1, rightEye);
+     * ~~~~~~~~~~~
+     *
+     * This method is not intended to be called every frame. Instead, to update the position of the
+     * head, use Camera::setModelMatrix.
+     *
+     * @param eyeId the index of the eye to set, must be < CONFIG_STEREOSCOPIC_EYES (2)
+     * @param model the model matrix for an individual eye
+     */
+    void setEyeModelMatrix(uint8_t eyeId, math::mat4 const& model);
+
    /** Sets the camera's model matrix
     *
     * @param eye       The position of the camera in world space.
@@ -448,7 +504,9 @@ public:
    //! returns this camera's sensitivity in ISO
    float getSensitivity() const noexcept;

-    //! returns the focal length in meters [m] for a 35mm camera
+    /** Returns the focal length in meters [m] for a 35mm camera.
+     * Eye 0's projection matrix is used to compute the focal length.
+     */
    double getFocalLength() const noexcept;

    /**
--- a/filament/include/filament/Engine.h
+++ b/filament/include/filament/Engine.h
@@ -513,6 +513,14 @@ public:
     */
    size_t getMaxAutomaticInstances() const noexcept;

+    /**
+     * Queries the device and platform for instanced stereo rendering support.
+     *
+     * @return true if stereo rendering is supported, false otherwise
+     * @see View::setStereoscopicOptions
+     */
+    bool isStereoSupported() const noexcept;
+
    /**
     * @return EntityManager used by filament
     */
@@ -676,6 +684,25 @@ public:
    bool destroy(const InstanceBuffer* p);      //!< Destroys an InstanceBuffer object.
    void destroy(utils::Entity e);              //!< Destroys all filament-known components from this entity

+    bool isValid(const BufferObject* p);        //!< Tells whether a BufferObject object is valid
+    bool isValid(const VertexBuffer* p);        //!< Tells whether an VertexBuffer object is valid
+    bool isValid(const Fence* p);               //!< Tells whether a Fence object is valid
+    bool isValid(const IndexBuffer* p);         //!< Tells whether an IndexBuffer object is valid
+    bool isValid(const SkinningBuffer* p);      //!< Tells whether a SkinningBuffer object is valid
+    bool isValid(const MorphTargetBuffer* p);   //!< Tells whether a MorphTargetBuffer object is valid
+    bool isValid(const IndirectLight* p);       //!< Tells whether an IndirectLight object is valid
+    bool isValid(const Material* p);            //!< Tells whether an IndirectLight object is valid
+    bool isValid(const Renderer* p);            //!< Tells whether a Renderer object is valid
+    bool isValid(const Scene* p);               //!< Tells whether a Scene object is valid
+    bool isValid(const Skybox* p);              //!< Tells whether a SkyBox object is valid
+    bool isValid(const ColorGrading* p);        //!< Tells whether a ColorGrading object is valid
+    bool isValid(const SwapChain* p);           //!< Tells whether a SwapChain object is valid
+    bool isValid(const Stream* p);              //!< Tells whether a Stream object is valid
+    bool isValid(const Texture* p);             //!< Tells whether a Texture object is valid
+    bool isValid(const RenderTarget* p);        //!< Tells whether a RenderTarget object is valid
+    bool isValid(const View* p);                //!< Tells whether a View object is valid
+    bool isValid(const InstanceBuffer* p);      //!< Tells whether an InstanceBuffer object is valid
+
    /**
     * Kicks the hardware thread (e.g. the OpenGL, Vulkan or Metal thread) and blocks until
     * all commands to this point are executed. Note that does guarantee that the
--- a/filament/include/filament/Fence.h
+++ b/filament/include/filament/Fence.h
@@ -28,11 +28,7 @@
 namespace filament {

 /**
- * Fence is used to synchronize rendering operations together, with the CPU or with compute.
- *
- * \note
- * Currently Fence only provide client-side synchronization.
- *
+ * Fence is used to synchronize the application main thread with filament's rendering thread.
 */
 class UTILS_PUBLIC Fence : public FilamentAPI {
 public:
--- a/filament/include/filament/Material.h
+++ b/filament/include/filament/Material.h
@@ -166,6 +166,25 @@ public:
     * many previous frames are enqueued in the backend. This also varies by backend. Therefore,
     * it is recommended to only call this method once per material shortly after creation.
     *
+     * If the same variant is scheduled for compilation multiple times, the first scheduling
+     * takes precedence; later scheduling are ignored.
+     *
+     * caveat: A consequence is that if a variant is scheduled on the low priority queue and later
+     * scheduled again on the high priority queue, the later scheduling is ignored.
+     * Therefore, the second callback could be called before the variant is compiled.
+     * However, the first callback, if specified, will trigger as expected.
+     *
+     * The callback is guaranteed to be called. If the engine is destroyed while some material
+     * variants are still compiling or in the queue, these will be discarded and the corresponding
+     * callback will be called. In that case however the Material pointer passed to the callback
+     * is guaranteed to be invalid (either because it's been destroyed by the user already, or,
+     * because it's been cleaned-up by the Engine).
+     *
+     * UserVariantFilterMask::ALL should be used with caution. Only variants that an application
+     * needs should be included in the variants argument. For example, the STE variant is only used
+     * for stereoscopic rendering. If an application is not planning to render in stereo, this bit
+     * should be turned off to avoid unnecessary material compilations.
+     *
     * @param priority      Which priority queue to use, LOW or HIGH.
     * @param variants      Variants to include to the compile command.
     * @param handler       Handler to dispatch the callback or nullptr for the default handler
--- a/filament/include/filament/MaterialInstance.h
+++ b/filament/include/filament/MaterialInstance.h
@@ -52,6 +52,7 @@ class UTILS_PUBLIC MaterialInstance : public FilamentAPI {
 public:
    using CullingMode = filament::backend::CullingMode;
    using TransparencyMode = filament::TransparencyMode;
+    using DepthFunc = filament::backend::SamplerCompareFunc;
    using StencilCompareFunc = filament::backend::SamplerCompareFunc;
    using StencilOperation = filament::backend::StencilOperation;
    using StencilFace = filament::backend::StencilFace;
@@ -367,6 +368,16 @@ public:
     */
    void setDepthCulling(bool enable) noexcept;

+    /**
+     * Overrides the default depth function state that was set on the material.
+     */
+    void setDepthFunc(DepthFunc depthFunc) noexcept;
+
+    /**
+     * Returns the depth function state.
+     */
+    DepthFunc getDepthFunc() const noexcept;
+
    /**
     * Returns whether depth culling is enabled.
     */
--- a/filament/include/filament/Options.h
+++ b/filament/include/filament/Options.h
@@ -133,9 +133,9 @@ struct BloomOptions {
    Texture* dirt = nullptr;                //!< user provided dirt texture %codegen_skip_json% %codegen_skip_javascript%
    float dirtStrength = 0.2f;              //!< strength of the dirt texture %codegen_skip_json% %codegen_skip_javascript%
    float strength = 0.10f;                 //!< bloom's strength between 0.0 and 1.0
-    uint32_t resolution = 360;              //!< resolution of vertical axis (2^levels to 2048)
+    uint32_t resolution = 384;              //!< resolution of vertical axis (2^levels to 2048)
    float anamorphism = 1.0f;               //!< bloom x/y aspect-ratio (1/32 to 32)
-    uint8_t levels = 6;                     //!< number of blur levels (3 to 11)
+    uint8_t levels = 6;                     //!< number of blur levels (1 to 11)
    BlendMode blendMode = BlendMode::ADD;   //!< how the bloom effect is applied
    bool threshold = true;                  //!< whether to threshold the source
    bool enabled = false;                   //!< enable or disable bloom
@@ -541,6 +541,13 @@ struct SoftShadowOptions {
    float penumbraRatioScale = 1.0f;
 };

+/**
+ * Options for stereoscopic (multi-eye) rendering.
+ */
+struct StereoscopicOptions {
+    bool enabled = false;
+};
+
 } // namespace filament

 #endif //TNT_FILAMENT_OPTIONS_H
--- a/filament/include/filament/RenderTarget.h
+++ b/filament/include/filament/RenderTarget.h
@@ -91,8 +91,6 @@ public:
        /**
         * Sets a texture to a given attachment point.
         *
-         * All RenderTargets must have a non-null COLOR attachment.
-         *
         * When using a DEPTH attachment, it is important to always disable post-processing
         * in the View. Failing to do so will cause the DEPTH attachment to be ignored in most
         * cases.
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Benjamin Doherty	031cd302dd	Capture command callstacks for debugging	2023-10-30 14:42:24 -07:00
Benjamin Doherty	87351097ad	Adjust NEW_RELEASE_NOTES to reflect cherry-picks	2023-08-30 16:30:10 -07:00
Ben Doherty	694766682f	Update FrameCompletedCallback using directive (#7128 )	2023-08-30 16:26:15 -07:00
Ben Doherty	c946ebd1e6	Make destroyFence asynchronous (#7127 )	2023-08-30 16:22:37 -07:00
Romain Guy	25a8291101	Don't force masked blending for transmission/volume materials (#7126 ) * Don't force masked blending for transmission/volume materials glTF lets you choose your own alpha mode when using the transmission and volume material extensions. We were forcing the masked mode which was incorrect, except to pass the standard tests. * Update release notes	2023-08-30 13:34:49 -07:00
Mathias Agopian	60c689688d	attempt to fix remote ui (#7120 ) fixes #7116	2023-08-30 08:46:53 -07:00
Romain Guy	763bc1f34a	Fix possible NPE when updating fog options (#7123 )	2023-08-30 08:44:39 -07:00
Romain Guy	ef07638eef	Properly apply emissive to masked materials (#7122 ) * Properly apply emissive to masked materials The emissive property should not be multiplied by the color alpha in masked materials. The alpha is treated as a coverage value in that case, not an opacity value. * Update release notes	2023-08-30 08:43:36 -07:00
Ben Doherty	0aa0efe159	Transition setFrameCompletedCallback to take a CallbackHandler (#7103 )	2023-08-28 10:27:38 -07:00
Powei Feng	ef7bcd1e19	vulkan: refactor resource garbage collection (#7110 )	2023-08-28 10:20:37 -07:00
Powei Feng	702ceda82a	vulkan: fix debug marker pop (#7112 )	2023-08-25 21:27:30 -07:00
Mathias Agopian	66b78074de	Revert "workaround another PowerVR compiler bug " This reverts commit `58f96be2c4`. This caused material files to increase in size significantly. It turns out that glslang has to generate a copy for each parameter that is passed to a function as a non-const parameter. This revert will break IMG devices again, but that should be the case only on debug builds. Release builds lose the const qualifier by virtue of going through spirv. We'll try to address this some other way later.	2023-08-25 15:31:00 -07:00
Mathias Agopian	8d440cea17	Update/Improve ViewerGUI - separate out the settings for bloom, ssao and ssr - update webgl binaries - change default bloom resolution to 384 from 360 to have up to 7 mipmap levels vertically	2023-08-25 09:53:52 -07:00
Mathias Agopian	3ab8e4d725	fix lenseflare effect we were accessing an uninitialized LOD.	2023-08-25 09:53:33 -07:00
Mathias Agopian	97f20afdd7	remove the anonymous union in SamplerParams - don't rely on it being 32-bits - update the jni code to store SamplerParams in a long (64 bits) instead of a int. This gives us some future-proofing of the java side.	2023-08-24 21:36:42 -07:00
Mathias Agopian	42989e76d7	fix possible NPE crasher in timerquery fixes #7106	2023-08-24 21:36:21 -07:00
Powei Feng	ad45cc9092	Release Filament 1.42.0	2023-08-22 13:36:35 -07:00
Ben Doherty	04669f6ab9	Add Engine query for stereoscopic support (#7086 )	2023-08-22 12:41:56 -07:00
Powei Feng	c3c0dde82f	vulkan: fix crashing Pixel 4xl adreno (#7087 ) Adreno doesn't seem to like defining the size of arrays using a `const int`.	2023-08-22 12:36:45 -07:00
Powei Feng	ecd5b681d0	Update MaterialEnums.h (#7098 )	2023-08-21 10:49:44 -07:00
Romain Guy	66081e6cc1	Add fields used by JNI to proguard rules (#7096 )	2023-08-21 10:39:34 -07:00
Jacob Su	aa6e94a128	Fix Mat cofactor UT error on Mac M2 chip machine.	2023-08-18 10:28:51 -07:00
Mathias Agopian	098be2e115	rework how we initialize the gl context (#7085 ) * rework how we initialize the gl context - early initialization is now implemented with static methods so that it's very clear which state they need. - the version number is no longer used outside of initialization, instead we use the feature level. - ES3.0 Adreno devices are downgraded to feature level 0 * Update filament/backend/src/opengl/OpenGLContext.cpp Co-authored-by: Powei Feng <powei@google.com> --------- Co-authored-by: Powei Feng <powei@google.com>	2023-08-18 10:25:10 -07:00
Mathias Agopian	17caf6cae9	improvements to CompilerThreadPool and OpenGLPlatform CompilerThreadPool: - it now supports a thread cleanup function - some initialization is moved to the setup function OpenGLPlatform: - now cleans-up the thread pool threads upon exit	2023-08-17 20:11:57 -07:00
Mathias Agopian	26952631a3	only attempt to compile shaders in parallel if supported It can be extremely counter productive to attempt to do this if not supported.	2023-08-17 20:10:57 -07:00
Mathias Agopian	fc7b6447b7	make sure to not assert when matdbg is enabled	2023-08-17 20:09:55 -07:00
Powei Feng	6c0db37919	vulkan: fix readPixels selectMemory (#7084 ) readPixels requests staging memory to be host-visible/coherent/cached. But "cached" is not supported on Mali (Pixel 6pro). We make it a preferrable but optional bit.	2023-08-17 15:19:43 -07:00
Mathias Agopian	69f78dbcbe	better fix for calls to eglMakeCurrent turns out that KHR_surfaceless_context is implied for ES3.0 when KHR_create_context is present. However, Adreno 306 fails even if it advertises it. So, we now reset the value of KHR_surfaceless_context based on actually calling eglMakeCurrent(EGL_NO_SURFACE).	2023-08-16 22:46:36 -07:00
Mathias Agopian	c0389ac54c	rework ShaderCompilerService to improve performance - remove support for non-shared contextes parallel compilation. this wasn't used. we can always revive it later if we need to. - rework how callbacks work so that we don't have to use a work list executed at each tick() in the shared context case (common case). this improves performance significantly on low-end devices, by not having to go through the list to check if all programs are compiled, multiple times per frame. The new CallbackManager handles scheduling the callbacks after all previous programs are compiled.	2023-08-16 22:46:13 -07:00
Mathias Agopian	c0db909c13	don't use eglMakeCurrent with EGL_NO_SURFACE unless we're allowed EGL_KHR_surfaceless_context is needed to be able to use eglMakeCurrent without an EGLSurface.	2023-08-16 21:35:29 -07:00
Ben Doherty	46e4e966b9	Fix assert with matdbg enabled (#7079 )	2023-08-16 14:23:24 -07:00
Powei Feng	288b59a348	Fix missing createFence (#7076 ) Continuing from #7072	2023-08-16 12:18:15 -07:00
Mathias Agopian	1c7293db8d	fix fuzzyEqual - the return value was inverted - fuzzyEqual could generate alignment faults - move it out of mat4 and mat2 because it was only used in one place.	2023-08-16 10:40:13 -07:00
Benjamin Doherty	6006b47c44	Release Filament 1.41.0	2023-08-15 17:11:38 -07:00
Ben Doherty	6bb29f6e01	Implement preliminary support for instanced stereo (#6967 )	2023-08-15 17:08:11 -07:00
Mathias Agopian	f1b160db04	remove backend wait(timeout) API The only use of this API was with a timeout 0 to check the fence status. Timeouts other than zero could be very dangerous and since we're not using that feature for now, we just get rid of it. wait() is replaced with getFenceStatus(). It is currently only used by the FrameSkipper. This is not a public API.	2023-08-15 12:17:02 -07:00
Mathias Agopian	3bb52f083b	Remove (unused) support for hardware fences This code hasn't been used for a while and we should not resurrect it.	2023-08-15 12:17:02 -07:00
Mathias Agopian	88337ab358	use whenGpuCommandComplete to emulate platform fences This is more appropriate (and simple) than runEveryNowAndThen because the later doesn't manage a fence, and therefore is more of a superset. This will allow us to use a shared context implementation in the future.	2023-08-15 12:17:02 -07:00
Mathias Agopian	0935fe3fe3	cleanup the timer query implementations We don't use runEveryNowAndThen for implementations that don't need it (e.g. the EGL fence version, or the fallback version).	2023-08-15 12:17:02 -07:00
Mathias Agopian	96ed19549e	reduce the number of shader compiler thread to two from four more threads also use (much) more memory which can be a problem for lower end devices	2023-08-14 10:20:01 -07:00
Mathias Agopian	945e9a2cb5	don't pin the GL thread on PowerVR	2023-08-14 10:20:01 -07:00
Ben Doherty	4d703e3807	Refactor CompilerThreadPool out of OpenGLDriver (#7067 )	2023-08-11 17:03:36 -07:00
Mathias Agopian	f537f62adf	Use 4 background threads for shader compiler on PowerVR Since powervr supports parallel shader compilation well, we use 4 background threads for shader compilation.	2023-08-11 15:22:19 -07:00
Mathias Agopian	7840404132	Enable read-only feedback loop for PowerVR This is technically forbidden by the GLES 3.x specification but many GPU support it, which saves us a depth buffer copy. Note that this is supported in GL desktop.	2023-08-11 15:22:19 -07:00
Mathias Agopian	afad361cac	Workaround a PowerVR performance issue with destroying FBOs Destroying the FBO target of a blit operation causes a stall similar to calling glFnish(). We workaround this by delaying all FBO destructions to after the GPU is finished with the current frame.	2023-08-11 15:22:19 -07:00
Mathias Agopian	58b23c290c	Work around a PowerVR bug where gl_InstanceID is not initialized	2023-08-11 15:22:19 -07:00
Mathias Agopian	12a73137d7	workaround another PowerVR compiler bug In some situation, functions with const parameter cause the shader compilation to fail without an error message. We remove all the `const` qualifiers on functions, assuming this shouldn't impact code generation a lot.	2023-08-11 15:22:19 -07:00
Powei Feng	7a136eec5d	vulkan: clean-up includes and refactor handle allocator (#7056 )	2023-08-10 16:19:24 -07:00
Mathias Agopian	e9bd9ab3a6	Fix matdbg for 32 bits architectures	2023-08-09 19:19:05 -07:00
Mathias Agopian	083bff62e3	better handle "urgent" shader compilation instead of moving the "urgent" compilation to the head of the queue, we simply remove it from the queue and process it immediately. This has the benefit that on drivers that truly support parallel compilation, the latency will be reduced as we don't need to wait for the current compile to finish.	2023-08-09 15:01:55 -07:00
Mathias Agopian	f713316541	Enable parallel shader compilation on more devices	2023-08-09 15:01:32 -07:00
Powei Feng	bcfdf2f70d	Release Filament 1.40.5	2023-08-09 10:26:17 -07:00
Mathias Agopian	018d6f877f	Workaround for some PowerVR devices The PowerVR compiler systematically crashes on some devices when `gl_Position` is written twice in the vertex shader. fixes #5118, b/190221124	2023-08-08 08:59:59 -07:00
Mathias Agopian	1e4172b820	`RenderTarget` needs not to have a color attachment This was a somewhat arbitrary requirement, some RenderTarget could be depth only for instance.	2023-08-08 08:59:32 -07:00
Mathias Agopian	6b6827b70d	add a GLES compiler unit test (#7050 ) * add a GLES compiler unit test * Update filament/test/compiler_test.cpp Co-authored-by: Ben Doherty <bendoherty@google.com> --------- Co-authored-by: Ben Doherty <bendoherty@google.com>	2023-08-08 08:59:06 -07:00
Romain Guy	96cccc83c6	Update BUILDING.md	2023-08-06 08:44:55 -07:00
Mathias Agopian	2468a3a854	fix a typo causing EXT_color_buffer_float enabled on al ES3 devices b/287126679	2023-08-04 16:51:04 -07:00
Powei Feng	f68825f2ed	vulkan: fix fence initialization (#7038 ) Previously, we have a VulkanSync with a default constructor that allows us to have sync objects that returns error when actual fences are not yet present. We need to replicate that with VulkanFence since sync objects have been removed from the API. Fixes #7034	2023-08-04 11:13:57 -07:00
Mathias Agopian	2a12f71f96	fix a crash when shutting the engine down We were breaking the promise of pending shader compilation jobs by destroying the corresponding std::promise embedded in the job queue. In practice there was no danger of a deadlock by construction, but std::promise throws an exception in that case. On builds without exception enabled, this we be turned into an abort(). We fix this by using our own mechanism for signaling instead of std::promise. This ends up be more lightweight anyways. Fix: #6933	2023-08-04 10:00:37 -07:00
Romain Guy	74b64a5451	Update README.md (#7037 )	2023-08-03 15:03:03 -07:00
조다니엘(Daniel Cho)	7d01e0349b	Fix rendering issue when using DoF	2023-08-03 13:49:57 -07:00
Mathias Agopian	80014bf2b1	fix a possible overflow when picking We don't need to convert the object id to float, instead we can just "reinterpret_cast" it. With the current possible values of Entities, there was a risk of overflow once the age gets to 128 (very rare).	2023-08-03 13:49:14 -07:00
Mathias Agopian	adf3421f4a	Workaround Adreno bug causing picking to fail Adreno drivers don't support precision qualifiers in structs. fixes #6997	2023-08-03 13:49:14 -07:00
Mathias Agopian	51c65ccfdc	add picking to gltf_viewer for debugging	2023-08-02 12:40:16 -07:00
Powei Feng	5d37d08cf8	vulkan: fix TSAN in readpixels (#7023 )	2023-08-01 16:29:27 -07:00
Benjamin Doherty	eb18d75b2e	Release Filament 1.40.4	2023-08-01 15:37:51 -07:00
mackong	549c582287	engine: support setDepthFunc for MaterialInstance (#7004 ) Co-authored-by: Mathias Agopian <mathias@google.com>	2023-07-31 15:49:48 -07:00
Mathias Agopian	6c05029a9f	workaround a crash with some adreno drivers The crashes are triggered by spirv-opt's MergeReturnPass, so we just disable it. This pass also caused issues with AMD drivers on macOS. fixes b/291140208	2023-07-31 15:48:44 -07:00
Mathias Agopian	b2278986dd	Update bug_report.md	2023-07-31 11:47:32 -07:00
Mathias Agopian	98a2b8f159	Improve FrameSkipper performance on GLES/Android We get rid of the backend's HwSync object because on all platforms but GL it was implemented just like a HwFence. We now use HwFence instead. On GL platforms though, HwFence doesn't exist natively it is instead provided by the Platform. In that case, we emulate it as with GLSync objects -- the emulation incurs some latency that can cause frames to be skipped. On Android and platforms that provide the Fence functionality, there is no such issue. This change improves significantly frame pacing on Android.	2023-07-31 11:29:40 -07:00
Mathias Agopian	0ba891fb14	Fix off-by-one in FrameSkipper The frame latency specified was off-by-one, i.e. a value of 1 meant a latency of 2. The default was 2, which meant 3. Also it wasn't possible to specify the max latency of 4, which would OOB.	2023-07-31 11:29:40 -07:00
Mathias Agopian	042cd670aa	Improve fence-based timer query emulation It now uses the fence for the start time and end time, leading to much more accurate timings. We also use a single atomic variable instead of two.	2023-07-31 11:29:40 -07:00
mackong	95c7e4d02b	sample: fix typo	2023-07-27 10:03:11 -07:00
Mathias Agopian	d302525674	Add a way to query the validity of filament objects Engine::isValid() can be used to check the validity of most filament objects.	2023-07-27 10:02:31 -07:00
Mathias Agopian	3dbb7298f8	fix a cleanup of material parallel compilation When the engine is shut down, it's possible for some parallel compilation jobs (and callbacks) to be queued. We need to make sure to clear the queues and call the callbacks before destroying the parallel compilation service. Fixes b/290388359	2023-07-27 10:02:31 -07:00
Mathias Agopian	0ed71ab53b	fix an issue causing callbacks to be called too late We were waiting for programs from both queues to be compiled before calling the callback associated with one queue. In practice this caused the callback associated with high priority programs to be called only after low priority programs were ready. Also cleanup-up "token" so that it doesn't store the priority. Update the documentation and sample to better reflect what the implementation does.	2023-07-27 10:02:31 -07:00
Mathias Agopian	b1491ae5b1	Disable timer queries on all Mali GPUs fixes b/233754398	2023-07-27 09:57:41 -07:00
Mathias Agopian	03b8dc8027	The "back" key will now terminate the gltf_viewer activity This is useful for testing our shutdown code.	2023-07-27 09:57:14 -07:00
Mirsfang	e3568cd89f	fix macos openggl compile process_ARB_shading_language_packing type conversion error (#6994 )	2023-07-27 09:34:41 -07:00
Powei Feng	b35e24daa7	gltfio: exclude unsupported platforms from test (#7000 )	2023-07-26 19:11:50 -07:00
Powei Feng	f506b27a31	gltfio: simple test for asset loading (#6990 )	2023-07-26 16:50:06 -07:00
Benjamin Doherty	9452d5be1d	Release Filament 1.40.3	2023-07-26 12:51:50 -07:00
Romain Guy	ea3f449a08	Update dependencies (#6992 )	2023-07-26 11:05:09 -07:00
Ben Doherty	dc9510fe25	Support EXT_clip_cull_distance for future use (#6965 ) This PR sets up the ability for shaders to use `gl_ClipDistance`, which will be needed in the future. Desktop GL supports this natively. OpenGL ES requires the EXT_clip_cull_distance extension. Unfortunately glslang does not support this extension, so we have to employ a workaround for mobile when going through glslang. We instead write to `filament_gl_ClipDistance`, and then modify the SPIR-V to decorate this as `gl_ClipDistance`. See the comment in SpirvFixup.h. Note this PR does not actually use `gl_ClipDistance` yet, so there should be no change to shaders.	2023-07-26 10:01:28 -07:00
Powei Feng	731dd761d9	vulkan: Implement async readPixels (#6695 ) - Carry out readPixels without blocking and wait for the read to complete own a separate thread. - Add mContext.commands->wait() in finish() - Wait for readPixels to complete in finish() - Remove unused commandBuffer in Context	2023-07-25 14:07:56 -07:00
Powei Feng	6726ccb2fb	vulkan: fix subpass validation (#6980 ) - Before, we supposed that the maximum number of input attachment should match the maximum number of color attachments. But in reality, we've only used one input attachment for the second subpass. - The problem with the above supposition is that the descriptor set layout for the input attachment descriptor set must have the exact number of input attachment specified in the shader. If the layout has more input attachment slots than specified in the shader, then we'd run into a validation error. - In this patch, we fix the number of max input attachment in the descriptor set layout to 1, since we ever only make use of one. Fixes #6513	2023-07-25 11:24:14 -07:00
Powei Feng	5fce0f9ecf	filamentapp: fix vulkan dependency (#6987 ) Fixes #6983	2023-07-24 16:50:51 -07:00
Mathias Agopian	ad03fc4118	fix typo that prevented the shader blob cache to work (#6979 ) fixes b/290670707	2023-07-24 09:59:19 -07:00
Y-way	3f77ff8815	Build error on msvc 2022	2023-07-21 10:00:51 -07:00
Mathias Agopian	26b5fa1e38	fix a crash when using Material::compile with a callback The work queue is sorted by priority but when we insert a notification job we didn't have a priority to use for insertion, in addition the priority was taken from the token, but for the notification job we don't have a token. The fix consists in passing the priority around so we have it when needed.	2023-07-21 10:00:21 -07:00
mackong	24286e6016	gltfio: fix crash when compute morph target without material	2023-07-21 09:08:22 -07:00
Ben Doherty	626577fe3d	Fix ineffective FOG variant filter (#6968 )	2023-07-20 11:54:18 -07:00
Romain Guy	176915b59c	Add missing setParameter variants to MaterialInstance (#6973 ) Adds support for mat3/mat4 parameter types	2023-07-20 11:19:35 -07:00
mackong	0b60933c2a	web: remove const qualifier for getMaterialInstances (#6952 )	2023-07-20 10:43:33 -07:00
Pawan Vimukthi	0e31d6936a	Update android/Windows.md (#6964 ) Updated the flag `filament-skip-samples` in the code snippet to align with the documentation.	2023-07-19 13:23:08 -07:00