CGI Scenes Dataset
Overview
The QoEVAVE CGI Scene Database is a repository of three high-quality audiovisual scenes: The Cave, Cinema, and Mansion. Unlike the 360 Scenes Database, which features three degrees-of-freedom video, the CGI scenes are designed for six degrees-of-freedom VR with interactive and task-based elements. Each scene includes interactive audiovisual objects, static and triggered audio sources, and fully modelled acoustic geometry for advanced audio rendering. For more information, see the publication.
The Cave

- Dark scene with an interactive lantern as the primary light source.
- Walkie-talkie way-finding task: locate source at 1 of 5 positions.
- Complex reverberant acoustic geometry with labyrinth-style corridors.
The Cinema

- Modelled after the Fraunhofer IIS cinema — 70 seats (10×7).
- Import custom audiovisual content via the cinema manager.
- Human avatars, interactable props, and a toggleable lighting button.
The Mansion

- Multi-room mansion with balcony level and diverse surface materials.
- Impact-sound interaction across multiple object types and materials.
- Audio localization task: triggered cues fire only when the source is out of view.
Visual Design
Future-Proofing with HDRP
The Unity scenes are developed using Unity's High Definition Render Pipeline (HDRP). This pipeline offers several high-fidelity rendering features such as ray-tracing, lighting volumetrics, and post-processing effects. However, rendering visuals for real-time virtual reality is still computationally expensive, and implementing even a subset of these effects can cause severe frame drops.
A key benefit of HDRP is future-proofing the project. As VR headset capabilities and GPU rendering performance continue to improve, more computationally intensive effects become viable. Volumetric lighting — currently implemented in the Mansion and Cave scenes — demonstrates some of the HDRP features already in use.
Optimization

Visual design
Texture Atlas Material
To reduce the number of draw calls, textures across the scenes are combined into a texture atlas. Rather than maintaining individual textures per object, all static-object textures are packed into a single atlas. The figure shows an example composite combining ambient occlusion, normals, and roughness maps into a single material.

Visual design
Levels of Detail (LODs)
Many models in the CGI Scene Database include three LOD levels to maintain rendering performance. The figure shows the wall lamps from the Mansion scene across three levels of mesh detail (left to right: lowest to highest). As the camera moves toward or away from a model, the appropriate LOD is swapped in to balance visual fidelity with draw speed.
Audio Implementation
For more information on audio rendering and a complete asset catalogue for all scenes, take a look at the Asset Information page.
Object-based Audio
The audio implementation uses an object-based workflow. To render the audio objects, the MetaXR Audio SDK is included in the project. This can be replaced with other Unity audio spatializers if desired.
Each scene features two types of audio playback:
- Event-based playback: Triggered by a user interaction or scripted event. Represents repeatable occurrences such as impact sounds or doors opening and closing.
- Continuous playback: Begins at scene start and runs for the duration of the scene. Examples include environmental ambience or a radio broadcast — sounds not strictly controlled by user interaction. A user may interact to mute such a source, but the underlying audio continues to play as though it were a live feed.
Acoustic Diversity

Acoustic design
Rendering Relevance Ratings
A large portion of the scene design was driven by acoustic diversity. The choice of Cave, Cinema, and Mansion provides contrasting acoustic environments in terms of scene tasks, audio stimuli, and the acoustic properties relevant to auralization. To gain initial impressions, rendering relevance was rated for a set of acoustic attributes across each scene (N = 5 audio experts). See the publication for full attribute descriptions.
Bespoke Acoustic Geometry
Each CGI scene includes bespoke acoustic meshes modelled in Blender. These meshes enable more accurate rendering of acoustic features — including occlusion, diffraction, and early reflections — matched to the visual geometry, as opposed to simplified shoebox-style rooms. Scene-specific geometry and acoustic requirements are described on each scene page and on the Asset Information page.
Requirements
PC Hardware and Unity Version
The CGI Scene Database was developed using Unity v2021.3 LTS (long-term support). Scenes may also be imported into newer Unity versions, but may require resolving compilation errors on import.
Scene performance was tested with the following PC specifications:
- OS: Windows 10 (64-bit)
- CPU: AMD Ryzen 7 5800X 8-core @ 3.80 GHz
- RAM: 32 GB
- GPU: NVIDIA GeForce RTX 3080
VR Input Compatibility
The Unity project ships with the Unity XR Interaction Toolkit, XR Plugin Management, and the Action-based Input System. Both the OpenXR and Oculus XR plugins are provided, making the project compatible with most modern VR headsets.
Version Download
v1.0.0
The Unity project download includes all three scenes along with the required SDKs for VR mechanics, the MetaXR Audio SDK, and XR Interaction Toolkit. Tested with Unity 2021.3 LTS.
Publication
When using the QoEVAVE CGI Scene Database, please cite the following work
@inproceedings{robotham2024,
title = {CGI Scenes for Interactive Audio Research and Development: Cave, Cinema, and Mansion},
author = {Robotham, Thomas and Rebmann, Daniela and Fintineanu-Anghelescu, Dominik O. and Raake, Alexander and Habets, Emanuël A. P.},
year = {2024},
booktitle = {6th AES International Conference on Audio for Games},
address={Tokyo, Japan},
pages={1--11}
}