Overview

Virtual Acoustics explained in a few words
This content is available under CC BY 4.0

Virtual Acoustics is a real-time auralization framework

Responsive auralization

VA creates audible sound from a purely virtual situation. To do so, it uses digital input data that is pre-recorded, measured, modelled or simulated. However, VA creates dynamic auditory worlds that can be interactively endevoured, because it accounts for modifications of the virtual situation. In the simplest case, this means that sound sources and listeners can move freely, and sound is changed accordingly. This real-time auralization approach can only be achieved, if certain parts of the audio processing are updated continuously and fast. We call this audio rendering, and the output is an audio stream that represents the virtual situation. For more complex situations like rooms or outdoor worlds, the sound propagation becomes highly relevant and very complex. VA uses real-time simulation backends or simplified models to create a physics-based auditory impression.
If update rates are undergoing certain perceptive thresholds, this method can be readily used in Virtual Reality applications.

Low latency, efficient real-time processing and flexible resource management

Input-output latency is crucial for any interactive application. VA tries to achieve minimal latency wherever possible, because latency of subsequent components add up. As long as latency is kept low, a human listener will not notice small delays during scene updates, resulting in a convincing live system, where interaction directly leads to the expected effect (without waiting for the system to process).
VA supports real-time capability by establishing flexible data management and processing modules that are lightweight and handle updates efficiently. For example, the FIR filtering modules use a partitioned block convolution resulting in update latencies (at least for the early part of filters) of a single audio block - which usually means a couple of milliseconds. Remotely updating long room impulse responses using Matlab can easily hit 1000 Hz update rates, which under normal circumstances is about three times more a block-based streaming sound card provides - and by far more a dedicated graphics rendering processor achieves, which is often the driving part of scene modifications.
However, this comes at a price: VA is not trading computational resources over update rates. Advantage is taken by improvement in the general purpose processing power available at present as well as more efficient software libraries. Limitations are solely imposed by the provided processing capacity, and not by the framework. Therefore, VA will plainly result in audio dropouts or complete silence, if the computational power is not sufficient for rendering and reproducing the given scene with the configuration used. Simply put, if you request too much, VA will stop auralizing correctly. Usually, the number of paths between a sound source and a sound receiver that can effectively be processed can be reduced to an amount where the system can operate in real-time. For example, a single binaural free field rendering can roughly calculate up to 20 paths in real-time on a modern PC, but for room acoustics with long reverberation times, a maximum of 6 sources and one listener is realistic (plus the necessity to simulate the sound propagation filters remotely). If reproduction of the rendered audio stream also requires intensive processing power, the numbers go further down.

Why is VA a framework?

You can download a ready-to-use VA application and individually configure it to reach your target. The combinations of available rendering modules are diverse and therefore VA is suitable for various purposes. The more simple modules provide free-field spatial processing (e.g. using Binaural Technology) for precise localization. More sophisticated modules create certain moods by applying directional artificial reverberation. And others try to be as precise as possible applying physics-based sound propagation simulation for indoor and outdoor scenarios. And there are also possibilities to simply mix ambient sounds that guide or entertain.
To deliver your sound to a human listener, you can use different reproduction modules. The selection process depends on the available hardware and the rendering type, and also the computational power you can afford. Find below the tables indicating the rendering and reproduction modules shipped with VA.
If what you want to do is not reflected by the available modules, you can also extend VA with your own module implementation. You can use generic calls to configure your components without modifying any interface and binding library, which is very helpful for prototyping.

Audio rendering in VA

The current version of VA provides rendering modules as depicted in the table below. In VA, you can instantiate as many rendering modules as you require - including multiple instances of the same class. This makes sense for example, if you want to use different configurations and evaluate the result by switching between renderings in the fraction of a second.
Rendering modules are connected to reproduction modules, and one renderer can also feed multiple reproductions.
However, there are limits in number of instances by the computational power available.

Class name Output stream Description
BinauralFreeField binaural 2-channel A binaural free field rendering that omits any present geometry. Uses FIR filtering for HRTFs / HRIRs, variable delay lines and filterbanks for directivities per source-receiver-pair.
BinauralArtificialReverb binaural 2-channel Mixes reverberation at receiver side using reverberation time, room volume and surface area with a binaural approach, applies effect using FIR filtering
BinauralRoomAcoustics binaural 2-channel Uses a simulation scheduler backend for binaural room impulse responses and applies effect by efficient convolution of long FIR filters per source-receiver-pair
BinauralOutdoorNoise binaural 2-channel Uses OutdoorNoise base renderer and processes incidence waves using binaural technology for spatialization. [BETA]
BinauralAirTrafficNoise binaural 2-channel See binaural free field renderer, but adds a ground reflection and temporal variation of medium dynamics.
AmbisonicsFreeField configurable Generates panned signals based on spherical base functions according to higher order Ambisonics (HOA).
VBAPFreeField variable Generates panned channel-based sound depending on output loudspeaker setup based on Vector-Base Amplitude Panning. Omits any geometry.
PrototypeFreeField configurable A free field rendering that omits any present geometry. Uses variable delay lines for propagation, filterbanks for source directivities and FIR filtering of given channel number for multi-channel receiver directivities. Mainly for recording simulations of spatial microphone arrays, sound field microphones or Ambisonics microphones.
PrototypeGenericPath configurable Concolves a long FIR filter efficiently for a configurable number of channels for each source-receiver-pair. FIR filter can be updated in real-time using the binding interface.
AmbientMixer variable Routes sound direcly to all channels of reproductions and applies gains of sources
PrototypeDummy unspecified Most simple dummy prototype renderer for developers to build upon.
Table 1: currently available audio rendering module classes in VACore

Audio reproduction in VA

The current version of VA provides reproduction modules as depicted in the table below. In VA, you can instantiate as many reproduction modules as you require - including multiple instances of the same class. This makes sense for example, if you want to use different configurations and evaluate the result by switching between reproductions in the fraction of a second. A rendering module can be fed by arbitrary numbers of rendering modules, but they have to be compatible concerning the streaming i/o schema. Also, a reproduction module can forward the final audio stream to any given number of outputs, if the physical channels are matching (e.g. 4 pairs of additional headphones).
However, there are limits in number of instances by the computational power available.

Class name Input stream Output stream Description
Talkthrough channel-based stream variable Forwards the incoming stream directly to the audio hardware. Used a lot for plain headphone playback and channel-based renderings for loudspeaker setups.
Headphones any two-channel equalized two-channel Forwards the incoming stream after applying FIR deconvolution, for euqalization of headphones if HpTF is available.
LowFrequencyMixer arbitrary variable Mixes all channels or routes a specified channel to a single subwoofer or a subwoofer array. Handy for simple LFE support.
NCTC binaural two-channel variable Uses static or dynamic binaural cross-talk cancellation for arbitrary number of loudspeakers.
BinauralMixdown any channel-based binaural two-channel Uses dynamic binaural technology with FIR filtering to simulate channel-based sound playback from a virtual loudspeaker setup.
HOA Ambisonics any order variable Calculates and applies gains for a loudspeaker setup using Higher Order Ambisonics methods (HOA).
BinauralAmbisonicsMixdown Ambisonics any order binaural two-channel Calculates and applies gains for a loudspeaker setup using Higher Order Ambisonics methods.
Table 2: currently available audio reproduction module classes in VACore

Configuring VA

If you want to use VA, you most likely want to change the configuration to match your hardware and activate the rendering and reproduction modules you are interested in.

Configuring VA in a VAServer application

VAServer can only start VA by providing a configuration file, usually called VACore.ini. You can configure VA for your purpose by modifying the *.ini files in the conf folder and use the provided batch start scripts, that will start the VA server using these configuration files. The VACore.ini controls the core parameters, the VASetup.*.ini are describing hardware devices and channel layouts. They are included by a line in the [Files] section of the configuration file and usually represent a static setup of a laboratory or a special setup of an experiment. Use enabled = true or enabled = false to activate or deactivate instantiation of sections, i.e. rendering or reproduction modules and output groups.

Configuring VA in a Redstart application

Redstart offers basic GUI dialogs to create and control common configurations in so called sessions, but can also create sessions based on arbitrarily configured INI files for special purposes. The audio settings and network server settings have extra inputs to provide rapid switching between sessions and audio hardware.

Using search paths

Loading files from the hard drive seems a triviality, but in practice a lot of time is wasted due to paths that can not be found during runtime - especially if error messages do not indicate this problem.
In VA, we struggle a lot with this and it is a serious problem. Often, configurations and input data for scenes are created locally and are later transferred to a computer in the laboratory. This computer is often not the computer that is also controlling the scene, because a remote network connection is used - which in consequence requires files to be mirrored on the hard drive of that server PC. If no precaution is taken, this usually leads to a nerve-wrecking trial-and-error process until all files are found - and mostly results in using absolute paths as the quick-and-dirty solution because we are all very lazy and to busy to do it right.
DO IT RIGHT in this context means, NEVER use absolute paths in the first place. VA provides search path functionality. This means, it will find any relative file path with the smallest amount of help: you have to provide one or many base paths where to look for your input files.

Search path best practice:

Put all your input data in one base folder, let's say C:/Users/student54/Documents/BachelorThesis/3AFCTest/InputData In your VACore.ini, add a search path to this folder:
[Paths]
studentlab_pc3_my_data = C:/Users/student54/Documents/BachelorThesis/3AFCTest/InputData
Let us assume you have some subfolders trial1, trial2, ... with WAV files and a HRIR dataset Kemar_individualized.v17.ir.daff in the root folder. You will load them using this pseudo code
HRIR_1 = va.CreateDirectivityFromFile( 'Kemar_individualized.v17.ir.daff' )
Sample_1_1 = va.CreateSignalSourceBufferFromFile( 'trial1/sample1.wav' )
Sample_1_2 = va.CreateSignalSourceBufferFromFile( 'trial1/sample2.wav' )
Sample_2_1 = va.CreateSignalSourceBufferFromFile( 'trial2/sample1.wav' )
...
When you now move to another computer in the laboratory (for conducting the listening experiment there), copy the entire InputData folder to the computer, where the VA server will be running. For example to D:/experiments/BA/student54/3AFCTest/InputData. Now, all you have to do is add another search path to your VACore.ini configuration file, e.g.
[Paths]
studentlab_pc3_my_data = C:/Users/student54/Documents/BachelorThesis/3AFCTest/InputData
hearingboth_pc_my_data = D:/experiments/BA/student54/3AFCTest/InputData
... and you have no trouble with paths, anymore. If it is applicable, you can also add search paths over the VA interface during runtime using the AddSearchPath function.

Controlling VA

The first question is: what kind of software do you usually use? There are bindings that make VA interfaces available in Matlab, Python, Lua and rudimentary functionality in C#. While many in the acoustics research area prefer Matlab, Python and especially the combination with Jupyter notebook is the open-source way to conveniently use VA. C# is your choice if you are planning to use VA for Unity environments, which is probably the lightest entry for those who are not familiar with either Matlab or Python scripting.

Let's create a simple example scene for a binaural rendering. It requires a running VA server application on the same PC.

Matlab

va = itaVA;
										
va.connect;
va.reset;

X = va.create_signal_source_buffer_from_file( '$(DemoSound)' );
va.set_signal_source_buffer_playback_action( X, 'play' );
va.set_signal_source_buffer_looping( X, true );

S = va.create_sound_source( 'itaVA_Example_Source' );
va.set_sound_source_pose( S, [ -2 1.7 -2 ], [ 0 0 0 1 ] );

va.set_sound_source_signal_source( S, X );

H = va.create_directivity_from_file( '$(DefaultHRIR)' );

L = va.create_sound_receiver( 'itaVA_Example_Sound_Receiver' );
va.set_sound_receiver_pose( L, [ 0 1.7 0 ], [ 0 0 0 1 ] );

va.set_sound_receiver_directivity( L, H );

va.disconnect;

Python

import va
										
va.connect
va.reset

signal_source_id = va.create_signal_source_buffer_from_file( '$(DemoSound)' )
va.set_signal_source_buffer_playback_action( signal_source_id, 'play' )
va.set_signal_source_buffer_looping( signal_source_id, true )

sound_source_id = va.create_sound_source( 'VAPy_Example_Source' )
va.set_sound_source_pose( sound_source_id, ( -2, 1.7, -2 ), ( 0, 0, 0, 1 ) 

va.set_sound_source_signal_source( sound_source_id, signal_source_id )

hrir = va.create_directivity_from_file( '$(DefaultHRIR)' )

sound_receiver_id = va.create_sound_receiver( 'VAPy_Example_Sound_Receiver' )
va.set_sound_receiver_pose( L, ( 0, 1.7, 0 ), ( 0, 0, 0, 1 ) )

va.set_sound_receiver_directivity( sound_receiver_id, hrir )

va.disconnect
										

C#
using VA;
namespace VA {
	class VAExample
	{
		static void Main(string[] args)
		{
            VAConnection = new VANet();
            VAConnection.Connect();
            VAConnection.Reset();

            string SignalSourceID = VAConnection.CreateSignalSourceBufferFromFile("$(DemoSound)");
            VAConnection.SetSignalSourceBufferPlaybackAction(SignalSourceID, "play");
            VAConnection.SetSignalSourceBufferIsLooping(SignalSourceID, true);

            int SoundSourceID = VAConnection.CreateSoundSource("C# example sound source");
            VAConnection.SetSoundSourcePose(SoundSourceID, new VAVec3(-2.0f, 1.7f, -2.0f), new VAQuat(0.0f, 0.0f, 0.0f, 1.0f));

            VAConnection.SetSoundSourceSignalSource(SoundSourceID, SignalSourceID);

            int HRIR = VAConnection.CreateDirectivityFromFile("$(DefaultHRIR)");

            int SoundReceiverID = VAConnection.CreateSoundReceiver("C# example sound receiver");
            VAConnection.SetSoundReceiverPose(SoundReceiverID, new VAVec3(0.0f, 1.7f, 0.0f), new VAQuat(0.0f, 0.0f, 0.0f, 1.0f));
			
            VAConnection.SetSoundReceiverDirectivity(SoundReceiverID, HRIR);

            // do something that suspends the program ...

            VAConnection.Disconnect();
		}
	}
}

C++
#include <VA.h>
#include <VANet.h>
#include <string>

int main( int, char** )
{
	IVANetClient* pVANet = IVANetClient::Create();
	pVANet->Initialize( "localhost" );

	if( !pVANet->IsConnected() )
		return 255;

	IVAInterface* pVA = pVANet->GetCoreInstance();

	pVA->Reset();

	const std::string sSignalSourceID = pVA->CreateSignalSourceBufferFromFile( "$(DemoSound)" );
	pVA->SetSignalSourceBufferPlaybackAction( sSignalSourceID, IVAInterface::VA_PLAYBACK_ACTION_PLAY );
	pVA->SetSignalSourceBufferLooping( sSignalSourceID, true );

	const int iSoundSourceID = pVA->CreateSoundSource( "C++ example sound source" );
	pVA->SetSoundSourcePose( iSoundSourceID, VAVec3( -2.0f, 1.7f, -2.0f ), VAQuat( 0.0f, 0.0f, 0.0f, 1.0f ) );

	pVA->SetSoundSourceSignalSource( iSoundSourceID, sSignalSourceID );

	const int iHRIR = pVA->CreateDirectivityFromFile( "$(DefaultHRIR)" );

	const int iSoundReceiverID = pVA->CreateSoundReceiver( "C++ example sound receiver" );
	pVA->SetSoundReceiverPose( iSoundReceiverID, VAVec3( 0.0f, 1.7f, 0.0f ), VAQuat( 0.0f, 0.0f, 0.0f, 1.0f ) );
	
	pVA->SetSoundReceiverDirectivity( iSoundReceiverID, iHRIR );

	// do something that suspends the program ...

	pVANet->Disconnect();
	delete pVANet;

	return 0;
}

Sound sources, sound receivers and sound portals

In VA, you will find three different virtual entities that represent sound objects.
While the term sound source is self explanatory, VA uses the term sound receiver instead of listener. The reason is, that listeners would reduce the receiving entity to living creatures, while in VA those listeners can also be virtual microphones or have a completely different meaning in other contexts.
Sound portals are entities that pick up sound and transport, transform and/or propagate it to other portals or sound receivers. This concept is helpful for sound transmission handling in Geometrical Acoustics, for example if a door acts as a transmitting object between two rooms.
It depends on the rendering module you use, but portals are mostly relevant in combination with gemoetry, say for room acoustics.

Auralization mode

Making acoustic effects audible is one of the central aspects of auralization. For research and demonstration purposes, it is helpful to switch certain acoustic phenomena on and off in a fraction of a second. This way, influences can be investigated intuitively.
VA provides a set of phenomena that can be toggled, and they are called auralization modes. Auralization modes can be controlled globally and for each sound sources and sound receiver individually. If a respective renderer consider the given auralization mode, the corresponding processing will be enabled or disabled based on the logical AND combination of the auralization modes (only if auralization modes of source, receiver AND global settings are positive, the phenomenon will be made audible).
Most of the auralization modes are only effective for certain rendering modules and are meaningless for other. For example a free field renderer will only expose direct sound, source directivity and doppler effect changes. All other phenomena are dismissed.

Name Acronym Description
Direct sound DS Direct sound path between a sound source and a sound receiver
Early reflections ER Specular reflections off walls, that correspond to early arrival time of a complex source-receiver-pair.
Diffuse decay DD Diffuse decay part of a the arrival time of a complex source-receiver-pair. Mostly used in the context of room acoustics.
Source directivity SD Sound source directivity function, the angle dependent radiation pattern of an emitter.
Medium absorption MA Acoustic energy attenuation due to absorbing capability of the medium
Temporal variation TV Statistics-driven fluctuation of sound resulting from turbulence and time-variance of the medium (the atmosphere).
Scattering SC Diffuse scattering off non-planar surfaces.
Diffraction DF Diffraction off and around obstacles.
Near field NF Acoustic phenomena caused by near field effects (in contrast to far field assumptions).
Doppler DP Doppler frequency shifts based on relative distance changes.
Spreading loss SL Distance dependend spreading loss, i.e. for spherical waves. Also called 1/r-law or (inverse) distance law.
Transmission TR Transmission of sound energy through solid structures like walls and flanking paths.
Absorption AB Sound absorption by material.
Table 3: currently recognized auralization modes

Signal sources

VA differentiates between sound source and signal source. On the one hand we have a sound source, which is an acoustic entity that emits sound. On the other hand we speak of a signal of a sound, which represents the acoustic information emitted. Hence, a sound source is always connected with a signal source. For example, a piano is a sound source, the music played when using its keys is called the source signal, and can vary depending on the piece or interpretation of the artist.
VA provides a set of different signal source types. Most of the time sample buffers are used, that are populated with pre-recorded audio samples by loading WAV files from the hard drive. Those buffer signal sources can be started, paused and stopped, and they can be set into loop mode.
Apart from buffers, there is also the possibility to connect a microphone input channel from the audio device. More specialized signal sources are for example speech that is transcripted from text input, or machine signal sources with a start, idle and stop sound. Finally, you can connect your own implementation of a signal source by providing a network client that feeds audio samples, or register a signal source using the local interface directly (both in an experimental stage, though).

Directivities (including HRTFs and HRIRs)

The sound radiation pattern of a sound source is usually described by a directional function that depends on wave length or frequency. This function is generally called directivity and is commonly used in the context of musical instruments. The underlying concept, however, can be diverse. Solutions range from simulated or measured data at sampled directions on a regular or irregular spherical grid all the way to sets of fundamental functions that are weighted by coefficients, such as spherical harmonics. VA supports individual implementations of those directivities, and it is up to the rendering modules to account for the different types (and for near field effects or distance dependencies).
To maintain a general approach to this topic, in VA, sound receivers can be assigned directivities, too. Due to the reciprocal nature of acoustic propagation and the fact that one can model sound transmission by means of linear shift-invariant systems for the majority of applications, this approach is equally valid for sound receivers. In the context of binaural technology, a sound receiver translates to a listener and the assigned directivity is called a head-related transfer function or head-related impulse response, depending on the domain of representation. The HRTF or HRIR is applied to the incoming sound at receiver in the same way a source directivity would be used.

Geometry meshes and acoustic materials

Geometry-aware audio rendering is the holy grail of physics-based real-time auralization using geometrical acoustics simulation. It requires sophisticated algorithms and powerful backend processing to achieve real-time capability. VA tries to support this by providing a simple geometry mesh class and interfaces to load and transmit geo data. However, it is up to the implementation of the rendering modules what to do with the data. Faces of meshes are assigned with acoustic materials such as absorption, scattering and transmission coefficients. These are, for example, used (or transformed and forwarded) by special rendering instances, like the binaural room acoustics audio renderer.

Scenes

A scene is a somewhat unspecified term in VA. Any assembled scene information is passed to the rendering modules, and it is up to the implementation what happens next. The scene interface methods are for prototyping and future work, and can for example be used to define stratified medium definitions for air traffic noise rendering ... or for loading entire cities from a geo information server for a certain geo location that is intended to be used by a special rendering implementation.

Active sound receiver concept

In VA, there is not one single sound receiver (or listener, if we speak of a human beeing). Instead, VA renders sound for all enabled sound receivers, and the actual output stream that is forwarded to the reproduction module(s) can be switched dynamically by configuring the active sound receiver. This works either for all rendering instances, but can also be controlled for each rendering instance individually, i.e. to use VA for multiple listeners in one scene.

Real-world pose (tracking)

Sound receivers have a pose, a combination of a position and an orientation in 3D space. But they also have a pose for the real-world, meaning that a receiver can also be positioned in the reference frame of the real physical laboratory environment. This is required for the processing of some reproduction modules, for example the binaural cross-talk cancellation reproduction NCTC, where the dynamic listener pose (this time it is a human beeing) has to be known very precisely.

The VA struct class

VA uses a struct class, that acts like an associative container. Keys are described by strings, and values can be of basic data types like boolean, integer, floating point and strings. Also, more sophisticated VA types like samples are available, i.e. for audio data or impulse responses. Furthermore, structs can load other structs, which means that they can be nested to create well-structured formats. Structs behave very much like Matlab structs, Python dicts and the JSON format, and these objects can be forwarded over remote interfaces, for example to update an impulse reponse in an FIR convolution engine. It is very convenient to use this concept for prototyping, and it allows to change parameters in almost every corner of the VA core by parameter setters and getters of modules that can be accessed using the module interface.