Implement a spatial audio effect (the sound moving around the listener’s head) using the
With the rise of the Metaverse and 3D games such as Battle Royale, the demand for immersive audio experiences in virtual environments is growing rapidly. Spatial audio, a technology that allows users to perceive the location and distance of a sound source around them in a virtual scene, is quickly becoming an essential part of creating immersive virtual experiences.
In response to this rapidly growing demand for an immersive audio experience, we’ve added a Proximity Voice module to the ZEGOCLOUD Express Web SDK (since v2.10.0), which provides the following features:
- Proximity voice chatA form of voice chat in virtual spaces chat where users can only hear the voice of other users from within a certain proximity, and the volume of the sound changes to according to the distance between the listener and the sound source.
- Spatial audio: Users in a virtual space can sense the position and distance of a sound source as they do when hearing a sound in the real world.
- Team voice chat: Users can join a team and switch between the Team-only mode (the user’s voice can only be heard by other users in the same team) and the Everyone mode (the user’s voice can be heard by everyone in the room) as they wish.
In this article, we will focus on how we can use the Web Audio API provided by web browsers to implement the spatial audio effect. Here is a simple spatial audio demo page we made using the Web Audio API.
- Click the Play button to start playing the music.
- Click the Turn On/Off Spatial Audio button to turn on or off the spatial audio effect.
- When the spatial audio effect is turned on, you can hear that the music is moving around your head.
(To experience the spatial audio effect, you will need to use stereo headphones or speakers.)
Okay. Let’s dive into more details.
An introduction to Web Audio API
The Web Audio API can be used for many different audio operations. For example, it is often used to replace the
<audio> tag to play audio on the web. In addition, it provides other audio-processing capabilities, such as audio volume adjustment, audio mixing, and audio spatialization.
The Web Audio API lets you perform audio operations inside an audio contextand has been designed to allow modular route. Basic audio operations are performed with audio nodeswhich are linked together to form an audio routing graph. A very basic audio routing graph looks like this:
In the graph, the Inputs, Effects, and Destination modules are three
AudioNodes representing the audio source, the intermediate processing module, and the audio destination arrangement.
The following describes the basic steps of a simple audio processing work-flow:
1. Create an audio context
AudioContext represents an audio-processing graph built from audio modules linked together, each represented by an
AudioNode. It is a central processing unit that controls the creation of the nodes it contains and the execution of the audio processing of each node.
2. Create a source node and an effect node inside the created audio context.
3. Connect the source node to the effect node
Call the source node’s
connect method to connect it to the specified effect node.
4. Connect the effect node to the destination of the audio context
Call the effect node’s
connect method to send the processed audio to the destination of the audio context. In this example, the destination node
audioCtx.destination represents the speakers currently being used.
5. Change the audio output by changing the properties of the effect node.
Implement a spatial audio effect using the Web Audio API
Now, let’s have a look at how we can implement spatial audio effects using the Web Audio API.
Basically, to add spatial audio effects to the audio source, you will need to use the following two interfaces in combination:
AudioListener : Represents a unique listener in a virtual 3D space. You can get the listener instance of an audio context from the
PannerNode: Represents an audio source in a virtual 3D space. You can call the
new method or the
AudioContext.createPanner() method to create a
The following describes how to set up the
AudioListener and the
PannerNode to achieve the audio spatialization effects you want.
1. Set up the
AudioListener describes the position and orientation of a unique person listening to the audio scene used in audio spatialization. A
PannerNode can be used to describe the position of the audio source relative to the listener.
The following three properties of an
AudioListener define its position in a right-hand cartesian coordinate system:
positionX: Represents the horizontal position of the listener. The default value is
positionY: Represents the vertical position of the listener. The default value is
positionZ: Represents the longitudinal (back and forth) position of the listener. The default value is
The following three properties define the position of the listener’s forward direction in the same right-hand cartesian coordinate system as the position values (
forwardX: Represents the horizontal position of the listener’s forward direction. The default value is
forwardY: Represents the vertical position of the listener’s forward direction. The default value is
forwardZ: Represents the longitudinal (back and forth) position of the listener’s forward direction. The default value is
The following three properties define the position of the top of the listener’s head in the same right-hand cartesian coordinate system as the position values (
upX: Represents the horizontal position of the top of the listener’s head. The default value is
upY: Represents the vertical position of the top of the listener’s head. The default value is
upZ: Represents the longitudinal (back and forth) position of the top of the listener’s head. The default value is
By setting up these two orientation vectors, the positions of the listener’s ears can be determined to create the spatial audio effect.
2. Set up the
PannerNode is an audio-processing module describing the position and movement of an audio source signal in a 3D audio space with the right-hand Cartesian coordinates. It spatializes an audio source signal using its position and orientation relative to the current
AudioListener within an
The following are some of the commonly used properties of a
panningModel: An enumerated value determining which spatialization algorithm to use to position the audio in 3D space. The default value is
equalpower, representing the equal-power panning algorithm. We recommend setting this property to
HRTFwhich means to render a stereo output of higher quality than
positionZ: The horizontal/vertical/longitudinal (back and forth) position of the audio in a right-hand cartesian coordinate system.
orientationZ: The horizontal/vertical/longitudinal (back and forth) position of the audio source’s vector in a right-hand cartesian coordinate system.
coneInnerAngle: A double value describing the angle, in degrees, of a cone inside of which there will be no volume reduction. The default value is
rolloffFactor: A double value describing how quickly the volume is reduced as the source moves away from the listener. The default value is
distanceModel: An enumerated value determining which algorithm to use to reduce the volume of the audio source as it moves away from the listener. The default value is
3. Implement the audio panning effect
The following code snippet shows how you can realize an audio panning effect that makes listeners feel like the audio is moving around their head. It is done simply by changing the position values of the
PannerNode while the music is being played.
The function is only effective for the sound collected by the SDK. The developer can dynamically adjust the voice change, reverberation, reverberation echo, and virtual stereo during a call or live broadcast.
This article gives a basic introduction to the Web Audio API and describes how to implement a spatial audio effect (the sound moving around the listener’s head) using the
Besides audio spatialization, the Web Audio API has many other powerful audio-processing features. For more details, you can check out the Web Audio API documentation on MDN.
For more details about the Proximity Voice module of the ZEGOCLOUD Express SDK, see the related developer documentation on the ZEGOCLOUD website.
Want to Connect?Visit ZEGOCLOUD website to learn more about what you can build with real-time audio and video!