An overview of modern audio technology in the game

3D Sound vs Surround Sound

In game development, sound is not as important as it seems. Game developers spend most of their time adding new features and effects to 3D graphics, but trying to convince them to spend more time and money developing games with high-quality audio can be difficult. At the same time, on the hardware side, players are more willing to buy the latest 3D image acceleration cards, and the new sound cards don't seem so cold.

However, with the development of display cards at its peak, players are becoming more and more critical of the game, believing that good games are indispensable in addition to their visual images and brilliant special effects, so the situation seems to be turning sharply - users and developers are more focused on the effects of audio systems than ever before. In modern game development programs, sound effects account for 40 percent of the budget, time and manpower.

Sound chip makers and developers of 3D sound are struggling to convince users and app developers that good 3D sound will be a major part of modern multimedia computers.

The previous sound was stereo, which is a very vague statement; after citing 3D Sound, we entered a new era of multichannel audio effects: 4.1, 5.1, and 7.1 channels.

Now let's move closer to 3D sound and see how it differs from multichannel solutions.

An overview of modern audio technology in the game1

Figure 1: The concept of 3D sound

The concept of 3D sound is to precisely locate the source of the sound in the 3D space around the audience. In the virtual game world, each object that can make a sound represents a source.

Let's take Action's typical first-person shooter" Vivisector: Beast Inside (Live Anatomist: Human Face Beast Heart) as an example to explain the problem in this article in depth. The image above shows the audience and sources, some of which are stereo (e.g. background music; in this particular game, the sand sound of the wind and jungle will be the main environment (noise); the monster has 8 sources; the player's shooting, footsteps, etc. as a sound source; and 3 environment sources (insects, birds, etc.).

In order to get more realistic sound in the scene, the 3D sound of the virtual world is deeply processed: simulating or exaggerating real-world sounds, using a variety of audio processing techniques such as reverb, reflection, blocking, obstruction, distant sound (the distance between the source and the audience).... Wait a minute.

3D audio technology: positioning

Everyone can perceive different sounds (depending on the shape, age and mental state of the ear), so there can be no only one quality option for different sound cards or processing effects in a 3D technique. Whether sound can be reproduced in real time depends mainly on sound cards and speakers, as well as the sound processing engine used in the game.

An overview of modern audio technology in the game2
Figure 2: 3D space

Now let's look at how 3D sound is generated, starting with 2D Panning (positioning) (the technology is still being used in DOom in ID Software). In this technology, each mono source is run as stereo, and the volume levels of their left-right channels can be adjusted to each other. This way, although there is no vertical positioning in the system, it can change the effect of the sound (e.g., high-frequency filtering), so he can hear the repressed sound when it comes from behind the audience.

Now the hardware is ready to do this. Sound cards can use HRTF (Head-Related Transfer Function) technology to simulate the location of a sound source in two speakers or headphones;

HRTF (Head-Related Transfer Function) - A function that uses two ears to determine the transmission of a source in a spatial location. In the process of sound transmission, our heads and bodies actually become obstacles to sound change, our ears are hidden behind the source, able to sense changes in sound signals, which then enter our brains and are decoded to determine the correct position of the source in space.

An overview of modern audio technology in the game3
Figure 3: HRTF (Head-Related Transfer Function)

In the image above you can see three HRTFs (head-related transmission functions) from the left ear to the right ear: source positioning, 135 degrees and 36 degrees. All processes of this data are basically the same, and it is common practice to record this data using special methods under special headsets. Sensaura uses synthetic HRTF under the law of smoothing (for example, at peaks of 2500 Hz, and intervals at lows of 5000 Hz), while other companies typically use average HRTFs.

The HRTF system above consists of two FIR filters, and HRTF is their transport function. Since HRTF is intelligent, our huge storage capacity HRTF seems wasteful, because the positioning of real sources can be achieved through HRTF interpolation.

The HRTF is fading

  1. The sound can be severely distorted

  2. The process is very slow

  3. If the sources are fixed, their positions will not be precisely positioned, because the human brain needs to move the source (the source moves or moves in the listener's mind) to know the precise positioning of the source in geometric space.

It is common for people to suddenly turn their heads towards the source of sound, and in the moment the head turns past, the exact position of the sound in space can be known in the mind. Between the HRTF functions before and after, if the source does not produce a special frequency, the mind ignores the sound;

4. The headset provides the best audio results. Headphones are a good solution to the problem of transmitting sound signals from one ear to the other. However, most people don't like headphones very much, even wireless models.

In addition, the problem of players bringing their headphones on will make the sound sound a little closer, and this issue has yet to be resolved.

Figure 4: Best listening position and crosstalk interference An overview of modern audio technology in the game4

The development of audioology can avoid these problems in headphones, but new challenges have emerged: First, it is not clear how to use speakers to generate stereo sound. For example, after HRTF transmission, how can a portion of the sound signal be transmitted between two ears? When we use speakers instead of headphones, the two ears get the same sound, and here's how to solve the problem is crosstalk interrupt (CC).

In the best listening position (sweet spots) listeners are ideally able to hear all 3D audio effects, while sounds in other areas are distorted. That way, we need to choose the right place when we listen. For a pair of speakers, there is a balance, vocal cords, detail, three-dimensional sense of the best listening position, called Sweet Spot. Recording and production are always important for listening. Sweet Spot is usually located in the middle of a pair of stereo speakers, a few feet in front of it. Many experts believe that from the top of the treble head to the tip of the listener's nose forms an illusory equilateral triangle, where Sweet Spot is located. Because of many objective conditions, this position may have some offsets, such as the reflection of the mixer panel, and differences in speakers can affect Sweet Spot, some speakers have a larger optimal position. The exact actual location is usually determined by continuous hearing and adjustment. The broader the scope of Sweet Spot, the better, which is why developers are struggling to find ways to extend the reach of Sweet Spot.

An overview of modern audio technology in the game5

Figure 5: Multi-speaker configuration

In a multi-speaker system (4.1, 5.1), sound is distributed from the speakers around the audience;

By the rules, using Panning is enough, i.e. all speakers play several streams simultaneously (depending on the number of speakers), but at different volume levels - so the effect is there. For example, Dolby Digital utilizes six and eight audio streams in 5.1 and 7.1 configurations, respectively.

Sensaura MultiDrive, Creative's innovative multi-speaker surround technology, reproduces sound using HRTF functions using four or more speakers.

Sensaura MultiDrive3D sound technology basically has to be at least 4 channels above the speakers to show the positioning of 3D sound, and each speaker outputs a different sound content. Creative Multi-Speaker Surround (CMSS) technology processes any mono or stereo source to 360-degree sound.

Each part of the speaker has two hemispheres before and after. Since the sound field is based on the HRTF function, sweet spot allows the source positioning of the source and front and rear axes on each side of the audience to be optimally perceived. As the coverage corner widens, so does Sweet Spot's space.

Without crosstalk interrupt (CC), the positioning of the source is not possible. Since HRTF is primarily used in MultiDrive technology for more than four speakers, it is necessary to apply CC algorithms to all four speakers, but this requires very powerful computing power on the audio processing chip.

With HRTF, the rear speakers can also be positioned precisely like the front speakers. The front speakers are usually placed near the monitor, the bass unit can be placed on the center floor, and the rear speakers can be placed anywhere the listener likes, but I'm sure no one will put it behind them.

Keep in mind that HRTF and CC require very powerful computing power when used in four speaker systems, so manufacturers have come up with a lot of ways to deal with it. Aureal, for example, has been innovatively acquired, using the Panning algorithm on the rear speakers because the positioning of the rear speakers is less rigorous.

NVIDIA uses Dolby Digital 5.1 on 3D audio. At the time of positioning, the entire audio stream is decoded to AC-3 format and then delivered in a digital format to an external decoder (for example, a home theater).

Min/Large Distance, Air Effect, Macro FX (Min/Max Distance, Air Effects, Macro FX)

An overview of modern audio technology in the game6
Figure 6: Distance mode

One of the main features of the sound engine is its distance effect, the farther away the source, the quieter the sound appears. The easiest way to do this is to lower the volume level over long distances, and when the sound starts to fade out, the designer of the sound effect must assign it a minimum distance. When the sound is within that distance, it only changes direction; The sound will remain weak until the furthest distance, and at the end the sound will be too far away to hear. When the sound approaches 1 volume level, the engine turns off the sound to free up resources. The farther away you are, the longer you hear the sound fade away.

In most cases, volume levels are peer-to-peer. Designers can identify larger and quiet sounds, and sources can be distinguished into minimum and maximum distances. Mosquitoes, for example, can't hear from 50cm away, while the sound of aircraft engines can still be heard clearly a few kilometers away.

A3D EAX HF Rolloff

The A3D API extends the distance of DirectSound 3D through modular high-frequency attenuation - the same as in the real world, the high-frequency portion is absorbed by the atmosphere according to the corresponding rules - approximately 0.05dB per meter (selected frequency: 5000 Hz by default). But in foggy weather, the air will be thicker and the decay of high frequencies will be faster. EAX3 allows for the handling of low-order modular air effects: two reference frequencies - low and high frequencies - are assigned here, depending on the parameters of the environment.


Most HRTF measurements are performed in remote fields, which simplifies calculations, but if the source is within 1 meter (in a nearby area), the HRTF will not be able to fully operate. This is when MacroFX, macroFX technology, is used to reproduce sounds coming from close areas. The MacroFX algorithm is suitable for sounds in the proximity area, and the sound is positioned to appear to be very close to the audience, as if the sound were coming from the speaker to the audience or even penetrating his or her ears. The effect is based on the precise modularity of all spatial sound transmissions around the audience, and efficient algorithms are used for data transmission.

The algorithm is integrated into the Sensaura engine and, under the control of DirectSound3D, i.e. it is transparent to application developers and can be used to develop a number of new effects.

For example, in a flight simulation program, an audience member as a pilot can hear a conversation with an air traffic controller, as if he were wearing headphones.

Doppler, Large Source Effects (ZOOM FX), Multi-Listen (Doppler, Volumetric Sound Sources (ZOOM FX), Multiple Listeners)

Doppler effect: A phenomenon in a transmission system that causes the observed wave frequency to change over time due to the effective propagation distance between the source and the observer. Racing or flying games will benefit a lot from Doppler effects, while in shooting, it can be used for sound effects during loud, laser or plasma shooting, i.e. any very fast moving target.

Large source effect

The volumetric source effect allows designers to create large sources of sound, as you might think: the sound of a person running, or firing a small weapon, is a very small source of sound, but if it's a group of cheering people, a huge generator, or a high-speed road, they're all making sounds that are in a wide area. Larger and composited sources provide more realistic results than the best.

The best sound source can be applied well to large but distant objects, such as moving cars. In real life, when the car approaches, the listener's position will no longer be the best source location. However, the algorithm of DS3D mode would think it was the best sound source, and the picture was less realistic (i.e. it looked like a small train approaching rather than a huge train).

Aureal was the first to apply large sources to its A3D API 3.0, followed by Sensaura's addition of support for large sources to its ZoomFX. ZoomFX technology defines several sources as a large object (assuming that the sound source synthesized by the train can consist of wheels, engines, coupled carriers, etc.).

An overview of modern audio technology in the game7

Figure 8: Multi-audience

Multiple Listeners is a new technology for game consoles (PlayStation 2, Xbox, GameCube) that supports two or more players. For example, the PS2 game "GT Racing 3" (Polyphony Digital Inc.) on the TV controller The ability to support multiple players, both in different areas of the computer and game, so they must only hear sounds around. No doubt they can hear each other's voices, but the technology simplifies the implementation process. Unfortunately, there are currently no hardware APIs that support multiple listeners. This technology is also only used in commercial sound APIs - FMOD. Wait a minute and we'll explain the details of it.

3D Sound Technology: Sonic Tracking VS Winding (wavetracing vs reverb)

An overview of modern audio technology in the game8
Figure 9: Multiple sound technologies

In 1997-1998, every chipmaker stepped up efforts to develop audio technologies that they considered promising. Aureal, then the industry leader, is betting on extreme real-life games, using the technology Wavetracing. Creative thought it would be better to use winding pre-calculations, so it developed EAX. Creative acquired Ensoniq / EMU in 1997: a company specializing in the development and manufacture of sound chips - which is why it had winding technology at the time. Sensaura appeared on the market using EAX as the basis, and the technology named EnvironmentFX version was actually: MultiDrive, ZoomFX and MacroFX. NVIDIA is the last manufacturer to enter the field - it enables the only real Dolby Digital 5.1 decoding for the positioning of 3D sound.


An overview of modern audio technology in the game9
Figure 10: Sound path/sonic tracking

In order to fully integrate sound effects into the game, it is necessary to calculate the sound environment and its interaction with the source. With the propagation of sound, sound waves and environment have the effect of interference. Sound waves can be transmitted to the audience's ears in several different ways:

  • Direct channel

  • 1st order reflection (1st order reflection)

  • 2nd order or late reflection (2nd order or late reflection)

  • Closed (occlusion)

Aureal's sound tracking algorithm analyzes the geometric description of 3D space and then determines how sound waves are transmitted in real-time mode, which is then reflected or passed through passive objects in a 3D environment.

Geometry engines are a very unique mechanism in the A3D interface program, which can modularize the reflection of sound and cross obstacles. It processes data horizontally from geometric levels: lines, triangles, and quads (acoustic geometry).

Sound polygons have their own location, size, shape and properties of manufacturing materials. Its shape is closely related to the source, and the listener can feel that each independent sound is reflected, crossed, or surrounded by polygons. The properties of the material can determine whether the transmitted sound is absorbed or reflected.

The database of image geometry can convert all graphics polygons into acoustic polygons when the game level is loaded through a converter. Globally reflected or closed values can be modified by setting parameters. In addition, it can handle multilateral conversion algorithms in advanced mode, store audio geometry databases as separate card files, and then exchange files as the game loads.

Finally, sound can achieve a more formal effect: a mix of 3D sound, an acoustically designed room and environment, and a precise reproduction of sound signals in the ear of the audience. The environment pattern implemented by Aureal is not ideal, even with the latest version of EAX of Creative.

In any case, the hardware flow assigned by the "sonic tracking" technology for calculating reflections is very limited. That's why there's still a long way to go to get real sound effects. For example, it currently does not have enough processing power for late reflections, let alone graphical sound processing. In addition, sonic tracking technology is not agile enough, and implementation requires significant resource expenditure. That's why you can't ignore texture rendering with EAX technology. 3D graphics have not yet been used for real-time rendering based on ray tracing methods.


Now let's look at the effects of closure. In principle, it can be achieved by turning down the volume, but a more practical implementation is to use low-pass filtering.

An overview of modern audio technology in the game10
Figure 11: Closed

In most cases, one type of closure is sufficient - the source is positioned behind invisible obstacles. The direct path is obscured, and the degree of filtration depends on the parameters of the geometry (thickness) and the material used to make the wall. Since there is no direct contact between the source and the listener, the echo of the source is suppressed according to the same principle.

An overview of modern audio technology in the game11

Figure 12: Obstacles

Creative's API developers used a more viable concept, using obstacles that meant that the direct path was encased - no direct contact with the audience, but the source and listener were in the same room;

An overview of modern audio technology in the game12

Figure 12: Exclusion

The most used is exclusion. The source and listener are in different rooms, but they have direct contact, the direct sound can be transmitted to the audience, but the reflected sound will distort (depending on the thickness, shape and properties of the material).

In short, no matter how realistic the effect is (using Aureal A3D, Creative Labs EAX, or manually selecting your own sound engine), you must track the geometry (full or partial sound) to find out if there is direct contact with the source. This has a great correlation to performance, which is why in most cases the simplest geometric space is built for sound (for more realistic effects, especially shooting, 3D RPG, or other similar games). Fortunately, this type of geometry is usually processed to find collisions - in order not to track the entire path in the player's room. That's why we can use the same geometry to show more sound detail.

Ambient gradient ( Environments morphing)

An overview of modern audio technology in the game13
Figure 13: Ambient gradients

Another solution for Creative Lab is EAX3, released in 2001. This is an algorithm for the gradual conversion of parameters from one environment to another. The picture above demonstrates the realization of two effects.

  • The first is the positional transition: the reverb parameters change gradually depending on the player's absolutely different parameters in both environments (in which case the outdoor space and the indoor space are separated by metal walls). As players get closer to the outdoors, the echo parameters of the outdoors can work more efficiently and vice versa.

  • The next type is Limit Change: when the player crosses the border (BORDER) -1 area, the parameters are automatically changed.

Ambient gradients are the most important function associated with echoes. However, there is a problem with modifying parameters that have been pre-set. Even without using the transition gradually, you can use these functions to create an average environment by setting the gradient to 0.5 (for example, we're in a stone corridor outdoors), so that we can get the average effect of different sound fields.

Before the ambient gradient was developed, the effects of the game (e.g. game Carnivores 2/Carnivores 2) could not be changed gradually by using different parameters (which had been pre-set in EAX1 and EAX2). The middle environment consists of 25 pre-set variables. For example, there is a cave gradient to the valley setting, and the stone corridor is selected as the intermediate parameter during listening. Now that you have an ambient gradient, you can avoid a lot of complicated processing.

Interface programs and APIs (Interfaces and APIs)

An overview of modern audio technology in the game14

Figure 14: Various popular API technologies

Now let's talk about the application of API programming in the audio engine. There are not many options available: Windows Multimedia, Direct Sound, OpenAL, Aureal A3D.

Unfortunately, the drivers of the Aureal A3D are still bug-riddle, and in the most popular Windows 2000 and XP operating systems today, the productivity is still very poor.

Windows Multimedia System is the most basic sound reproduction system inherited from early Windows 3.1. Its larger buffering causes relatively large latency, so there are few applications in the game;

OpenAL is a cross-platform API solution for Loki Entertaiment, similar to OpenGL. It was promoted by Creative as one of the options for Direct Sound. The idea is good, but the reality is cruel because it works badly. In addition, Loki Entertaiment has recently declared bankruptcy. We want new sound APIs to appear as soon as possible, because OpenAL is a nightmare for programmers. However, NVIDIA recently released the OpenAL hardware drivers it supports in its nForce chipset, and the results are too good to believe.

Direct Sound and Direct Sound 3D are by far the best APIs. They don't have an equal-strength opponent, and it's a bit pretentious; after all, it can actually reproduce the effects of sound without any help.

These hardware APIs (APIs with hardware drivers, rather than simulating sound reproduction through DirectSound or WinMM) are called wrappers (using prepared soft-hard interface programs to create their own application interfaces).

As a rule, each game has its own packaged application interface. There are many API component packages of this type (they don't have real hardware support): Miles Sound System, RenderWare Audio, GameCoda, FMOD, Galaxy, BASS, SEAL.

MilesSS is one of the most famous of these - 2,700 games use this component package entirely. It is licensed for Intel RSX technology and is now available as one of the options for software 3D Sound. The technology has a lot of features to choose from, but that's not enough to make up for it: it can only be used on Win32 and Mac platforms, and requires extremely high licensing costs.

Galaxy Audio was originally developed for Unreal, and now it's used on all Unreal Engine-based games;

Game coda and RenderWare Audio, from Sensaura and Renderware, respectively, are almost the same size and support PCs, PS2, GameCube, XBOX and many other features, but their licensing costs are also very high.

FMOD, a recently introduced technology with a wide selection of features and perfect support for API technology, occupies the current leadership position.

EAX (Ambient Sound Effect Extension)

EAX, whose full name is Environmental Audio Extension, is the API slot standard introduced by the innovation company when it introduced the SB Live sound card, which is designed to create a sound effector for specific environments, such as concert halls, corridors, rooms, caves, etc., that allows sound card processing through DirectX and drivers when the computer needs special sound effects, to show how different sounds react in different environments, and through multi-piece speakers. Achieve a stereo sound effect. EAX was originally available as version 1.0 and is currently version 4.0, which is currently supported in many games.

EAX Advanced HD (high-quality audio and 3D audio technology)

In 2001, Creative announced the Audigy sound card and a new EAX function called EAX Advanced HD. It includes 25 parameters that the audience can fine-tune and 18 parameters for the source (two of which are for the new closed effect).

An overview of modern audio technology in the game15

Figure 15: EAX Advanced HD mode

Optional user settings optimized for headphones, 2, 4 or 5.1 speaker systems
and external A/V amplifier systems Dolby
digital audio decoding to
5.1 speakers
in analog or digital mode Upgradeable 3D audio architecture In-game hardware-accelerated
EAX ADVANCED HD - Creative Surround (CMSS)
technology processes any mono or stereo
sound source to 360-degree sound Effects - User-selectable, DSP mode that simulates acoustic environments Advanced time zoom technology to adjust track playback speed without changing the sound frequency Audio denoising removes background noise from audio tapes and CD disc bursts

An overview of modern audio technology in the game16
Figure 16:

These effects are not typical real-world effects. They are used to create mood swings, such as if you feel dizzy, excited, and so on. We also have depth (0....1) and modulation time (0.4....4 seconds).

EAX4 (EAX Advanced HD Version 4)

Creative released EAX Advanced HD Version 4 in March 2003 and is expected to be available in late April or early May. Unfortunately, Creative does not describe its technical details in detail. The difference between EAX3 and EAX4 is only conceptual.

EAX Advanced HD Version 4 has the following new elements:

  • Studio Quality Effects

  • Multiple effect slots

  • Multiple Environments and Zoned Effects

Studio quality effect

EAX4 offers 11 studio quality effects. You can select the following effects in 2D and 3D sources.

  • AGC Compressor (Compression) - Automatically adjusts the level of the source volume

  • Auto-Wah - Auto-adjusts the version of Wah pedal

  • Chorus - Enables a single instrument to sound multiple instruments

  • Distortion - Simulates "excessive" and knots its amplifier

  • Echo - Brings audio space to the motion and extended source

  • Equalizer - 4-band equalizer

  • Flanger - produces a whistling effect

  • Frequency Shifter: Used to enter a signal

  • Vocal Morpher (Sound Element)

  • Pitch Shifter

  • Ring Modulator (ring modulator)

  • Environment Reverb - the basic component of EAX

Multi-effect slot

You can add a variety of effects. For example, you can hear several environments at the same time, or increase the gradient effect of distortion to the environment.

An overview of modern audio technology in the game17
Figure 17: Specific episodes of EAX Advanced HD v3

In EAX4, each source and listener has its own environment; Sound from the source is spreading in both its own and the audience's environment; So we can get the interaction of sounds between the environment and the audience.

An overview of modern audio technology in the game18

Zone effect

The concept of a zone is very similar to that of a room or environment.

An overview of modern audio technology in the game19

Regional effect is our ideal technique, but its implementation is much more difficult than theoretical application. The main challenge is to find out where the source is located, correct the nearest area of each source, and track the diffusion, closure, and barrier parameters for each source. Of course, we don't need to use all the effects that EAX4 provides;