EN | FR

Masking

by Amélie Bernier-Robert and Ben Duinker
Timbre Lingo | Timbre and Orchestration Writings

Published: November 13, 2023

Suppose you enter a restaurant with a friend, mid-conversation. As you enter, you are greeted by the background noise of other patrons’ conversations. You and your friend begin to speak louder so you can hear one another. You’ve just experienced masking, a very familiar yet fascinating phenomenon that many of us encounter every day without even noticing. Masking is “the process by which the threshold of hearing of one sound is raised by the presence of another'' [1]. In this restaurant experience, your friend’s voice functions—to you—as the masked sound (known in lab-based research as the test stimulus) and the background noise of the restaurant patrons functions as the masker; this background noise raises your masked detection threshold—the minimum audible intensity of the masked sound (your friend’s voice) in the presence of a masker (the background din of the restaurant) making it harder to detect [1]

Masking can be partial, when the masked sound is still audible (but softer), or total, when the masked sound can’t be heard at all [2]. Additionally, there are several contexts in which masking can occur, including simultaneous masking (with two concurrent sounds), forward masking (when the stimulus occurs after the masker), backward masking (yes, when the stimulus precedes the masker!), central masking (when the masker is presented in one ear and the stimulus in the other) [2], energetic masking (peripheral masking, due to interferences) and informational masking (higher-level masking which is not energetic) [1], [3]

Simultaneous masking, as the restaurant experience above describes, is by far the most well-understood type. It is a direct consequence of competition on the auditory nerve [1], occurring when the masker activates the receptors of the inner ear that would have otherwise been activated by the masked sound, meaning the masked sound does not fully reach higher levels of the auditory system. The activation of these receptors strongly depends on critical bands, a very important concept in psychoacoustics. But before discussing this, it is necessary to take a brief look at the ear’s anatomy. 

 

Figure 1: Sound propagates into the ear canal (outer ear) before transmitting the vibration to the eardrum and ossicles of the middle ear (malleus, incus and stapes). The oval window (touching the other tip of the stapes) then receives and sends the oscillations to the cochlea (inner ear) [4]

 

The receptors of our auditory system sit inside the cochlea, a spiraled organ filled with liquid that oscillates with the incoming sound waves. This process then spreads to the basilar membrane, which lines the entire length of the cochlea through its center, upon which sit the inner hair cells that transmit the auditory information to the cortex. The resonance location of the basilar membrane depends on the incoming frequency: higher frequencies transmit vibration to the base (bottom of the spiral at the oval window where the vibrations enter) of the membrane while the lower frequencies bring resonance to the apex (tip of the spiral near the helicotrema) [5]

Critical bands act as a series of band-pass filters characterizing the behavior of our basilar membrane. A band-pass filter lets through only a part of a sound’s spectrum. It has a bell-curve shape and is quantified by its center frequency (frequency value at the maximum height of the curve) and bandwidth (frequency distance between the two borders of the filter, where the energy of the curve is half the maximum value). Critical bands thus split the spectrum of an incoming sound into many frequency bands. The width of a critical band increases as the center frequency increases, but always covers the same distance on the basilar membrane [6]. This means that identical-sized sections of basilar membrane cover a wider range of frequencies in the higher register than they do in the lower register.

 

Figure 2: Unrolled basilar membrane, with the frequency values (in Hertz) aligned with the corresponding distances from the oval window. Notice how the range of frequencies covered by an identical small segment of basilar membrane near the oval window (e.g., from 6000 to 8000 Hz) is much bigger than it is near the helicotrema (e.g., from 150 to 200 Hz) [7]

 

Why do critical bands matter for masking? Simply put, critical bands determine when masking will occur, and when it won’t occur. For example, one sound (“sound A”) can mask another (“sound B”) if the distance between the two sounds’ frequencies is less than a critical band’s bandwidth. By contrast, sound A will not mask sound B if the distance between the two sounds’ frequencies is greater than a critical band’s bandwidth. If sound A’s bandwidth is quite narrow (i.e., it only covers a small region of the critical band it occupies), the masked detection threshold for sound B will be quite low (i.e., it will still be easy to hear). As sound A’s bandwidth grows wider, eventually becoming coterminous with the critical band, the masked detection threshold for sound B similarly grows (i.e. it will become very difficult to hear). Consequently, sound B must become louder (i.e., emit more sonic energy) to remain audible. But if sound A’s bandwidth surpasses the critical band’s bandwidth, the masking threshold begins to lower again, because sound A’s energy gets spread over an increasingly large frequency band, and thus less of this energy is distributed within the critical band [6], [8].

By measuring the masked detection thresholds of a tone with variable frequency, we can obtain a masked audiogram (or masking pattern), a curve often used to measure the masking effect generated by a fixed masker as a function of stimulus frequency [6]. Masked audiograms reflect the physical oscillation pattern of the basilar membrane provoked by the masker [2], [9]. In the case of a pure tone masker, their asymmetrical shape is affected by the level of stimulation. At very low levels, the shape is symmetrical. As the level increases, the slope of the pattern of the low-frequency side remains constant but the slope of the high-frequency side is progressively more shallow because the excitation pattern extends toward the high frequencies more (this is often called “upward spread of masking”) [10]. In other words, the critical bands involved by two sine waves will—depending on their energy levels—overlap more when the masked tone is above the masker, making the masking effect more intense. You can hear the effect in action in this demonstration.

 

Figure 3: This diagram shows the relation between the basilar membrane excitation pattern for a sine wave signal (S) and a narrow-band noise masker (B) at different frequency and level relations between the two. It demonstrates that lower-frequency maskers have more of a masking effect than do higher-frequency maskers. The overlap between the two excitation patterns is shown by the hashed region. Notice that a masking noise at a lower frequency masks the signal more at higher levels than does a masker at a higher frequency than the signal.

 

The width of a masked audiogram increases with the intensity of the masker, notably due to the biomechanics of the basilar membrane and the fact that the traveling wave begins at the high-frequency end and moves toward the low-frequency end. Additionally, the general shape of the curve depends on the energy distribution of the masker (whether it is a tone, a noise band, a noise, etc.). 

In certain cases, masking will happen even if the test stimulus and the masker do not appear simultaneously. This scenario is often referred to as temporal masking, and results from higher level processes (e.g., cortical areas) [11]. We can experience temporal masking when the test stimulus occurs before or after the masker (as demonstrated here). Forward masking (or post-masking; [12]) is caused by the ringing of the basilar membrane, which keeps oscillating for a short amount of time after the masker disappears [11]. Therefore, competition on the auditory nerve can still happen for up to around 30 ms after the masker has disappeared. In backward masking [12], where the test stimulus occurs before the masker, the more energetic masker travels faster and catches up the test tone in the higher levels of the auditory system (their energies are integrated and therefore interfere with one another) [11]. This effect can generally be observed for an interstimulus interval of up to about 10 ms. 

Other types of higher-level masking include central masking, when the masker and the test tone are introduced in opposite ears [2], and informational masking, when the sound is not processed even though it was not energetically masked (i.e., no lower level interference has occurred) [1]. In central masking, the interference occurs in the medial superior olive, the region of the brain where binaural information is combined [2]. Informational masking, for its part, can be caused by many factors including similarity between the target and the masker, attention factors, and factors related to Auditory Scene Analysis [3]. We can also reduce most masking effects by using mechanisms of masking release, which include filtering, amplitude modulation of the masker, spatial separation, and many others [1]. This example features a demonstration of masking release using comodulation of the masker with a narrow band of noise.

With this all in mind, it’s no surprise that predicting the masking effect of an auditory scene becomes a very complex task. We not only need to know the contribution of each type of masking phenomenon for each component, but also the interactions that could occur between different maskers. For instance, two maskers could mask each other (which would reduce their total masking effect), or they could together be loud enough to induce distortion inside the cochlea (increasing the total masking effect) [3]. Nevertheless, years of research on masking have led to many applications, including the MP3 compression algorithm for audio files, noise reduction methods ([12], [13]) and compositional tools

References

[1] Culling, J. F. and Stone, M. A. (2017). Energetic masking and masking release. In Middlebrooks, J. C., Simon, J. Z., Popper, A. N. and Fay, R. R. (Ed.), The auditory system at the cocktail party. Springer. 

[2] Zwislocki, J. J. (1978). Masking : Experimental and theoretical aspects of simultaneous, forward, backward, and central masking. In Carterette, E. C. and Friedman, M. P. (Ed.), Handbook of perception. Volume IV, Hearing. Academic Press. 283-332.

[3] Durlach, N. (2006). Auditory masking: Need for improved conceptual structure. Journal of the Acoustical Society of America, 120(4). 1787-1790. https://doi.org/10.1121/1.2335426 

[4] How do we hear? (2022, March 16th) National Institute on Deafness and Other Communication Disorders. https://www.nidcd.nih.gov/health/how-do-we-hear

[5] Cook, P. R. (1999). Music, Cognition, and Computerized Sound. An Introduction to Psychoacoustics. The MIT Press. 

[6] Greenwood, D. D. (1961). Auditory masking and the critical band. Journal of the Acoustical Society of America, 33(4), 484-502. https://doi.org/10.1121/1.1908699

[7] Wegel, R. L. and Lane, C. E. (1924). The auditory masking of one pure tone by another and its probable relation to the dynamics of the inner ear. Physical Review, 23, 266-285. 

[8] Scharf, B. (1961). Complex sounds and critical bands. Psychological Bulletin, 58(3), 205-217.

[9] Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12, 47-66.

[10] Moore, B. C. J. and Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750-753.

[11] Källstrand, J., Montnémery, P., Nielzén, S. and Olsson, O. (2002). Auditory masking experiments in schizophrenia. Psychiatry Research, 113, 115-125. 

[12] Wang, Y. S., Feng, T. P., Wang, X. L., Guo, H., and Qi, H. Z. (2018). An improved LMS algorithm of active sound-quality control of vehicle interior noise based on auditory masking effect. Mechanical Systems and Signal Processing, 108, 292-303.

[13] Nilsson, M. E., Alvarsson, J., Rådsten-Ekman, M. and Bolin, K. (2010). Auditory masking of wanted and unwanted sounds in a city park. Noise Control Engineering Journal, 58(5), 524-531.

Previous
Previous

Klangfarbenmelodie

Next
Next

Spectral Envelope