Emmanuel Deruty, Sony CSL, 6 rue Amyot, 75005, Paris, France.
derutycsl at gmail.com

2nd AES Workshop on Intelligent Music Production.
QMUL, London, UK, September 2016


In the field of music production, mixing is the process that aims at converting multi-tracks to music. Automatic mixing is a field of research that aims at performing this task without human intervention. The automatic mixing community is currently focused on producing technically correct mixes. Our position is that the mixing process serves many purposes, and that technical correctness is but one of them. We call this approach goal-oriented mixing.


The state of the art in automatic mixing provides knowledge and tools to perform a technically correct mix. This is useful, insofar as a mixing engine should be able to produce a technically correct rough mix, leaving the interesting, creative stuff for the mixer [1]. We must see beyond correctness, and one way to do so is through goal-oriented mixing. A goal is set, and tools can be designed to reach this goal. We provide a non-exhaustive list of such goals, illustrated by real-life examples.

In many cases, the goal can be the expression of an emotion. The example can therefore be linked to BRECVEM/BRECVEMA mechanisms [2-6].


According to [7], knowledge concerning technically correct mixes can be gathered through three channels: best practices, experiments and machine learning.

2.1 Knowledge through best practices

In this process, expert human mixers list what they consider as best practices. In [8], best practices have been presented under the form of wrong or true claims.

Examples of wrong claims as found by [8]:

01 All signals should be presented with equal loudness.
06 Dynamic Range Compression affects relative loudness choices.
10 The higher the frequency content the more a track can be panned sideways.
12 Hard panning should be avoided.
15 Panning is mostly done audience-perspective.
16 It is customary to apply temporal cues to panning.
19 High-pass filters should be used in all tracks with no significant low-frequency content.
20 There is a specific low-mid region that can be attenuated to improve clarity.
21 Expert mixers tend to cut more than boost.
23 Equalization use should always be minimized.
25 Reverb time is strongly dependent on song tempo.
38 Compression should not be overused and there are maximum values for it.
39 Compressor attack is set up so that only the transient goes through.
40 Compressor release is set up so that it is over when the next note is about to start.

Examples of true claims as found by [8]:

02 The main element should be up by an understandable amount of loudness units.
03 Vocals should be ridden above the backing track True
04 No element should be able to mask any of the frequency content of the vocals.
05 Track panning affects partial loudness
07 Low-end frequencies should be centrally panned.
08 The main track is always panned centrally.
09 Remaining tracks are panned out of the center.
11 Frequency balance should be kept between left and right
13 Sources recorded with close (mono) and far (stereo) techniques simultaneously should have the mono source panned to the same perceived position featured in the stereo source.
28 The pre-delay is timed as a multiple of the subdivided song tempo.
29 The level of the reverb returns is on average set to a specific amount of loudness lower than the direct sound.

30 Low-end frequencies are less tolerant of reverb and delay.
14 Monophonic compatibility should be kept.
17 Equalization is frequently done to avoid inter-track masking effects.
18 Salient resonant frequencies should be subdued.
22 High Q-factors should be used when cutting and low Q-factors when boosting.
24 Every song is unique in its spectral/timbral contour.
26 Reverb time is strongly dependent to an autocorrelation measure.
27 Delay times are typically locked to song tempo.

31 Transients are less tolerant of reverb and delay.
32 The sends into the reverbs should be equalized.
33 Reverbs can be carefully substituted by delays to lessen masking effects.
34 Compression takes place whenever a source track varies too much in loudness.
35 Compression takes place whenever headroom is at stake, and the low-end is usually more critical.
36 Gentle bus/mix compression helps blend things better.
37 There is an optimal amount of compression in terms of dB and it depends on sound source features.
41 It is acceptable to judiciously lop off some micro-burst transients to gain peak-to-RMS space.
42 In deciding a tracks dynamic profile, an expert engineer will shift the focus of the listener by enhancing different tracks over time, with volume changes that may some times be quite big.

2.2. Knowledge through experiments

In the context of such experiments, a group of human mixers work on multi-track material under a controlled setting. The result is analyzed in various terms. In [9], this process has been able to generate the following type of knowledge: “the loudness of the lead vocals is 3dB higher than the loudness of the overall mix”; “panning increases until 400Hz, after which is remains stable”; “a target spectrum can be observed”;  “crest factor values increase for all tracks”.

2.3. Knowledge through machine learning

In this process, systematic study of large amounts of data typically provides characterization of elements from the dataset.

In [10] and as illustrated in Figure 1, a study involving machine learning was able to highlight relations between loudness and spectrum in mixed multi-tracks.

Figure 1: relations between loudness and spectrum in mixed multi-tracks are an example of knowledge acquired by machine learning.


The BRECVEM model outlines relations between music and emotion. There are seven BRECVEM mechanisms. We find that six of them are relevant to the mix. The six mechanisms we find to be relevant are, quoting [6]:

  • Brain stem reflex is a hard-wired primordial response that humans have to sudden loud noises and dissonant sounds.
  • Rhythmic entrainment is when the listener’s internal body rhythm adjusts to an external source like a drum beat.
  • Emotional contagion is when the listener perceives an emotional expression in the music and mimics the emotions internally [5].
  • Visual imagery may occur when a piece of music conjures up a particularly strong image. This could potentially have negative or positive valence and it has been linked to feelings of pleasure and deep relaxation [4].
  • Episodic memory is when the music triggers a particular memory from the listener’s past life.
  • Musical expectancy is believed to be activated by an unexpected melodic or harmonic sequence. The listener will be expecting a piece of music to be resolved, but suddenly it violates, delays or changes in an unexpected way [5].

“Aesthetic judgment” was added to BRECVEM, and this resulted in BRECVEMA. We set “aesthetic judgment” apart, on the grounds that any mix involving craftsmanship may involve aesthetic judgment. Quoting [6] again:

  • Aesthetic judgment is the mechanism that induces ‘aesthetic emotion’ such as admiration and awe. This mechanism may play a part in music production quality by enhancing musically induced emotions.

[6] provide a particular interpretation of the BRECVEMA mechanisms in terms of mixing. Our interpretation may differ.

Reference for poster Num. Audio / video Ref. of the recording Description Suggested BRECVEM mechanisms


This section lists cases in which either the mix must not be heard as such, or in which most of the mixing process consists in hiding itself. For comparison, we also list a case in which the mix is literally non-existent.

4.1 Simulated transparency: realism, fidelity, clarity

This particular aspect of the mixing process is reminiscent of the assumption according to which all sources should be heard as well as possible, be it by way of equal loudness [11-13] or masking minimization [11, 14-15].

1 Brahms: Symphony no. 4 in E minor, op. 98, 1885
Orchestre philarmonique de Radio France
Recorded 2014

Mixes of classical music often seek to produce a result that’s as realistic as possible [16]. On the video, we can observe microphones placed directly above the orchestra. As a result, the raw tracks will not sound like a typical live orchestra.. The goal of the mix in this case is therefore to recreate a realistic image.

2 Dowland: "Love those beams that breed", 1612
Recorded and mixed by the author, 2015

The goal of this mix is to pretend that 5 recorder parts played separately in a damp studio may have been played in a live, real place. Realism had to be recreated. Distances had to be simulated with equalization and reverberations; acoustic background noise had to be synthesized. The result sounds realistic, yet everything was heavily processed.

3 Sibelius Violin Concerto, 1905
Maxim Vengerov, Daniel Barenboim
Recorded 2011

The production process may result into elements at different positions of the stereo image, including at different depths [17]. In this example, the solo violin lies at the foreground. The orchestra is split into different plans, roughly corresponding to the instrument position in a real orchestra. For instance, the orchestra strings are on the second plan (behind the solist), and the timpani are all the way behind.

Crystal Castles - "Not in Love" ft. Robert Smith, 2010

Crystal Castles is an underground band [18], and Robert Smith, as the lead singer and main songwriter of The Cure [19], is a star. Robert Smith starring on a Crystal Castles song is an opportunity for the band to be more successful. One of the purposes of the mix is therefore to make Robert Smith’s voice as recognizable as possible. The mix will have to try to render Robert Smith’s voice as realistically as possible.

5 Carcass - "Incarnated Solvent Abuse", 2013

Heavy metal musicians are often proud of their virtuosity [20]. Theferore, a mix of heavy metal music will tend to be as precise as possible so that the listener will rightly appreciate any virtuoso part that should arise.

4.2 Truly transparent mix, absence of mix per se
6 Pierre Schaeffer - "Etude de Bruits", 1948

Available technology in 1948 didn’t make multi-track recording possible, and limited processing to its simplest expression [21]. This resulted into music that was not mixed. We observe from this example that an absence of mix doesn’t necessarily result in a realistic or transparent mix.

Reference for poster Num. Audio / video Ref. of the recording Description Suggested BRECVEM mechanisms


This section lists cases in which the mixing process highlights or creates feelings that are conveyed to the listener. As a result, this section contains several subjective observations, in which case they’re signaled as the “author’s interpretation”. In several cases, the example is linked to one or several BRECVEM mechanisms.
7 Rammstein, "Reise, Reise", 2004

Author’s interpretation: “huge, powerful”. In many songs from Rammstein, the mix results in a sound that’s huge and powerful. As illustrated in Figure 2, such mixes are homogeneous to the band’s public image.

Figure 2: Representation of Rammstein’s guitar player Paul Landers. Rammstein’s mixes correspond to the band’s image: they convey a feeling of power.

8 Dagoba, "I sea Red", 2010

Author’s interpretation: “compact, thick”. In this example, the mix produces an impression of thickness. The result is less powerful than in the case of Rammstein, but thicker and more compact.

9 Nine Inch Nails, A warm Place, 1993

Author’s interpretation: “warm, comfortable”. The mix is never aggressive, and takes advantage of lush reverbs, tamed attacks, as well as comfortable-sounding basses.

Emotional contagion
10 The Cure, "At Night", 1980

Author’s interpretation: “cold, empty”. A number of elements from the mix point towards cold and emptiness, in particular: 1) drums with precise attacks, without natural acoustics, 2) a large reverb with no low frequencies on the keyboard, 3) a dry, precise guitar, 4) a compact, compressed, close bass. See [22] for more details concerning the album’s production.

Emotional contagion
11 Pink Floyd, "Empty Spaces", 1979

Author’s interpretation: “empty, abrupt reality”. In this example, the mix sustains the message conveyed by the lyrics and the song’s title. The mix includes a large reverb on the crowd sample, simulating a situation in which everybody else is far away. A sense of abruptness is conveyed by the handling of plosives during the mix, e.g. the “k” at the end of the sentence “where we use to talk” (1'45).

Visual imagery
Emotional contagion
12 Pink Floyd, "Another Brick in the Wall pt. 1", 1979

Author’s interpretation: “solemn”. The vocals are processed so that an impression of solemnity surfaces. This is made possible by the addition of church-like reverberations (would correspond to BRECVEM: “visual imagery”), and deep basses. This is particularly obvious on the word “memory” at 0’30.

Visual imagery

Figure 3: The mix for The Cure’s “Pornography” is in harmony with the album cover.

13 The Cure, "Pornography", 1982

Author’s interpretation: “terror, panic”. The drums are hard panned to the left, and appear to come from the back of a large room. The guitar is difficult to follow; it features a highly fluctuating loudness; high-medium frequencies are boosted. The vocals are underlined by a long delay with important feedback. The process of volume automation (see [23] for what it is) makes elements unexpectedly jump out of the mix. As shown in Figure 3 (left), the mix is homogeneous to the album’s cover. May be linked to the BRECVEM mechanisms “brain stem reflex” and “emotional contagion”.

Brain stem reflex
Emotional contagion

14 Nine Inch Nails, "The Perfect Drug", 1997

The impression conveyed by the mix in the case of examples 7 through 13 could be described using words. This is more difficult in the present case. From 4’55, the mix produces an impression of “hollowness” that may originate from the “oily can”-type reverb that’s added to the drums and the detuning of the piano.


Brian Eno, "Slow Water", 1978

Brian Eno, "There is Nobody", 1978

The mix creates unrealistic acoustic spaces, and these spaces can be interpreted as the acoustics of imaginary places. It therefore provides a feeling of travel and dépaysement, with specific feelings attached to each imaginary place and space. This is particularly obvious in tracks such as “Slow Water” or “There is Nobody”. May rely on the BRECVEM mechanism “visual imagery”.

Visual imagery
16 Madonna, "American Life", 2005

The mix can be held responsible for 1) creating a particular feeling, 2) enhancing the original sounds (see Section 8.2) and 3) creating an abstract message (see Section 7). The vocals in the first verse are tightly controlled: they appear to be tightly compressed, as well as tightly processed with a pitch correction plug-in. The attacks are crisps. There is a clean delay under some of the syllables. These operations converge towards a vocal sound that may be described as clean, straight, and tight. Listening to these vocals could be compared to watching an abstract painting: though the actual content is difficult to describe, combined factors converge towards a particular emotion.

17 Tove Lo, "Habits", Hippie Sabotage Remix, 2014

The combination of a heavy round bass and dynamic compression contributes to a murky, unstable atmosphere. In this case also, a resulting impression that originates from processes taking place during the mix can be pointed out, but is difficult to describe using words.

18 The Cure, "Cold", 1982

The drums are mixed so they convey the impression of being loud and noisy. As the album’s producer puts it: « […] to compress the ambience mike to capture the real sound of a drummer. Loud. Noisy. » [24] This is a borderline case between simulated realism (see section 4.1) and the expression of a feeling.

19 Hans Zimmer, "Mountains", 2015

Author’s interpretation: “monumental, scary”. Movie soundtracks may be designed to underline the scene’s ambience. In this case, Zimmer’s use of large acoustic spaces and, at the end of the extract, saturation of the acoustic space, reinforces the feeling of enormousness we find in the movie’s scene. This example may rely on the BRECVEM mechanism “visual imagery”. The process is similar to Eno’s “Music for Films” (example 15). Both composers propose imaginary acoustics for an imaginary place.

Visual imagery
Reference for poster Num. Audio / video Ref. of the recording Description Suggested BRECVEM mechanisms


Trends concern the mix as well. Following a trend may include conforming to a particular spectral profile [25]. A given spectral profile may be provided as a target in the process of target mixing [26-28].

6.1 Current trends

20 Metallica, "My Apocalypse", 2008

The peak of the loudness war takes place in 2008 [29]. One trend in 2008 is that music must be loud. Metallica wants to be louder than everybody else – and succeeds [30].

21 Partenaire Particulier, "Partenaire Particulier", 1983

The ‘80s: one of the trends is to wash many instruments with reverb [31]. This track doesn’t stand as an exception; witness the amount of reverb on the lead vocals.

22 Maître Gims, "Sapés comme jamais", 2015

The ‘00s and ‘10s: many use Autotune to modify the lead vocal’s intonation [32]. Maître Gims conforms to the trend and uses Autotune liberally.

6.2 Revival of vintage sound.

The revival of vintage sounds may prompt “episodic memory” BRECVEM mechanisms, by reminding people of their first listening to similar sounds and music.
23 MGMT, "Kids", 2007

‘60s revival. Use of boosted medium frequencies and slow-attack compression during the mix emulates records from the 60’s [25, 33].

Episodic memory
Visual imagery
24 Portishead, "Silence", 2008

‘60s revival. The mix is hard-panned left/center/right. This is reminiscent of the era during which technology only allowed hard panning [34]. Following the same goal, equalization favors medium frequencies, and tube compression is obvious.

Episodic memory
Visual imagery
25 Portishead, "Cowboys", 1997

Portishead “Cowboys”, 1997. ‘60s revival. The mix is entirely in mono – reminiscent of the era before stereo [34]. Boosted medium frequencies and slow-attack compression also contribute to a vintage sound [25, 33].

Episodic memory
Visual imagery

Benoit Carré, "Piano Mécanique", 2013

60’s revival. Quoting the artist: “we recorded the drums as Phil Spector would have done; the entire mix followed this direction.” [35]

Original quote in French: “On avait enregistré les drums dans le style de Phil Spector; tout est parti de là et le reste, y compris le mix, a suivi.”

Episodic memory
Visual imagery
27 Sky Ferreira, "Everything is embarrassing", 2012 ‘80s revival. Traits from a mix from the 80’s: snare with a gated reverb [36], synthetic sounding reverbs on the lead vocals [37]. Episodic memory
Visual imagery
Reference for poster Num. Audio / video Ref. of the recording Description Suggested BRECVEM mechanisms


The examples from this section involve abstract structures. Appreciation of abstract structure, for instance appreciation of the structure’s elegance or novelty, can by definition be associated with the BRECVEMA mechanism “aesthetic judgment”. Therefore, “aesthetic judgment” will not be mentioned in this section, even though it is relevant to all examples.

7.1 Creating sound scenes

One of the purposes of mixing is to create a sound scene [17]. In Section 1, these sound scenes were realistic; they could be identified with actual physical setups. In the present section, we show examples of sound scenes that are devoid of realism. Given a set of instruments, there exist a large number of potential sound scenes. Some archetypes exist, such as the scenes illustrated by examples 28 and 29. Each example in the present section is illustrated by a video that shows the evolution of the sound scene over time.
28 Death, "Symbolic", 1995

In the author’s experience, this is an archetypical sound scene: vocals are close and panned at the center. The bass is at the center. The rhythmic guitar is double tracked, and hard panned left and right. Drums are mixed further away and panned gradually, with the kick and snare drums in the center.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

29 Nick Minaj, "Super Bass", 2011

This is another example of an archetypal sound scene. The lead vocals are mixed in the center, close. Other vocals and choirs are hard panned left/right. The image of harmonic keyboards is wide. Some percussive synthesizers follow a trajectory between left and right. The kick drum, snare drum and hi-hat are panned at the center.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

30 Radiohead, "Exit music for a film", 1997

A guitar is near the center; its attacks are panned center, and its harmonic resonances are panned left; this brings width to the guitar. The lead vocals are in the center, with a reverb more widely panned. As a result, the vocals feature width and depth. Choirs are hard-panned left/right; faraway children noises are hard-panned left/right. In the second part, the bass is close, mingled with a noisy synthesizer track, and devoid of acoustics; the drums are moderately panned. In the author’s experience, this sound scene appears to be both clear and sophisticated.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

31 Nine Inch Nails, "Ruiner", 1993

A noticeable trait lies in the fact that originally loud vocals (during the verses) are mixed softer and further than originally soft vocals (during the chorus). This inversion is not common. Additionally, originally soft chorus vocals are mixed panned left/right with a phase shift. A keyboard in the chorus occupies most the middle of the mix, while other non-rhythmic elements are hard-panned left and right (possibly so that they remain audible). The solo guitar is panned on the left; the bass and drums during the guitar solo are panned on the right. In the author’s experience, this sound scene is highly unusual.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

32 Nine Inch Nails, "Right where it belongs", 2005

This piece is distinctive for its slow shift in sound image during the song (at 3’03), when the panning, width and depth of many elements change gradually.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

33 Radiohead, "2+2=5", 2003

This song presents several shifts of sound scene. We focus on the first verse, at 1’00. A dry lead vocal line and a higher, wet lead vocal line are both panned at the center. A guitar is hard-panned right; reverberated noises are hard-panned left; narrow spectrum, clean percussions are panned at the center. The sound scene changes abruptly at 1’25, with, amongst the differences with the previous scene, two dry lead vocals with different spectra.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.


7.2 Creating abstract movements

A particular track can be processed so it follows a predetermined trajectory inside the sound scene.
34 Karlheinz Stockhausen, "Oktophonie", 1990-1991

This is the stereo reduction of an eight-channel piece. An important part of the mix consists in recreating the eight-channel sound movements inside a stereo space.

35 Guillamino / Aleix Fabra Roca,
“3D audio sound objects with Blender”, 2009

Thesis of Computer Science studies developed in the audio department of BarcelonaMedia [38]

It is part of the stereo mixing process to create trajectories in the stereo space, in this case using dedicated spatialization tools.

36 Massive Attack, "Mezzanine", 1998

Examples 34 and 35 illustrated trajectories taking place at the time-scale of a musical phrase. In this example, trajectories occur at a smaller time-scale; they become a feature of an objet sonore, i.e. a shorter, unitary sound object [39]. The sample at 0’16 illustrates the mixing process as a way to create spatially dynamic objets sonores.


7.3 Creating space sequences

Section 7.1 described entire sound scenes. Section 7.2 described trajectories of single instruments in space. This section lists examples of composite objects – more specifically, vocals, evolving in a complex way.

The examples are borrowed from [40]. A video illustrates each example. A square represents a part. A part can be made from one or several homorhythmic lines. Each line is represented by a dot. The dot’s abscissa represents the pan, and its ordinate represents the pitch.

37 Lorde, "Royals", 2013

Though there is only one singer, the vocals are composite, following dynamic groupings of homorhythmic lines.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

38 Britney Spears, "Circus", 2008

A different instance of the same type of process.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

39 Brandy, "Human", 2008

In this example, a set of previously homorhythmic lines forming a single part splits into two different parts for a while. This produces an ambiguity concerning the identification of vocal parts.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.


7.4 Creating illusions

Mixing can create auditory illusions. They can re-edit well-known illusions such as Risset’s endless glissandi [41], or they can exploit more specific setups such as the relation between a dry sound and its reverb.
40 Pink Floyd, "Sheep", 1977

At 1’38, the lead vocal part progressively turns into an instrumental sound.

41 Pink Floyd, "Echoes pt. 2", 1971

The end of this piece reedits Risset’s endless glissandi, with vocal lines fading into each other.

42 Radiohead, "Backdrifts", 2003

We focus on the vocal part that starts at 1’29. What may at first seem to be a reverb to the lead vocals turns out to be a different part doing a different pattern. A similar trick is observed on the solo piano starting from 3’12. What seems at first a reverb to the piano turns out to be a set of independent sounds doing different patterns.


7.5 Creating / modifying articulations

Modifications of the transition between two consecutive notes can be performed using pitch correction plug-ins such as Autotune.
43 Cher, "Believe", 1998

This is the song that started the Autotune craze [42]. A massive hit makes use of Autotune not only to correct the intonation, but also to audibly and purposely modify the articulations.

44 Madonna, "American Life", 2005

We focus on the lead vocals at the beginning of the chorus (1’09). The pitch correction plug-in creates subtle artificial articulations, reminiscent of middle-eastern vocal music.

45 Daft Punk, "One More Time", 2001

Whereas example 44 featured a subtle use of pitch correction to modify the articulations, this example illustrates a particularly drastic use of pitch correction for the same purpose.

7.6 Creating / enhancing structure
46 BT, "Skylarking", 2013

An increase in loudness and a sliding filter are two transformations that can be implemented during the mixing process. We want to check if we find these transformations near the limits of structural segments. We split the example into structural segments according to [43]. There are 41 segments. As illustrated on Figure 4, sliding filters can be observed 15 times newt to segment limits. An increase in loudness can be observed 13 times next to segment limits. These transformations begin a few seconds before the limit, and end at the limit. We can conclude that such transformations are linked to the segmentation. This analysis was adapted from [44].

Figure 4: Segmentation of the audio from example 46. Sliding filters (noted “I”) and loudness increase (noted “II”) can often be found near segment limits.

47 Lady Gaga, "Telephone", 2009

In some recent tracks, it can be observed that loudness changes correspond to section changes. This is less noticeable in older tracks.

Figure 5: Similarity matrices of RMS power for two songs. Left, “Come Together” by the Beatles (1969); level changes no not follow sections. Right, “Telephone” by Lady Gaga (2009); level changes follow sections (adapted from [30]).

Reference for poster Num. Audio / video Ref. of the recording Description Suggested BRECVEM mechanisms


There exist highly different classes of enhancements; for instance, “creating punctuations” and “polishing a track so it sounds better” do not belong to the same class of processes.
8.1 Punctuations
48 Canibus, "M-Sea Creasy", 2003
Transcription and video by the paper's author.

Additional vocal lines underline particular words.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

49 Pink Floyd, "Another Brick in the Wall pt. 1", 1979
Transcription and video by the paper's author.

At 0’30, additional vocal lines underline the word “memory”.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

50 Radiohead, "2+2=5", 2005

At 1’35, a reverb and additional vocals lines emphasize the word “scream”.

Manual transcription and video by the author, 2016.
Manual transcriptions are prone to error.
Download this video with a better resolution.

51 Céline Dion, "Encore un soir", 2016

The vocals are punctuated by different reverbs with different release times.


8.2 Making more lush

Knowledge of the spectrum characteristics for each instrument is useful for enhancing its sound [44]. According to [8], frequency removal as described in [45-46] is a widely recognized sound enhancement technique.
52 Keith Jarrett, "Oct. 17, 1988", 1988

ECM is a label known for its careful attention to the final result in terms of sound [48]. This is an example of the famous “ECM sound”.

53 Buena Vista Social Club, "Chan Chan", 1996

The mix for this acoustic piece goes beyond a realistic rendering and enhances all sounds to provide a lush, generous sound. A distinctive feature of the example is the use of superimposed acoustics that are very different from each other. The lushness is particularly obvious with the guitar punctuations at 1’40. The variety of acoustics is particular obvious with the trumpet solo near 2’40.

54 Massive Attack, "What your soul sings", 1996

The vocals are particularly shiny. This is probably done using EQs and tube saturation.

55 Radiohead, "Exit music for a film", 1997

The vocals are warm, their image inside the sound scene is wide. Refer to example 30 for more details.

56 Nine Inch Nails, "The Great Below", 1999

The pads starting near 2’08 are lush and warm. In the author’s experience, achieving such a warm sound requires a lot of craftsmanship.


8.3 Making the music more “dancy”

Dance music is supposed to make people dance. It has to be mixed adequately. Typically, the drums have to be mixed louder. All examples relate to the BRECVEM mechanism “rhythmic entertainment”.
57 Hardwell feat. Jack Reese, "Run Wild", 2016

Hardwell was the world #1 DJ in 2013 and 2014 [49]. His job is to make people dance, and the kick drum is mixed loudly.

Rhythmic entertainment
58 Britney Spears, "I wanna go", 2011

The lyrics describe an experience that takes place in a club. In accordance to club music standards, the drums are mixed louder than they would otherwise be.

Rhythmic entertainment

8.4 Getting closer to what’s perceived as musical perfection

During the mix, a number of different processes can be used to tweak the recorded parts so that they get closer to musical perfection. Of particular importance are the pitch- correction plug-ins. These plug-ins have been used by the music producers to such an extent, that singers have been observed to naturally sing as if their voice was processed by a pitch-correction plug-in. Quoting [32]:

“Hugh and I listened to a demo recording on which the studio owner's daughter had provided the vocals, and we both thought that he'd been too heavy‑handed with Auto-Tune or some similar pitch-correction plug‑in. He told us that he hadn't used any pitch-correction processing at all, that was just the way she sung […] I mentioned this to UK producer Steve Levine when we met to sit on a panel earlier this year and he said he'd also come across this development, specifically female vocalists who had learned to pitch very precisely and to move cleanly from one note to another without the normal glides and slides you hear in a typical unprocessed vocal.”

59 Miley Cyrus, "Karen don't be sad", 2015, studio version This comparison illustrates the difference between a processed version and (comparatively) unprocessed version of Miley Cyrus’ voice. There is no pitch-correction in the live, less processed version.


Miley Cyrus, "Karen don't be sad", 2015, natural live vocals  
Pitch-corrections plug-ins are not the only way to get closer to perfection. Some plug-ins, such as Vocalign, may be used to doctor rhythmic alignment.
60 Crystal Castles, "Baptism & Intimate Live at Jimmy Kimmel", 2010 A more exotic method to make a singer seem less imperfect is to drown his/her voice in various effects. In this example, this phenomenon is obvious when comparing the first song to the second.  
Reference for poster Num. Audio / video Ref. of the recording Description Suggested BRECVEM mechanisms

9.1 Based on necessary choices

According to Stéphane ‘Alf’ Briat, mixer: “Given that storage price is low and that editing is easy, musicians postpone decisions until the last moment. As a result, I regularly get sessions with an inordinate number of tracks. Therefore, part of my work often consists in toning down tracks, or, even better, in simply removing them.” [50]

Original quote in French : “L’espace disque, ça coûte pas cher, et éditer dans ProTools, c’est facile. Du coup, les musiciens veulent tout décider au dernier moment. Quand la session m’arrive, il y a parfois énormément de tracks. Je dois me débrouiller pour rendre certaines secondaires, ou, si je peux, les muter.”

One goal of mixing is therefore prioritization or even removal of tracks. This practice contradicts one widely recognized best practice: making all tracks as audible as possible [14-15, 51-52].

9.2 Based on social conventions

The music industry is a competitive environment, and this reflects on the egos: those who have succeeded may demand privileges - having one’s voice always mixed on the front is one particular privilege.
61 Céline Dion, "Encore un Soir", 2016

In this example, Céline Dion’s vocals are mixed consistently loud. Quoting the mixer for this track: “this is a diva situation, clearly the singer’s status has priority over the instrumental tracks; she must be completely at the forefront. In my initial mix, she was 1dB softer, and I already thought she was mixed loudly .” [53]

From the original French: “parce qu'on est sur un truc 'Diva' où clairement le statut de la chanteuse est largement prépondérant par rapport à l'instrumental et qu'elle doit être largement devant; à la base sur mon mix initial la voix était 1 dB en dessous (et encore je la trouvais forte...)”

62 Oasis, "Stand by Me", 2011

The Gallagher brothers are both strong characters [54]. One is the guitar player; the other is the lead singer. This is reflected in the mix, which favors both brothers at the expense of the other instruments (drums, bass).

63 Depeche Mode, "Heaven", 2013

Depeche Mode’s main author/composer is Martin L. Gore, who can be seen live playing diverse instruments and sometimes singing. The front man is lead singer Dave Gahan, who also wrote a few songs for the band [55]. Their respective roles are balanced, and so is the mix: no part is favored over the other.



There are many goals that mixing can reach beside technical correctness:

  • The mix may pretend to be transparent.
  • The mix may help create / convey a particular feeling.
  • The mix may follow trends (current or past)
  • The mix may help create abstract structures (sound scenes, movements, space sequences, illusions, articulations) or help creating the musical structure.
  • The mix may enhance and correct sounds (punctuating, making more lush, making more "dancy", perfecting)
  • The mix may prioritize tracks (on musical or social bases)

Automatic mixing currently addresses only a few of these goals (4.1, 6, 8.2). We wish that the community would investigate the technical means to reach more goals, so that automated mixing may provide more perspectives and stand a chance to emulate a human mixer.


