New horizons for music creation

22 July, 2018 |

Contemporary popular music dominates the world’s music market. Over the past decades, it has evolved towards a music of sounds, heavily conditioned by technology. Digital instruments and software interfaces provide everybody with unprecedented freedom to express themselves through music creation, whether individually or collaboratively. Another revolution is yet to come. The recent progress in deep learning should prompt generative Artificial Intelligence to be the next horizon for music technology, resulting in a wave of innovations that will profoundly impact the scope of music production.

Music and technology

Throughout human history, music has been intertwined with technology. Without paper, the complexity of fugue or counterpoint can’t be achieved. Acoustic instruments are the result of several centuries of innovations, such as improved copper smelting to make brass instruments, or refinements in the chemical composition of steel to build piano frames. Since the ’50s, electronic synthesizers have enriched music with a wide range of novel sounds [1–4]. Today, the relations between music and technology are tighter than ever. Without digital technology, popular music in the twenty-first century is unthinkable. As early as the Beatles’ era, multitrack recording became central to popular music, placing itself at the core of the compositional process. Gradually, the recording medium has been utilized for its own creative potential. Nowadays, music is made in Digital Audio Workstations or DAWs, i.e. in application software used for multi-track recording, editing and producing audio files. DAWs allow the user to alter and mix multiple tracks into a final produced piece [5–9].

Technology for the Masses: Martin Gore’s studio.

A worldwide language

Contemporary popular music is a worldwide phenomenon, resulting into aesthetic cosmopolitanism. Between 1960 and 2010, pop charts (USA excepted) have been observed to increasingly contain foreign music. As part of cultural globalization, pop-rock songs have brought people into greater proximity, leading to a core set of common musical cultural background, albeit expressed in many forms. While English may eventually become the common language of the world, contemporary popular music is already a universal communication mode [10–13].

A world industry: a US company selling Danish music in China

Unprecedented freedom

Today, musicians enjoy incredible freedom at all levels. For centuries, European musicians were judged according to their understanding of the strict rules of Western harmony. Nowadays, self-taught musicians top the charts. They can combine any element (notes, chords, samples, effects, acoustics…) in any manner they want, even forego pitch entirely. They can choose to use any musical system ever devised, from any time and any culture. Until the 1910s, only the instruments of the symphonic orchestra were considered acceptable sound sources. Today, the amount of virtual instruments, audio libraries, effects, etc… that can be inserted into DAWs is staggering. Basically, as long as a sound is audible, it’s OK to use it! Luigi Russolo’s dream for expanding the symphonic orchestra has been fulfilled beyond all expectations. Opera voices are famous for their use of “vocal formant”-based singing so that they can be heard above large ensembles, as a result sounding like variants of a single instrument. Pop music vocalists take advantage of studio technology to use their natural vocal timbre, whisper, talk, yell, laugh, and yet have every detail of their voice heard with perfect clarity. In line with Pierre Schaeffer’s vision of the studio being the most general instrument, all music is now possible [14–16].

A music of sounds

There is now no clear border between musicianship and sonic craftsmanship. The manipulation of digital audio has become the primary focus of contemporary music production. Great care is dedicated to the design of the right samples, the right sound. As Bill Stephney points it out : “Drum programming in rap is incredibly complex. You may get a kid who puts a kick from one record on one track, a kick from another record on another track, a Linn kick on a third track, and a TR-808 kick on a fourth – all to make one kick!”. Instruments are re-created during production. For instance, as documented in the magazine Guitar World, the opening guitar riff for David Bowie’s ‘Little Wonder’ was made by playing three E notes with different vibratos. The notes were sampled, then played on a keyboard – but not in their original octaves. Getting into so much trouble to get a two-second long pattern using two notes (E, at two different octaves), suggests that sound has become at least as pertinent as pitch. Individual samples have come to carry the same commercial and aesthetic weight as the melody or lyrics in pop songs. As a result, generation of modern popular music must consider sound as part of the music. Input datasets and generated content should not be strictly limited to symbolic representations such as scores [1,5,17–20].

Finding the right snare drum sound

Production tools for everyone

Two revolutions have impacted the role of music in Western society. At the turn of the XXth century, new media, such as the radio and the phonograph, made music consumption accessible to everyone. But the production of music was still a privilege. Then came the home studio. By the mid-1990s, digital multi-track recorders had become available at modest prices. Home studio technologies have fulfilled the dream of a professional quality home recording. The distinction between what a ‘professional’ or ‘commercial’ project studio, and what a ’personal’ or ’home’ studio can produce, has become increasingly difficult to make. Any personal computer can be used to make music. Both making and listening to music are now accessible to anyone. Today, everybody can be a music producer [5,21].

Collaborative music

From Palestrina to Mahler, the Western classical music tradition is championed by solitary composers who write music single-handedly. Jazz and rock changed this practice, with many music songs and al- bums credited to several members in a band. Contemporary productions show an even greater degree of collaboration. The credits for David Guetta’s album ’Listen’ include 32 unique composers. To collaborate, people do not even need to work concomitantly: many musicians credit people they have sampled but may never have met. Sampling is part of the ’remix culture’, in which authors re-use somebody else’s content for their own production. Although the practice is not specific to contemporary popular music (see Messiaen’s extensive ’borrowing’ practices), the literal, sample-by-sample quoting of somebody else’s recording is a source of legal problems. Proofs of authorship stored in a blockchain may be a solution to trace individual contributions. Shared authorship technologies for music production are likely to become a necessity in the near-future [22–30].

Thirty-two single authors, one album

Music A.I.

The general public is becoming more and more aware of the astounding rate of innovation currently taking place in the field of automatic image generation by means of Artificial Intelligence. What about music? Even though two music albums involving A.I. have been recently released, music generation is comparatively struggling. Until recently, direct generation of audio was made difficult by technical limitations. Generated content was therefore limited to representations such as scores. This situation may change as in the last few months, innovations from Google’s Wavenet have been used to generate audio directly from neural networks. This is a major turnaround, as it may facilitate the adaptation of some remarkable image generation techniques to music. Examples may include automatic rendering of symbolic content using conditional adversarial networks, production of samples by interpolation using latent space regularization in variational auto-encoders, content-based recommendation using context convolutional networks (including navigation in audio library), or context-based generation using inpainting techniques adapted to audio [31–38].

New horizons for music creation

Cosmopolitan, collaborative with traceable authorship, assisted by artificial intelligence: these are new features of music for the years to come. For the musician, the concept of an organic, self-adaptive environment is particularly striking: music composition for everyone, everywhere, with anyone! The interaction with A.Is. will extend the scope of human capabilities, enabling strange and wonderful musical creativity experiences that we cannot yet imagine. Sony’s contribution to this exciting new potential of creation for the human mind is both an impressive challenge and a wonderful opportunity.

New horizons: a representation of Bach music’s latent space


[1] François Delalande. Le son des musiques. Paris: Buchet-Castel, 2001.
[2] J. Peter Burkholder and Donald Jay Grout. A History of Western Music: Ninth International Student Edition. WW Norton & Company, 2014.
[3] Edmund Bowles. The impact of technology on musical instruments. Cosmos Journal, 1999.
[4] Martin Russ. Sound synthesis and sampling. Taylor & Francis, 2004.
[5] Paul Théberge. Plugged in: technology and popular music. The Cambridge companion to pop and rock, pages 3–25, 2001.
[6] Greg Milner. Perfecting Sound Forever. Faber & Faber, 2010.
[7] Brian Eno. Pro session: The studio as compositional tool – part I. lecture delivered at New Music New York, the first New Music America Festival sponsored in 1979 by the Kitchen, excerpted by Howard Mandel, Down Beat 50 (July 1983) 56, 1979.
[8] William Moylan. Understanding and crafting the mix: The art of recording. CRC Press, 2014.
[9] Alan P. Kefauver and David Patschke. Fundamentals of digital audio, volume 22. AR Editions, Inc., 2007.
[10] Motti Regev. Pop-rock music: aesthetic cosmopolitanism in late modernity. John Wiley & Sons, 2013.
[11] Global music report 2017: Annual state of the industry. Technical report, International Federation of the Phonographic Industry, 2017.
[12] Marc Verboord and Amanda Brandellero. The globalization of popular music, 1960-2010: a multi-level analysis of music flows. Communication Research, 2016.
[13] David Huron. Science & music: Lost in music. Nature, 453(7194):456, 2008.
[14] Luigi Russolo. L’art des bruits: manifeste futuriste 1913. Introd. de Maurice Lemaitre. Richard Masse, 1954.
[15] Johan Sundberg. The acoustics of the singing voice. Scientific American, 236(3):82–91, 1977.
[16] Pierre Schaeffer. A la recherche d’une musique concrète. Seuil, 1952.
[17] Brian Eno. The Dick Flash interview., 2010.
[18] Bill Stephney. Keyboard 14[11]. Referenced from Paul Théberge, Any Sound you can Imagine. Wesleyan University Press, 1990, p. 195, November 1988.
[19] Guitar World Staff. 22 guitarists, producers and gear makers reveal their tone secrets. Guitar World, April 2017.
[20] Paul Théberge. Any sound you can imagine: Making music/consuming technology. Wesleyan University Press, 1997.
[21] Timothy D Taylor, Mark Katz, and Tony Grajeda. Music, sound, and technology in America: a documentary history of early phonograph, cinema, and radio. Duke University Press, 2012.
[22] David Guetta. Listen, Parlophone 2014.
[23] Will Fulford-Jones. “Sampling,” Grove Music Online. 2011.
[24] Oshani Seneviratne and Andres Monroy-Hernandez. Remix culture on the web: A survey of content reuse on different user-generated content websites. 2010.
[25] Yves Balmer, Thomas Lacôte, and Christopher Murray. Le modèle et l’invention: Olivier Messiaen et la technique de l’emprunt. Série 20-21, Series editor Nicolas Donin, 2017.
[26] Jeffrey H Brown. They don’t make music the way they used to: The legal implications of sampling in contemporary music. Wis. L. Rev., page 1941, 1992.
[27] Imogen Heap. Blockchain could help musicians make money again. Harvard Business Review, June 2017.
[28] Patrick Goldstein and James Rainey. Hans Zimmer to academy: I’m no liar! Los Angeles Times, December 2008.
[29] Michael Crosby, Pradan Pattanayak, Sanjeev Verma, and Vignesh Kalyanaraman. Blockchain technology: beyond Bitcoin. Applied Innovation, 2:6–10, 2016.
[30] Marcus O’Dair, Zuleika Beaven, David Neilson, Richard Osborne, and Paul Pacifico. Music on the blockchain. 2016.
[31] Dom Galeon. The world’s first album composed and produced by an AI has been unveiled., August 2017.
[32] Alex Marshall. Is this the world’s first good robot album?, January 2018.
[33] Aaron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
[34] Chris Donahue, Julian McAuley, and Miller Puckette. Synthesizing audio with generative adversarial networks. arXiv preprint arXiv:1802.04208, 2018.
[35] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004, 2017.
[36] Gaëtan Hadjeres, Frank Nielsen, and François Pachet. GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures. arXiv preprint arXiv:1707.04588, 2017.
[37] Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep content-based music recommendation. In Advances in neural information processing systems, pages 2643–2651, 2013.
[38] Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2536–2544, 2016.