Cyran Aouameur

Date

10/10/2018 11:00 am

Sony Computer Science Laboratories Paris

Recently, music production has turned essentially digital, hence, drastically increasing the scope of possibilities in terms of sounds synthesis and textures. Thus, the sound design process has become increasingly free and complex given the overwhelming amount of parameters provided by modern synthesizers. Therefore, methods allowing an easy and rich fine-tuning of sounds become a key requirement in music production, especially for non-expert users. Moreover, at a time where home-studios are becoming a norm in music production, it is essential to develop simple and lightweight tools that target users that create music on a single computer. Passionate about urban music since I was a child, I have always looked for music that would make me nod my head. Today, I have identified that rhythm and percussive sounds are particularly important to me. That is why I am focusing on drums and developing AI-based solutions for helping artists to easily design original drum sounds and rhythms, with the final goal of enlarging the horizon of possibilities, always trying to avoid framing artists creativity.

Modern synthesizers are getting increasingly powerful and now provide an overwhelming amount of parameters to carve a sound spectrum. This simultaneously increases creative freedom but can also complicate the sound design process. In parallel, recent generative learning models have been developed towards audio synthesis. Here, we aim at providing intuitive control over sound synthesis with deep learning models, through synthesis by learning. Only a limited number of approaches have been proposed to deal with this new type of synthesis, which allows learning a synthesizer directly from audio sample examples. One of the most important proposals relies on the framework of variational autoencoders, which allows generating sounds from a parameter latent space, by simultaneously learning inference and generation networks from existing data. In this work, we develop generative models for audio synthesis that are able to handle complex temporal information, thus, allowing to generate a wide variety of sounds. Here, we developed a model based on combinations of variational autoencoders and convolutional neural networks for audio synthesis. We also collected and labeled a dataset representing a variety of percussive sounds to train our model.

&inCSL

Learning Latent Spaces for Real-Time Audio Synthesis