Music and technology always go together. Without paper, there is no fugue or counterpoint.
Acoustic instruments are the result of several centuries of innovations. Electronic synthesisers
have enriched music with- a wide range of novel sounds. Without digital technology, popular
music in the twenty-first century is unthinkable.
At Sony CSL, we imagine the technology for the music of the next decade. Our researchers
work with musicians and content providers to push the boundaries of creativity and understand
the complexity of modern music production processes. By combining cutting-edge A.I. research
with a strong musical expertise, we develop the artificial intelligence that will extend the scope of
the musician’s capabilities. We pave the way for musical experiences yet to imagine.
Learning features from data has shown to be more successful than using hand-crafted features for many machine learning tasks. Inmusicinformationretrieval(MIR), features learned from windowed spectrograms are highly variant to transformations like transposition or time-shift. Such variances are undesirable when they are irrelevant for the respective MIR task. We propose an architecture called Complex Autoencoder (CAE) which learns features invariant to orthogonal transformations. Mapping signals onto complex basis functions learned by the CAE results in a transformation-invariant “magnitude space” and a transformation-variant“phase space”. Thephasespaceis useful to infer transformations between data pairs. When exploiting the invariance-property of the magnitude space, we achieve state-of-the-art results in audio-to-score alignment and repeated section discovery for audio. A PyTorch implementation of the CAE,including the repeated section discovery method, is available online.
Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback–Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold. We then show how to generate MCIG for arbitrary separable statistical divergence between distributions belonging to a same parametric family of distributions.
Inpainting-based generative modeling allows for stimulating human-machine interactions by letting users perform stylistically coherent local editions to an object using a statistical model. We presentNONOTO, a new interface for interactive music generation based on in-painting models. It is aimed both at researchers, by offering a simple and flexible API allowing them to connect their own models with the interface, and at musicians by providing industry-standard features such as audio playback, real-time MIDI output and straightforward synchronization with DAWs using Ableton Link.
Creative industries constantly strive for fame and popularity. Though highly desirable, popularity is not the only achievement artistic creations might ever acquire. Leaving a longstanding mark in the global production and influencing future works is an even more important achievement, usually acknowledged by experts and scholars. ‘Significant’ or ‘influential’ works are not always well known to the public or have sometimes been long forgotten by the vast majority. In this paper, we focus on the duality between what is successful and what is significant in the musical context. To this end, we consider a user-generated set of tags collected through an online music platform, whose evolving co-occurrence network mirrors the growing conceptual space underlying music production. We define a set of general metrics aiming at characterizing music albums throughout history, and their relationships with the overall musical production. We show how these metrics allow to classify albums according to their current popularity or their belonging to expert-made lists of important albums. In this way, we provide the scientific community and the public at large with quantitative tools to tell apart popular albums from culturally or aesthetically relevant artworks. The generality of the methodology presented here lends itself to be used in all those fields where innovation and creativity are in play.
We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of a given musical corpus. Instead of using the n–body interactions of (n−1)–order Markov models, traditionally used in automatic music generation, we use a k-nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don’t need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. Our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, our scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation.