Go to JKU Homepage
Institute of Computational Perception
What's that?

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Con Espressione

Towards Expressivity-aware Computer Systems in Music

Falling Walls 2021 Science Breakthrough of the Year

We would like to point out that when playing the video, data may be transmitted to external parties. Learn more by reading our data privacy policy

2021: Fantastic news towards the end of the project ...

Project Summary

What makes music so important, what can make a performance so special and stirring? It is the things the music expresses, the emotions it induces, the associations it evokes, the drama and characters it portrays. The sources of this expressivity are manifold: the music itself, its structure, orchestration, personal associations, social settings, but also - and very importantly - the act of performance, the interpretation and expressive intentions made explicit by the musicians through nuances in timing, dynamics etc. 

Thanks to research in fields like Music Information Research (MIR), computers can do many useful things with music, from beat and rhythm detection to song identification and tracking. However, they are still far from grasping the essence of music: they cannot tell whether a performance expresses playfulness or ennui, solemnity or gaiety, determination or uncertainty; they cannot produce music with a desired expressive quality; they cannot interact with human musicians in a truly musical way, recognising and responding to the expressive intentions implied in their playing. 

The project is about developing machines that are aware of certain dimensions of expressivity, specifically in the domain of (classical) music, where expressivity is both essential and - at least as far as it relates to the act of performance - can be traced back to well-defined and measurable parametric dimensions (such as timing, dynamics, articulation). We will develop systems that can recognise, characterise, generate, modify, and react to expressive qualities in music. To do so, we will (1) bring together the fields of AI, Machine Learning, Music Information Retrieval (MIR), and Music Performance Research; (2) integrate knowledge from musicology to build more well-founded models of music understanding; (3) train and validate computational models with massive musical corpora.

In terms of computational methodologies, we will rely on, and improve, methods from Artificial Intelligence - particularly: (deep) machine learning (for learning musical features, abstractions, and representations from musical corpora, and for inducing mappings for expression recognition and prediction); probabilistic modeling (for information fusion, tracking, reasoning and prediction); and audio signal processing and pattern recognition (for extracting musical parameters and patterns relevant to expressivity). This will be combined with models of structure perception from fields like systematic and cognitive musicology, in order to create systems that have a somewhat deeper 'understanding' of music, musical structure, music performance, and musical listening, and the interplay of these factors in making music the expressive art that it is. (A more detailed discussion of how we believe all these things relate to each other can be found in the "Con Espressione Manifesto").

With this research, we hope to contribute to a new generation of MIR systems that can support musical services and interactions at a new level of quality, and to inspire expressivity-centered research in other domains of the arts and human-computer interaction (HCI).

Project Details

Call identifier

ERC-2014-AdG

Project Number

670035

Project Period

Jan 2016 - Dec 2021

Funding Amount

€ 2,318,750.00

The Con Espressione Game ...

After about 2 years of data gathering in the Con Espressione Game, opens an external URL in a new window, we collected, cleaned, and analysed the performance characterisations (thanks to all of you who contributed!), and can now offer the following results:

The Con Espressione Manifesto

Here is our view (2017) on the current research landscape, and what research needs to be done in the coming years (within and beyond Con Espressione):

(If you are interested in applying for a research position in the project, please think about how your research ideas and plans would fit in this scheme (or go beyond it, because we may have missed some crucial directions ...). 
 

News

People and Cooperations

Matthias Dorfer

Ali Nikrang

Results 1:
Publications, Resources, Presentations, Media Coverage

Want to know more about the scientific work and results of the project?

Here's an up-to-date list of our scientific publications related to the project.

Publishing our methods, experimental software, and data is one of our guiding principles, and we try to do it wherever legal restrictions (e.g., copyright on music data) permit it. Research software and data associated with specific scientific papers is linked to from our publications page (so that you have the appropriate papers to go with the resource).


More general resources (such as our computational model of expressive performance (the "Basismixer"), the Con Espressione Dataset, or the Partitura score manipulation software) can be found here.

Results 2:
Demonstrators and Prototypes


The ACCompanion (work in progress, opens an external URL in a new window) is an automatic accompaniment system designed to accompany a pianist in music for two pianists (or two pianos). The ACCompanion will not just follow the soloist in real time and synchronise with her playing, but will also recognise and anticipate expressive intentions and playing style of the soloist, and contribute its own expressive interpretation of the accompaniment part (via the "BasisMixer" expressive performance model).

Aug. 2019: First demonstrations with polyphonic music:

Late 2020 (hopefully): Much more spectacular music (to come - as soon as Covid-19 permits it ...)

The Con Espressione! Exhibit, opens an external URL in a new window is an interactive system designed for popular science exhibitions. It demonstrates and enables joint human-computer control of expressive performance: the visitor controls overall tempo and loudness of classical piano pieces (such as Beethoven's "Moonlight" sonata Op.27 No.2) via hand movements (tracked by a LeapMotion, opens an external URL in a new window sensor). In the background, our "Basis Mixer" expressive performance model adds subtle modifications to the performance, such as articulation and micro-timing (e.g., slight temporal differences in the note onsets when playing chords). The contribution of the Basis Mixer can be controlled and experimented with via a slider.

The exhibit was first shown at the La La Lab Science Exhibition, opens an external URL in a new window ("The Mathematics of Music") in Heidelberg, Germany (May 2019 - August 2020). The source code is openly available via a github repository, opens an external URL in a new window.

 

Some videos with and about the Con Espressione Exhibit:


The Basis Mixer is a comprehensive computational model of expressive music performance that predicts musically appropriate patterns for various performance parameters (tempo, timing, dynamics, articulation, ...) as a function of the score of a given piece. It is based on so-called basis functions (feature functions that describe various relevant aspects of the score) and state-of-the-art deep learning methods. A comprehensive description can be found in Carlos Cancino's Ph.D. thesis, opens an external URL in a new window (Dec. 2018). The model has been used as an experimental tool for studying or verifying various hypotheses related to expressive piano performance, and can also be used to generate expressive performances for new pieces. An early version of the model is said to have passed a "musical Turing Test", producing a piano performance whose "humanness" as judged by listeners was undistinguishable from a human musician's [and ranking best, in this respect, among a number of algorithms] in a recent study (E. Schubert et al., "Algorithms Can Mimic Human Piano Performance: The Deep Blues of Music, opens an external URL in a new window",  E. Schubert et al., J.New.Mus.Res. 2017)

The Basis Mixer is used as an autonomous expressive performance generator in several of our demonstrators (e.g., the ACCompanion and the Con Espressione! Exhibit).

Resources:

 

Our Two-level Mood Recogniser, opens an external URL in a new window is a deep neural network that learns to recognise emotional characteristics of a musical piece from audio, together with (and based on) human-interpretable, mid-level perceptual features, opens an external URL in a new window. This permits it not only to make predictions regarding some emotion/mood-related qualities that humans may perceive in a piece of music, but also to provide explanations for its predictions, in terms of general musical concepts (such as "melodiousness" or "rhythmic complexity") that most (Western) music listeners may intuitively understand. These "mid-level" concepts can then be further traced back to aspects of the actual audio in a second level of explanation, opens an external URL in a new window, if the user so wishes.

Here are two little demo pages with examples:

 


Non Sequitur, opens an external URL in a new window is an experimental implementation of a tool for generating complex rhythms with complex poly-rhythms and micro-timing. It is based on several individual, partly dependent clocks realised as oscillators that can influence each others' periodicities by virtue of being connected in a network.

Resources:


The Listening Machine, opens an external URL in a new window (picture, opens a file) is an interactive exhibit designed for the permanent exhibition "Understanding AI", opens an external URL in a new window at the  Ars Electronica Center (AEC), opens an external URL in a new window, to demonstrate real-time computational sound/music perception to the general public. It is based on a deep neural network that has been trained, via machine learning methods and using thousands of sound examples, to recognise different kinds of sounds, by finding out what patterns in the sound signal are characteristic of certain classes – for example, what distinguishes a flute from a trumpet, or spoken language from singing.

Here is a video documentary, opens an external URL in a new window produced by the AEC on the occasion of the opening of the new AEC permanent exhibition (May 2019).


The robod, opens a file is our little robot drummer that continually listens to its surroundings through a microphone, recognises when music is played, automatically determines the meter and downbeat, and accompanies the musicians in real time, adapting to expressive changes of tempo.
It was designed on the occasion of the BE OPEN Public Science Festival, opens an external URL in a new window in the city center of Vienna (Sept. 2018), organised by the Austrian Science Fund (FWF) , opens an external URL in a new windowon the occasion of its 50th anniversary.

To come (still in the making ...): Demonstration video.

 

Results 3:
Some Special Things

 

Falling Walls 2021 Science Breakthrough of the Year (Art & Science), Falling Walls Science Summit 2021, Berlin

We would like to point out that when playing the video, data may be transmitted to external parties. Learn more by reading our data privacy policy

Acknowledgments

This project receives funding from the European Research Council (ERC), opens an external URL in a new window under the European Union's Horizon 2020 research and innovation programme under grant agreement No 670035.

In addition, we gratefully acknowledge material support for this research (in the form of music, scores, access to musical instruments and performance spaces) from the following institutions: