Spring 2022 – Fall 2025

Mus2Vid

A real-time art project that uses diffusion models to generate video depictions in response to classical music, with recurrent and transformer networks estimating emotion and genre.

Mus2Vid is a real-time art project that uses diffusion models to generate video depictions in response to classical music. It uses recurrent and transformer networks to analyze input audio and estimate its emotion and genre qualities, which are converted into text and fed to a text-to-image diffusion model to generate images.

Statement of Problem

Engaging classical-music listeners, especially audiences with hearing impairment, often relies on textual program notes or pre-rendered visuals that cannot react to live performance. We aim to generate visuals that follow the performance in real time.

Proposed Solution

The pipeline analyzes the live signal across multiple dimensions (rhythmic, harmonic, timbral), classifies emotional resonance via a learned model, converts the classification into text prompts, and feeds those prompts to a generative image model. The output updates continuously as the music evolves.

Demo

Looking for feedback on the prototype? Fill in the Qualtrics survey.

Impact

Mus2Vid aims to deepen accessibility for hearing-impaired audiences, enable personalized visual entertainment, expand the entertainment industry’s palette, and broaden how classical music can reach new listeners.

Team

Haichang Li (Lead)
Tim Nadolsky (Contributor)
Brian Ng (Former Lead)

All research