Artificial Intelligence for Music
A workshop at 2025 ICME Annual Conference
Date: 2025/06/30 Monday
Workshop Summary
Music is an essential component of multimedia content. This workshop will explore the dynamic intersection of artificial intelligence and music. This workshop investigates how AI is changing the music industry and education, from composition to performance, production, collaboration, and audience experience. Participants will gain insights into the ways AI can enhance creativity and enable musicians and producers to push the boundaries of their art. The workshop will also discuss AI's impacts on music education and the careers of musicians. We will cover topics such as AI-driven music composition, where algorithms generate melodies, harmonies, and even full orchestral arrangements. Computer-generated music may be combined with computer-generated video to create the entire multimedia content. The workshop will discuss how AI tools can assist in sound design, remixing, and mastering, allowing for new sonic possibilities and efficiencies in music production. Additionally, the workshop will discuss the legal and ethical implications of AI in music, including questions of authorship, originality, and the role of the human artist in an increasingly automated world. This workshop is designed for AI researchers, musicians, producers, and educators interested in the status and future of AI in music. The organizing team will hold a competition for Automatic Music Transcription (AMT). This online competition will accept submissions worldwide, including both academia and industry. The winners will present their solutions at this ICME workshop. This competition is sponsored by the IEEE Technical Community on Multimedia Computing (TCMC) and the Computer Society. More details about this challenge will be available here.
Call for Papers
This one-day workshop will explore the dynamic intersection of artificial intelligence and multimedia with an emphasis on music and audio technologies. The workshop explores how AI is transforming music creation, recognition, and education, ethical and legal implications, as well as business opportunities. We will investigate how AI is changing the music industry and education—from composition to performance, production, collaboration, and audience experience. Participants will gain insights into the technological challenges in music and how AI can enhance creativity, enabling musicians and producers to push the boundaries of their art. The workshop will cover topics such as AI-driven music composition, where algorithms generate melodies, harmonies, and even full orchestral arrangements. We will discuss how AI tools assist in sound design, remixing, and mastering, allowing for new sonic possibilities and efficiencies in music production. Additionally, we'll examine AI's impact on music education and the careers of musicians, exploring advanced learning tools and teaching methods. AI technologies are increasingly adopted in the music and entertainment industry. The workshop will also discuss the legal and ethical implications of AI in music, including questions of authorship, originality, and the evolving role of human artists in an increasingly automated world. This workshop is designed for AI researchers, musicians, producers, and educators interested in the current status and future of AI in music.Topics of Interest
Topics of Interest include, but are not limited to- AI-Driven Music Composition and Generation
- AI in Music Practice and Performance
- AI-based Music Recognition and Transcription
- AI Applications in Sound Design
- AI-Generated Videos to Accompany Music
- AI-Generated Lyrics Based on Music
- Legal or Ethical Implications of AI on Music
- AI's Impacts on Musicians' Careers
- AI Assisted Music Education
- Business Opportunities of AI and Music
- Music Datasets and Data Analysis
Submission Requirements
Please follow the submission requirements of ICME 2025. Papers must be no longer than 6 pages, including all text, figures, and references. This workshop will follow ICME submission and adopt double blind reviews. Authors should not identify themselves in the submitted PDF files.
Work in progress is welcome. Authors are encouraged to include descriptions of their prototype implementations. Additionally, authors are encouraged to interact with workshop attendees by including posters or demonstrations at the end of the workshop. Conceptual designs without any evidence of practical implementation are discouraged.
The authors agree that their papers submitted to this workshop have not been previously published (or accepted) in substantially similar forms. Furthermore, authors should not submit any papers that contain significant overlap with any papers that are being reviewed by a conference or a journal.
Submit papers to CMT.Important Dates
- Submission Deadline: April 1, 2025 (11:59PM Pacific Time)
- Notification of Acceptance: April 25, 2025
- Final Version Due: May 15, 2025
Accepted papers will be posted on the workshop website and IEEEXplore.
Workshop Schedule
Time | Topic |
---|---|
08:30AM | Welcome by Organizers: Yung-Hsiang Lu and Yeon-Ji Yun |
08:40AM | Keynote Speech by Zhiyao Duan. Moderator: Yeon-Ji Yun |
09:30AM | Invited Speech by Fatemeh Jamshidi. Moderator: Yeon-Ji Yun |
10:10AM | Break |
10:20AM | Invited Speech by Gus Xia. Moderator: Emmanouil Benetos |
11:00AM | Invited Speech by Geoffroy Peeters. Moderator: Emmanouil Benetos |
11:40PM | Discussion with the Morning Speakers. Moderator: Emmanouil Benetos |
12:00PM | Lunch Break |
01:00PM | Invited Speech by Emmanouil Benetos. Moderator: Zhiyao Duan |
01:40PM | Paper Presentations. Moderator: Zhiyao Duan
|
03:20PM | Break |
03:30PM |
Panel Discussion Moderator: Gus Xia. Panelists: Geoffroy Peeters, Emmanouil Benetos, Zhiyao Duan, Ziyu Wang. |
04:30PM | Winners of the Transcription Challenge. Moderator: Yung-Hsiang Lu
|
05:00PM | Adjourn |
Invited Speakers

Geoffroy Peeters
Geoffroy Peeters is full-professor in the (Laboratoire Traitement et Communication de l'Information ) S2A team at Télécom Paris. He received his PHDs degree in 2001 and Habilitation in 2013 from University Paris-VI on audio signal processing, data analysis and machine learning. Before joining Télécom Paris, he lead research related to Music Information Retrieval at IRCAM ('Institut de recherche et coordination acoustique/musique). His current research work is on signal processing, machine learning and deep learning applied to audio and music data analysis.
Self-Supervised Learning for Invariant and Equivariant representations
Abstract: Self-supervised learning aims to apply supervised learning algorithms without the need for annotated data. It can therefore offer a solution for training ML-based systems in music, a domain where annotated data is often scarce. In this talk, we review recent advances in self-supervised learning applied to music, focusing on its two main paradigms: invariance (e.g., contrastive, masking, teacher-student, clustering, information-based, multi-modal) and equivariance. More precisely, we present our contributions: MatPac as foundation models, Stem-JEPA for generation, PESTO for pitch, PESTO-T for tempo, and CPC for beat detection.

Zhiyao Duan
Zhiyao Duan is an associate professor in Electrical and Computer Engineering, Computer Science, and Data Science at the University of Rochester. He is also a co-founder of Violy, a company aiming to improve music education through AI. His research interest is in computer audition and its connections with computer vision, natural language processing, and augmented and virtual reality. He received a best paper award at the Sound and Music Computing (SMC) Conference in 2017, a best paper nomination at the International Society for Music Information Retrieval (ISMIR) Conference in 2017, and a CAREER award from the National Science Foundation (NSF). His work has been funded by NSF, National Institute of Health, National Institute of Justice, New York State Center of Excellence in Data Science, and University of Rochester internal awards on AR/VR, health analytics, and data science. He is a senior area editor of IEEE Signal Processing Letters, an associate editor for IEEE Open Journal of Signal Processing, and a guest editor for Transactions of the International Society for Music Information Retrieval. He is the President of ISMIR.

Fatemeh Jamshidi
Fatemeh Jamshidi is an Assistant Professor in the Department of Computer Science at Cal Poly Pomona. Her research spans artificial intelligence, computer science education, computer music, machine learning and deep learning in music, game AI, human-AI collaboration, as well as augmented and mixed reality. She has published in prestigious venues, including ACM SIGCSE, ISMIR, IEEE, and HCII. Fatemeh earned her Ph.D. in Computer Science and Software Engineering and a master's in Music Education from Auburn University in 2024 and 2023, respectively. During her Ph.D., she founded the Computing + Music programs, which have engaged hundreds of participants from underrepresented groups since 2018. From 2020 to 2023, she also served as the Director of the Persian Music Ensemble at Auburn University. Her long-term goal is to establish a music technology center that fosters undergraduate and graduate research in areas such as music therapy, music generation, game music, and mixed reality in music.

Gus Xia
Gus Xia is an assistant professor of Machine Learning at the Mohamed bin Zayed University of Artificial Intelligence in Masdar City, Abu Dhabi. His research includes the design of interactive intelligent systems to extend human musical creation and expression. This research lies at the intersection of machine learning, human-computer interaction, robotics, and computer music. Some representative works include interactive composition via style transfer, human-computer interactive performances, autonomous dancing robots, large-scale content-based music retrieval, haptic guidance for flute tutoring, and bio-music computing using slime mold.

Emmanouil Benetos
Emmanouil Benetos is Reader in Machine Listening and Director of Research at the School of Electronic Engineering and Computer Science of Queen Mary University of London. Within Queen Mary, he is member of the Centre for Digital Music and Centre for Multimodal AI, is Deputy Director at the UKRI Centre for Doctoral Training in AI and Music (AIM), and co-leads the School's Machine Listening Lab. His main area of research is computational audio analysis, also referred to as machine listening or computer audition - with applications to music, urban, everyday and nature sounds.
Machine learning paradigms for music and audio understanding
Abstract: The area of computational audio analysis -also called machine listening- continues to evolve. Starting from methods grounded in digital signal processing and acoustics, followed by supervised machine learning methods that require large amounts of labelled data, recent approaches for learning music audio representations are fueled by advances in the broader field of artificial intelligence. The talk will outline recent research carried out at the Centre for Digital Music of Queen Mary University of London focusing on emerging learning paradigms for making sense of music and audio data. Topics covered will include learning in the presence of limited audio data, the inclusion of other modalities such as natural language to aid learning music representations, and finally methods for learning from unlabelled audio data - with the latter being used as a first step towards the creation of music foundation models.