What does it mean to teach machines to listen? And how does our understanding of “listening” inform how we “tune” machine ears to listen to the world around us?
In this course, students will learn how to teach machines to listen from the ground up. We will see how design decisions in building these systems inform just what these machines are able to listen for. Beginning with fundamental audio signal processing techniques, students will learn the building blocks to go from machines that respond to simple tones to ones that recognize speech and eventually understand complex sounds in our environment. Complementing these technical exercises are readings and case studies that help contextualize this technology within a larger history of teaching machines to understand the world through sound and audio. These examples highlight our own biases and presumptions in building these systems, focusing us to ask: what is the machine listening for, and for whom?
This class will primarily be guided through academic readings and in-class/take home programming exercises. Experience with programming is a prerequisite. Not simply a technical programming course, however, this course can also be though of as a History of Technology or Science and Technology Studies course, using machine listening, speech recognition, voice interfaces, environmental sound classification, and audio understanding as topics to explore a techno-history that extends back to pre-electronic practices from the late 19th century to our contemporary moment with Alexa, Google Home, and Siri ever present. We will examine this technology alongside papers, articles, and scholarly writings to frame our interaction with this pursuit of teaching machines to listen within a particular history and context, as though we are archeologists examining this technological artifact through the lens of the humanities, social science and anthropology. The intention is to become better informed technologists, equipped with both technical skill, historical context, and critical design approaches to create listening machines responsibly and ethically, mitigating the risks and harm for those it listens to.
Class 1 Presentation: https://docs.google.com/presentation/d/1PK628ZIwQW9GWWvM42FS5txZqehg57U_QFTP6gDYDyI/edit?usp=sharing
Reading
Sterne, Jonathan. “Is Machine Listening Listening?” Preprint, University of Massachusetts Amherst, 2022. https://doi.org/10.7275/ZEQH-EG38.
Napolitano, Domenico, and Renato Grieco. “The Folded Space of Machine Listening.” SoundEffects - An Interdisciplinary Journal of Sound and Sound Experience 10, no. 1 (2021): 173–89. https://doi.org/10.7146/se.v10i1.124205
Homework Assignment
Programming assignment (still TBD but maybe one of those)
Class 2 Presentation: TBD
Reading
Li, Xiaochang, and Mara Mills. “Vocal Features: From Voice Identification to Speech Recognition by Machine.” Technology and Culture 60, no. 2S (2019): S129–60. https://doi.org/10.1353/tech.2019.0066.
Li and Mills - 2019 - Vocal Features From Voice Identification to Speec.pdf
Homework Assignment
Class 3 Presentation: TBD
Reading
Edward B. Kang. 2023. On the Praxes and Politics of AI Speech Emotion Recognition. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23). Association for Computing Machinery, New York, NY, USA, 455–466. https://doi.org/10.1145/3593013.3594011
On_the_Praxes_and_Politics_of_AI_Speech_Emotion_Recognition-Edward_B_Kang.pdf
Homework Assignment
Class 4 Presentation: TBD
Class 5 Presentation: TBD