Skip to main content

FOCI GenAI/LLM Users Group: "LLMs for Audio Applications" (01 May 2024)

Posted April 26, 2024
FOCI GenAI/LLM Users Group
FOCI GenAI/LLM Users Group
6p Weds, 01 May 2024
Amos Eaton 214

WHAT: "LLMs for Audio Applications"
LEADER: Abraham Sanders
VIDEO: https://youtu.be/GcwauiJI_Ck
EVENT PAGE:  https://bit.ly/foci_llm_users_01may2024
WHEN: 6p, 1 May 2024
WHERE: AE 214
CONTACT: Aaron Green <greena12@rpi.edu>

DESCRIPTION:  In this talk we explore how Audio Language Models work and how to use them. Specifically, we look at how audio waveforms are converted into sequences of discrete tokens that can be handled by an autoregressive language model and converted back again into raw audio. We then review how such audio language models can be applied to common audio tasks such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Speech-To-Speech Machine Translation. We conclude with a discussion of future-focused applications including text-guided music generation and full-duplex spoken dialogue agents.

BIO: Abraham is a third-year PhD student in Cognitive Science at RPI, working in the LACAI lab with Dr. Tomek Strzalkowski. His research interests include open-domain and goal-oriented conversational agents along with multimodal, natural spoken dialogue systems. Previously, he was a lead software engineer at Nextech Systems where he worked on electronic health record system interoperability.

Slides and a recording of this talk will be available after 01 May

Recordings of previous FOCI GenAI Users Group sessions:

Remote video URL