termehtaheri/SAR-LM

Official implementation of “SAR-LM: Symbolic Audio Reasoning with Large Language Models”, a modular framework for interpretable audio reasoning with large language models.

/ 100

Experimental

When analyzing audio, SAR-LM helps you understand why a system makes certain decisions by converting sounds like speech, music, or specific events into easy-to-read text descriptions. It takes raw audio files as input and provides detailed, human-understandable explanations for what's happening in the sound, rather than just a final answer. Researchers studying audio, sound engineers, or anyone needing transparent insights into audio content would find this useful.

Use this if you need to understand the 'why' behind audio analysis results, such as identifying specific sound events or speech patterns, rather than just getting a summarized output.

Not ideal if you only need quick, high-level summaries of audio content without requiring a detailed breakdown of the underlying reasoning.

audio-analysis sound-recognition speech-analysis music-information-retrieval acoustic-research

No Package No Dependents

Maintenance 6 / 25

Adoption 3 / 25

Maturity 13 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Explore Transformer Models

All categories Trending Transformer directory Insights