termehtaheri/SAR-LM
Official implementation of “SAR-LM: Symbolic Audio Reasoning with Large Language Models”, a modular framework for interpretable audio reasoning with large language models.
When analyzing audio, SAR-LM helps you understand why a system makes certain decisions by converting sounds like speech, music, or specific events into easy-to-read text descriptions. It takes raw audio files as input and provides detailed, human-understandable explanations for what's happening in the sound, rather than just a final answer. Researchers studying audio, sound engineers, or anyone needing transparent insights into audio content would find this useful.
Use this if you need to understand the 'why' behind audio analysis results, such as identifying specific sound events or speech patterns, rather than just getting a summarized output.
Not ideal if you only need quick, high-level summaries of audio content without requiring a detailed breakdown of the underlying reasoning.
Stars
4
Forks
—
Language
Python
License
MIT
Category
Last pushed
Nov 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/termehtaheri/SAR-LM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.