zhenye234/xcodec
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Combines acoustic encoding with learned semantic representations (HuBERT, WavLM) through dual feature extraction and joint quantization, enabling unified codec tokens that preserve both content and acoustic properties. Integrates seamlessly with Hugging Face Transformers with pre-trained checkpoints optimized for speech (Librispeech, MLS) and general audio domains, supporting straightforward adaptation by injecting semantic modules into existing codecs via projection layers and MSE-based semantic preservation loss.
294 stars.
Stars
294
Forks
23
Language
Python
License
MIT
Category
Last pushed
Oct 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zhenye234/xcodec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
zhuhanqing/APOLLO
APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention
Y-Research-SBU/CSRv2
Official Repository for CSRv2 - ICLR 2026
HITESHLPATEL/Mamba-Papers
Awesome Mamba Papers: A Curated Collection of Research Papers , Tutorials & Blogs
psychofict/llm-effective-context-length
Investigating Why the Effective Context Length of LLMs Falls Short (Based on STRING, ICLR 2025)
MouxiaoHuang/PPE
[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in...