facebookresearch/large_concept_model
Large Concept Models: Language modeling in a sentence representation space
Operates on language- and modality-agnostic sentence embeddings from the SONAR space (supporting 200+ languages) rather than raw tokens, enabling cross-lingual concept reasoning. Implements multiple generative approaches including MSE regression and diffusion-based models, trained on 1.3T tokens with 1.6B parameters using fairseq2 and Hydra for configuration. Integrates with SONAR embeddings, HuggingFace datasets, and provides data processing pipelines using SaT for sentence tokenization.
2,341 stars. No commits in the last 6 months.
Stars
2,341
Forks
208
Language
Python
License
MIT
Category
Last pushed
Jan 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/facebookresearch/large_concept_model"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
yaserkl/RLSeq2Seq
Deep Reinforcement Learning For Sequence to Sequence Models
kefirski/pytorch_RVAE
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch
ctr4si/A-Hierarchical-Latent-Structure-for-Variational-Conversation-Modeling
PyTorch Implementation of "A Hierarchical Latent Structure for Variational Conversation...
georgian-io/Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
nurpeiis/LeakGAN-PyTorch
A simple implementation of LeakGAN in PyTorch