yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
Built on PyTorch and Hugging Face Transformers, this framework provides end-to-end training pipelines for Chinese NLP tasks spanning text classification, generation (GPT-2, Dolly, LLaMA, ChatGLM), and multimodal models (CLIP, vision-encoder-decoder). It handles production-scale data processing through memory-mapped I/O and multithreading for datasets up to 100GB+, and implements multi-GPU pipeline parallelism and tensor parallelism for models exceeding single-GPU VRAM limits. The project includes comprehensive tutorials covering data cleaning, model modification (vocabulary pruning/expansion), LoRA fine-tuning, and deployment strategies across 15+ model architectures including recent additions like Qwen2, InternLM, and LLaVA.
3,783 stars. No commits in the last 6 months.
Stars
3,783
Forks
447
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Aug 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yuanzhoulvpi2017/zero_nlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
minggnim/nlp-models
A repository for training transformer based models
CPJKU/wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of...
IntelLabs/nlp-architect
A model library for exploring state-of-the-art deep learning topologies and techniques for...
LoicGrobol/zeldarose
Train transformer-based models.
MahmoudWahdan/dialog-nlu
Tensorflow and Keras implementation of the state of the art researches in Dialog System NLU