yuanzhoulvpi2017/zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

49
/ 100
Emerging

Built on PyTorch and Hugging Face Transformers, this framework provides end-to-end training pipelines for Chinese NLP tasks spanning text classification, generation (GPT-2, Dolly, LLaMA, ChatGLM), and multimodal models (CLIP, vision-encoder-decoder). It handles production-scale data processing through memory-mapped I/O and multithreading for datasets up to 100GB+, and implements multi-GPU pipeline parallelism and tensor parallelism for models exceeding single-GPU VRAM limits. The project includes comprehensive tutorials covering data cleaning, model modification (vocabulary pruning/expansion), LoRA fine-tuning, and deployment strategies across 15+ model architectures including recent additions like Qwen2, InternLM, and LLaVA.

3,783 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

3,783

Forks

447

Language

Jupyter Notebook

License

MIT

Last pushed

Aug 05, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yuanzhoulvpi2017/zero_nlp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.