zufeshan12/fine-tuning-and-reinforcement-learning-on-llms

supervised fine tuning and RLAIF on DeepSeek-math-7b-base using LoRA adapters and GRPO training objective

/ 100

Experimental

No Package No Dependents

Maintenance 6 / 25

Adoption 1 / 25

Maturity 9 / 25

Community 0 / 25

Stars

Forks

—

Language

Jupyter Notebook

License

MIT

Category

Last pushed

Nov 19, 2025

Commits (30d)

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/zufeshan12/fine-tuning-and-reinforcement-learning-on-llms"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

daekeun-ml/genai-ko-LLM

This hands-on lab walks you through a step-by-step approach to efficiently serving and...

GURPREETKAURJETHRA/Llama-3-ORPO-Fine-Tuning

Llama 3 ORPO Fine Tuning on A100 in Colab Pro.

ramalamadingdong/onnx-rubikpi

ONNX LLM runtime on RUBIK-Pi with Gemma 1B and Llama 3.2 1B

keanteng/sesame-csm-elise

Fine-Tuning Sesame CSM Wth Elise. Enjoy the voice （￣︶￣）↗　

sukanyabag/Finetuning-Qwen2-7B-VQA-on-Radiology-Scans

This repository is doing the finetuning of the Qwen2 7B VLM for performing VQA (Visual Question...