zufeshan12/fine-tuning-and-reinforcement-learning-on-llms
supervised fine tuning and RLAIF on DeepSeek-math-7b-base using LoRA adapters and GRPO training objective
Stars
1
Forks
—
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Nov 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/zufeshan12/fine-tuning-and-reinforcement-learning-on-llms"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
daekeun-ml/genai-ko-LLM
This hands-on lab walks you through a step-by-step approach to efficiently serving and...
GURPREETKAURJETHRA/Llama-3-ORPO-Fine-Tuning
Llama 3 ORPO Fine Tuning on A100 in Colab Pro.
ramalamadingdong/onnx-rubikpi
ONNX LLM runtime on RUBIK-Pi with Gemma 1B and Llama 3.2 1B
keanteng/sesame-csm-elise
Fine-Tuning Sesame CSM Wth Elise. Enjoy the voice ( ̄︶ ̄)↗
sukanyabag/Finetuning-Qwen2-7B-VQA-on-Radiology-Scans
This repository is doing the finetuning of the Qwen2 7B VLM for performing VQA (Visual Question...