OpenMOSS/Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
Provides model-specific "I don't know" (Idk) datasets and multiple fine-tuning approaches—Idk-SFT, Idk-BoN with reward modeling, Idk-DPO, and Idk-PPO—to align LLMs to refuse unanswerable questions while maintaining knowledge on familiar topics. Built on llama-recipes, DeepSpeed-Chat, and DPO frameworks with FSDP distributed training support across models like Llama-2, Baichuan2, and Mistral. Includes inference tools for both prompt-based and learned refusal strategies, transforming Unknown Unknowns and Unknown Knowns into calibrated Known states via knowledge quadrant classification.
No commits in the last 6 months.
Stars
85
Forks
10
Language
Python
License
—
Category
Last pushed
Feb 05, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/OpenMOSS/Say-I-Dont-Know"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
filipnaudot/llmSHAP
llmSHAP: a multi-threaded explainability framework using Shapley values for LLM-based outputs.
microsoft/automated-brain-explanations
Generating and validating natural-language explanations for the brain.
CAS-SIAT-XinHai/CPsyCoun
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework...
wesg52/universal-neurons
Universal Neurons in GPT2 Language Models
ICTMCG/LLM-for-misinformation-research
Paper list of misinformation research using (multi-modal) large language models, i.e., (M)LLMs.