OpenMOSS/Say-I-Dont-Know

[ICML'2024] Can AI Assistants Know What They Don't Know?

/ 100

Experimental

Provides model-specific "I don't know" (Idk) datasets and multiple fine-tuning approaches—Idk-SFT, Idk-BoN with reward modeling, Idk-DPO, and Idk-PPO—to align LLMs to refuse unanswerable questions while maintaining knowledge on familiar topics. Built on llama-recipes, DeepSpeed-Chat, and DPO frameworks with FSDP distributed training support across models like Llama-2, Baichuan2, and Mistral. Includes inference tools for both prompt-based and learned refusal strategies, transforming Unknown Unknowns and Unknown Knowns into calibrated Known states via knowledge quadrant classification.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 1 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

filipnaudot/llmSHAP

llmSHAP: a multi-threaded explainability framework using Shapley values for LLM-based outputs.

microsoft/automated-brain-explanations

Generating and validating natural-language explanations for the brain.

CAS-SIAT-XinHai/CPsyCoun

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework...

wesg52/universal-neurons

Universal Neurons in GPT2 Language Models

ICTMCG/LLM-for-misinformation-research

Paper list of misinformation research using (multi-modal) large language models, i.e., (M)LLMs.

Explore LLM Tools

All categories Trending LLM Tool directory Insights