PKU-YuanGroup/Hallucination-Attack
Attack to induce LLMs within hallucinations
This project helps evaluate how easily large language models (LLMs) can be tricked into generating false information or 'hallucinations.' It takes a standard LLM and applies specially crafted, often nonsensical prompts to see if the model can be made to produce fake facts or news. This is useful for AI safety researchers, red teamers, and anyone responsible for assessing the reliability and potential risks of LLMs before deployment.
164 stars. No commits in the last 6 months.
Use this if you need to rigorously test an LLM's susceptibility to generating false or misleading content when given unusual or adversarial inputs.
Not ideal if you are looking to improve the factual accuracy of an LLM or fine-tune it for a specific task.
Stars
164
Forks
21
Language
Python
License
MIT
Category
Last pushed
May 17, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/PKU-YuanGroup/Hallucination-Attack"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
THU-BPM/MarkLLM
MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 System Demonstration)
git-disl/Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large...
zjunlp/Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
NishilBalar/Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations...
HillZhang1999/ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced...