RobustNLP/CipherChat
A framework to evaluate the generalization capability of safety alignment for LLMs
Systematically evaluates safety alignment robustness by encoding harmful prompts in ciphers (Caesar, substitution, etc.) that bypass natural-language-based safety training. The framework uses in-context learning to teach models cipher comprehension, then applies rule-based decryption to convert encoded outputs back to natural language. Supports multiple LLMs, domains, and languages with pre-computed query-response datasets loadable via PyTorch.
626 stars.
Stars
626
Forks
68
Language
Python
License
MIT
Category
Last pushed
Oct 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/RobustNLP/CipherChat"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
xming521/WeClone
🚀 One-stop solution for creating your AI twin from chat history 💡 Fine-tune LLMs with your chat...
posit-dev/chatlas
Your friendly guide to building LLM chat apps in Python with less effort and more clarity.
ooyinet/WeClone
🚀从聊天记录创造数字分身的一站式解决方案💡 使用聊天记录微调大语言模型,让大模型有“那味儿”,并绑定到聊天机器人,实现自己的数字分身。 数字克隆/数字分身/数字永生/LLM/聊天机器人/LoRA
vemonet/libre-chat
🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline...
qqqqqf-q/MirrorFlow
从对话数据到训练:数字分身 + 模型蒸馏 From Dialogue Data to Training Closed-Loop: Digital Twin + Model Distillation