RobustNLP/CipherChat

A framework to evaluate the generalization capability of safety alignment for LLMs

50
/ 100
Established

Systematically evaluates safety alignment robustness by encoding harmful prompts in ciphers (Caesar, substitution, etc.) that bypass natural-language-based safety training. The framework uses in-context learning to teach models cipher comprehension, then applies rule-based decryption to convert encoded outputs back to natural language. Supports multiple LLMs, domains, and languages with pre-computed query-response datasets loadable via PyTorch.

626 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

626

Forks

68

Language

Python

License

MIT

Last pushed

Oct 09, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/RobustNLP/CipherChat"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.