Zhaoyi-Li21/creme
[ACL 2024 Findings] "Understanding and Patching Compositional Reasoning in LLMs"
This project helps evaluate and improve how Large Language Models (LLMs) answer complex questions requiring multiple steps of reasoning. It takes an LLM and a set of multi-hop questions, then identifies where the model struggles with compositional reasoning. The output provides insights into these failures and a method to 'patch' the LLM to improve its accuracy on such questions. AI researchers and practitioners working on LLM development and fine-tuning would use this.
No commits in the last 6 months.
Use this if you need to understand, diagnose, and fix compositional reasoning errors in Large Language Models for complex, multi-step questions.
Not ideal if you are looking for a general-purpose LLM evaluation tool or a simple API for common NLP tasks.
Stars
13
Forks
—
Language
Python
License
—
Category
Last pushed
Aug 28, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Zhaoyi-Li21/creme"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
sapientinc/HRM
Hierarchical Reasoning Model Official Release
tigerchen52/query_level_uncertainty
query-level uncertainty in LLMs
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models