atlaspolat/llm_harness

llm-harness, a repository designed to streamline testing of various language models on multiple Hugging Face datasets. It enables seamless evaluation across different benchmarks and allows users to integrate external tools—such as RAG, calculators, image QA, and image captioning—alongside the main model to enhance functionality and performance.

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 1 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

—

Higher-rated alternatives

HowieHwong/TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

Intelligent-CAT-Lab/PLTranslationEmpirical

Artifact repository for the paper "Lost in Translation: A Study of Bugs Introduced by Large...

rishub-tamirisa/tamper-resistance

[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

tsinghua-fib-lab/ANeurIPS2024_SPV-MIA

[NeurIPS'24] "Membership Inference Attacks against Fine-tuned Large Language Models via...

FudanDISC/ReForm-Eval

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

Explore Transformer Models

All categories Trending Transformer directory Insights