TJ-Neary/AI_Eval
Comprehensive LLM evaluation framework comparing local and cloud models with hardware-aware benchmarking. Evaluate across code generation, document analysis, and structured output using pass@k, LLM-as-Judge, and RAG metrics. Supports Ollama, Google Gemini, Anthropic, and OpenAI.
Stars
—
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/TJ-Neary/AI_Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
masaakisakamoto/memory-os
Deterministic continuity for AI systems. Detect and repair inconsistencies across sessions — not...
dahlinomine/local-llm-rag-bench
Python tool for benchmarking local LLM performance on specific RAG datasets.
VectoringAI/ai-engineering
Practical tutorials to build AI Engineering skills
priyanshus/evaliphy
E2E RAG Testing Tool
moshe19909090/llm-evaluation-pipeline
End-to-end LLM evaluation pipeline with human and automated judging for e-commerce product descriptions