shubchat/loab

LOAB: A benchmark for evaluating LLM agents on end-to-end mortgage lending operations under real regulatory constraints.

/ 100

Emerging

**Technical Summary** Evaluates LLM agents on mortgage origination through a multi-component scoring rubric that requires both correct outcomes *and* compliant processes—tool calls in sequence, proper inter-agent handoffs, forbidden action avoidance, and evidence chains—across six templated scenarios spanning prime/near-prime/sub-prime borrower profiles and fraud detection. Built on agentic orchestration with policy-bound decision routing (DTI thresholds, credit score gates, KYC sequencing) against Australian mortgage regulation, designed for multi-step workflows including Processing Officer, Underwriter, Credit Manager, and Financial Crime escalation paths. Current benchmark suite (v0.1.0) spans origination tasks with servicing, collections, and compliance modules in active development.

No Package No Dependents

Maintenance 13 / 25

Adoption 4 / 25

Maturity 9 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

StonyBrookNLP/appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...

qualifire-dev/rogue

AI Agent Evaluator & Red Team Platform

future-agi/ai-evaluation

Evaluation Framework for all your AI related Workflows

microsoft/WindowsAgentArena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...

agentscope-ai/OpenJudge

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Explore AI Agents

All categories Trending AI Agent directory Insights