Maximilian-Winter/llama-cpp-agent

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.

72
/ 100
Verified

Leverages guided sampling with JSON schema grammars to constrain model outputs, enabling function calling and structured output even on models not fine-tuned for these tasks. Integrates with multiple inference backends including llama.cpp, TGI, and vLLM servers, and supports agentic workflows through conversational, sequential, and mapping chain patterns with tool integration from Pydantic, llama-index, and OpenAI schemas.

620 stars and 8,620 monthly downloads. Actively maintained with 1 commit in the last 30 days. Available on PyPI.

Maintenance 16 / 25
Adoption 19 / 25
Maturity 18 / 25
Community 19 / 25

How are scores calculated?

Stars

620

Forks

69

Language

Python

License

Last pushed

Mar 09, 2026

Monthly downloads

8,620

Commits (30d)

1

Dependencies

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Maximilian-Winter/llama-cpp-agent"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.