MilosKosRadGit/ClozeTaskEvaluation

This project evaluates Llama 3.2 3B continued pre-training for Serbian language, using a custom-made cloze-style benchmark. It supports grammatical, lexical, semantic, idiomatic, and factual sentence completion tasks. The evaluation script calculates model accuracy based on log-likelihood scoring over masked token choices.

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 1 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

madroidmaq/mlx-omni-server

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically...

NVIDIA-NeMo/Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based...

generative-computing/mellea

Mellea is a library for writing generative programs.

rhesis-ai/rhesis

Open-source platform & SDK for testing LLM and agentic apps. Define expected behavior, generate...

Explore Generative AI Tools

All categories Trending Generative AI directory Insights