k4black/codebleu
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Combines syntactic AST matching and semantic data-flow analysis with n-gram scoring to evaluate code generation quality across 10 programming languages. Built on tree-sitter for language-agnostic parsing, it decomposes scores into four components (n-gram match, weighted n-gram match, syntax match, dataflow match) that can be individually weighted. Integrates with HuggingFace's evaluate library for streamlined evaluation workflows.
130 stars and 5,089 monthly downloads. Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Stars
130
Forks
28
Language
Python
License
MIT
Category
Last pushed
Mar 31, 2025
Monthly downloads
5,089
Commits (30d)
0
Dependencies
2
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/k4black/codebleu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
LiveCodeBench/LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of...
EdinburghNLP/code-docstring-corpus
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and...
hendrycks/apps
APPS: Automated Programming Progress Standard (NeurIPS 2021)
alxschwrz/codex_py2cpp
Converts python code into c++ by using OpenAI CODEX.
AS-SiliconMind/SiliconMind-V1
Inference Engine for SiliconMind-V1 Verilog Coding Models