github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
ArchivedProvides 2 million (comment, code) function pairs across six languages with repository-level train/test splits to prevent data leakage, paired with a dual-encoder baseline model and NDCG evaluation metrics for semantic code search. Includes manually annotated relevance judgments for 99 queries across languages, enabling rigorous benchmarking of retrieval-based approaches. Integrates with Weights & Biases for experiment tracking and community leaderboard submission via containerized training pipelines.
2,417 stars. No commits in the last 6 months.
Stars
2,417
Forks
408
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 31, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/github/CodeSearchNet"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
fireindark707/Python-Schema-Matching
A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and...
visual-layer/fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...
josh-ashkinaze/plurals
Plurals: A System for Guiding LLMs Via Simulated Social Ensembles