platisd/duplicate-code-detection-tool

A simple Python3 tool to detect similarities between files within a repository

38
/ 100
Emerging

Leverages gensim's document similarity models to compute semantic similarity between source files across C, C++, Java, Python, and C# codebases. Available as both a CLI tool and GitHub Action that integrates directly into pull request workflows, with configurable thresholds for reporting and failure conditions. Includes pre-commit hook support and uses token-based NLP analysis, becoming more accurate as project size increases.

203 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 19 / 25

How are scores calculated?

Stars

203

Forks

34

Language

Python

License

MIT

Last pushed

Jun 01, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/platisd/duplicate-code-detection-tool"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.