microsoft/presidio

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

80
/ 100
Verified

Built on modular components (Analyzer for detection, Anonymizer for transformation, Image-Redactor for visual PII), Presidio combines NER, regex patterns, and checksum validation with context-aware logic across multiple languages. Deployable as Python/PySpark libraries, Docker containers, or Kubernetes clusters, it supports external model integration and handles specialized formats like DICOM medical images alongside standard text and structured data.

7,198 stars. Actively maintained with 22 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

7,198

Forks

960

Language

Python

License

MIT

Last pushed

Mar 13, 2026

Commits (30d)

22

Dependencies

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/presidio"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.