microsoft/presidio
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
Built on modular components (Analyzer for detection, Anonymizer for transformation, Image-Redactor for visual PII), Presidio combines NER, regex patterns, and checksum validation with context-aware logic across multiple languages. Deployable as Python/PySpark libraries, Docker containers, or Kubernetes clusters, it supports external model integration and handles specialized formats like DICOM medical images alongside standard text and structured data.
7,198 stars. Actively maintained with 22 commits in the last 30 days. Available on PyPI.
Stars
7,198
Forks
960
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
22
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/presidio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
rhnfzl/SqueakyCleanText
Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection,...
rushilpatel21/Redactify
Redactify is an efficient data redaction tool that secures sensitive text using advanced NLP and...
KasraAhmadi/PII-360
An open-source Chrome Extension that identifies Personally Identifiable Information (PII) in...
romelancheta/AutoRedact
🛡️ Redact sensitive information from images securely in your browser with AutoRedact, featuring...
JuanDiego-10/Privacy_Protection_Redaction_LLM
Privacy_Protection_Redaction_LLM is a machine learning model designed to identify and redact...