databricks-industry-solutions/ocr-phi-masking
Our joint Solution Accelerator with John Snow Labs automates the detection of sensitive information contained within unstructured data using NLP models for healthcare. Extracted data is stored within the Lakehouse, where teams can use the pre-trained models to easily remove, obfuscate or mask data for downstream analytics at massive scale.
No commits in the last 6 months.
Stars
7
Forks
6
Language
Python
License
—
Category
Last pushed
Apr 17, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/databricks-industry-solutions/ocr-phi-masking"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DataFog/datafog-python
Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines...
vmenger/deduce
Deduce: de-identification method for Dutch medical text
martincjespersen/DaAnonymization
Simple customizable pipeline tool for anonymizing Danish text.
aphp/eds-pseudo
EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports
seanpedrick-case/doc_redaction
Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface....