titipata/pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset

/ 100

Verified

Extracts structured metadata and full-text content (abstracts, paragraphs, citations, images, tables) from PubMed OA and MEDLINE XML using `lxml` for efficient parsing into Python dictionaries. Supports both compressed and uncompressed XML files, NCBI E-utils API queries, and reference linking via PMIDs and DOIs. Built for biomedical text mining and NLP workflows with specialized parsers for figures, captions, MeSH terms, and chemical entities.

727 stars and 9,126 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 2 / 25

Adoption 19 / 25

Maturity 25 / 25

Community 25 / 25

How are scores calculated?

Stars

727

Forks

178

Language

Python

License

MIT

Category

medical-abstract-segmentation

Last pushed

Jul 31, 2025

Monthly downloads

9,126

Commits (30d)

Dependencies

GitHub PyPI

Medical Abstract Segmentation · 31 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/titipata/pubmed_parser"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

nfflow/pubmedflow

Data Collection API for pubmed

greenelab/snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

KarelDO/BioDEX

BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance.

jind11/PubMed-PICO-Detection

PubMed PICO Element Detection Dataset

purplepotion/sadrat

Smart Adverse Drug Reaction Assessment Tools.

Explore NLP Tools

All categories Trending NLP directory Insights