danielfrees/scrapemed
ScrapeMed: Data scraping for PubMed Central.
Provides pythonic object-oriented access to PubMed Central articles by downloading, validating, and parsing raw PMC XML into standardized `Paper` objects with extracted metadata, references, and structured sections. Integrates with ChromaDB and LangChain for semantic vectorization and natural language querying, while supporting pandas conversion for data science workflows and advanced search via PMC's search API.
No commits in the last 6 months. Available on PyPI.
Stars
15
Forks
1
Language
Python
License
MIT
Category
Last pushed
Jan 06, 2024
Monthly downloads
51
Commits (30d)
0
Dependencies
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/danielfrees/scrapemed"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
docarray/docarray
Represent, send, store and search multimodal data
primeqa/primeqa
The prime repository for state-of-the-art Multilingual Question Answering research and development.
CogStack/CogStack-Pipeline
Distributed, fault tolerant batch processing for Natural Language Applications and Search, using...
ekatraone/Mobius-v1
Ekatra QnA is a student-focused intelligent search engine that enables them to find answers...
algoprog/Quin
An easy to use framework for large-scale fact-checking and question answering