jack-tol/usda-food-data-pipeline

Code for the USDA Branded Food Dataset pipeline and the USDA Food Assistant. This project consolidates USDA FoodData Central data into a structured dataset, along with an interactive tool that allows for conversational exploration of food items, nutrients, and ingredients.

22
/ 100
Experimental

The pipeline automates ingestion and transformation of 34 USDA FoodData Central CSV files into a normalized, ML-ready dataset. The Food Assistant uses semantic search via Pinecone vector indexing with multilingual-e5-large embeddings to enable conversational queries, combining retrieval with language generation to answer nutrition and ingredient questions. The cleaned dataset is published on HuggingFace Datasets with a live demo available on HuggingFace Spaces.

No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 9 / 25
Community 9 / 25

How are scores calculated?

Stars

7

Forks

1

Language

Jupyter Notebook

License

MIT

Last pushed

Nov 07, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/jack-tol/usda-food-data-pipeline"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.