PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

/ 100

Emerging

Provides reproducible, deterministically-split datasets (Reddit: 654M examples, OpenSubtitles: 286M, Amazon QA: 3.6M) structured as context-response pairs with historical conversation turns, enabling pre-training of conversational models. Uses Apache Beam pipelines on Google Dataflow for distributed processing, with outputs serialized as either JSON or TensorFlow record format for seamless integration with TensorFlow training workflows.

1,387 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

1,387

Forks

177

Language

Python

License

Apache-2.0

Category

question-answering-systems

Last pushed

Nov 16, 2019

Commits (30d)

GitHub

Question Answering Systems · 26 frameworks

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/PolyAI-LDN/conversational-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

Pinafore/qb

QANTA Quiz Bowl AI

KristiyanVachev/Question-Generation

Generating multiple choice questions from text using Machine Learning.

wuba/qa_match

A simple effective ToolKit for short text matching

mcQA-suite/mcQA

🔮 Answering multiple choice questions with Language Models.

dapurv5/awesome-question-answering

Resources, datasets, papers on Question Answering

Explore ML Frameworks

All categories Trending ML Framework directory Insights