chatterbot-corpus and self_dialogue_corpus

These are complementary resources—the first provides general-purpose multilingual conversational training data for building dialogue systems, while the second offers domain-specific self-dialogue data for training systems that generate internal reasoning or multi-turn reasoning chains, particularly in entertainment domains.

chatterbot-corpus
83
Verified
self_dialogue_corpus
45
Emerging
Maintenance 13/25
Adoption 20/25
Maturity 25/25
Community 25/25
Maintenance 0/25
Adoption 9/25
Maturity 16/25
Community 20/25
Stars: 1,411
Forks: 1,158
Downloads: 9,123
Commits (30d): 5
Language: Python
License: BSD-3-Clause
Stars: 107
Forks: 24
Downloads: —
Commits (30d): 0
Language: Python
License: BSD-3-Clause
No risk flags
Stale 6m No Package No Dependents

About chatterbot-corpus

gunthercox/chatterbot-corpus

A multilingual dialog corpus

Provides YAML-formatted conversation pairs organized by category and language, designed as a primer dataset for ChatterBot's machine learning pipeline. Training data is community-contributed and covers multiple languages with customizable categories. Users can extend the corpus by creating new YAML files in the data directory structure, enabling domain-specific bot training without modifying the core framework.

About self_dialogue_corpus

jfainberg/self_dialogue_corpus

The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports

Scores updated daily from GitHub, PyPI, and npm data. How scores work