chatterbot-corpus and self_dialogue_corpus

These are complementary resources—the first provides general-purpose multilingual conversational training data for building dialogue systems, while the second offers domain-specific self-dialogue data for training systems that generate internal reasoning or multi-turn reasoning chains, particularly in entertainment domains.

chatterbot-corpus

Verified

self_dialogue_corpus

Emerging

Maintenance 13/25

Adoption 20/25

Maturity 25/25

Community 25/25

Maintenance 0/25

Adoption 9/25

Maturity 16/25

Community 20/25

Stars: 1,411

Forks: 1,158

Downloads: 9,123

Commits (30d): 5

Language: Python

License: BSD-3-Clause

Stars: 107

Forks: 24

Downloads: —

Commits (30d): 0

Language: Python

License: BSD-3-Clause

No risk flags

Stale 6m No Package No Dependents

About chatterbot-corpus

gunthercox/chatterbot-corpus

A multilingual dialog corpus

Provides YAML-formatted conversation pairs organized by category and language, designed as a primer dataset for ChatterBot's machine learning pipeline. Training data is community-contributed and covers multiple languages with customizable categories. Users can extend the corpus by creating new YAML files in the data directory structure, enabling domain-specific bot training without modifying the core framework.

About self_dialogue_corpus

jfainberg/self_dialogue_corpus

The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports

Related comparisons

chatterbot-corpus and negochat_corpus

Scores updated daily from GitHub, PyPI, and npm data. How scores work