chatterbot-corpus and self_dialogue_corpus
These are complementary resources—the first provides general-purpose multilingual conversational training data for building dialogue systems, while the second offers domain-specific self-dialogue data for training systems that generate internal reasoning or multi-turn reasoning chains, particularly in entertainment domains.
About chatterbot-corpus
gunthercox/chatterbot-corpus
A multilingual dialog corpus
Provides YAML-formatted conversation pairs organized by category and language, designed as a primer dataset for ChatterBot's machine learning pipeline. Training data is community-contributed and covers multiple languages with customizable categories. Users can extend the corpus by creating new YAML files in the data directory structure, enabling domain-specific bot training without modifying the core framework.
About self_dialogue_corpus
jfainberg/self_dialogue_corpus
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work