gunthercox/chatterbot-corpus
A multilingual dialog corpus
Provides YAML-formatted conversation pairs organized by category and language, designed as a primer dataset for ChatterBot's machine learning pipeline. Training data is community-contributed and covers multiple languages with customizable categories. Users can extend the corpus by creating new YAML files in the data directory structure, enabling domain-specific bot training without modifying the core framework.
1,411 stars and 9,123 monthly downloads. Used by 1 other package. Actively maintained with 5 commits in the last 30 days. Available on PyPI.
Stars
1,411
Forks
1,158
Language
Python
License
BSD-3-Clause
Category
Last pushed
Mar 05, 2026
Monthly downloads
9,123
Commits (30d)
5
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gunthercox/chatterbot-corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
EdinburghNLP/awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
jfainberg/self_dialogue_corpus
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
jkkummerfeld/irc-disentanglement
Dataset and model for disentangling chat on IRC
Tomiinek/MultiWOZ_Evaluation
Unified MultiWOZ evaluation scripts for the context-to-response task.
tae898/multimodal-datasets
Multimodal datasets.