veezbo/akkadian_english_corpus
Cleaned Akkadian English Corpus for LLMs
This project provides a meticulously cleaned and pre-processed collection of Akkadian texts translated into English. It takes raw, expert-translated Akkadian-English materials, removes inconsistencies and irrelevant notes, and enriches them with clear translation details. The output is a highly usable dataset designed for researchers and computational linguists working with ancient languages.
No commits in the last 6 months.
Use this if you are a researcher or computational linguist needing a high-quality, pre-processed dataset of Akkadian-to-English translations for training or analysis.
Not ideal if you need a dataset for a different ancient language or if you require the original, uncleaned text without any modifications.
Stars
8
Forks
—
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Oct 10, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/veezbo/akkadian_english_corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/ERNIE
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit...
eyurtsev/kor
LLM(😽)
NiuTrans/NLPBook
A comprehensive book on neural networks and large language models in NLP
bigscience-workshop/data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
allenai/TOPICAL
:magic_wand::page_facing_up: TOPICAL: TOPIC pages AutomagicaLly