veezbo/akkadian_english_corpus

Cleaned Akkadian English Corpus for LLMs

/ 100

Experimental

This project provides a meticulously cleaned and pre-processed collection of Akkadian texts translated into English. It takes raw, expert-translated Akkadian-English materials, removes inconsistencies and irrelevant notes, and enriches them with clear translation details. The output is a highly usable dataset designed for researchers and computational linguists working with ancient languages.

No commits in the last 6 months.

Use this if you are a researcher or computational linguist needing a high-quality, pre-processed dataset of Akkadian-to-English translations for training or analysis.

Not ideal if you need a dataset for a different ancient language or if you require the original, uncleaned text without any modifications.

ancient-languages assyriology computational-linguistics historical-research text-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

PaddlePaddle/ERNIE

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit...

eyurtsev/kor

LLM(😽)

NiuTrans/NLPBook

A comprehensive book on neural networks and large language models in NLP

bigscience-workshop/data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

allenai/TOPICAL

:magic_wand::page_facing_up: TOPICAL: TOPIC pages AutomagicaLly

Explore NLP Tools

All categories Trending NLP directory Insights