Text Alignment Systems NLP Tools

Tools for aligning texts across languages, documents, or modalities (word-level, sentence-level, or document-level). Includes cross-lingual alignment, monolingual alignment, and narrative/script synchronization. Does NOT include general translation, similarity matching without explicit alignment output, or semantic parsing.

There are 86 text alignment systems tools tracked. The highest-rated is sileod/tasksource at 46/100 with 193 stars and 208 monthly downloads.

Get all 86 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-alignment-systems&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	sileod/tasksource Datasets collection and preprocessings framework for NLP extreme multitask learning	46	Emerging	193	Python
2	luheng/deep_srl Code and pre-trained model for: Deep Semantic Role Labeling: What Works and...	42	Emerging	334	Python
3	CK-Explorer/DuoSubs Semantic subtitle aligner and merger for bilingual subtitle syncing.	40	Emerging	7	Python
4	loomchild/maligna Bilingual sengence aligner	39	Emerging	29	AL
5	coastalcph/lex-glue LexGLUE: A Benchmark Dataset for Legal Language Understanding in English	33	Emerging	244	Python
6	ChineseGLUE/ChineseGLUE Language Understanding Evaluation benchmark for Chinese: datasets,...	33	Emerging	1,786	Python
7	gkiril/benchie Comprehensive evaluation framework for Open Information Extraction.	33	Emerging	40	Python
8	PhilipMay/stsb-multi-mt Machine translated multilingual STS benchmark dataset.	33	Emerging	33	Python
9	naver-ai/korean-safety-benchmarks Official datasets and pytorch implementation repository of SQuARe and KoSBi...	32	Emerging	249	Python
10	scofield7419/HeSyFu Code for the ACL2021 paper: Better Combine Them Together! Integrating...	31	Emerging	14	Python
11	IINemo/isanlp_srl_framebank SRL parser for Russian based on FrameBank corpus	30	Emerging	27	Jupyter Notebook
12	vecto-ai/word-benchmarks Benchmarks for intrinsic word embeddings evaluation.	29	Experimental	66	—
13	UKPLab/eacl2026-abcd-link Repository for reproducing results from ABCD-Link	29	Experimental	2	Python
14	TalSchuster/CrossLingualContextualEmb Cross-Lingual Alignment of Contextual Word Embeddings	29	Experimental	99	Python
15	ardoco/benchmark A benchmark repository for TLR between (textual) Software Architecture...	29	Experimental	3	Python
16	cdli-gh/Semantic-Role-Labeler A semantic role labeling system for the Sumerian language. A Google Summer...	28	Experimental	16	HTML
17	ubisoft/ubisoft-laforge-binaryalign BinaryAlign: Word Alignment as Binary Sequence Labeling	28	Experimental	11	Python
18	Babelscape/ID10M Data and code for the paper "ID10M: Idiom Identification in 10 Languages"...	28	Experimental	8	Python
19	Babelscape/CroCoAlign A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System...	27	Experimental	10	Python
20	SapienzaNLP/gsrl GSRL is a seq2seq model for end-to-end dependency- and span-based SRL (IJCAI2021).	27	Experimental	18	Python
21	GuillaumeDD/dialign Automatic and generic measures of verbal alignment in dyadic dialogue based...	27	Experimental	13	Scala
22	ku-nlp/JKUSea Utilitary tool aligning sentences of texts written in 2 different languages.	26	Experimental	8	Perl
23	thunlp/DictSKB Code and data of the paper "Automatic Construction of Sememe Knowledge Bases...	26	Experimental	4	Python
24	doc-analysis/XFUND XFUND: A Multilingual Form Understanding Benchmark	25	Experimental	217	—
25	rggdmonk/hadal A simple and eﬀicient tool for mining and aligning sentences with pre-trained models.	25	Experimental	6	Python
26	qiyuw/WSPAlign WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span...	25	Experimental	12	Python
27	LaVi-Lab/CLEVA [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"	25	Experimental	64	Shell
28	thespectrewithin/joint_align Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple...	24	Experimental	52	Python
29	scofield7419/LAGCN-SRL Codes for the AAAI 2021 paper: Encoder-Decoder Based Unified Semantic Role...	24	Experimental	4	Python
30	tschomacker/aligned-narrative-documents A collection of scripts to create a Document-aligned corpus of German...	24	Experimental	4	Python
31	orzhan/rusimscore Code for paper "RuSimScore: unsupervised scoring function for Russian...	24	Experimental	3	Python
32	tyjiangU/fido Code for the paper "Exploiting Definitions for Frame Identification"	24	Experimental	3	Python
33	amazon-science/real-world-noisy-benchmarks-for-natural-language-understanding Benchmark test sets for real-world noise phenomena in goal-directed...	24	Experimental	3	—
34	UKPLab/acl2024-ircoder Data creation, training and eval scripts for the IRCoder paper	23	Experimental	20	Python
35	p-lambda/swords The Stanford Word Substitution (Swords) Benchmark	23	Experimental	33	Python
36	strubell/preprocess-conll05 Scripts for preprocessing the CoNLL-2005 SRL dataset.	23	Experimental	24	Shell
37	luciusssss/MiLiC-Eval [ACL'25 Findings] MiLiC-Eval: Benchmarking Multilingual LLMs for China's...	23	Experimental	5	Python
38	google/BEGIN-dataset A benchmark dataset for evaluating dialog system and natural language...	22	Experimental	39	—
39	SapienzaNLP/dsrl Code for "Semantic Role Labeling meets Definition Modeling: using natural...	22	Experimental	7	Perl
40	Tixierae/WECD Code and data for the paper: 'Word Embeddings for the Construction Domain'	21	Experimental	6	Python
41	allenai/multicite MultiCite code and data. Models are available on Huggingface.	21	Experimental	33	Python
42	ryokamoi/wice This repository contains the dataset and code for "WiCE: Real-World...	20	Experimental	42	Python
43	v-hirak/explaining-MT-difficulty Dataset of diverse typological language properties as part of "Assessing the...	20	Experimental	1	—
44	longxudou/multispider MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing	20	Experimental	9	Python
45	lyutyuh/structured-span-selector A Structured Span Selector (NAACL 2022). A structured span selector with a...	19	Experimental	21	Python
46	liutianlin0121/decoding-time-realignment Implementation of "Decoding-time Realignment of Language Models", ICML 2024.	18	Experimental	21	Jupyter Notebook
47	ShiZhengyan/IngredientParsing Dataset and pytorch codes for the paper titled "Attention-based Ingredient...	18	Experimental	8	Python
48	Sam120204/Pluralistic-Alignment-for-Healthcare Code of our paper - "Pluralistic Alignment for Healthcare: A Role-Driven...	18	Experimental	3	Python
49	jacklxc/CORWA CORWA: A Citation-Oriented Related Work Annotation Dataset, NAACL 2022	18	Experimental	17	Jupyter Notebook
50	tsar-workshop/tsar-2025-shared-task Code and data for TSAR 2025 Shared Task	17	Experimental	2	Python
51	cvjena/chiasmus-detector Code for paper "Data-Driven Detection of General Chiasmi Using Lexical and...	17	Experimental	2	Python
52	guilhermevarela/deep_srlbr SRL task using PropBank 1.1	16	Experimental	3	Jupyter Notebook
53	garfieldpigljy/CrowdWSA2019 Crowdsourced Word Sequence Aggregation 2019	16	Experimental	4	Jupyter Notebook
54	joshstephenson/SEAS Tools for extracting and aligning sentences from subtitle language pairs...	16	Experimental	1	Python
55	bMagicLAB/human-alignment-pl-en-codeswitch Human-in-the-Loop alignment dataset for Polish-English code-switching...	15	Experimental	—	—
56	yumoxu/detnet Code and dataset for TACL 19: Weakly Supervised Domain Detection.	15	Experimental	19	Python
57	sampalomad/IKEA-Dataset A dataset for multimodal machine translation	14	Experimental	13	—
58	Botfuel/benchmark-nlp NLP benchmark test sentences and full results	14	Experimental	13	—
59	Toavinarandrianarivo/Scene2Chapter-NLP-Aligner 📖 Align movie scripts with novel chapters seamlessly using advanced NLP...	14	Experimental	—	Python
60	SapienzaNLP/srl-pas-probing Probing for Predicate Argument Structures in Pretrained Language Models (ACL 2022).	13	Experimental	6	Python
61	nikolayVv/MultiParaphrase Comparing and evaluating monolingual paraphrasing of English, German, Czech,...	13	Experimental	6	Jupyter Notebook
62	pranav-ust/cognates ACL SRW paper: Alignment Analysis of Sequential Segmentation of Lexicons to...	13	Experimental	5	Jupyter Notebook
63	DominiqueMercier/ImpactCite ImpactCite: A XLNet-based Solution Enabling Qualitative CitationImpact...	13	Experimental	5	Jupyter Notebook
64	sileod/metaeval Collection of tasks for meta-learning and extreme multitask learning	13	Experimental	5	Python
65	okalai-ai/moimoe Typology-Guided Adaption in Multilingual Models	13	Experimental	2	HTML
66	gling07/Text2DRS System Text2Drs takes English narrative as an input and outputs a discourse...	13	Experimental	8	Assembly
67	SapienzaNLP/conception Code and experiments for the COLING2020 paper "Conception:...	13	Experimental	11	Java
68	multilingual-dataset-survey/multilingual-dataset-survey.github.io The website implementation of Findings of EMNLP 2022, "Beyond Counting...	13	Experimental	—	JavaScript
69	kukas/word-alignment-visualization Word Alignment Visualization is a Python package for visualizing word...	13	Experimental	7	Jupyter Notebook
70	ZurichNLP/ConLoan A Contrastive Multilingual Dataset for Evaluating Loanwords - ACL2025	13	Experimental	2	Python
71	DorinK/Principal-Parts-Detection Multilingual dataset for principal parts detection in inflectional...	12	Experimental	1	—
72	ghomasHudson/muld The Multitask Long Document Benchmark	12	Experimental	42	Python
73	SapienzaNLP/exploring-srl Repository for the paper "Exploring Non-Verbal Predicates in Semantic Role...	12	Experimental	3	—
74	SapienzaNLP/usea Universal Semantic Annotator (LREC 2022)	12	Experimental	18	—
75	mbanon/benchmarks Several benchmarks on sentence splitting and language identification	12	Experimental	3	Mathematica
76	hexuandeng/HExp4UDS Implementation of the paper “Holistic Exploration on Universal...	12	Experimental	4	Python
77	qiyuw/WSPAlign.InferEval Inference library and evaluation script for WSPAlign...	12	Experimental	4	Python
78	maxkagamine/word-alignment-demo Demonstration of AI/neural word alignment of English & Japanese text using...	12	Experimental	4	Python
79	SapienzaNLP/unify-srl Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic...	12	Experimental	17	Python
80	kinit-sk/multiclaim MultiClaim dataset repository	12	Experimental	—	Python
81	SapienzaNLP/united-srl A unified dataset for span- and dependency-based multilingual and...	12	Experimental	3	—
82	zahra-parvizian/PersianLexicalSimplifier Persian text simplification using lexical simplification	11	Experimental	—	Jupyter Notebook
83	INTERACT-LLM/alignment-drift-llms Dataset and analysis code for BEA2025 paper @ ACL: "Alignment Drift in...	11	Experimental	—	HTML
84	williammulianto/cleu Cross-Lingual Embeddings Utility	10	Experimental	1	Jupyter Notebook
85	agneknie/com4520DarwinProject Adjacent code related to the paper prepared for Joint Workshop on Multiword...	10	Experimental	1	Jupyter Notebook
86	hmosousa/professor_heideltime Create a multilingual corpus weakly labeled with HeidelTime.	10	Experimental	1	Python