Text Alignment Systems NLP Tools
Tools for aligning texts across languages, documents, or modalities (word-level, sentence-level, or document-level). Includes cross-lingual alignment, monolingual alignment, and narrative/script synchronization. Does NOT include general translation, similarity matching without explicit alignment output, or semantic parsing.
There are 86 text alignment systems tools tracked. The highest-rated is sileod/tasksource at 46/100 with 193 stars and 208 monthly downloads.
Get all 86 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-alignment-systems&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning |
|
Emerging |
| 2 |
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and... |
|
Emerging |
| 3 |
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing. |
|
Emerging |
| 4 |
loomchild/maligna
Bilingual sengence aligner |
|
Emerging |
| 5 |
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English |
|
Emerging |
| 6 |
ChineseGLUE/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets,... |
|
Emerging |
| 7 |
gkiril/benchie
Comprehensive evaluation framework for Open Information Extraction. |
|
Emerging |
| 8 |
PhilipMay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset. |
|
Emerging |
| 9 |
naver-ai/korean-safety-benchmarks
Official datasets and pytorch implementation repository of SQuARe and KoSBi... |
|
Emerging |
| 10 |
scofield7419/HeSyFu
Code for the ACL2021 paper: Better Combine Them Together! Integrating... |
|
Emerging |
| 11 |
IINemo/isanlp_srl_framebank
SRL parser for Russian based on FrameBank corpus |
|
Emerging |
| 12 |
vecto-ai/word-benchmarks
Benchmarks for intrinsic word embeddings evaluation. |
|
Experimental |
| 13 |
UKPLab/eacl2026-abcd-link
Repository for reproducing results from ABCD-Link |
|
Experimental |
| 14 |
TalSchuster/CrossLingualContextualEmb
Cross-Lingual Alignment of Contextual Word Embeddings |
|
Experimental |
| 15 |
ardoco/benchmark
A benchmark repository for TLR between (textual) Software Architecture... |
|
Experimental |
| 16 |
cdli-gh/Semantic-Role-Labeler
A semantic role labeling system for the Sumerian language. A Google Summer... |
|
Experimental |
| 17 |
ubisoft/ubisoft-laforge-binaryalign
BinaryAlign: Word Alignment as Binary Sequence Labeling |
|
Experimental |
| 18 |
Babelscape/ID10M
Data and code for the paper "ID10M: Idiom Identification in 10 Languages"... |
|
Experimental |
| 19 |
Babelscape/CroCoAlign
A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System... |
|
Experimental |
| 20 |
SapienzaNLP/gsrl
GSRL is a seq2seq model for end-to-end dependency- and span-based SRL (IJCAI2021). |
|
Experimental |
| 21 |
GuillaumeDD/dialign
Automatic and generic measures of verbal alignment in dyadic dialogue based... |
|
Experimental |
| 22 |
ku-nlp/JKUSea
Utilitary tool aligning sentences of texts written in 2 different languages. |
|
Experimental |
| 23 |
thunlp/DictSKB
Code and data of the paper "Automatic Construction of Sememe Knowledge Bases... |
|
Experimental |
| 24 |
doc-analysis/XFUND
XFUND: A Multilingual Form Understanding Benchmark |
|
Experimental |
| 25 |
rggdmonk/hadal
A simple and efficient tool for mining and aligning sentences with pre-trained models. |
|
Experimental |
| 26 |
qiyuw/WSPAlign
WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span... |
|
Experimental |
| 27 |
LaVi-Lab/CLEVA
[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform" |
|
Experimental |
| 28 |
thespectrewithin/joint_align
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple... |
|
Experimental |
| 29 |
scofield7419/LAGCN-SRL
Codes for the AAAI 2021 paper: Encoder-Decoder Based Unified Semantic Role... |
|
Experimental |
| 30 |
tschomacker/aligned-narrative-documents
A collection of scripts to create a Document-aligned corpus of German... |
|
Experimental |
| 31 |
orzhan/rusimscore
Code for paper "RuSimScore: unsupervised scoring function for Russian... |
|
Experimental |
| 32 |
tyjiangU/fido
Code for the paper "Exploiting Definitions for Frame Identification" |
|
Experimental |
| 33 |
amazon-science/real-world-noisy-benchmarks-for-natural-language-understanding
Benchmark test sets for real-world noise phenomena in goal-directed... |
|
Experimental |
| 34 |
UKPLab/acl2024-ircoder
Data creation, training and eval scripts for the IRCoder paper |
|
Experimental |
| 35 |
p-lambda/swords
The Stanford Word Substitution (Swords) Benchmark |
|
Experimental |
| 36 |
strubell/preprocess-conll05
Scripts for preprocessing the CoNLL-2005 SRL dataset. |
|
Experimental |
| 37 |
luciusssss/MiLiC-Eval
[ACL'25 Findings] MiLiC-Eval: Benchmarking Multilingual LLMs for China's... |
|
Experimental |
| 38 |
google/BEGIN-dataset
A benchmark dataset for evaluating dialog system and natural language... |
|
Experimental |
| 39 |
SapienzaNLP/dsrl
Code for "Semantic Role Labeling meets Definition Modeling: using natural... |
|
Experimental |
| 40 |
Tixierae/WECD
Code and data for the paper: 'Word Embeddings for the Construction Domain' |
|
Experimental |
| 41 |
allenai/multicite
MultiCite code and data. Models are available on Huggingface. |
|
Experimental |
| 42 |
ryokamoi/wice
This repository contains the dataset and code for "WiCE: Real-World... |
|
Experimental |
| 43 |
v-hirak/explaining-MT-difficulty
Dataset of diverse typological language properties as part of "Assessing the... |
|
Experimental |
| 44 |
longxudou/multispider
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing |
|
Experimental |
| 45 |
lyutyuh/structured-span-selector
A Structured Span Selector (NAACL 2022). A structured span selector with a... |
|
Experimental |
| 46 |
liutianlin0121/decoding-time-realignment
Implementation of "Decoding-time Realignment of Language Models", ICML 2024. |
|
Experimental |
| 47 |
ShiZhengyan/IngredientParsing
Dataset and pytorch codes for the paper titled "Attention-based Ingredient... |
|
Experimental |
| 48 |
Sam120204/Pluralistic-Alignment-for-Healthcare
Code of our paper - "Pluralistic Alignment for Healthcare: A Role-Driven... |
|
Experimental |
| 49 |
jacklxc/CORWA
CORWA: A Citation-Oriented Related Work Annotation Dataset, NAACL 2022 |
|
Experimental |
| 50 |
tsar-workshop/tsar-2025-shared-task
Code and data for TSAR 2025 Shared Task |
|
Experimental |
| 51 |
cvjena/chiasmus-detector
Code for paper "Data-Driven Detection of General Chiasmi Using Lexical and... |
|
Experimental |
| 52 |
guilhermevarela/deep_srlbr
SRL task using PropBank 1.1 |
|
Experimental |
| 53 |
garfieldpigljy/CrowdWSA2019
Crowdsourced Word Sequence Aggregation 2019 |
|
Experimental |
| 54 |
joshstephenson/SEAS
Tools for extracting and aligning sentences from subtitle language pairs... |
|
Experimental |
| 55 |
bMagicLAB/human-alignment-pl-en-codeswitch
Human-in-the-Loop alignment dataset for Polish-English code-switching... |
|
Experimental |
| 56 |
yumoxu/detnet
Code and dataset for TACL 19: Weakly Supervised Domain Detection. |
|
Experimental |
| 57 |
sampalomad/IKEA-Dataset
A dataset for multimodal machine translation |
|
Experimental |
| 58 |
Botfuel/benchmark-nlp
NLP benchmark test sentences and full results |
|
Experimental |
| 59 |
Toavinarandrianarivo/Scene2Chapter-NLP-Aligner
đŸ“– Align movie scripts with novel chapters seamlessly using advanced NLP... |
|
Experimental |
| 60 |
SapienzaNLP/srl-pas-probing
Probing for Predicate Argument Structures in Pretrained Language Models (ACL 2022). |
|
Experimental |
| 61 |
nikolayVv/MultiParaphrase
Comparing and evaluating monolingual paraphrasing of English, German, Czech,... |
|
Experimental |
| 62 |
pranav-ust/cognates
ACL SRW paper: Alignment Analysis of Sequential Segmentation of Lexicons to... |
|
Experimental |
| 63 |
DominiqueMercier/ImpactCite
ImpactCite: A XLNet-based Solution Enabling Qualitative CitationImpact... |
|
Experimental |
| 64 |
sileod/metaeval
Collection of tasks for meta-learning and extreme multitask learning |
|
Experimental |
| 65 |
okalai-ai/moimoe
Typology-Guided Adaption in Multilingual Models |
|
Experimental |
| 66 |
gling07/Text2DRS
System Text2Drs takes English narrative as an input and outputs a discourse... |
|
Experimental |
| 67 |
SapienzaNLP/conception
Code and experiments for the COLING2020 paper "Conception:... |
|
Experimental |
| 68 |
multilingual-dataset-survey/multilingual-dataset-survey.github.io
The website implementation of Findings of EMNLP 2022, "Beyond Counting... |
|
Experimental |
| 69 |
kukas/word-alignment-visualization
Word Alignment Visualization is a Python package for visualizing word... |
|
Experimental |
| 70 |
ZurichNLP/ConLoan
A Contrastive Multilingual Dataset for Evaluating Loanwords - ACL2025 |
|
Experimental |
| 71 |
DorinK/Principal-Parts-Detection
Multilingual dataset for principal parts detection in inflectional... |
|
Experimental |
| 72 |
ghomasHudson/muld
The Multitask Long Document Benchmark |
|
Experimental |
| 73 |
SapienzaNLP/exploring-srl
Repository for the paper "Exploring Non-Verbal Predicates in Semantic Role... |
|
Experimental |
| 74 |
SapienzaNLP/usea
Universal Semantic Annotator (LREC 2022) |
|
Experimental |
| 75 |
mbanon/benchmarks
Several benchmarks on sentence splitting and language identification |
|
Experimental |
| 76 |
hexuandeng/HExp4UDS
Implementation of the paper “Holistic Exploration on Universal... |
|
Experimental |
| 77 |
qiyuw/WSPAlign.InferEval
Inference library and evaluation script for WSPAlign... |
|
Experimental |
| 78 |
maxkagamine/word-alignment-demo
Demonstration of AI/neural word alignment of English & Japanese text using... |
|
Experimental |
| 79 |
SapienzaNLP/unify-srl
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic... |
|
Experimental |
| 80 |
kinit-sk/multiclaim
MultiClaim dataset repository |
|
Experimental |
| 81 |
SapienzaNLP/united-srl
A unified dataset for span- and dependency-based multilingual and... |
|
Experimental |
| 82 |
zahra-parvizian/PersianLexicalSimplifier
Persian text simplification using lexical simplification |
|
Experimental |
| 83 |
INTERACT-LLM/alignment-drift-llms
Dataset and analysis code for BEA2025 paper @ ACL: "Alignment Drift in... |
|
Experimental |
| 84 |
williammulianto/cleu
Cross-Lingual Embeddings Utility |
|
Experimental |
| 85 |
agneknie/com4520DarwinProject
Adjacent code related to the paper prepared for Joint Workshop on Multiword... |
|
Experimental |
| 86 |
hmosousa/professor_heideltime
Create a multilingual corpus weakly labeled with HeidelTime. |
|
Experimental |