VinAIResearch/BERTweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

/ 100

Emerging

Pre-trained on 850M English Tweets using RoBERTa's procedure, with variants optimized for COVID-19 content and a large 355M-parameter model supporting 512-token sequences. Integrates with Hugging Face `transformers` and `fairseq`, with included tweet normalization utilities that convert URLs and mentions to special tokens to match pre-training preprocessing. Demonstrates strong performance on downstream tasks including POS tagging, NER, sentiment analysis, and irony detection.

605 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

605

Forks

Language

Python

License

MIT

Higher-rated alternatives

yongzhuo/Pytorch-NLU

中文文本分类、序列标注工具包（pytorch），支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Chinese text...

aniass/Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning...

hppRC/bert-classification-tutorial

【2023年版】BERTによるテキスト分類

zhanlaoban/Transformers_for_Text_Classification

基于Transformers的文本分类

maxent-ai/zeroshot_topics

Topic Inference with Zeroshot models

Explore NLP Tools

All categories Trending NLP directory Insights