VinAIResearch/BERTweet
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Pre-trained on 850M English Tweets using RoBERTa's procedure, with variants optimized for COVID-19 content and a large 355M-parameter model supporting 512-token sequences. Integrates with Hugging Face `transformers` and `fairseq`, with included tweet normalization utilities that convert URLs and mentions to special tokens to match pre-training preprocessing. Demonstrates strong performance on downstream tasks including POS tagging, NER, sentiment analysis, and irony detection.
605 stars. No commits in the last 6 months.
Stars
605
Forks
54
Language
Python
License
MIT
Category
Last pushed
Jul 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/VinAIResearch/BERTweet"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
yongzhuo/Pytorch-NLU
中文文本分类、序列标注工具包(pytorch),支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Chinese text...
aniass/Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning...
hppRC/bert-classification-tutorial
【2023年版】BERTによるテキスト分類
zhanlaoban/Transformers_for_Text_Classification
基于Transformers的文本分类
maxent-ai/zeroshot_topics
Topic Inference with Zeroshot models