CarryChang/Real_Time_DataMining_Software
携程/榛果民宿实时评论挖掘软件,包含数据的实时采集/数据清洗/结构化保存/ UGC 数据主题提取/情感分析/后结构化可视化等技术的综合性演示 Demo。基于在线民宿 UGC 数据的意见挖掘项目,包含数据挖掘和 NLP 相关的处理,负责数据采集、主题抽取、情感分析等任务。主要克服用户打分和评论不一致,实时对携程和美团在线民宿的满意度进行评测以及对额外数据进行可视化的综合性工具,多维度的对在线 UGC 进行数据挖掘并可视化,demo 视频演示见链接。
The pipeline employs Request-based web scraping for automated Ctrip/Meituan comment collection, then applies sentence segmentation via POS tagging and punctuation rules to decompose multi-topic reviews before applying topic-dictionary classification and supervised sentiment models trained on user ratings. A Python GUI (RealTime_UGC_Analysis_GUI.py) orchestrates the workflow, storing raw data in local TXT files and generating comparative visualizations across dimensions like topic-specific sentiment, repeat booking rates, and temporal booking trends to reconcile rating-comment inconsistencies.
No commits in the last 6 months.
Stars
82
Forks
24
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 01, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CarryChang/Real_Time_DataMining_Software"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Data-Science-Community-SRM/Forecasting-US-Elections
Extraction of tweets and Perform sentiment analysis on the presidential candidature of Donald...
steveee27/Multiclass-Text-Classification-of-Presidential-Campaign-Tweets
Explore the Indonesian presidential campaign of 2024 through advanced text classification. This...
MPKuchciak/Twitter
Sentiment of Polish politicians in tweets
smbanaie/twitter-persian-nlp
مخزنی برای خوانش و پردازش توئیت های فارسی برای تحلیل های متنی و الگوریتم های پردازش زبان طبیعی
josephpcowell/cowell_proj_4
Metis Project 4: Unsupervised Machine Learning and NLP (exploring Tweets about veganism)