Zhennor/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text
A video search engine combining OCR, ASR, CLIP, Image Captioning, Object & Color Detection. It enables accurate retrieval based on text, speech, images, objects, and colors in video content.
No commits in the last 6 months.
Stars
4
Forks
4
Language
—
License
—
Category
Last pushed
Jan 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Zhennor/Multimodal-Video-Retrieval-Engine-with-Vision-and-Text"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
byjlw/video-analyzer
Analyze videos using LLMs, Computer Vision and Automatic Speech Recognition
XnneHangLab/XnneHangLab
不会聊天的字幕提取器不是一个好 B 站下载器~
harry0703/AudioNotes
快速提取音视频内容,整理成一份结构化的markdown笔记
bakaburg1/minutemaker
Generate meeting minutes starting from an audio recording or a transcripts using speech-to-text and LLMs.
yufan-aslp/AliMeeting
The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party...