luisrui/Modality-Interference-in-MLLMs
The source code for the paper "Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models"
This project helps AI researchers and machine learning engineers improve the reliability of their Multimodal Large Language Models (MLLMs). It provides tools to diagnose why MLLMs sometimes get confused by irrelevant information from different input types (like an image with a text question) and offers methods to train more robust models. The input is existing MLLM models and training data, and the output is a fine-tuned MLLM that performs better on tasks requiring it to focus on only one type of input.
No commits in the last 6 months.
Use this if you are developing or deploying MLLMs and need to ensure they can accurately distinguish between relevant and irrelevant information across different modalities, especially for tasks that should rely on a single input type.
Not ideal if you are not working with Multimodal Large Language Models or if your primary concern is not model robustness against irrelevant modal inputs.
Stars
7
Forks
—
Language
Python
License
—
Category
Last pushed
Sep 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/luisrui/Modality-Interference-in-MLLMs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
SkyworkAI/Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...
roboflow/vision-ai-checkup
Take your LLM to the optometrist.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video...