MM-IQ Benchmark

🌐 Homepage | 🏆 Leaderboard | 🤗 MM-IQ | 📖 MM-IQ Paper

This repo provides the evaluation code of the MM-IQ benchmark.

Introduction

MM-IQ

IQ testing has served as a foundational methodology for evaluating human cognitive capabilities, deliberately decoupling assessment from linguistic background, language proficiency, or domain-specific knowledge to isolate core competencies in abstraction and reasoning. Yet, artificial intelligence research currently lacks systematic benchmarks to quantify these critical cognitive dimensions in multimodal systems. To address this critical gap, we propose MM-IQ, a comprehensive evaluation framework comprising 2,710 meticulously curated test items spanning 8 distinct reasoning paradigms.

Through systematic evaluation of leading open-source and proprietary multimodal models, our benchmark reveals striking limitations: even state-of-the-art architectures achieve only marginally superior performance to random chance (27.49% vs. 25% baseline accuracy). This substantial performance chasm highlights the inadequacy of current multimodal systems in approximating fundamental human reasoning capacities, underscoring the need for paradigm-shifting advancements to bridge this cognitive divide.

Dataset Curation

For more detailed information, please refer to MM-IQ Dataset.

We have uploaded a demo to illustrate how to access the MM-IQ dataset on Hugging Face, available at hugging_face_dataset_demo.ipynb.

Evaluation

Please refer to MM-IQ Evaluation for more detailed information.

🎯 MM-IQ Evaluation

We have released MM-IQ dataset with 2710 problems, across eight reasoning patterns.
With our evaluation folder, you can use the LMM's response or the parsed prediction as input to get the performance of MM-IQ.

Disclaimers

The guidelines for the annotators emphasized strict compliance with copyright and licensing rules from the initial data source, specifically avoiding materials from websites that forbid copying and redistribution. If you encounter any data samples potentially breaching the copyright or licensing regulations of any site, we encourage you to contact us. Upon verification, such samples will be promptly removed.

Contact

Huanqia Cai: [email protected]

Acknowledgment

Our code is based on MMMU. We thank the authors of MMMU for their great work and clean code.

Citation

@article{cai2025mm,
  title={MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models},
  author={Cai, Huanqia and Yang, Yijun and Hu, Winston},
  journal={arXiv preprint arXiv:2502.00698},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
mmiq		mmiq
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MM-IQ Benchmark

Introduction

MM-IQ

Dataset Curation

Evaluation

Disclaimers

Contact

Acknowledgment

Citation

About

Releases

Packages

Contributors 2

Languages

License

AceCHQ/MMIQ

Folders and files

Latest commit

History

Repository files navigation

MM-IQ Benchmark

Introduction

MM-IQ

Dataset Curation

Evaluation

Disclaimers

Contact

Acknowledgment

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages