MP is a database that curates inorganic materials with computed properties including but not limited to thermal, electrical, mechanical, etc.
MatBench is a benchmark that provides a standardized framework for evaluating and comparing the performance of different machine learning models on various materials science tasks. It curates data from multiple sources with MP as a main source. However, they do not provide machine learning ready data preparation nor implemented machine learning models and workflow.
QMOF is a comprehensive database that focuses on metal-organic frameworks (MOFs) with quantum-chemical properties. The MOFs are optimized by DFT derived from both experimental and hypothetical MOF databases.
OMDB is a repository of organic materials. The properties are calculated using DFT for crystal structures contained in the COD database (in Appendix~\ref{sec:add_data} additional data sources).
JARVIS is a database that integrates materials data from various sources, including quantum mechanical calculations, materials simulations, machine learning predictions and high-throughput databases. Our datasets DFT3D, DFT2D and EDOS-PDOS are all from JARVIS database.
OC is a database focused on catalytic materials. It includes three tasks: Structure to Energy and Forces (S2EF), Initial Structure to Relaxed Structure (IS2RS) and Relaxed Energy (IS2RE).
tmQM is a comprehensive database focused on transition metal-based materials. It compiles experimentally derived and computationally predicted data on the structure, composition, and electronic properties of transition metal compounds.
QM9 comprises small organic molecules up to 9 heavy atoms with 12 quantum chemical properties.
Carbon24 is a synthetic dataset that includes materials made up by carbon atoms but with different structures obtained by \textit{ab initio} random structure searching.
Perov5 is a synthetic dataset that includes perovskite materials with the same structure but different compositions.