Skip to content

Latest commit

 

History

History
203 lines (189 loc) · 24.8 KB

README.md

File metadata and controls

203 lines (189 loc) · 24.8 KB

A collection of papers in performance evaluation.

Involved Topics:

  • Transferability Estimation
  • Model/Dataset Vectorization
  • Model/Algorithm/Representation Evaluation
  • Generalization Gap Prediction
  • Out-of-distribution Error Prediction
  • Accuracy Prediction
  • Model Validation
  • Calibration Error Prediction
  • Confidence Calibration

Survey

  • A Survey on Evaluation of Out-of-Distribution Generalization [Paper]
  • Which Model to Transfer? A Survey on Transferability Estimation [Paper]
  • A Survey of Language Model Confidence Estimation and Calibration [Paper]
  • Calibration of Neural Networks [Paper]

2024

  • Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress [Paper]
  • Energy-based Automated Model Evaluation [Paper]
  • Rethinking The Uniformity Metric in Self-Supervised Learning [Paper]
  • Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning [Paper]
  • Revisiting Confidence Estimation: Towards Reliable Failure Prediction [TPAMI] [Code]
    • Conference ver. : Rethinking Confidence Calibration for Failure Prediction [ECCV22]
  • Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets [Paper]
  • Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks [Paper]
  • Online GNN Evaluation Under Test-time Graph Distribution Shifts [ICLR]

2023

  • Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples [ArXiv]
  • K-Means Clustering Based Feature Consistency Alignment for Label-Free Model Evaluation [CVPR Workshop]
  • Predicting Out-of-Domain Generalization with Neighborhood Invariance [TMLR]
  • Test Accuracy vs. Generalization Gap: Model Selection in NLP without Accessing Training or Testing Data [SIGKDD]
  • Analysis of Task Transferability in Large Pre-trained Classifiers [Under Review]
  • A Bag-of-Prototypes Representation for Dataset-Level Applications [CVPR]
  • DataMap: Dataset transferability map for medical image classification [PR]
  • To transfer or not transfer: Unified transferability metric and analysis [ArXiv]
  • Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [Artificial Intelligence in Medicine Xiv]
  • Identification of Negative Transfers in Multitask Learning Using Surrogate Models [TMLRArXiv]
  • Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [AIM]
  • Model selection, adaptation, and combination for transfer learning in wind and photovoltaic power forecasts [Energy and AI]
  • Identifying Useful Learnwares for Heterogeneous Label Spaces [ICML]
  • Transferability prediction among classification and regression tasks using optimal transport [Multimedia Tools and Applications]
  • Choosing public datasets for private machine learning via gradient subspace distance[Paper]
  • Learning to Predict Task Transferability via Soft Prompt[Paper]
  • Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes[Paper]
  • Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
  • How to Estimate Model Transferability of Pre-Trained Speech Models? [InterSpeech]
  • TaskWeb: Selecting Better Source Tasks for Multi-task NLP[[Paper]](https://arxiv.org/abs/2305.13256
  • Feasibility and Transferability of Transfer Learning: A Mathematical Framework [ArXiv]
  • Topological Vanilla Transfer Learning [Paper]
  • Model Spider: Learning to Rank Pre-Trained Models Efficiently [Arxiv]
  • Towards Estimating Transferability using Hard Subsets [ArXiv]
  • Pick the Best Pre-trained Model: Towards Transferability Estimation for Medical Image Segmentation [MICCAI]
  • Simple Transferability Estimation for Regression Tasks [UAI]
  • Transferability Metrics for Object Detection [ArXiv]
  • Fast and Accurate Transferability Measurement by Evaluating Intra-class Feature Variance[ArXiv]
  • ETran: Energy-Based Transferability Estimation [ICCV]
  • How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability [ICCV]
  • Exploring Model Transferability through the Lens of Potential Energy[ICCV]
  • Unleashing the power of Neural Collapse for Transferability Estimation [ArXiv]
  • Foundation Model is Efficient Multimodal Mltitask Model Selector [ArXiv]
  • ?[multi-model] Towards Robust Multi-Modal Reasoning via Model Selection [ArXiv]
  • Graph-based fine-grained model selection for multi-source domain [PAA]
  • Guided Recommendation for Model Fine-Tuning [CVPR]
  • Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [ArXiv]
  • Estimating the Transfer Learning Ability of a Deep Neural Networks by Means of Representations [NCMLCR]
  • Source Selection based on Diversity for Machine Learning [Patent ]
  • Efficient Prediction of Model Transferability in Semantic Segmentation Tasks [ICIP]
  • The Performance of Transferability Metrics Does Not Translate to Medical Tasks [MICCAI workshop]
  • How to Estimate Model Transferability of Pre-Trained Speech Models? [Interspeech]
  • How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey [ArXiv]
  • Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
  • Guided recommendation for model fine-tuning[Paper]
  • LOVM: Language-Only Vision Model Selection [NeurIPSW]
  • RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank [ICML Oral]
  • T-Measure: A Measure for Model Transferabilty [Under Review]
  • Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [TMLR]
  • Domain Adaptation for Network Performance Modeling with and without Labeled Data [NOMS]
  • Content-Based Search for Deep Generative Models [ArXiv]
  • GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [NeurIPS]
  • Learning inter-task transferability in the absence of target task samples[Paper]
  • Model selection for cross-lingual transfer[Paper]
  • ModelGiF: Gradient Fields for Model Functional Distance[Paper]
  • Predicting Out-of-Distribution Error with Confidence Optimal Transport [Paper]
  • Gnnevaluator: Evaluating gnn performance on unseen graphs without labels [NeurIPS]
  • Came: Contrastive automated model evaluation [ICCV]
  • On the Importance of Feature Separability in Predicting Out-Of-Distribution Error [NeurIPS]
  • Characterizing out-of-distribution error via optimal transport [NeurIPS]
  • What can we Learn by Predicting Accuracy? [WACV]
  • Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis [Paper]

2022

  • On the Relationship Between Explanation and Prediction: A Causal View [ArXiv]
  • The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs [AIR]
  • Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy [ISITA]
  • Transferability Estimation Based On Principal Gradient Expectation [ArXiv]
  • Transferability-Guided Cross-Domain Cross-Task Transfer Learning [ArXiv]
  • Wasserstein Task Embedding for Measuring Task Similarities [ArXiv][Code]
  • Efficiently tuned parameters are task embeddings[Paper]
  • Fisher task distance and its application in neural architecture search[Paper]
  • Leveraging task transferability to meta-learning for clinical section classification with limited data[Paper]
  • Transferability Between Regression Tasks[Paper]
  • Dataset2vec: Learning dataset meta-features[Paper]
  • CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP[ACL]
  • Exploring the role of task transferability in large-scale multi-task learning[Paper]
  • Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance [ECML PKDD]
  • Frustratingly Easy Transferability Estimation [ICML] [Slides]
  • Transferability Estimation Using Bhattacharyya Class Separability [CVPR]
  • Transferability Metrics for Selecting Source Model Ensembles [CVPR]
  • How stable are Transferability Metrics evaluations? [ECCV] [TensorFlow]
  • Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS] [Codes]
  • Neural Transferability: Current Pitfalls and Striving for Optimal Scores [Paper]
  • Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs [JMLR]
  • PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks [ECCV] [Codes]
  • Which Model to Transfer? Finding the Needle in the Growing Haystack [CVPR]
  • Evidence > Intuition: Transferability Estimation for Encoder Selection [EMNLP]
  • Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space [ECCV]
  • Efficient Semantic Segmentation Backbone Evaluation for Unmanned Surface Vehicles based on Likelihood Distribution Estimation [MSN]
  • ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [NeurIPS]
  • Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS]
  • Predicting Out-of-Distribution Error with the Projection Norm [paper]
  • Agreement-on-the-line: Predicting the performance of neural networks under distribution shift [NeurIPS]
  • Leveraging unlabeled data to predict out-of-distribution performance [Paper]
  • Estimating and Explaining Model Performance When Both Covariates and Labels Shift [NeurIPS]
  • Unsupervised and semi-supervised bias benchmarking in face recognition [ECCV]
  • On the strong correlation between model invariance and generalization [NeurIPS]
  • Active surrogate estimators: An active learning approach to label-efficient model evaluation [NeurIPS]
  • Predicting out-of-domain generalization with local manifold smoothness [Paper]
  • Predicting the Generalization Gap in Deep Models using Anchoring [ICASSP]

2021

  • A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning [NeurIPS]
  • Transferability Estimation for Semantic Segmentation Task []
  • OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations [CVPR] [Poster]
  • Practical Transferability Estimation for Image Classification Tasks [ArXiv]
  • What to pre-train on? efficient intermediate task selection [EMNLP]
  • Efficiently identifying task groupings for multi-task learning[NeurIPS]
  • The information complexity of learning tasks, their structure and their distance[Paper]
  • An information-geometric distance on the space of tasks](https://proceedings.mlr.press/v139/gao21a.html)
  • [ImageDataset2Vec: An image dataset embedding for algorithm selection[Paper]
  • Similarity of classification tasks[Paper]
  • Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification[Paper]
  • Analysis and Prediction of NLP models via Task Embeddings[Paper]
  • Inter-task similarity measure for heterogeneous tasks[Paper]
  • Ranking Neural Checkpoints [CVPR]
  • LogME: Practical Assessment of Pre-trained Models for Transfer Learning [ICML] [PyTorch]
  • Scalable Diverse Model Selection for Accessible Transfer Learning [NeurIPS] [PyTorch]
  • A linearized framework and a new benchmark for model selection for fine-tuning [ArXiv]
  • Are Labels Always Necessary for Classifier Accuracy Evaluation? [ICCV]
  • Predicting With Confidence on Unseen Distributions [ICCV]
  • What does rotation prediction tell us about classifier accuracy under varying testing environments?[ICML]
  • Detecting errors and estimating accuracy on unlabeled data with self-training ensembles[NeurIPS]
  • Ranking models in unlabeled new environments [ICCV]

2020

  • Duality diagram similarity: a generic framework for initialization selection in task transfer learning [ECCV]
  • Exploring and Predicting Transferability across NLP Tasks [EMNLP]
  • Geometric Dataset Distances via Optimal Transport [NeurIPS]
  • Similarity of neural networks with gradients[Paper]
  • Measuring and Harnessing Transference in Multi-Task Learning [Ar]
  • LEEP: A New Measure to Evaluate Transferability of Learned Representations [ICML] [Slides] [PyTorch]
  • Source Model Selection for Deep Learning in the Time Series Domain [IEEE Access]
  • [Ranking and rejecting of pre-trained deep neural networks in transfer learning based on separation index][ArXiv]
  • DEPARA: Deep Attribution Graph for Deep Knowledge Transferability [Paper]
  • Predicting neural network accuracy from weight [Paper]
  • Computing the testing error without a testing set [CVPR]
  • Fantastic generalization measures and where to find them [ICLR]

2019

  • TASK2VEC: Task Embedding for Meta-Learning [ICCV]
  • Finding the Most Transferable Tasks for Brain Image Segmentation [BIBM]
  • aserstein Task Ebei for Measring Tas imilaitis [ArXiv] 0.17)
  • Zero-Shot Task Transfer
  • Transferability and Hardness of Supervised Classification Tasks [ICCV]
  • An informationtheoretic approach to transferability in task transfer learning [ICIP] [Codes]
  • Model reuse with reduced kernel mean embedding specification [ArXiv]
  • TASK2VEC: Task Embedding for Meta-Learning [ICCV]
  • Service Metric Prediction in Clouds using Transfer Learning [DiVA]
  • Predicting the Generalization Gap in Deep Networks with Margin Distributions [ICLR]

2018

  • Taskonomy: Disentangling Task Transfer Learning [CVPR Best Paper]
  • Dynamics and reachability of learning tasks[Paper]
  • Stronger generalization bounds for deep nets via a compression approach [ICML]

2017

  • Exploring generalization in deep learning [NeurIPS]
  • Estimating accuracy from unlabeled data: A probabilistic logic approach [NeurIPS]

2016

  • Learning to Select Pre-trained Deep Representations with Bayesian Evidence Framework [CVPR]
  • Learning with rejection [Paper]
  • Estimating accuracy from unlabeled data: A bayesian approach [ICML]

2004

  • Using model disagreement on unlabeled data to validate classification algorithms [NeurIPS]