A collection of papers in performance evaluation.
- Transferability Estimation
- Model/Dataset Vectorization
- Model/Algorithm/Representation Evaluation
- Generalization Gap Prediction
- Out-of-distribution Error Prediction
- Accuracy Prediction
- Model Validation
- Calibration Error Prediction
- Confidence Calibration
- A Survey on Evaluation of Out-of-Distribution Generalization [Paper]
- Which Model to Transfer? A Survey on Transferability Estimation [Paper]
- A Survey of Language Model Confidence Estimation and Calibration [Paper]
- Calibration of Neural Networks [Paper]
- Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress [Paper]
- Energy-based Automated Model Evaluation [Paper]
- Rethinking The Uniformity Metric in Self-Supervised Learning [Paper]
- Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning [Paper]
- Revisiting Confidence Estimation: Towards Reliable Failure Prediction [TPAMI] [Code]
- Conference ver. : Rethinking Confidence Calibration for Failure Prediction [ECCV22]
- Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets [Paper]
- Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks [Paper]
- Online GNN Evaluation Under Test-time Graph Distribution Shifts [ICLR]
- Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples [ArXiv]
- K-Means Clustering Based Feature Consistency Alignment for Label-Free Model Evaluation [CVPR Workshop]
- Predicting Out-of-Domain Generalization with Neighborhood Invariance [TMLR]
- Test Accuracy vs. Generalization Gap: Model Selection in NLP without Accessing Training or Testing Data [SIGKDD]
- Analysis of Task Transferability in Large Pre-trained Classifiers [Under Review]
- A Bag-of-Prototypes Representation for Dataset-Level Applications [CVPR]
- DataMap: Dataset transferability map for medical image classification [PR]
- To transfer or not transfer: Unified transferability metric and analysis [ArXiv]
- Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [Artificial Intelligence in Medicine Xiv]
- Identification of Negative Transfers in Multitask Learning Using Surrogate Models [TMLRArXiv]
- Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [AIM]
- Model selection, adaptation, and combination for transfer learning in wind and photovoltaic power forecasts [Energy and AI]
- Identifying Useful Learnwares for Heterogeneous Label Spaces [ICML]
- Transferability prediction among classification and regression tasks using optimal transport [Multimedia Tools and Applications]
- Choosing public datasets for private machine learning via gradient subspace distance[Paper]
- Learning to Predict Task Transferability via Soft Prompt[Paper]
- Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes[Paper]
- Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
- How to Estimate Model Transferability of Pre-Trained Speech Models? [InterSpeech]
- TaskWeb: Selecting Better Source Tasks for Multi-task NLP[[Paper]](https://arxiv.org/abs/2305.13256
- Feasibility and Transferability of Transfer Learning: A Mathematical Framework [ArXiv]
- Topological Vanilla Transfer Learning [Paper]
- Model Spider: Learning to Rank Pre-Trained Models Efficiently [Arxiv]
- Towards Estimating Transferability using Hard Subsets [ArXiv]
- Pick the Best Pre-trained Model: Towards Transferability Estimation for Medical Image Segmentation [MICCAI]
- Simple Transferability Estimation for Regression Tasks [UAI]
- Transferability Metrics for Object Detection [ArXiv]
- Fast and Accurate Transferability Measurement by Evaluating Intra-class Feature Variance[ArXiv]
- ETran: Energy-Based Transferability Estimation [ICCV]
- How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability [ICCV]
- Exploring Model Transferability through the Lens of Potential Energy[ICCV]
- Unleashing the power of Neural Collapse for Transferability Estimation [ArXiv]
- Foundation Model is Efficient Multimodal Mltitask Model Selector [ArXiv]
- ?[multi-model] Towards Robust Multi-Modal Reasoning via Model Selection [ArXiv]
- Graph-based fine-grained model selection for multi-source domain [PAA]
- Guided Recommendation for Model Fine-Tuning [CVPR]
- Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [ArXiv]
- Estimating the Transfer Learning Ability of a Deep Neural Networks by Means of Representations [NCMLCR]
- Source Selection based on Diversity for Machine Learning [Patent ]
- Efficient Prediction of Model Transferability in Semantic Segmentation Tasks [ICIP]
- The Performance of Transferability Metrics Does Not Translate to Medical Tasks [MICCAI workshop]
- How to Estimate Model Transferability of Pre-Trained Speech Models? [Interspeech]
- How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey [ArXiv]
- Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
- Guided recommendation for model fine-tuning[Paper]
- LOVM: Language-Only Vision Model Selection [NeurIPSW]
- RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank [ICML Oral]
- T-Measure: A Measure for Model Transferabilty [Under Review]
- Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [TMLR]
- Domain Adaptation for Network Performance Modeling with and without Labeled Data [NOMS]
- Content-Based Search for Deep Generative Models [ArXiv]
- GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [NeurIPS]
- Learning inter-task transferability in the absence of target task samples[Paper]
- Model selection for cross-lingual transfer[Paper]
- ModelGiF: Gradient Fields for Model Functional Distance[Paper]
- Predicting Out-of-Distribution Error with Confidence Optimal Transport [Paper]
- Gnnevaluator: Evaluating gnn performance on unseen graphs without labels [NeurIPS]
- Came: Contrastive automated model evaluation [ICCV]
- On the Importance of Feature Separability in Predicting Out-Of-Distribution Error [NeurIPS]
- Characterizing out-of-distribution error via optimal transport [NeurIPS]
- What can we Learn by Predicting Accuracy? [WACV]
- Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis [Paper]
- On the Relationship Between Explanation and Prediction: A Causal View [ArXiv]
- The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs [AIR]
- Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy [ISITA]
- Transferability Estimation Based On Principal Gradient Expectation [ArXiv]
- Transferability-Guided Cross-Domain Cross-Task Transfer Learning [ArXiv]
- Wasserstein Task Embedding for Measuring Task Similarities [ArXiv][Code]
- Efficiently tuned parameters are task embeddings[Paper]
- Fisher task distance and its application in neural architecture search[Paper]
- Leveraging task transferability to meta-learning for clinical section classification with limited data[Paper]
- Transferability Between Regression Tasks[Paper]
- Dataset2vec: Learning dataset meta-features[Paper]
- CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP[ACL]
- Exploring the role of task transferability in large-scale multi-task learning[Paper]
- Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance [ECML PKDD]
- Frustratingly Easy Transferability Estimation [ICML] [Slides]
- Transferability Estimation Using Bhattacharyya Class Separability [CVPR]
- Transferability Metrics for Selecting Source Model Ensembles [CVPR]
- How stable are Transferability Metrics evaluations? [ECCV] [TensorFlow]
- Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS] [Codes]
- Neural Transferability: Current Pitfalls and Striving for Optimal Scores [Paper]
- Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs [JMLR]
- PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks [ECCV] [Codes]
- Which Model to Transfer? Finding the Needle in the Growing Haystack [CVPR]
- Evidence > Intuition: Transferability Estimation for Encoder Selection [EMNLP]
- Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space [ECCV]
- Efficient Semantic Segmentation Backbone Evaluation for Unmanned Surface Vehicles based on Likelihood Distribution Estimation [MSN]
- ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [NeurIPS]
- Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS]
- Predicting Out-of-Distribution Error with the Projection Norm [paper]
- Agreement-on-the-line: Predicting the performance of neural networks under distribution shift [NeurIPS]
- Leveraging unlabeled data to predict out-of-distribution performance [Paper]
- Estimating and Explaining Model Performance When Both Covariates and Labels Shift [NeurIPS]
- Unsupervised and semi-supervised bias benchmarking in face recognition [ECCV]
- On the strong correlation between model invariance and generalization [NeurIPS]
- Active surrogate estimators: An active learning approach to label-efficient model evaluation [NeurIPS]
- Predicting out-of-domain generalization with local manifold smoothness [Paper]
- Predicting the Generalization Gap in Deep Models using Anchoring [ICASSP]
- A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning [NeurIPS]
- Transferability Estimation for Semantic Segmentation Task []
- OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations [CVPR] [Poster]
- Practical Transferability Estimation for Image Classification Tasks [ArXiv]
- What to pre-train on? efficient intermediate task selection [EMNLP]
- Efficiently identifying task groupings for multi-task learning[NeurIPS]
- The information complexity of learning tasks, their structure and their distance[Paper]
- An information-geometric distance on the space of tasks](https://proceedings.mlr.press/v139/gao21a.html)
- [ImageDataset2Vec: An image dataset embedding for algorithm selection[Paper]
- Similarity of classification tasks[Paper]
- Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification[Paper]
- Analysis and Prediction of NLP models via Task Embeddings[Paper]
- Inter-task similarity measure for heterogeneous tasks[Paper]
- Ranking Neural Checkpoints [CVPR]
- LogME: Practical Assessment of Pre-trained Models for Transfer Learning [ICML] [PyTorch]
- Scalable Diverse Model Selection for Accessible Transfer Learning [NeurIPS] [PyTorch]
- A linearized framework and a new benchmark for model selection for fine-tuning [ArXiv]
- Are Labels Always Necessary for Classifier Accuracy Evaluation? [ICCV]
- Predicting With Confidence on Unseen Distributions [ICCV]
- What does rotation prediction tell us about classifier accuracy under varying testing environments?[ICML]
- Detecting errors and estimating accuracy on unlabeled data with self-training ensembles[NeurIPS]
- Ranking models in unlabeled new environments [ICCV]
- Duality diagram similarity: a generic framework for initialization selection in task transfer learning [ECCV]
- Exploring and Predicting Transferability across NLP Tasks [EMNLP]
- Geometric Dataset Distances via Optimal Transport [NeurIPS]
- Similarity of neural networks with gradients[Paper]
- Measuring and Harnessing Transference in Multi-Task Learning [Ar]
- LEEP: A New Measure to Evaluate Transferability of Learned Representations [ICML] [Slides] [PyTorch]
- Source Model Selection for Deep Learning in the Time Series Domain [IEEE Access]
- [Ranking and rejecting of pre-trained deep neural networks in transfer learning based on separation index][ArXiv]
- DEPARA: Deep Attribution Graph for Deep Knowledge Transferability [Paper]
- Predicting neural network accuracy from weight [Paper]
- Computing the testing error without a testing set [CVPR]
- Fantastic generalization measures and where to find them [ICLR]
- TASK2VEC: Task Embedding for Meta-Learning [ICCV]
- Finding the Most Transferable Tasks for Brain Image Segmentation [BIBM]
- aserstein Task Ebei for Measring Tas imilaitis [ArXiv] 0.17)
- Zero-Shot Task Transfer
- Transferability and Hardness of Supervised Classification Tasks [ICCV]
- An informationtheoretic approach to transferability in task transfer learning [ICIP] [Codes]
- Model reuse with reduced kernel mean embedding specification [ArXiv]
- TASK2VEC: Task Embedding for Meta-Learning [ICCV]
- Service Metric Prediction in Clouds using Transfer Learning [DiVA]
- Predicting the Generalization Gap in Deep Networks with Margin Distributions [ICLR]
- Taskonomy: Disentangling Task Transfer Learning [CVPR Best Paper]
- Dynamics and reachability of learning tasks[Paper]
- Stronger generalization bounds for deep nets via a compression approach [ICML]
- Exploring generalization in deep learning [NeurIPS]
- Estimating accuracy from unlabeled data: A probabilistic logic approach [NeurIPS]
- Learning to Select Pre-trained Deep Representations with Bayesian Evidence Framework [CVPR]
- Learning with rejection [Paper]
- Estimating accuracy from unlabeled data: A bayesian approach [ICML]
- Using model disagreement on unlabeled data to validate classification algorithms [NeurIPS]