GitHub - YuheD/awesome-performance-evaluation

A collection of papers in performance evaluation.

Involved Topics:

Transferability Estimation
Model/Dataset Vectorization
Model/Algorithm/Representation Evaluation
Generalization Gap Prediction
Out-of-distribution Error Prediction
Accuracy Prediction
Model Validation
Calibration Error Prediction
Confidence Calibration

Survey

A Survey on Evaluation of Out-of-Distribution Generalization [Paper]
Which Model to Transfer? A Survey on Transferability Estimation [Paper]
A Survey of Language Model Confidence Estimation and Calibration [Paper]
Calibration of Neural Networks [Paper]

2024

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress [Paper]
Energy-based Automated Model Evaluation [Paper]
Rethinking The Uniformity Metric in Self-Supervised Learning [Paper]
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning [Paper]
Revisiting Confidence Estimation: Towards Reliable Failure Prediction [TPAMI] [Code]
- Conference ver. : Rethinking Confidence Calibration for Failure Prediction [ECCV22]
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets [Paper]
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks [Paper]
Online GNN Evaluation Under Test-time Graph Distribution Shifts [ICLR]

2023

Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples [ArXiv]
K-Means Clustering Based Feature Consistency Alignment for Label-Free Model Evaluation [CVPR Workshop]
Predicting Out-of-Domain Generalization with Neighborhood Invariance [TMLR]
Test Accuracy vs. Generalization Gap: Model Selection in NLP without Accessing Training or Testing Data [SIGKDD]
Analysis of Task Transferability in Large Pre-trained Classifiers [Under Review]
A Bag-of-Prototypes Representation for Dataset-Level Applications [CVPR]
DataMap: Dataset transferability map for medical image classification [PR]
To transfer or not transfer: Unified transferability metric and analysis [ArXiv]
Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [Artificial Intelligence in Medicine Xiv]
Identification of Negative Transfers in Multitask Learning Using Surrogate Models [TMLRArXiv]
Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [AIM]
Model selection, adaptation, and combination for transfer learning in wind and photovoltaic power forecasts [Energy and AI]
Identifying Useful Learnwares for Heterogeneous Label Spaces [ICML]
Transferability prediction among classification and regression tasks using optimal transport [Multimedia Tools and Applications]
Choosing public datasets for private machine learning via gradient subspace distance[Paper]
Learning to Predict Task Transferability via Soft Prompt[Paper]
Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes[Paper]
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
How to Estimate Model Transferability of Pre-Trained Speech Models? [InterSpeech]
TaskWeb: Selecting Better Source Tasks for Multi-task NLP[[Paper]](https://arxiv.org/abs/2305.13256
Feasibility and Transferability of Transfer Learning: A Mathematical Framework [ArXiv]
Topological Vanilla Transfer Learning [Paper]
Model Spider: Learning to Rank Pre-Trained Models Efficiently [Arxiv]
Towards Estimating Transferability using Hard Subsets [ArXiv]
Pick the Best Pre-trained Model: Towards Transferability Estimation for Medical Image Segmentation [MICCAI]
Simple Transferability Estimation for Regression Tasks [UAI]
Transferability Metrics for Object Detection [ArXiv]
Fast and Accurate Transferability Measurement by Evaluating Intra-class Feature Variance[ArXiv]
ETran: Energy-Based Transferability Estimation [ICCV]
How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability [ICCV]
Exploring Model Transferability through the Lens of Potential Energy[ICCV]
Unleashing the power of Neural Collapse for Transferability Estimation [ArXiv]
Foundation Model is Efficient Multimodal Mltitask Model Selector [ArXiv]
?[multi-model] Towards Robust Multi-Modal Reasoning via Model Selection [ArXiv]
Graph-based fine-grained model selection for multi-source domain [PAA]
Guided Recommendation for Model Fine-Tuning [CVPR]
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [ArXiv]
Estimating the Transfer Learning Ability of a Deep Neural Networks by Means of Representations [NCMLCR]
Source Selection based on Diversity for Machine Learning [Patent ]
Efficient Prediction of Model Transferability in Semantic Segmentation Tasks [ICIP]
The Performance of Transferability Metrics Does Not Translate to Medical Tasks [MICCAI workshop]
How to Estimate Model Transferability of Pre-Trained Speech Models? [Interspeech]
How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey [ArXiv]
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
Guided recommendation for model fine-tuning[Paper]
LOVM: Language-Only Vision Model Selection [NeurIPSW]
RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank [ICML Oral]
T-Measure: A Measure for Model Transferabilty [Under Review]
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [TMLR]
Domain Adaptation for Network Performance Modeling with and without Labeled Data [NOMS]
Content-Based Search for Deep Generative Models [ArXiv]
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [NeurIPS]
Learning inter-task transferability in the absence of target task samples[Paper]
Model selection for cross-lingual transfer[Paper]
ModelGiF: Gradient Fields for Model Functional Distance[Paper]
Predicting Out-of-Distribution Error with Confidence Optimal Transport [Paper]
Gnnevaluator: Evaluating gnn performance on unseen graphs without labels [NeurIPS]
Came: Contrastive automated model evaluation [ICCV]
On the Importance of Feature Separability in Predicting Out-Of-Distribution Error [NeurIPS]
Characterizing out-of-distribution error via optimal transport [NeurIPS]
What can we Learn by Predicting Accuracy? [WACV]
Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis [Paper]

2022

On the Relationship Between Explanation and Prediction: A Causal View [ArXiv]
The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs [AIR]
Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy [ISITA]
Transferability Estimation Based On Principal Gradient Expectation [ArXiv]
Transferability-Guided Cross-Domain Cross-Task Transfer Learning [ArXiv]
Wasserstein Task Embedding for Measuring Task Similarities [ArXiv][Code]
Efficiently tuned parameters are task embeddings[Paper]
Fisher task distance and its application in neural architecture search[Paper]
Leveraging task transferability to meta-learning for clinical section classification with limited data[Paper]
Transferability Between Regression Tasks[Paper]
Dataset2vec: Learning dataset meta-features[Paper]
CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP[ACL]
Exploring the role of task transferability in large-scale multi-task learning[Paper]
Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance [ECML PKDD]
Frustratingly Easy Transferability Estimation [ICML] [Slides]
Transferability Estimation Using Bhattacharyya Class Separability [CVPR]
Transferability Metrics for Selecting Source Model Ensembles [CVPR]
How stable are Transferability Metrics evaluations? [ECCV] [TensorFlow]
Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS] [Codes]
Neural Transferability: Current Pitfalls and Striving for Optimal Scores [Paper]
Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs [JMLR]
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks [ECCV] [Codes]
Which Model to Transfer? Finding the Needle in the Growing Haystack [CVPR]
Evidence > Intuition: Transferability Estimation for Encoder Selection [EMNLP]
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space [ECCV]
Efficient Semantic Segmentation Backbone Evaluation for Unmanned Surface Vehicles based on Likelihood Distribution Estimation [MSN]
ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [NeurIPS]
Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS]
Predicting Out-of-Distribution Error with the Projection Norm [paper]
Agreement-on-the-line: Predicting the performance of neural networks under distribution shift [NeurIPS]
Leveraging unlabeled data to predict out-of-distribution performance [Paper]
Estimating and Explaining Model Performance When Both Covariates and Labels Shift [NeurIPS]
Unsupervised and semi-supervised bias benchmarking in face recognition [ECCV]
On the strong correlation between model invariance and generalization [NeurIPS]
Active surrogate estimators: An active learning approach to label-efficient model evaluation [NeurIPS]
Predicting out-of-domain generalization with local manifold smoothness [Paper]
Predicting the Generalization Gap in Deep Models using Anchoring [ICASSP]

2021

A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning [NeurIPS]
Transferability Estimation for Semantic Segmentation Task []
OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations [CVPR] [Poster]
Practical Transferability Estimation for Image Classification Tasks [ArXiv]
What to pre-train on? efficient intermediate task selection [EMNLP]
Efficiently identifying task groupings for multi-task learning[NeurIPS]
The information complexity of learning tasks, their structure and their distance[Paper]
An information-geometric distance on the space of tasks](https://proceedings.mlr.press/v139/gao21a.html)
[ImageDataset2Vec: An image dataset embedding for algorithm selection[Paper]
Similarity of classification tasks[Paper]
Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification[Paper]
Analysis and Prediction of NLP models via Task Embeddings[Paper]
Inter-task similarity measure for heterogeneous tasks[Paper]
Ranking Neural Checkpoints [CVPR]
LogME: Practical Assessment of Pre-trained Models for Transfer Learning [ICML] [PyTorch]
Scalable Diverse Model Selection for Accessible Transfer Learning [NeurIPS] [PyTorch]
A linearized framework and a new benchmark for model selection for fine-tuning [ArXiv]
Are Labels Always Necessary for Classifier Accuracy Evaluation? [ICCV]
Predicting With Confidence on Unseen Distributions [ICCV]
What does rotation prediction tell us about classifier accuracy under varying testing environments?[ICML]
Detecting errors and estimating accuracy on unlabeled data with self-training ensembles[NeurIPS]
Ranking models in unlabeled new environments [ICCV]

2020

Duality diagram similarity: a generic framework for initialization selection in task transfer learning [ECCV]
Exploring and Predicting Transferability across NLP Tasks [EMNLP]
Geometric Dataset Distances via Optimal Transport [NeurIPS]
Similarity of neural networks with gradients[Paper]
Measuring and Harnessing Transference in Multi-Task Learning [Ar]
LEEP: A New Measure to Evaluate Transferability of Learned Representations [ICML] [Slides] [PyTorch]
Source Model Selection for Deep Learning in the Time Series Domain [IEEE Access]
[Ranking and rejecting of pre-trained deep neural networks in transfer learning based on separation index][ArXiv]
DEPARA: Deep Attribution Graph for Deep Knowledge Transferability [Paper]
Predicting neural network accuracy from weight [Paper]
Computing the testing error without a testing set [CVPR]
Fantastic generalization measures and where to find them [ICLR]

2019

TASK2VEC: Task Embedding for Meta-Learning [ICCV]
Finding the Most Transferable Tasks for Brain Image Segmentation [BIBM]
aserstein Task Ebei for Measring Tas imilaitis [ArXiv] 0.17)
Zero-Shot Task Transfer
Transferability and Hardness of Supervised Classification Tasks [ICCV]
An informationtheoretic approach to transferability in task transfer learning [ICIP] [Codes]
Model reuse with reduced kernel mean embedding specification [ArXiv]
TASK2VEC: Task Embedding for Meta-Learning [ICCV]
Service Metric Prediction in Clouds using Transfer Learning [DiVA]
Predicting the Generalization Gap in Deep Networks with Margin Distributions [ICLR]

2018

Taskonomy: Disentangling Task Transfer Learning [CVPR Best Paper]
Dynamics and reachability of learning tasks[Paper]
Stronger generalization bounds for deep nets via a compression approach [ICML]

2017

Exploring generalization in deep learning [NeurIPS]
Estimating accuracy from unlabeled data: A probabilistic logic approach [NeurIPS]

2016

Learning to Select Pre-trained Deep Representations with Bayesian Evidence Framework [CVPR]
Learning with rejection [Paper]
Estimating accuracy from unlabeled data: A bayesian approach [ICML]

2004

Using model disagreement on unlabeled data to validate classification algorithms [NeurIPS]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Involved Topics:

Survey

2024

2023

2022

2021

2020

2019

2018

2017

2016

2004

About

Releases

Packages

YuheD/awesome-performance-evaluation

Folders and files

Latest commit

History

Repository files navigation

Involved Topics:

Survey

2024

2023

2022

2021

2020

2019

2018

2017

2016

2004

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages