[[A Practical Survey on Faster and Lighter Transformers]] : Popular approaches to make Transformers fasters and lighter by providing a comprehensive explanation of the methods' strengths, limitations and underlying assumptions
[[MS-Shift An Analysis of MS Marco Distribution Shifts on Neural Retrieval]] : This paper segments the MS Marco dataset to evaluate the three families of neural retrievers based on BERT - Sparse, Dense and Late Interaction.
[[Atomised Search Length Beyond User Models]] : Proposes a new IR metric called [[Atomised Search Length]] which helps to better reflect the quality of retrievals
Improving Language Understanding by Generative Pre-Training : How GPT-1 helped to bring forth a huge revolution in the NLP space by showing efficient transfer learning abilities
Language Models Are Unsupervised Multitask Learners : How GPT-2 challenged a traditional paradigm of pre-train -> fine-tune on task with its auto-regressive prompting ability
A Comprehensive Overview of Large Language Models : A walkthrough of the key ideas and concepts around large language models
[[ORPO Monolithic Preference Optimization without Reference Model]] : Using the odds ratio of LLM output generated directly to fine-tune a model. This combines the SFT and RLHF stages in a single stage
[[LoRA Learns Less and Forgets Less]] : An ablation study of LoRA's performance as compared to a full fine-tune when it comes to instruction fine-tuning and continued pre-training.
Matryoshka Embeddings: Training embedding models that are able to work at a variety of different embedding dimensions