Skip to content

Latest commit

 

History

History
104 lines (91 loc) · 7.23 KB

README.md

File metadata and controls

104 lines (91 loc) · 7.23 KB

llms-learning 📚 🦙

A repository sharing the literatures and resources about Large Language Models (LLMs) and beyond.

Hope you find this repository handy and helpful for your llms learning journey! 😊

News 🔥

  • 2025.01.25
    • Deepseek has unveiled its latest large reasoning model (LRM), Deepseek-R1, trained using its proprietary large-scale reinforcement-learning recipe, GRPO, built upon its newest pretrained large language model (LLM) Deepseek-V3. This achievement marks a significant milestone, particularly from two key perspectives:
        1. In the intense AGI competition sparked by OpenAI, it stands out as the first Chinese model to not only match but frequently surpass the performance of the state-of-the-art LRM OpenAI-o1, all while operating at a fraction of the computational cost.
        1. as for the exploration of AGI, it further validates the effectiveness of the inference scaling law and emergent abilities on reasoning capabilities through the innovative Long-CoT method introduced by OpenAI-o1, which mimics the System-2 Slow Thinking pattern of human intelligence.
  • 2025.01.15
    • Minimax has officially open-sourced their latest Mixture of Experts (MoE) model featuring Lightning Attention, named MiniMax-01, along with the paper, the code and the model!
    • I’m truly honored to have contributed as one of the authors of this groundbreaking work 😆!
  • 2024.10.24
    • Welcome to watch our new online free LLMs intro course on bilibili!
    • We also open-source the course assignments for you to take a deep dive into LLMs.
    • If you like this course or this repository, you can subscribe to the teacher's bilibili account and maybe ⭐ this GitHub repo 😜.
  • 2024.03.07
    • We offer a comprehensive notebook tutorial on efficient GPU kernel coding using Triton, building upon the official tutorials and extending them with additional hands-on examples, such as the Flash Attention 2 forward/backward kernel.
    • In addition, we also provide a step-by-step math derivation of Flash Attention 2, enabling a deeper understanding of its underlying mechanics.

Table of Contents


Note:

  • Each markdown file contains collected papers roughly sorted by published year in descending order; in other words, newer papers are generally placed at the top. However, this arrangement is not guaranteed to be completely accurate, as the published year may not always be clear.

  • The taxonomy is complex and not strictly orthogonal, so don't be surprised if the same paper appears multiple times under different tracks.