R1-Experiments we use this public project to run RL reasoning experiment for Llama and Qwen models source code from willccbb/grpo_demo.py Run pip install -r requirements.txt pip install flash-attn --no-build-isolation pip install git+https://github.com/huggingface/trl.git