The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, Ke Ji+, arXiv'25 #1813

AkihikoWatanabe · 2025-03-19T11:51:33Z

URL

https://arxiv.org/abs/2503.02875

Authors

Ke Ji
Jiahao Xu
Tian Liang
Qiuzhi Liu
Zhiwei He
Xingyu Chen
Xiaoyuan Liu
Zhijie Wang
Junying Chen
Benyou Wang
Zhaopeng Tu
Haitao Mi
Dong Yu

Abstract

Improving the reasoning capabilities of large language models (LLMs) typically requires supervised fine-tuning with labeled data or computationally expensive sampling. We introduce Unsupervised Prefix Fine-Tuning (UPFT), which leverages the observation of Prefix Self-Consistency -- the shared initial reasoning steps across diverse solution trajectories -- to enhance LLM reasoning efficiency. By training exclusively on the initial prefix substrings (as few as 8 tokens), UPFT removes the need for labeled data or exhaustive sampling. Experiments on reasoning benchmarks show that UPFT matches the performance of supervised methods such as Rejection Sampling Fine-Tuning, while reducing training time by 75% and sampling cost by 99%. Further analysis reveals that errors tend to appear in later stages of the reasoning process and that prefix-based training preserves the model's structural knowledge. This work demonstrates how minimal unsupervised fine-tuning can unlock substantial reasoning gains in LLMs, offering a scalable and resource-efficient alternative to conventional approaches.

Translation (by gpt-4o-mini)

大規模言語モデル（LLMs）の推論能力を向上させるためには、通常、ラベル付きデータを用いた教師ありファインチューニングや計算コストの高いサンプリングが必要です。本研究では、Prefix Self-Consistencyの観察を活用した「非教師ありプレフィックスファインチューニング（UPFT）」を提案します。これは、多様な解法の軌跡における共通の初期推論ステップを利用して、LLMの推論効率を向上させるものです。UPFTは、初期のプレフィックス部分文字列（わずか8トークン）にのみ基づいて訓練を行うため、ラベル付きデータや徹底的なサンプリングの必要がありません。推論ベンチマークにおける実験では、UPFTがRejection Sampling Fine-Tuningなどの教師あり手法と同等の性能を示しながら、訓練時間を75%、サンプリングコストを99%削減することが分かりました。さらに分析を行った結果、エラーは推論プロセスの後半に現れる傾向があり、プレフィックスベースの訓練がモデルの構造的知識を保持することが明らかになりました。本研究は、最小限の非教師ありファインチューニングがLLMにおける大幅な推論向上を実現できることを示し、従来のアプローチに対するスケーラブルでリソース効率の良い代替手段を提供します。

Summary (by gpt-4o-mini)

非教師ありプレフィックスファインチューニング（UPFT）を提案し、LLMの推論効率を向上。初期のプレフィックス部分文字列に基づいて訓練し、ラベル付きデータやサンプリングを不要に。UPFTは、教師あり手法と同等の性能を維持しつつ、訓練時間を75%、サンプリングコストを99%削減。最小限の非教師ありファインチューニングで大幅な推論向上を実現し、リソース効率の良い代替手段を提供。

AkihikoWatanabe · 2025-03-19T12:39:21Z

斜め読みだが、reasoning traceの冒頭部分は重要な役割を果たしており、サンプリングした多くのresponseのreasoning traceにおいて共通しているものは重要という直感から（Prefix Self-Consistency）、reasoning traceの冒頭部分を適切に生成できるようにモデルをFinetuningする。従来のRejection Samplingを用いた手法では、複数のresponseを生成させて、最終的なanswerが正解のものをサンプリングするため正解ラベルが必要となるが、提案手法ではreasoning traceの冒頭部分の共通するsubsequenceをmajority voteするだけなのでラベルが不要である。

reasoning prefixを学習する際は下記のようなテンプレートを用いる。このときに、prefixのspanのみを利用して学習することで大幅に学習時間を削減できる。

また、そのような学習を行うとcatastrophic forgettingのリスクが非常に高いが、これを防ぐために、マルチタスクラーニングを実施する。具体的には学習データのp%については全体のreasoning traceを生成して学習に利用する。このときに、最終的な回答の正誤を気にせずtraceを生成して学習に利用することで、ラベルフリーな特性を維持できる（つまり、こちらのデータは良いreasoning traceを学習することを目的としているわけではなく、あくまでcatastrophic forgettingを防ぐためにベースモデルのようなtraceもきちんと生成できれば良い、という感覚だと思われる）。

AppendixにQwenを用いてtemperature 0.7で16個のresponseをサンプリングし、traceの冒頭部分が共通している様子が示されている。

AkihikoWatanabe · 2025-03-19T13:08:12Z

下記論文でlong-CoTを学習させる際のlong-CoTデータとして、reasoningモデルから生成したtraceと非reasoning modelから生成したtraceによるlong-CoTデータを比較したところ前者の方が一貫して学習性能が良かったとあるが、この研究でもreasoning traceをつよつよモデルで生成したら性能上がるんだろうか。

Demystifying Long Chain-of-Thought Reasoning in LLMs, Edward Yeo+, arXiv'25 #1746

AkihikoWatanabe added the Pocket label Mar 19, 2025

AkihikoWatanabe changed the title あ The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, Ke Ji+, arXiv'25 Mar 19, 2025

AkihikoWatanabe added Efficiency/SpeedUp Finetuning (SFT) Reasoning NLP Adapter/LoRA labels Mar 19, 2025

AkihikoWatanabe removed the Pocket label Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, Ke Ji+, arXiv'25 #1813

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, Ke Ji+, arXiv'25 #1813

AkihikoWatanabe commented Mar 19, 2025 •

edited

Loading

AkihikoWatanabe commented Mar 19, 2025 •

edited

Loading

AkihikoWatanabe commented Mar 19, 2025

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, Ke Ji+, arXiv'25 #1813

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, Ke Ji+, arXiv'25 #1813

Comments

AkihikoWatanabe commented Mar 19, 2025 • edited Loading

URL

Authors

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)

AkihikoWatanabe commented Mar 19, 2025 • edited Loading

AkihikoWatanabe commented Mar 19, 2025

AkihikoWatanabe commented Mar 19, 2025 •

edited

Loading

AkihikoWatanabe commented Mar 19, 2025 •

edited

Loading