Introducing Min-p Sampling: A Smarter Way to Sample from LLM

Written by Wand Team | Jun 5, 2025 5:56:59 PM

Ravid Shwartz-Ziv [X, LinkedIn] & Allen Roush [X, LinkedIn]

Effective token sampling is crucial for balancing coherence and creativity in text generation with Large Language Models (LLMs). Traditional methods like top-p (nucleus) sampling apply a fixed probability cutoff, disregarding the model's confidence level at each step. This can lead to suboptimal performance, especially at higher temperatures intended to enhance creativity.

Introducing Min-p Sampling:

Min-p sampling offers a dynamic alternative by adjusting the sampling threshold in accordance with the model's confidence. Specifically, it scales the probability threshold based on the top token's probability, allowing the model to focus on high-confidence tokens when certain, and to consider a broader range of tokens when less confident. This adaptive approach facilitates a better balance between coherence and diversity in generated text.

Key Features of Min-p Sampling:

Dynamic Threshold Adjustment: The sampling threshold is scaled according to the model's confidence, providing adaptive token selection that enhances text quality.
Robustness at High Temperatures: Unlike traditional methods, min-p sampling maintains coherence and diversity even at elevated temperature settings, which are typically used to encourage creative outputs.
Empirical Validation: Extensive experiments on benchmarks such as GPQA, GSM8K, and AlpacaEval Creative Writing demonstrate that min-p sampling outperforms top-p sampling in both quality and diversity of generated text. Human evaluations further reveal a clear preference for outputs generated using min-p sampling.
Open-Source Integration: Min-p sampling has been incorporated into leading open-source LLM frameworks, including Hugging Face Transformers and vLLM, facilitating its adoption in various applications.

Implementation and Adoption:

The integration of min-p sampling into Hugging Face Transformers and vLLM underscores its practical utility. Developers can readily implement this method to enhance the performance of LLM-based applications, benefiting from its dynamic approach to token sampling.

For a comprehensive understanding of min-p sampling, including detailed methodologies, experimental results, and implementation guidelines, refer to the full paper.

Note from the team:

We are honored that our work on min-p sampling has been recognized with an oral presentation at ICLR 2025. We invite the research community and practitioners to explore this approach and consider its integration into their LLM applications.

Feedback and contributions are welcome to further advance this method.

View full post