Ravid Shwartz-Ziv [X, LinkedIn] & Allen Roush [X, LinkedIn]
Effective token sampling is crucial for balancing coherence and creativity in text generation with Large Language Models (LLMs). Traditional methods like top-p (nucleus) sampling apply a fixed probability cutoff, disregarding the model's confidence level at each step. This can lead to suboptimal performance, especially at higher temperatures intended to enhance creativity.
Min-p sampling offers a dynamic alternative by adjusting the sampling threshold in accordance with the model's confidence. Specifically, it scales the probability threshold based on the top token's probability, allowing the model to focus on high-confidence tokens when certain, and to consider a broader range of tokens when less confident. This adaptive approach facilitates a better balance between coherence and diversity in generated text.
The integration of min-p sampling into Hugging Face Transformers and vLLM underscores its practical utility. Developers can readily implement this method to enhance the performance of LLM-based applications, benefiting from its dynamic approach to token sampling.
For a comprehensive understanding of min-p sampling, including detailed methodologies, experimental results, and implementation guidelines, refer to the full paper.
We are honored that our work on min-p sampling has been recognized with an oral presentation at ICLR 2025. We invite the research community and practitioners to explore this approach and consider its integration into their LLM applications.
Feedback and contributions are welcome to further advance this method.