Available Parameters
We currently support the available parameters that vLLM supports.
Main Sampling Parameters:
temperature(default: 1.0)Controls randomness in token selection.
Higher values (>1.0) increase randomness.
Lower values (<1.0) make outputs more deterministic.
A value of 0 forces greedy decoding.
top_p(Nucleus Sampling) (default: 1.0)Controls the probability mass of token selection.
Only considers tokens that sum up to
top_pprobability.Lower values (for example, 0.9) limit token choices to more likely options.
Higher values (close to 1.0) allow a broader range of tokens.
top_k(default:-1, which means disabled)Limits token selection to the top
kmost probable tokens.Lower values (for example,
top_k=50) make output more deterministic.If
-1, this setting is ignored.Since
top_kis not defined in OpenAI spec, it should be passed inextra_bodyfield.
max_tokens(default:None)Sets the maximum number of tokens to generate.
Helps prevent excessively long responses.
repetition_penalty(default: 1.0)Penalizes repeated tokens to avoid looping responses.
Values >1.0 discourage repetition.
Common values: 1.1 - 1.2.
presence_penalty(default: 0.0)Increases the likelihood of introducing new tokens.
Useful for making outputs more diverse.
frequency_penalty(default: 0.0)Penalizes tokens that have appeared frequently.
Helps prevent excessive repetition of common words.
seed(default:None)Sets a fixed seed for reproducible results.
Useful for debugging or deterministic sampling.
How These Work Together:
Setting
temperature=0forces deterministic output (greedy decoding).Using
top_pandtop_ktogether balances diversity and coherence.repetition_penaltyandpresence_penaltyhelp avoid repetitive loops.
Last updated