Available Parameters
We currently support the availablee parameters that vLLM supports.
Main Sampling Parameters:
temperature
(default: 1.0)Controls randomness in token selection.
Higher values (>1.0) increase randomness.
Lower values (<1.0) make outputs more deterministic.
A value of 0 forces greedy decoding.
top_p
(Nucleus Sampling) (default: 1.0)Controls the probability mass of token selection.
Only considers tokens that sum up to
top_p
probability.Lower values (e.g., 0.9) limit token choices to more likely options.
Higher values (close to 1.0) allow a broader range of tokens.
top_k
(default:-1
, which means disabled)Limits token selection to the top
k
most probable tokens.Lower values (e.g.,
top_k=50
) make output more deterministic.If
-1
, this setting is ignored.
max_tokens
(default:None
)Sets the maximum number of tokens to generate.
Helps prevent excessively long responses.
repetition_penalty
(default: 1.0)Penalizes repeated tokens to avoid looping responses.
Values >1.0 discourage repetition.
Common values: 1.1 - 1.2.
presence_penalty
(default: 0.0)Increases the likelihood of introducing new tokens.
Useful for making outputs more diverse.
frequency_penalty
(default: 0.0)Penalizes tokens that have appeared frequently.
Helps prevent excessive repetition of common words.
Advanced Sampling Techniques:
do_sample
(default:True
)Enables stochastic sampling (when set to
True
).If
False
, uses greedy decoding (choosing the highest probability token at each step).
random_seed
(default:None
)Sets a fixed seed for reproducible results.
Useful for debugging or deterministic sampling.
guided_choice
Allows constraining outputs to predefined options.
Ensures responses align with a given list of acceptable completions.
How These Work Together:
Setting
temperature=0
forces deterministic output (greedy decoding).Using
top_p
andtop_k
together balances diversity and coherence.repetition_penalty
andpresence_penalty
help avoid repetitive loops.
Last updated