GPT-OSS 20b and 120b
GPT-OSS Reasoning Control
GPT-OSS models support explicit reasoning control via two parameters:
thinking_budget—limits how many internal thinking tokens the model can usereasoning_effort—qualitative control over how hard the model reasons
These are especially useful for trading off depth vs latency/cost.
Parameters
thinking_budget
Integer
Upper bound on internal reasoning tokens
Higher = deeper reasoning, slower & more expensive
reasoning_effort
"low" | "medium" | "high"Controls how deliberate the reasoning style is
Typically paired with a thinking budget
Python (OpenAI client)
Direct HTTP (curl)
Practical guidance
Fast / cheap:
thinking_budget: 10,reasoning_effort: "low"Balanced default:
thinking_budget: 25,reasoning_effort: "medium"Deep reasoning:
thinking_budget: 40+,reasoning_effort: "high"
These controls affect internal reasoning only—the final response stays concise unless you ask otherwise.
Structured outputs (JSON Schema) with GPT-OSS
Goal: force the model to return only valid JSON that matches your schema. On Parasail, you do this by sending response_format.type = "json_schema" (same shape as OpenAI), and—optionally—adding GPT-OSS reasoning knobs via extra_body.
Minimal example (Parasail + GPT-OSS)
Notes
strict: True+additionalProperties: Falseis the combo that keeps output tight.Always
json.loads(...)the returnedmessage.contentand treat that as the source of truth.If you stream, you’ll still receive JSON text in
delta.content—just concatenate and parse at the end (like yourstream_completion()does).
Last updated