GPT-OSS 20b and 120b

GPT-OSS Reasoning Control

GPT-OSS models support explicit reasoning control via two parameters:

  • thinking_budget – limits how many internal thinking tokens the model can use

  • reasoning_effort – qualitative control over how hard the model reasons

These are especially useful for trading off depth vs latency/cost.


Parameters

thinking_budget

  • Integer

  • Upper bound on internal reasoning tokens

  • Higher = deeper reasoning, slower & more expensive

reasoning_effort

  • "low" | "medium" | "high"

  • Controls how deliberate the reasoning style is

  • Typically paired with a thinking budget


Python (OpenAI client)


Direct HTTP (curl)


Practical guidance

  • Fast / cheap: thinking_budget: 10, reasoning_effort: "low"

  • Balanced default: thinking_budget: 25, reasoning_effort: "medium"

  • Deep reasoning: thinking_budget: 40+, reasoning_effort: "high"

These controls affect internal reasoning only — the final response stays concise unless you ask otherwise.

Structured outputs (JSON Schema) with GPT-OSS

Goal: force the model to return only valid JSON that matches your schema. On Parasail, you do this by sending response_format.type = "json_schema" (same shape as OpenAI), and—optionally—adding GPT-OSS reasoning knobs via extra_body.


Minimal example (Parasail + GPT-OSS)

Notes

  • strict: True + additionalProperties: False is the combo that keeps output tight.

  • Always json.loads(...) the returned message.content and treat that as the source of truth.

  • If you stream, you’ll still receive JSON text in delta.content—just concatenate and parse at the end (like your stream_completion() does).

Last updated