GPT-OSS 20b and 120b

GPT-OSS Reasoning Control

GPT-OSS models support explicit reasoning control via two parameters:

thinking_budget—limits how many internal thinking tokens the model can use
reasoning_effort—qualitative control over how hard the model reasons

These are especially useful for trading off depth vs latency/cost.

Parameters

thinking_budget

Integer
Upper bound on internal reasoning tokens
Higher = deeper reasoning, slower & more expensive

reasoning_effort

"low" | "medium" | "high"
Controls how deliberate the reasoning style is
Typically paired with a thinking budget

Python (OpenAI client)

extra_params = {
    "extra_body": {
        "custom_params": {"thinking_budget": 40},
        "chat_template_kwargs": {"reasoning_effort": "high"},
    }
}

resp = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "user", "content": "Explain why the sky is blue."}
    ],
    **extra_params,
)

Direct HTTP (curl)

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "Explain why the sky is blue." }
    ],
    "custom_params": {
      "thinking_budget": 40
    },
    "chat_template_kwargs": {
      "reasoning_effort": "high"
    }
  }'

Practical guidance

Fast / cheap: thinking_budget: 10, reasoning_effort: "low"
Balanced default: thinking_budget: 25, reasoning_effort: "medium"
Deep reasoning: thinking_budget: 40+, reasoning_effort: "high"

These controls affect internal reasoning only—the final response stays concise unless you ask otherwise.

Structured outputs (JSON Schema) with GPT-OSS

Goal: force the model to return only valid JSON that matches your schema. On Parasail, you do this by sending response_format.type = "json_schema" (same shape as OpenAI), and—optionally—adding GPT-OSS reasoning knobs via extra_body.

Minimal example (Parasail + GPT-OSS)

import os, json
from openai import OpenAI

MODEL = "parasail-gpt-oss-20b-fast"
client = OpenAI(api_key=os.environ["PARASAIL_API_KEY"], base_url="https://api.parasail.io/v1")

PRODUCT_REVIEW_SCHEMA = {
  "type": "object",
  "properties": {
    "rating": {"type": "integer", "minimum": 1, "maximum": 5},
    "pros":   {"type": "array", "items": {"type": "string"}},
    "cons":   {"type": "array", "items": {"type": "string"}},
    "summary":{"type": "string"},
  },
  "required": ["rating", "pros", "cons", "summary"],
  "additionalProperties": False,
}

resp = client.chat.completions.create(
  model=MODEL,
  messages=[{"role":"user","content":"Write a short review of a flagship smartphone."}],
  response_format={
    "type": "json_schema",
    "json_schema": {"name": "product_review", "strict": True, "schema": PRODUCT_REVIEW_SCHEMA},
  },
  # GPT-OSS / Parasail knobs (optional)
  extra_body={
    "custom_params": {"thinking_budget": 50},
    "chat_template_kwargs": {"reasoning_effort": "medium"},
  },
)

data = json.loads(resp.choices[0].message.content)
print(data)

Notes

strict: True + additionalProperties: False is the combo that keeps output tight.
Always json.loads(...) the returned message.content and treat that as the source of truth.
If you stream, you’ll still receive JSON text in delta.content—just concatenate and parse at the end (like your stream_completion() does).

PreviousDeepseek V3.X NextCapacity of Dedicated Serverless Endpoints

Last updated 1 month ago

hashtagGPT-OSS Reasoning Control

hashtagParameters

hashtagPython (OpenAI client)

hashtagDirect HTTP (curl)

hashtagPractical guidance