# GPT-OSS 20b and 120b

## GPT-OSS Reasoning Control

GPT-OSS models support **explicit reasoning control** via two parameters:

* **`thinking_budget`**—limits how many internal *thinking tokens* the model can use
* **`reasoning_effort`**—qualitative control over how hard the model reasons

These are especially useful for trading off **depth vs latency/cost**.

***

#### Parameters

**`thinking_budget`**

* Integer
* Upper bound on internal reasoning tokens
* Higher = deeper reasoning, slower & more expensive

**`reasoning_effort`**

* `"low" | "medium" | "high"`
* Controls how deliberate the reasoning style is
* Typically paired with a thinking budget

***

### Python (OpenAI client)

```python
extra_params = {
    "extra_body": {
        "custom_params": {"thinking_budget": 40},
        "chat_template_kwargs": {"reasoning_effort": "high"},
    }
}

resp = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "user", "content": "Explain why the sky is blue."}
    ],
    **extra_params,
)
```

***

### Direct HTTP (curl)

```bash
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "Explain why the sky is blue." }
    ],
    "custom_params": {
      "thinking_budget": 40
    },
    "chat_template_kwargs": {
      "reasoning_effort": "high"
    }
  }'
```

***

### Practical guidance

* **Fast / cheap**: `thinking_budget: 10`, `reasoning_effort: "low"`
* **Balanced default**: `thinking_budget: 25`, `reasoning_effort: "medium"`
* **Deep reasoning**: `thinking_budget: 40+`, `reasoning_effort: "high"`

> These controls affect *internal reasoning only*—the final response stays concise unless you ask otherwise.

## Structured outputs (JSON Schema) with GPT-OSS

**Goal:** force the model to return *only valid JSON* that matches your schema. On Parasail, you do this by sending `response_format.type = "json_schema"` (same shape as OpenAI), and—optionally—adding GPT-OSS reasoning knobs via `extra_body`.

***

**Minimal example (Parasail + GPT-OSS)**

```python
import os, json
from openai import OpenAI

MODEL = "parasail-gpt-oss-20b-fast"
client = OpenAI(api_key=os.environ["PARASAIL_API_KEY"], base_url="https://api.parasail.io/v1")

PRODUCT_REVIEW_SCHEMA = {
  "type": "object",
  "properties": {
    "rating": {"type": "integer", "minimum": 1, "maximum": 5},
    "pros":   {"type": "array", "items": {"type": "string"}},
    "cons":   {"type": "array", "items": {"type": "string"}},
    "summary":{"type": "string"},
  },
  "required": ["rating", "pros", "cons", "summary"],
  "additionalProperties": False,
}

resp = client.chat.completions.create(
  model=MODEL,
  messages=[{"role":"user","content":"Write a short review of a flagship smartphone."}],
  response_format={
    "type": "json_schema",
    "json_schema": {"name": "product_review", "strict": True, "schema": PRODUCT_REVIEW_SCHEMA},
  },
  # GPT-OSS / Parasail knobs (optional)
  extra_body={
    "custom_params": {"thinking_budget": 50},
    "chat_template_kwargs": {"reasoning_effort": "medium"},
  },
)

data = json.loads(resp.choices[0].message.content)
print(data)
```

**Notes**

* `strict: True` + `additionalProperties: False` is the combo that keeps output tight.
* Always `json.loads(...)` the returned `message.content` and treat that as the source of truth.
* If you stream, you’ll still receive JSON text in `delta.content`—just concatenate and parse at the end (like your `stream_completion()` does).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.parasail.io/parasail-docs/serverless-and-models/gpt-oss-20b-and-120b.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
