# Text-to-Speech with Orpheus TTS models

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/parasail-ai/cookbook/blob/main/orpheus_demo.ipynb)

[Orpheus TTS](https://canopylabs.ai/model-releases) is a series of open-source text-to-speech models using the same architecture as Llama 3. As a result, the model can run just like any other Llama-based LLM. In this cookbook, we show how to use Orpheus TTS for text-to-speech applications.

Note that currently the [SNAC audio codec](https://github.com/hubertsiuzdak/snac) has to be run on the client-side.

## Hello world

First, import the needed dependencies and define a helper function to convert tokens into audio data.

```python
!pip install torchaudio snac openai soundfile &> /dev/null

from IPython.display import Audio, display

import torch
import openai
import snac
import re
import torchaudio

AUDIO_TOKENS_REGEX = re.compile(r"<custom_token_(\d+)>")

# Convert tokens into audio data
def convert_to_audio(audio_ids, model):
  audio_ids = torch.tensor(audio_ids, dtype=torch.int32).reshape(-1, 7)
  codes_0 = audio_ids[:, 0].unsqueeze(0)
  codes_1 = torch.stack((audio_ids[:, 1], audio_ids[:, 4])).t().flatten().unsqueeze(0)
  codes_2 = (
    torch.stack((audio_ids[:, 2], audio_ids[:, 3], audio_ids[:, 5], audio_ids[:, 6]))
    .t()
    .flatten()
    .unsqueeze(0)
  )

  with torch.inference_mode():
    audio_hat = model.decode([codes_0, codes_1, codes_2])

  return audio_hat[0]
```

Next, prompt the model to generate audio.

```python
try:
  from google.colab import userdata
  API_KEY = userdata.get('OPENAI_API_KEY')
except:
  import os
  API_KEY = os.getenv('OPENAI_API_KEY')

snac_model = snac.SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
oai_client = openai.OpenAI(
    base_url="https://api.parasail.io/v1",
    api_key=API_KEY)

# Voice options: "tara", "leah", "jess", "leo", "dan", "mia", "zac", "zoe"
voice = "tara"

# Available emotive tag: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>
text_prompt = """<chuckle> Hello world!"""

prompt = f"<custom_token_3><|begin_of_text|>{voice}: {text_prompt}<|eot_id|><custom_token_4><custom_token_5><custom_token_1>"

# Tips for prompting [1]:
#  - Sampling parameters 'temperature' and 'top_p' work just like regular LLMs.
#  - 'repetition_penalty' >= 1.1 is required for stable generations.
#  - Increasing 'repetition_penalty' and/or 'temperature' makes the model speak faster.
#
# [1]: https://github.com/canopyai/Orpheus-TTS/tree/main?tab=readme-ov-file#prompting
response = oai_client.completions.create(
    model="parasail-orpheus-3b-01-ft",
    prompt=prompt,
    temperature=0.6,
    top_p=0.9,
    max_tokens=10240,
    extra_body={
        "stop_token_ids": [128258],
        "repetition_penalty": 1.1,
    },
)

text = response.choices[0].text
audio_ids = [
    int(token) - 10 - ((index % 7) * 4096)
    for index, token in enumerate(AUDIO_TOKENS_REGEX.findall(text))
]
audio = convert_to_audio(audio_ids, snac_model)
torchaudio.save("output.wav", audio, sample_rate=24000)

print(f"Audio duration: {audio.shape[1] / 24000:.2f}s")
display(Audio("output.wav"))
```

```
Audio duration: 2.65s
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.parasail.io/parasail-docs/cookbooks/text-to-speech-orpheus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
