# Pricing

## Serverless Pricing: <a href="#serverless-pricing" id="serverless-pricing"></a>

Pricing for the “serverless” is token-based based on tokens split between input and output, and the amount you owe changes depending on which models you choose to use. You can find the pricing listed directly on the "Serverless" page for the Input/Output pricing. The pricing works per million tokens so if the model costs $1 for input and output pricing and you spend 250,000 tokens on input and 250,000 tokens on output, you get charged $.50 $0.25 for the input and $0.25 for the output.

<figure><img src="/files/pFgxtR7TTBGrfGjYQFfg" alt=""><figcaption></figcaption></figure>

## Dedicated Pricing: <a href="#dedicated-pricing" id="dedicated-pricing"></a>

Each dedicated instance costs according to graphics processing unit per hour. Parasail offers various configurations of the hardware fleet to hit your indicated cost, performance, and latency targets. You have the ability to have your dedicated instances automatically scale the number of graphics processing units as your workload fluctuates, but Parasail offers scale-down policy configuration to meet your needs. A scale down policy is when you want the server to automatically turn off. During run time you get the possible option and amount of replicas you want for the model you chose with the pricing displayed on the option:

<figure><img src="/files/1NmQKA8XqcocyS7RCdWP" alt=""><figcaption></figcaption></figure>

## Batch Pricing: <a href="#batch-pricing" id="batch-pricing"></a>

Pricing for the “batch” Use Case is token-based based on total amount of tokens, discounted to reflect the fact that your queries don't get processed in real time, and the amount you owe changes depending on which models you choose to use.

The default pricing bases itself on parameter size unless the model is a named model.

Batch gets billed on a 50% discount of the Serverless Price. Cached tokens are also an additional 30% off. If the model is an FP16 quant model it's 30% more, FP8 models incur no additional costs.

The current Price:

<table><thead><tr><th>Parameter Count</th><th>Size</th><th width="100">Serverless Price</th><th>Batch Price FP8</th><th>Batch Price FP16</th><th>Cache Price FP8</th><th>Cache Price FP16</th></tr></thead><tbody><tr><td>0-4 B</td><td>0-4 B</td><td>$0.05</td><td>$0.025</td><td>$0.033</td><td>$0.013</td><td>$0.016</td></tr><tr><td>4.1-8 B</td><td>4.1-8 B</td><td>$0.08</td><td>$0.040</td><td>$0.052</td><td>$0.020</td><td>$0.026</td></tr><tr><td>LLM_Model_8.1-16 B</td><td>8.1-16 B</td><td>$0.11</td><td>$0.055</td><td>$0.072</td><td>$0.028</td><td>$0.036</td></tr><tr><td>LLM_Model_16.1 B-21 B</td><td>16.1 B-21 B</td><td>$0.45</td><td>$0.225</td><td>$0.293</td><td>$0.113</td><td>$0.146</td></tr><tr><td>LLM_Model_21.1 B-41 B</td><td>21.1 B-41 B</td><td>$0.50</td><td>$0.250</td><td>$0.325</td><td>$0.125</td><td>$0.163</td></tr><tr><td>LLM_Model_41.1 B-80 B</td><td>41.1 B-80 B</td><td>$0.70</td><td>$0.350</td><td>$0.455</td><td>$0.175</td><td>$0.228</td></tr><tr><td>LLM_Model_80.1 B-404 B</td><td>80.1 B-404 B</td><td>$0.80</td><td>$0.400</td><td>$0.520</td><td>$0.200</td><td>$0.260</td></tr><tr><td>LLM_Model_405 B</td><td>405 B</td><td>$1.75</td><td>$0.875</td><td>$1.138</td><td>$0.438</td><td>$0.569</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.parasail.io/parasail-docs/billing/pricing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
