Parasail
  • Welcome
  • Serverless
    • Serverless
    • Available Parameters
  • Dedicated
    • Dedicated Endpoints
    • Speeding up Dedicated Models with Speculative Decoding
    • Deploying private models through HuggingFace Repos
    • Dedicated Endpoint Management API
    • Rate Limits and Limitations
  • Batch
    • Quick start
    • Batch Processing with Private Models
    • Batch file format
    • API Reference
  • Cookbooks
    • Run and Evaluate Any Model
    • Chat Completions
    • RAG
    • Multi-Modal
    • Text-to-Speech with Orpheus TTS models
  • Billing
    • Pricing
    • Billing And Payments
    • Promotions
    • Batch SLA
  • Security and Account Management
    • Data Privacy and Retention
    • Account Management
    • Compliance
  • Resources
    • Silly Tavern Guide
    • Community Engagement
Powered by GitBook
On this page
  • Serverless Pricing:
  • Dedicated Pricing:
  • Batch Pricing:
  1. Billing

Pricing

PreviousText-to-Speech with Orpheus TTS modelsNextBilling And Payments

Last updated 1 month ago

Serverless Pricing:

Pricing for the “serverless” is token-based (amount of tokens split between input and output), and the amount you owe will change depending on which Model(s) you choose to use. You will find the pricing listed directly on the "Serverless" page for the Input/Output pricing. The pricing is done per million tokens so if the model costs $1 for input and output pricing and you spend 250,000 tokens on input and 250,000 tokens on output, you will be charged $.50 ($.25 for the input, and .$25 for the output).

Dedicated Pricing:

Our dedicated instances are priced by GPU per hour. We offer various configurations of our hardware fleet to hit your indicated cost, performance, and latency targets. You have the ability to have your dedicated instances auto scale the number of GPUs as your workload fluctuates, but we offer scale-down policy configuration to meet your needs. A scale down policy is when you want the server to automatically turn off. During run time you will be given the possible option and amount of replicas you want for the model you chose with the pricing displayed on the option:

Batch Pricing:

Pricing for the “batch” Use Case is token-based (total amount of tokens, discounted to reflect the fact that your queries will not be processed in real time), and the amount you owe will change depending on which Model(s) you choose to use.

The default pricing is based on parameter size unless the model is a named model.

Batch is billed on a 50% discount of the Serverless Price. Cached tokens are also an additional 30% off. If the model is an FP16 quant model it is 30% more, FP8 models incur no additional costs.

The current Price:

Parameter Count
Size
Serverless Price
Batch Price FP8
Batch Price FP16
Cache Price FP8
Cache Price FP16

0-4B

0-4B

$0.05

$0.025

$0.033

$0.013

$0.016

4.1-8B

4.1-8B

$0.08

$0.040

$0.052

$0.020

$0.026

LLM_Model_8.1-16B

8.1-16B

$0.11

$0.055

$0.072

$0.028

$0.036

LLM_Model_16.1B-21B

16.1B-21B

$0.45

$0.225

$0.293

$0.113

$0.146

LLM_Model_21.1B-41B

21.1B-41B

$0.50

$0.250

$0.325

$0.125

$0.163

LLM_Model_41.1B-80B

41.1B-80B

$0.70

$0.350

$0.455

$0.175

$0.228

LLM_Model_80.1B-404B

80.1B-404B

$0.80

$0.400

$0.520

$0.200

$0.260

LLM_Model_405B

405B

$1.75

$0.875

$1.138

$0.438

$0.569