Pricing
Last updated
Last updated
Pricing for the “serverless” is token-based (amount of tokens split between input and output), and the amount you owe will change depending on which Model(s) you choose to use. You will find the pricing listed directly on the "Serverless" page for the Input/Output pricing. The pricing is done per million tokens so if the model costs $1 for input and output pricing and you spend 250,000 tokens on input and 250,000 tokens on output, you will be charged $.50 ($.25 for the input, and .$25 for the output).
Our dedicated instances are priced by GPU per hour. We offer various configurations of our hardware fleet to hit your indicated cost, performance, and latency targets. You have the ability to have your dedicated instances auto scale the number of GPUs as your workload fluctuates, but we offer scale-down policy configuration to meet your needs. A scale down policy is when you want the server to automatically turn off. During run time you will be given the possible option and amount of replicas you want for the model you chose with the pricing displayed on the option:
Pricing for the “batch” Use Case is token-based (total amount of tokens, discounted to reflect the fact that your queries will not be processed in real time), and the amount you owe will change depending on which Model(s) you choose to use.
The default pricing is based on parameter size unless the model is a named model.
Batch is billed on a 50% discount of the Serverless Price. Cached tokens are also an additional 30% off. If the model is an FP16 quant model it is 30% more, FP8 models incur no additional costs.
The current Price:
0-4B
0-4B
$0.05
$0.025
$0.033
$0.013
$0.016
4.1-8B
4.1-8B
$0.08
$0.040
$0.052
$0.020
$0.026
LLM_Model_8.1-16B
8.1-16B
$0.11
$0.055
$0.072
$0.028
$0.036
LLM_Model_16.1B-21B
16.1B-21B
$0.45
$0.225
$0.293
$0.113
$0.146
LLM_Model_21.1B-41B
21.1B-41B
$0.50
$0.250
$0.325
$0.125
$0.163
LLM_Model_41.1B-80B
41.1B-80B
$0.70
$0.350
$0.455
$0.175
$0.228
LLM_Model_80.1B-404B
80.1B-404B
$0.80
$0.400
$0.520
$0.200
$0.260
LLM_Model_405B
405B
$1.75
$0.875
$1.138
$0.438
$0.569