Parasail
  • Welcome
  • Serverless
    • Serverless
    • Available Parameters
  • Dedicated
    • Dedicated Endpoints
    • Speeding up Dedicated Models with Speculative Decoding
    • Deploying private models through HuggingFace Repos
    • Dedicated Endpoint Management API
    • Rate Limits and Limitations
    • FP8 Quantization
  • Batch
    • Quick start
    • Batch Processing with Private Models
    • Batch file format
    • API Reference
  • Cookbooks
    • Run and Evaluate Any Model
    • Chat Completions
    • RAG
    • Multi-Modal
    • Text-to-Speech with Orpheus TTS models
    • Multimodal GUI Task Agent
    • Tool/Function Calling
    • Structured Output
  • Billing
    • Pricing
    • Billing And Payments
    • Promotions
    • Batch SLA
  • Security and Account Management
    • Data Privacy and Retention
    • Account Management
    • Compliance
  • Resources
    • Silly Tavern Guide
    • Community Engagement
Powered by GitBook
On this page
  1. Dedicated

Rate Limits and Limitations

Our current Rate Limits and Serverless usage limitations

PreviousDedicated Endpoint Management APINextFP8 Quantization

Last updated 2 months ago

GPU Quota:

As a new customer of our platform, we have a quota of 2 GPU's per user organization. This is set for multiple reasons the biggest reason is that, initial Users sometimes don't realize that their model could be on 1 GPU but choose a larger option before testing and accumulate a large bill.

If you see the message "Insufficient quota" in your deployment page, that indicates you have reached your user quota.

In order to increase your user quota please fill out this form:

Quota Increase