Run and Evaluate Any Model

At Parasail, we aim to make it as easy as possible to find the right model for your job. Our Serverless tier is the quickest way to try out a wide range of popular models covering chat, instruct, and multimodal with model sizes ranging from 7B to the incredibly capable DeepSeek V3 at 671B parameters. There are many times though when when Serverless is not sufficient, and we built the Dedicated and Batch tiers to help in those cases:

Trying out new models: If a model doesn't exist in serverless — maybe its an uncommon model on HuggingFace, or maybe you trained and fine tuned it youreslf — you can easily spin it up in a Dedicated Endpoint. We support most transformers on HuggingFace, both public and private, and we have the lowest cost on-demand GPUs on the market from 4090s up to H200s.

Large-scale Evaluations: Effective and automatic LLM evaluation is critical to building a quality product. Our batch processing service is ideal for evals that require a large amount of prompts, images, or text. The price is 50% off our serverless endpoints and prompt-caching provides another 50% discount. We also support most transformers on HuggingFace with a single-line change in the code, making the evaluation of many models easy.

Embeddings: We support a range of embedding models that rival proprietary models such as OpenAI and Voyage, including:

These embeddings - and any others on HuggingFace - can be easily run using our batch processing service.

Checking Model Compatibility

The Dedicated UI can be used to verify whether a model is supported by our inference engines. Simply paste the URL into the model entry page and a message will appear indicating that it is supported.

Model is supported

Model cannot be found indicates the model is not supported

Private or Custom Models and LORAs

Fine-tuned models and LORA adapters can be easily hosted on Parasail's dedicated by hosting them on in a public or private repo on HuggingFace. For public models, simply paste the URL into the dedicated page. For guidelines on hosting private models, including generating the access token, please see this section:

Deploying private models through HuggingFace Repos

Batch Processing of Private Models

Private models can also be processed in batch mode at a 50% discount to the equivalent serverless pricing of the model. For information on how to set this up, please see this section:

Batch Processing with Private Models

PreviousAPI Reference NextChat Completions

Last updated 3 months ago