Parasail
  • Welcome
  • Serverless
    • Serverless
    • Available Parameters
  • Dedicated
    • Dedicated Endpoints
    • Speeding up Dedicated Models with Speculative Decoding
    • Deploying private models through HuggingFace Repos
    • Dedicated Endpoint Management API
    • Rate Limits and Limitations
  • Batch
    • Quick start
    • Batch Processing with Private Models
    • Batch file format
    • API Reference
  • Cookbooks
    • Run and Evaluate Any Model
    • Chat Completions
    • RAG
    • Multi-Modal
    • Text-to-Speech with Orpheus TTS models
  • Billing
    • Pricing
    • Billing And Payments
    • Promotions
    • Batch SLA
  • Security and Account Management
    • Data Privacy and Retention
    • Account Management
    • Compliance
  • Resources
    • Silly Tavern Guide
    • Community Engagement
Powered by GitBook
On this page
  • Checking Model Compatibility
  • Private or Custom Models and LORAs
  • Batch Processing of Private Models
  1. Cookbooks

Run and Evaluate Any Model

PreviousAPI ReferenceNextChat Completions

Last updated 1 month ago

At Parasail, we aim to make it as easy as possible to find the right model for your job. Our tier is the quickest way to try out a wide range of popular models covering chat, instruct, and multimodal with model sizes ranging from 7B to the incredibly capable DeepSeek V3 at 671B parameters. There are many times though when when Serverless is not sufficient, and we built the Dedicated and Batch tiers to help in those cases:

Trying out new models: If a model doesn't exist in serverless — maybe its an uncommon model on HuggingFace, or maybe you trained and fine tuned it youreslf — you can easily spin it up in a . We support most transformers on HuggingFace, both public , and we have the lowest cost on-demand GPUs on the market from 4090s up to H200s.

Large-scale Evaluations: Effective and automatic LLM evaluation is critical to building a quality product. Our is ideal for evals that require a large amount of prompts, images, or text. The price is 50% off our serverless endpoints and prompt-caching provides another 50% discount. We also support most transformers on HuggingFace with a single-line change in the code, making the evaluation of many models easy.

Embeddings: We support a range of embedding models that rival proprietary models such as OpenAI and Voyage, including:

These embeddings - and any others on HuggingFace - can be easily run using our .

Checking Model Compatibility

The Dedicated UI can be used to verify whether a model is supported by our inference engines. Simply paste the URL into the model entry page and a message will appear indicating that it is supported.

Private or Custom Models and LORAs

Fine-tuned models and LORA adapters can be easily hosted on Parasail's dedicated by hosting them on in a public or private repo on HuggingFace. For public models, simply paste the URL into the dedicated page. For guidelines on hosting private models, including generating the access token, please see this section:

Batch Processing of Private Models

Private models can also be processed in batch mode at a 50% discount to the equivalent serverless pricing of the model. For information on how to set this up, please see this section:

Batch Processing with Private Models

Model is supported

Model cannot be found indicates the model is not supported

Serverless
Dedicated Endpoint
and private
parasail-ai/GritLM-7B-vllm
Alibaba-NLP/gte-Qwen2-7B-instruct
Deploying private models through HuggingFace Repos

batch processing service
batch processing service