Quick start
This page gives an introduction to Parasail's Batch Processing Library. For more detailed information, jump to:
Parasail's batch processing engine is a very easy and inexpensive way to process thousands or millions of LLM inferences. Batch inferencing is easy: create an input file, start a batch job, wait for it to finish, then download the output.
OpenAI compatibility, Parasail batch helper library
Getting Started with our Parasail Batch Helper Library
The first step is to create a Parasail API key: https://www.saas.parasail.io/keys. This key should be stored in your environment through something like a .bashrc file or a) .env file, or passed through the command line invocation. It is not recommended to paste directly in code as this is a common way of keys getting leaked.
Next, install openai-batch, our batch helper library that is compatible both with Parasail and OpenAI:
pip install openai-batch
Now you're ready to run a batch job in as little as five lines of code! Our batch endpoint supports most transformers on HuggingFace, all you have to do is put the HuggingFace ID in the request. There is no need for this model to be a dedicated or serverless endpoint. In this example, we will use NousResearch/DeepHermes-3-Mistral-24B-Preview (https://huggingface.co/NousResearch/DeepHermes-3-Mistral-24B-Preview).
#test_batch.py
import random
from openai_batch import Batch
# Create a batch with random prompts
with Batch() as batch:
objects = ["cat", "robot", "coffee mug", "spaceship", "banana"]
for i in range(100):
batch.add_to_batch(
model="NousResearch/DeepHermes-3-Mistral-24B-Preview",
messages=[{"role": "user", "content": f"Tell me a joke about a {random.choice(objects)}"}]
)
# Submit, wait for completion, and download results
result, output_path, error_path = batch.submit_wait_download()
print(f"Batch completed with status {result.status} and stored in {output_path}")
This code will look for PARASAIL_API_KEY in the environment, and an easy way to pass it is through the command line. Running this code will produce the following output:
PARASAIL_API_KEY=<INSERT API KEY> python3 test_batch.py
validating
in_progress
...
in_progress
in_progress
completed
Batch completed with status completed and stored in batch-itvt3wmjs7-output.jsonl
The first two lines of batch-itvt3wmjs7-output.jsonl
are the prompt responses:
{"id":"vllm-51653b28a97e4e67a0d3587f959ffe3a","custom_id":"line-1","response":{"status_code":200,"request_id":"vllm-batch-1c7186c472cd49b1aa12a81760094540","body":{"id":"chatcmpl-01be171caf3b4dc69cfa9ef360d26a4e","object":"chat.completion","created":1743050291,"model":"NousResearch/DeepHermes-3-Mistral-24B-Preview","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Why did the coffee mug get arrested? It had too much to behave!","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":41,"total_tokens":60,"completion_tokens":19,"prompt_tokens_details":null},"prompt_logprobs":null}},"error":null}
{"id":"vllm-58a8ac3f589d4380ad94e00b191da24e","custom_id":"line-3","response":{"status_code":200,"request_id":"vllm-batch-2dd59bd942fa4a25a8792f9b8ee28da1","body":{"id":"chatcmpl-a59bfccd7d134981bf5e1004e77c6854","object":"chat.completion","created":1743050291,"model":"NousResearch/DeepHermes-3-Mistral-24B-Preview","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Why can't you trust an astronaut? Because they're always out of this world!","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":41,"total_tokens":59,"completion_tokens":18,"prompt_tokens_details":null},"prompt_logprobs":null}},"error":null}
Batch Submission Limitations
Both Parasail and OpenAI limit the maximum size of the batch input files:
Up to 50,000 requests (lines)
Up to 500MB total input file size for Parasail
Up to 250MB total input file size for OpenAI
Max completion tokens for Parasail defaults to 8,192, but can be overridden to 16,384 through the
max_completion_tokens
parameter.
Workloads exceeding these limits must be split into multiple batches. The add_to_batch
function in the Parasail Batch Helper Library will raise a ValueError
exception when the file size or request count is exceeded for the provider.
Resuming Batch Jobs
A major convenience of batch processing is the ability to submit a job and resume the monitoring in a different process or flow. A developer can upload hundreds of batch jobs and millions of prompts without worrying about program crashes, errors, or resets - those prompts are processed on Parasail's servers until successful completion.
Changing the last two lines of the previous example will submit the job and print out the batch ID, then exit.
# Submit, wait for completion, and download results
batch_id = batch.submit()
print(f"Batch ID: {batch_id}")
With a batch ID of batch-tclfzwczcd
we can now wait for the batch to finish and download it in a separate script:
from openai_batch import Batch
import time
with Batch(batch_id="batch-tclfzwczcd",
output_file="mybatch.jsonl") as batch:
# Check status periodically
while True:
status = batch.status()
print(f"Batch status: {status.status}")
if status.status in ["completed", "failed", "expired", "cancelled"]:
break
time.sleep(60) # Check every minute
# Download results once completed
output_path, error_path = batch.download()
print(f"Output saved to: {output_path}")
Which outputs:
Batch status: in_progress
...
Batch status: in_progress
Output saved to: mybatch.jsonl
Broad Model and Parameter Support
OpenAI Models
with Batch() as batch:
for i in range(100):
batch.add_to_batch(
model="gpt-4o",
messages=[{"role": "user", "content": f"Give me 10 dad jokes"}]
)
Embedding Models
GritLM — developed by Contextual and hosted on HuggingFace by Parasail — and GTE-Qwen2-7B-Instruct from Alibaba are two excellent open source embeddding models that rival proprietary models. Embeddings like these can be easily run by changing messages
to input
.
Alibaba-NLP/gte-Qwen2-7B-instruct
As seen below, we strongly recommend using base64 encoding encoding_format="base64"
to reduce the size of the output files for both Parasail and OpenAI.
with Batch() as batch:
for i in range(100):
batch.add_to_batch(
model="parasail-ai/GritLM-7B-vllm",
encoding_format= "base64"
input=f"This is input #{i}"
)
Parameters
add_to_batch
supports all of the parameters that client.chat.completions.create
or client.embedding.create
supports, though note that open source models on parasail may not always support every parameter.
with Batch() as batch:
for i in range(100):
batch.add_to_batch(
model="NousResearch/DeepHermes-3-Mistral-24B-Preview",
max_completion_tokens=1000,
temperature=0.7,
top_p=0.1,
messages=[{"role": "user", "content": f"Give me 10 dad jokes"}]
)
Metadata
Metadata can be added to the batch. This is useful for passing information between separate submit and download processes, as well as tracking the results on our status UI page. Any metadata added to the submission will be visible in the Batch UI progress section. For detailed information about the metadata field, see the OpenAI Batch API Submit Specification.
with Batch() as batch:
for i in range(100):
batch.add_to_batch(
model="NousResearch/DeepHermes-3-Mistral-24B-Preview",
messages=[{"role": "user", "content": f"Give me 10 dad jokes"}]
)
batch.submit( metadata= {"Job Name": "Dad jokes"})
Parasail Batch UI
The Parasail Batch UI can be used upload new batches, track the status of batches, and download results when finished.
While a batch is queued or running, you can view its status, download the input, or cancel it:

When a batch job is done, you can download the input and the output file as well as view the total token usage:

Batch Submission
Batches can be be submitted directly from the Batch section of our platform by clicking the Create Batch button. This brings up a dialog to upload a JSONL file.

Below is an example JSONL file, which follows the OpenAI format for batch submissions. Unlike the Parsail Batch Helper Library, this does not support OpenAI models and only supports open source HuggingFace transformers and embeddings.
Further Reading
Last updated