Batch file format
Parasail supports the OpenAI batch API, including the format of the input and output files.
A batch input file is a .jsonl
file where each line is a batch request. Parasail processes these requests and returns the results as an output .jsonl
file.
Batch input file
The batch input wraps the same data structures used by interactive requests into an offline format. Each line in the input file is one request.
The request is a JSON dictionary with the following keys:
custom_id
: A unique value that the user creates so they can later match outputs to inputs.method
: HTTP method, currentlyPOST
.url
: one of/v1/chat/completions
,/v1/embeddings
body
: The same as the body of an interactive request.Chat Completion. Note:
stream
must be omitted orfalse
.Embeddings. Parasail strongly recommends using
"encoding_format": "base64"
. See below.
Example batch input files
{"custom_id": "b5b938a55cc349d13f08a2586f96807d", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "max_completion_tokens": 10, "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say pong."}]}}
{"custom_id": "ad95b85d915346e29b7afa52314e94b8", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "max_completion_tokens": 1024, "messages": [{"role": "user", "content": "What is the capital of New York?"}]}}
{"custom_id": "c587d5391f4524068bb1d6a4a00b5177", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "max_completion_tokens": 1024, "messages": [{"role": "user", "content": "Write two very funny dad jokes."}], "temperature": 0.5}}
Downloadable example file:
See OpenAI's request input object documentation.
Best Practices
For embeddings, Parasail strongly recommends using "encoding_format": "base64"
(link). This setting asks the server to return the result as a Base64-encoded binary array instead of a plain-text float array. This typically reduces file size by nearly 4x with no loss of precision, and the interactive OpenAI client uses this by default.
Limits
Both Parasail and OpenAI limit the maximum size of the batch input files:
Up to 50,000 requests (lines)
Up to 100 MB total input file size
Workloads that exceed these limits must be split into multiple batches.
Batch output file
The batch output file wraps the interactive responses into an offline format. Each line in the output file is the response to one request.
Important: the order of responses may differ from the order of requests in the input file
The response is a JSON dictionary. The most important keys:
custom_id
: The same as the value in the request. Use this to match responses to requests.response
: The HTTP response, as a dictionary:status_code
: HTTP status code.200
if the request succeeded, else the same HTTP error code as if this request was interactive.body
: The same as the body of an interactive response.
See OpenAI's request output object documentation.
Create your own
Creating a batch input file is straightforward. The exact approach depends on your workflow. The following code snippets get you started.
import json
prompts = ["What is the capital of New York?", "Write two very funny dad jokes.", "Say pong."]
with open("example_batch_input.jsonl", "w") as file:
for i, prompt in enumerate(prompts):
file.write(
json.dumps(
{
"custom_id": f"request-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
# same request body as for an interactive /v1/chat/completions request
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"max_completion_tokens": 1024,
"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}],
},
}
)
+ "\n"
)
Last updated