Batch file format
Parasail supports the OpenAI batch API, including the format of the input and output files.
A batch input file is a .jsonl
file where each line is a batch request. Parasail processes these requests and returns the results as an output .jsonl
file.
Batch input file
The batch input wraps the same data structures used by interactive requests into an offline format. Each line in the input file is one request.
The request is a JSON dictionary with the following keys:
custom_id
: A unique value that the user creates so they can later match outputs to inputs.method
: HTTP method, currentlyPOST
.url
: one of/v1/chat/completions
,/v1/embeddings
body
: The same as the body of an interactive request.Chat Completion. Note:
stream
must be omitted orfalse
.Embeddings. We strongly recommend using
"encoding_format": "base64"
. See below.
Example batch input files
{"custom_id": "b5b938a55cc349d13f08a2586f96807d", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "max_completion_tokens": 10, "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say pong."}]}}
{"custom_id": "ad95b85d915346e29b7afa52314e94b8", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "max_completion_tokens": 1024, "messages": [{"role": "user", "content": "What is the capital of New York?"}]}}
{"custom_id": "c587d5391f4524068bb1d6a4a00b5177", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "max_completion_tokens": 1024, "messages": [{"role": "user", "content": "Write two very funny dad jokes."}], "temperature": 0.5}}
Downloadable example file:
See OpenAI's request input object documentation.
Best Practices
For embeddings, we strongly recommend using "encoding_format": "base64"
(link). This setting asks the server to return the result as a Base64-encoded binary array instead of a plain-text float array. This typically reduces file size by nearly 4x with no loss of precision, and is used by default by the interactive OpenAI client.
Limits
Both Parasail and OpenAI limit the maximum size of the batch input files:
Up to 50,000 requests (lines)
Up to 100MB total input file size
Workloads exceeding these limits must be split into multiple batches.
Batch output file
The batch output file wraps the interactive responses into an offline format. Each line in the output file is the response to one request.
Important: The order of responses may differ from the order of requests in the input file
The response is a JSON dictionary. The most important keys:
custom_id
: The same as the value in the request. Use this to match responses to requests.response
: The HTTP response, as a dictionary:status_code
: HTTP status code.200
if the request succeeded, else the same HTTP error code as if this request was interactive.body
: The same as the body of an interactive response.
See OpenAI's request output object documentation.
Create your own
It is easy to create a batch input file. Exactly how depends on how your workflow. The following code snippets get you started.
import json
prompts = ["What is the capital of New York?", "Write two very funny dad jokes.", "Say pong."]
with open("example_batch_input.jsonl", "w") as file:
for i, prompt in enumerate(prompts):
file.write(
json.dumps(
{
"custom_id": f"request-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
# same request body as for an interactive /v1/chat/completions request
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"max_completion_tokens": 1024,
"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}],
},
}
)
+ "\n"
)
Last updated