Use this endpoint to call the model associated with the specified environment asynchronously.
curl --request POST \
--url https://model-{model_id}.api.baseten.co/environments/{env_name}/async_predict \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '{
"model_input": "<any>",
"webhook_endpoint": "<string>",
"priority": 123,
"max_time_in_queue_seconds": 123,
"inference_retry_config": "<any>"
}'
{
"request_id": "<string>"
}
Api-Key
(e.g. {"Authorization": "Api-Key abcd1234.abcd1234"}
)./async_predict
request payloads.
webhook_endpoint
is empty, your model must save prediction outputs so they can be accessed later. priority
is between 0 and 2, inclusive.max_time_in_queue_seconds
must be between 10 seconds and 72 hours, inclusive.Show child attributes
max_attempts
must be between 1 and 10, inclusive.initial_delay_ms
must be between 0 and 10,000 milliseconds, inclusive.max_delay_ms
must be between 0 and 60,000 milliseconds, inclusive.{
"request_id": "<string>"
}
/async_predict
endpoint are limited to 200 requests per second.
QUEUED
or IN_PROGRESS
async requests, summed across all deployments.
/async_predict
requests will receive a 429 status code.
To avoid hitting these rate limits, we advise:
/async_predict
with exponential backoff in response to 429 errors.curl --request POST \
--url https://model-{model_id}.api.baseten.co/environments/{env_name}/async_predict \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '{
"model_input": "<any>",
"webhook_endpoint": "<string>",
"priority": 123,
"max_time_in_queue_seconds": 123,
"inference_retry_config": "<any>"
}'
{
"request_id": "<string>"
}