How to call a model that has a streaming-capable endpoint.
truss predict -d '{"prompt": "What is the Mistral wind?", "stream": true}'
--no-buffer
flag to stream output as it is received.
As with all cURL invocations, you’ll need a model ID and API key.
curl -X POST https://app.baseten.co/models/MODEL_ID/predict \
-H 'Authorization: Api-Key YOUR_API_KEY' \
-d '{"prompt": "What is the Mistral wind?", "stream": true}' \
--no-buffer
import requests
import json
import os
# Model ID for production deployment
model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]
# Open session to enable streaming
s = requests.Session()
with s.post(
# Endpoint for production deployment, see API reference for more
f"https://model-{model_id}.api.baseten.co/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
# Include "stream": True in the data dict so the model knows to stream
data=json.dumps({
"prompt": "What even is AGI?",
"stream": True,
"max_new_tokens": 4096
}),
# Include stream=True as an argument so the requests libray knows to stream
stream=True,
) as resp:
# Print the generated tokens as they get streamed
for content in resp.iter_content():
print(content.decode("utf-8"), end="", flush=True)
Was this page helpful?