Usage
Installation
pip install amazon-polly-streaming
Requirements
Python 3.13+
AWS credentials in the default chain (env vars, profile, or IAM role) with
polly:StartSpeechSynthesisStreampermissionA region supporting Polly bidirectional streaming (
us-east-1,us-west-2,eu-central-1,eu-west-2,ap-southeast-1,ca-central-1as of 2026-05)
The polly:StartSpeechSynthesisStream IAM action is distinct from the
classic polly:SynthesizeSpeech: a role granting only the classic action
cannot call the bidirectional endpoint and will fail with
ValidationException.
Basic usage
Instantiate the client once for a region and call
start_speech_synthesis_stream per utterance. The method is an async
generator yielding audio bytes as Polly emits them, with no need to wait for
the full audio to be generated server-side.
import asyncio
from amazon_polly_streaming import PollyStreamingClient
async def main() -> None:
client = PollyStreamingClient(region="eu-central-1")
audio = b""
async for chunk in client.start_speech_synthesis_stream(
text="hello world, how are you today",
voice_id="Matthew",
):
audio += chunk
with open("hello.mp3", "wb") as fh:
fh.write(audio)
asyncio.run(main())
Configuration
start_speech_synthesis_stream accepts the following keyword arguments:
text(required): the text to synthesizevoice_id(required): a Polly voice id that supports generative bidirectional streaming (e.g."Matthew","Joanna","Bianca"); see the Amazon Polly generative voices page for the current listengine: defaults to"generative"(the only value supported by the bidirectional streaming API at the time of writing)language_code: BCP-47 code, defaults to"en-US"; required only for bilingual voicesoutput_format:"mp3"(default),"pcm", or"ogg_vorbis"sample_rate: a string with the sample rate in Hz, defaults to"24000"use_pool: defaults toTrue; see Connection pool
Region is bound at client construction time, not per call. To switch region, instantiate a new client.
eu = PollyStreamingClient(region="eu-central-1")
us = PollyStreamingClient(region="us-west-2")
Error handling
Service-side errors are surfaced as typed exceptions in
amazon_polly_streaming.exceptions, mirroring the Polly API exception types
documented for StartSpeechSynthesisStream:
ServiceException: base class for all service errorsServiceFailureException: an unexpected Polly service failureValidationException: invalid input (e.g. unsupported voice id for bidirectional streaming)ServiceQuotaExceededException: account quota hitThrottlingException: request rate exceeds the service throttle
All concrete exceptions inherit from ServiceException, so a single
except covers them all:
from amazon_polly_streaming import PollyStreamingClient, ServiceException
client = PollyStreamingClient(region="eu-central-1")
try:
async for chunk in client.start_speech_synthesis_stream(
text="hello", voice_id="Matthew"
):
...
except ServiceException as exc:
# `type(exc).__name__` carries the specific Polly exception type
...
Catch a specific type for targeted handling, e.g. backoff on throttling:
import asyncio
from amazon_polly_streaming import PollyStreamingClient, ThrottlingException
client = PollyStreamingClient(region="eu-central-1")
for attempt in range(3):
try:
async for chunk in client.start_speech_synthesis_stream(
text="hello", voice_id="Matthew"
):
...
break
except ThrottlingException:
await asyncio.sleep(2**attempt)
Transport failures (HTTP non-2xx, TLS or HTTP/2 negotiation, missing
credentials) are raised as RuntimeError and are distinct from
ServiceException.
Connection pool
Each PollyStreamingClient instance owns an HTTP/2 connection pool that
reuses connections across calls, amortizing the TLS handshake and HTTP/2
SETTINGS exchange. In the common pattern of caching one client per process
(boto3 convention), there is one pool per process too. Each lease holds the
connection for the duration of one Polly stream; concurrent calls (e.g.
broadcast fan-out to multiple target languages) get distinct connections
from the pool up to the configured size.
The pool size is set via the pool_size constructor parameter, default
8:
# default: up to 8 concurrent Polly streams without queueing
client = PollyStreamingClient(region="eu-central-1")
# custom: up to 16 concurrent streams (e.g. fan-out to 16 target languages)
client = PollyStreamingClient(region="eu-central-1", pool_size=16)
Beyond the configured size, additional concurrent calls wait for an active lease to be released. The queueing is correct (no errors) but adds latency equal to the remaining duration of an in-flight stream.
Disable the pool with use_pool=False for one-shot scripts where the
pool’s bookkeeping is overhead without benefit:
async for chunk in client.start_speech_synthesis_stream(
text="hello", voice_id="Matthew", use_pool=False
):
...
For latency-sensitive workloads doing many short utterances (e.g.
streaming captions), keep use_pool=True: the first call opens a
connection, subsequent calls lease idle connections from the pool and
return them on completion.
AWS credentials
Credentials are resolved via the AWS SDK default credential chain:
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY(with optionalAWS_SESSION_TOKEN) environment variablesAWS profile via
AWS_PROFILEenv var or~/.aws/credentialsIAM instance role (EC2) or container role (ECS, Fargate, EKS)
SSO via
~/.aws/sso
The resolved identity must have an IAM policy granting
polly:StartSpeechSynthesisStream on * (Polly does not support
resource-level permissions for this action). The classic
polly:SynthesizeSpeech action does not authorize the bidirectional
streaming endpoint.
Caching the client
Each PollyStreamingClient instance is light: construction does not open
any connection or perform any I/O. Still, instantiating it once and reusing
it mirrors the boto3 client pattern and keeps region configuration in one
place. A functools.cache decorator on a factory function is a common
idiom:
import os
from functools import cache
from amazon_polly_streaming import PollyStreamingClient
@cache
def get_polly_client() -> PollyStreamingClient:
return PollyStreamingClient(region=os.environ["AWS_REGION"])
async def synthesize(text: str, voice_id: str) -> bytes:
client = get_polly_client()
audio = b""
async for chunk in client.start_speech_synthesis_stream(
text=text, voice_id=voice_id
):
audio += chunk
return audio
The HTTP/2 connection pool inside the library is already shared process-wide, so the cache is purely for client-instance reuse, not for connection reuse.