Usage

Installation

pip install amazon-polly-streaming

Requirements

Python 3.13+
AWS credentials in the default chain (env vars, profile, or IAM role) with polly:StartSpeechSynthesisStream permission
A region supporting Polly bidirectional streaming (us-east-1, us-west-2, eu-central-1, eu-west-2, ap-southeast-1, ca-central-1 as of 2026-05)

The polly:StartSpeechSynthesisStream IAM action is distinct from the classic polly:SynthesizeSpeech: a role granting only the classic action cannot call the bidirectional endpoint and will fail with ValidationException.

Basic usage

Instantiate the client once for a region and call start_speech_synthesis_stream per utterance. The method is an async generator yielding audio bytes as Polly emits them, with no need to wait for the full audio to be generated server-side.

import asyncio
from amazon_polly_streaming import PollyStreamingClient

async def main() -> None:
    client = PollyStreamingClient(region="eu-central-1")
    audio = b""
    async for chunk in client.start_speech_synthesis_stream(
        text="hello world, how are you today",
        voice_id="Matthew",
    ):
        audio += chunk
    with open("hello.mp3", "wb") as fh:
        fh.write(audio)

asyncio.run(main())

Configuration

start_speech_synthesis_stream accepts the following keyword arguments:

text (required): the text to synthesize
voice_id (required): a Polly voice id that supports generative bidirectional streaming (e.g. "Matthew", "Joanna", "Bianca"); see the Amazon Polly generative voices page for the current list
engine: defaults to "generative" (the only value supported by the bidirectional streaming API at the time of writing)
language_code: BCP-47 code, defaults to "en-US"; required only for bilingual voices
output_format: "mp3" (default), "pcm", or "ogg_vorbis"
sample_rate: a string with the sample rate in Hz, defaults to "24000"
use_pool: defaults to True; see Connection pool

Region is bound at client construction time, not per call. To switch region, instantiate a new client.

eu = PollyStreamingClient(region="eu-central-1")
us = PollyStreamingClient(region="us-west-2")

Error handling

Service-side errors are surfaced as typed exceptions in amazon_polly_streaming.exceptions, mirroring the Polly API exception types documented for StartSpeechSynthesisStream:

ServiceException: base class for all service errors
ServiceFailureException: an unexpected Polly service failure
ValidationException: invalid input (e.g. unsupported voice id for bidirectional streaming)
ServiceQuotaExceededException: account quota hit
ThrottlingException: request rate exceeds the service throttle

All concrete exceptions inherit from ServiceException, so a single except covers them all:

from amazon_polly_streaming import PollyStreamingClient, ServiceException

client = PollyStreamingClient(region="eu-central-1")
try:
    async for chunk in client.start_speech_synthesis_stream(
        text="hello", voice_id="Matthew"
    ):
        ...
except ServiceException as exc:
    # `type(exc).__name__` carries the specific Polly exception type
    ...

Catch a specific type for targeted handling, e.g. backoff on throttling:

import asyncio
from amazon_polly_streaming import PollyStreamingClient, ThrottlingException

client = PollyStreamingClient(region="eu-central-1")
for attempt in range(3):
    try:
        async for chunk in client.start_speech_synthesis_stream(
            text="hello", voice_id="Matthew"
        ):
            ...
        break
    except ThrottlingException:
        await asyncio.sleep(2**attempt)

Transport failures (HTTP non-2xx, TLS or HTTP/2 negotiation, missing credentials) are raised as RuntimeError and are distinct from ServiceException.

Connection pool

Each PollyStreamingClient instance owns an HTTP/2 connection pool that reuses connections across calls, amortizing the TLS handshake and HTTP/2 SETTINGS exchange. In the common pattern of caching one client per process (boto3 convention), there is one pool per process too. Each lease holds the connection for the duration of one Polly stream; concurrent calls (e.g. broadcast fan-out to multiple target languages) get distinct connections from the pool up to the configured size.

The pool size is set via the pool_size constructor parameter, default 8:

# default: up to 8 concurrent Polly streams without queueing
client = PollyStreamingClient(region="eu-central-1")

# custom: up to 16 concurrent streams (e.g. fan-out to 16 target languages)
client = PollyStreamingClient(region="eu-central-1", pool_size=16)

Beyond the configured size, additional concurrent calls wait for an active lease to be released. The queueing is correct (no errors) but adds latency equal to the remaining duration of an in-flight stream.

Disable the pool with use_pool=False for one-shot scripts where the pool’s bookkeeping is overhead without benefit:

async for chunk in client.start_speech_synthesis_stream(
    text="hello", voice_id="Matthew", use_pool=False
):
    ...

For latency-sensitive workloads doing many short utterances (e.g. streaming captions), keep use_pool=True: the first call opens a connection, subsequent calls lease idle connections from the pool and return them on completion.

AWS credentials

Credentials are resolved via the AWS SDK default credential chain:

AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (with optional AWS_SESSION_TOKEN) environment variables
AWS profile via AWS_PROFILE env var or ~/.aws/credentials
IAM instance role (EC2) or container role (ECS, Fargate, EKS)
SSO via ~/.aws/sso

The resolved identity must have an IAM policy granting polly:StartSpeechSynthesisStream on * (Polly does not support resource-level permissions for this action). The classic polly:SynthesizeSpeech action does not authorize the bidirectional streaming endpoint.

Caching the client

Each PollyStreamingClient instance is light: construction does not open any connection or perform any I/O. Still, instantiating it once and reusing it mirrors the boto3 client pattern and keeps region configuration in one place. A functools.cache decorator on a factory function is a common idiom:

import os
from functools import cache
from amazon_polly_streaming import PollyStreamingClient

@cache
def get_polly_client() -> PollyStreamingClient:
    return PollyStreamingClient(region=os.environ["AWS_REGION"])

async def synthesize(text: str, voice_id: str) -> bytes:
    client = get_polly_client()
    audio = b""
    async for chunk in client.start_speech_synthesis_stream(
        text=text, voice_id=voice_id
    ):
        audio += chunk
    return audio

The HTTP/2 connection pool inside the library is already shared process-wide, so the cache is purely for client-instance reuse, not for connection reuse.