Introduction
Wfloat lets you ship text-to-speech that runs inference inside your app rather than sending text to a hosted inference endpoint.
Wfloat currently ships three packages:
@wfloat/wfloat-webfor browser applications@wfloat/react-native-wfloatfor React Native applications on iOS and Androidwfloatfor local Python applications and scripts
At a high level, the Web and React Native packages are built around the same product flow:
- You get a
modelIdfrom your Wfloat account. - A device loads the model. The model is downloaded if the device does not already have it cached.
- Speech is generated locally in the app.
The Python package also runs speech locally, but it does not require a Wfloat model credential. Instead, it loads the public wfloat/wfloat-tts model directly.
Your modelId
modelId means your Wfloat model credential.
You can find it in your account page after sign up. The UI labels it as Model Credential.
What happens on first load
The first time a device loads a model, the package downloads the model assets it needs onto the device. The model stays cached on the device, so when that user comes back later the package can use the local model again instead of downloading it each time. Speech runs locally in the app rather than sending text to a hosted inference endpoint.
Packages
The Web package and React Native package are intentionally close to each other at a product level so teams can work with the same model, voices, and overall integration pattern across platforms. The Python package uses the same voice and emotion set with a Python-native API and CLI.
If you are ready to integrate, continue to the package-specific quick starts:
Voice IDs
The current model exposes these voice IDs:
skilled_hero_manskilled_hero_womanfun_hero_manfun_hero_womanstrong_hero_manstrong_hero_womanmad_scientist_manmad_scientist_womanclever_villain_manclever_villain_womannarrator_mannarrator_womanwise_elder_manwise_elder_womanoutgoing_anime_manoutgoing_anime_womanscary_villain_manscary_villain_womannews_reporter_mannews_reporter_woman
Emotions
The current model supports these emotions:
neutraljoysadnessangerfearsurprisedismissiveconfusion
Intensity
Intensity controls how strongly the selected emotion is expressed. It is a value between 0 and 1.
Speed
Speed controls the speaking rate. 1.0 is the default speed, 0.75 is slower, and 1.25 is faster.
Pricing
Wfloat pricing tiers are tied to your app's monthly active users.
For this product, an MAU is counted as someone who has used loaded the text-to-speech model onto their device in the last 30 days.