Skip to main content

Introduction

Wfloat lets you ship text-to-speech that runs inference inside your app rather than sending text to a hosted inference endpoint.

Wfloat currently ships three packages:

At a high level, the Web and React Native packages are built around the same product flow:

  1. You get a modelId from your Wfloat account.
  2. A device loads the model. The model is downloaded if the device does not already have it cached.
  3. Speech is generated locally in the app.

The Python package also runs speech locally, but it does not require a Wfloat model credential. Instead, it loads the public wfloat/wfloat-tts model directly.

Your modelId

modelId means your Wfloat model credential.

You can find it in your account page after sign up. The UI labels it as Model Credential.

What happens on first load

The first time a device loads a model, the package downloads the model assets it needs onto the device. The model stays cached on the device, so when that user comes back later the package can use the local model again instead of downloading it each time. Speech runs locally in the app rather than sending text to a hosted inference endpoint.

Packages

The Web package and React Native package are intentionally close to each other at a product level so teams can work with the same model, voices, and overall integration pattern across platforms. The Python package uses the same voice and emotion set with a Python-native API and CLI.

If you are ready to integrate, continue to the package-specific quick starts:

Voice IDs

The current model exposes these voice IDs:

  • skilled_hero_man
  • skilled_hero_woman
  • fun_hero_man
  • fun_hero_woman
  • strong_hero_man
  • strong_hero_woman
  • mad_scientist_man
  • mad_scientist_woman
  • clever_villain_man
  • clever_villain_woman
  • narrator_man
  • narrator_woman
  • wise_elder_man
  • wise_elder_woman
  • outgoing_anime_man
  • outgoing_anime_woman
  • scary_villain_man
  • scary_villain_woman
  • news_reporter_man
  • news_reporter_woman

Emotions

The current model supports these emotions:

  • neutral
  • joy
  • sadness
  • anger
  • fear
  • surprise
  • dismissive
  • confusion

Intensity

Intensity controls how strongly the selected emotion is expressed. It is a value between 0 and 1.

Speed

Speed controls the speaking rate. 1.0 is the default speed, 0.75 is slower, and 1.25 is faster.

Pricing

Wfloat pricing tiers are tied to your app's monthly active users.

For this product, an MAU is counted as someone who has used loaded the text-to-speech model onto their device in the last 30 days.