Text to speech generation
Generate speech from text with controls for emotion, pacing, and model parameters. The product pages also describe low-latency and real-time generation for speech output.
Fish Audio is an AI voice platform for text to speech, voice cloning, and speech to text. It offers emotionally controllable voice generation, multilingual output, and developer access through APIs and SDKs.
Fish Audio is an AI voice platform for text to speech, voice cloning, and speech to text. Its site focuses on emotionally controllable voice generation, with tools for creators, developers, and teams that need narration, character voices, or voice agents.
The product pages describe a web app and developer platform built around real-time and low-latency speech generation, plus APIs and SDKs for integrating voice features into applications. Fish Audio also highlights a large voice library, multilingual output, and voice cloning from short audio samples.
Generate speech from text with controls for emotion, pacing, and model parameters. The product pages also describe low-latency and real-time generation for speech output.
Create a digital voice from a short sample and reuse it across outputs. Fish Audio says it can clone a voice from as little as 10 seconds of audio.
Use emotion tags and special tags to shape delivery, including options such as whispering, laughing, sighing, pause, and emphasis. The site positions this as part of its emotional control workflow.
Access the product through a web experience, REST API, Python SDK, and JavaScript SDK. The developer page also shows streaming support and example requests.
Work with multiple languages and native accents. Fish Audio lists support for English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
Use the platform for speech to text as well as voice generation. The developer and home pages present speech to text alongside TTS and cloning.
Turn scripts into voiceovers for videos, explainers, and social content. The site emphasizes scene-matched narration with emotion tags and tone control for YouTube-style production.
Produce chapter narration, audiobooks, and long-form spoken content with pacing and emotion control. Fish Audio also references publish-ready output and audiobook-oriented workflows.
Clone a signature voice or design a branded persona for games, animation, and interactive stories. The product pages mention dynamic emotions and easy-to-use API access for voice character work.
Add a natural voice layer to support agents and chatbots. The home page specifically calls out conversational chatbots and low-latency speech for customer support and virtual agents.
Integrate speech generation or transcription into applications through the API and SDKs. The developer page shows REST, Python, and JavaScript options for teams that want programmatic control.
Fish Audio supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish, and says it is continuously adding more languages.
Fish Audio says its voice cloning can work from as little as 10 seconds of audio. The cloned voice can then be used for text to speech and can speak in multiple languages.
The product pages describe Fish Audio as supporting text to speech, voice cloning, and speech to text through a web app, APIs, and SDKs.
The pricing page shows a free tier, paid plans, and an enterprise option with contact sales. It also says API access is available for premium subscribers.
Fish Audio says the free plan allows free generations for personal, non-commercial use, while premium subscribers can use verified voices they own for commercial purposes.
蓝藻AI是一款在线AI配音与语音合成产品,可将文字转成语音,并支持自助声音克隆。页面信息显示它面向短视频、有声书等需要配音的内容场景。
Noiz AI is an AI text-to-speech, voice cloning, and voice design tool for creating lifelike speech from text. It also lets users shape voice delivery, including emotion, within the same workflow.
Gemini 3.1 Flash TTS is Google’s preview text-to-speech model for generating expressive AI speech with fine-grained control over style and delivery. It is available across the Gemini API, Google AI Studio, Vertex AI, and Google Vids.
Ondoku 是一款基于浏览器的文字转语音软件,可将文本转换为可下载的 .mp3 语音,并提供免费额度与付费方案。它支持多语言朗读、图片朗读以及按规则商用。
Typecast is an online AI voice generator that turns text into life-like speech with emotional delivery and a selection of hyper-realistic voices. It is a browser-based tool for creating spoken audio from written content.
魔音工坊 (Moying Gongfang) es una plataforma inteligente de texto a voz (TTS) en línea que convierte texto escrito en locuciones de alta calidad utilizando voces humanas realistas con diversos acentos.