speech-core is a C++17 library for on-device speech orchestration, including VAD, streaming and batch STT, diarization, TTS, and a voice-agent pipeline. It runs locally and uses optional ONNX Runtime or LiteRT backends for model inference.
Voiser AI Voiceover turns text into spoken audio for voiceovers, with multilingual voice options and style controls for different narration needs. It supports a web studio workflow and shows free, paid, and enterprise paths on the site.
Tico é um assistente de IA para Windows que acompanha o cursor, entende o que está na tela e orienta o usuário por voz. A página indica uso gratuito com limite diário e planos pagos com mais usos e suporte prioritário.
Yeta AI is a browser-based tool that translates and dubs public YouTube videos in real time using AI voices. It is designed for watching tutorials, lectures, and other long-form videos in more than 10 languages without relying on subtitles.
Morph is a web-based reading platform for public-domain classics that combines text, synced narration, and an AI assistant. It helps readers switch between reading and listening, browse a curated library, and get book-specific help without leaving the page.
FlowSpeech is a context-aware text-to-speech studio that turns scripts and uploaded files into human-like audio. It offers multiple generation modes, pause and emotion control, and a free plan alongside paid tiers.
xAI’s Grok Speech to Text and Text to Speech APIs let developers add transcription and speech generation to apps through REST and WebSocket endpoints. The product supports multilingual STT, expressive TTS, and usage-based pricing.
Gemini 3.1 Flash TTS is Google’s preview text-to-speech model for generating expressive AI speech with fine-grained control over style and delivery. It is available across the Gemini API, Google AI Studio, Vertex AI, and Google Vids.
Guardrails 2.0 is ElevenLabs’ control layer for ElevenAgents, designed to keep AI voice agents on-topic, policy-aligned, and safer to deploy in production. It is built for teams using voice agents in support, sales, marketing, reception, and internal workflows.
Official HeyGen API documentation for building AI avatar videos, translations, lipsync, and interactive video-agent sessions. It supports direct API use plus MCP and CLI-style workflows for developers and AI agents.
Smallest.ai Lightning TTS is a text-to-speech API for generating spoken audio from text with low latency, multilingual support, and fast voice cloning. It is aimed at developers and product teams building voice agents, narrated content, and other production speech workflows.
Voxtral TTS is Mistral’s text-to-speech model for generating lifelike, multilingual speech for voice agents and enterprise voice workflows. It supports short-reference voice adaptation, low-latency output, and access through Mistral Studio, Le Chat, the API, and open weights on Hugging Face.
Gemini 3.1 Flash Live is Google’s real-time audio and voice model for natural dialogue across developer, enterprise, and consumer surfaces. It is available in preview for developers through Google AI Studio and powers experiences in Gemini Live and Search Live.
listen. turns article URLs, PDFs, or pasted text into audio that can be played in a podcast app. It also offers a daily article feed for chosen topics and a Chrome extension for faster capture while browsing.
Voizematic is an AI voice agent platform that automates inbound and outbound phone calls, appointment booking, and follow-up actions. It is designed for businesses that want to deploy phone agents without code and connect them to Google Workspace.
Clipchamp 的 AI 画外音生成器是一项在线文本转语音功能,可为视频生成旁白和配音。它支持多语言语音选择、语速与音色调整,并可直接在浏览器中使用。
CAMB.AI Streams dubs live audio in multiple languages in real time for broadcasts on platforms like YouTube, Twitch, and X. It plugs into existing live workflows using common streaming protocols and avoids a post-production step.
TADA is Hume AI's open-source speech-language model for generating speech with one-to-one text-acoustic synchronization. It is aimed at developers and researchers who need fast, reliable text-to-speech that can also fit on-device or long-form use cases.
Ondoku 是一款在线文字转语音软件,可将文本或图片中的内容转换为语音,并支持下载为 .mp3。它提供免费额度、分层付费方案和多语言语音选择,适合个人、教育和商业用途。
Coursebox’s AI Training Video Generator creates training videos from scripts or slide decks, with avatar-led narration, multilingual output, and delivery options for LMS or SCORM use. It is aimed at course builders and training teams that need a faster way to produce instructional video content.