Krisp Voice Translation API

Krisp Voice Translation API is a self-serve real-time speech translation API for developers. Translates live speech, returns audio and transcripts, with background voice cancellation and custom vocabulary.

Translate

AI Speech Recognition

AI Noise Cancellation

Speech-to-Text

Visit Website

Overview

Krisp Voice Translation API is a real-time speech-to-speech translation API for developers building accuracy-critical voice applications. The page positions it as the same translation engine used in Krisp CX Enterprise, but offered as a self-serve API.

The API is designed to start from a short-lived session key, then stream audio into a session and receive translated audio, source transcripts, translated transcripts, and flow-control events back. The page also shows built-in background voice cancellation, custom vocabulary support, and a translation dictionary for handling domain-specific terms.

Krisp says the API supports 61 languages and any-to-any language pairs, with locale variants such as US Spanish, French Canadian, Egyptian Arabic, Catalan, Basque, and Galician. The developer page also notes a free sign-up credit and 96% accuracy on live calls, real accents, and real noise.

Core capabilities

Real-time speech-to-speech translation

Translates live speech directly between languages in real time, with the page positioning it for accuracy-critical use cases.

Broad language coverage

Supports 61 languages and any-to-any language pairs, including locale-specific variants such as US Spanish, French Canadian, Egyptian Arabic, Catalan, Basque, and Galician.

Background voice cancellation

Uses built-in background voice cancellation to handle noise, competing voices, and reverberation without requiring audio preprocessing.

Custom vocabulary and dictionary

Accepts custom vocabulary and a translation dictionary so domain terms can be recognized and translated consistently.

Developer callbacks and session events

Exposes source and translated transcripts, translated audio, flow-control events, and error callbacks through the SDK/session flow.

SDK and WebSocket integration

Provides Python and Node.js SDK examples as well as a WebSocket session configuration flow for direct implementation.

Practical uses

Live multilingual conversation apps
Build an application that translates live spoken conversations as they happen, while still surfacing source and translated transcripts for the UI or audit trail.
Contact-center voice workflows
Add translation to customer-support or call-center workflows where audio noise, accents, and live audio quality matter and the team needs built-in noise handling.
Domain-specific terminology
Handle specialized terms such as medication names, product names, or internal jargon by defining a custom vocabulary and translation dictionary.
Real-time SDK integrations
Stream audio from a client to the API and receive callbacks for translated audio, event state, and errors, making it suitable for interactive voice experiences.
Developer prototyping and testing
Prototype translation flows quickly with the playground and the documented Python or Node.js examples before moving into a full product implementation.

Pros and Cons

Pros

Self-serve access is explicitly called out, with a path to get an API key and start in a playground.
The API is built for live speech and returns both translated audio and transcript data during a session.
Background voice cancellation is built in, reducing the need for separate audio cleanup steps.
Custom vocabulary and a translation dictionary support domain-specific terminology.
The page shows SDK examples for Python and Node.js, which lowers integration friction.

Cons

The developer page does not publish full API limits, authentication details, or a complete setup guide in the provided text.
Pricing is not fully transparent on the public page, so teams may need to sign up or contact sales for exact commercial terms.
The source coverage is thin on supported deployment environments beyond the Python and Node.js examples shown.

FAQ

How do developers get started with the Voice Translation API?

Yes. The page says the Voice Translation API is self-serve: you sign up, get an API key, and start translating without a sales call or procurement cycle.

What integration patterns does the API support?

The source shows Python and Node.js SDK examples, plus a single JSON configuration for a WebSocket session. It also shows callbacks for source text, translated text, translated audio, events, and error handling.

What does the API return during a session?

The page describes real-time speech-to-speech translation, source and translated transcripts, translated audio output, and background voice cancellation. It also shows custom vocabulary and a translation dictionary in the session config.

How many languages are supported?

The source says the API supports 61 languages and any-to-any language pairs, while the contact-center page describes Krisp AI Voice Translation as supporting 80+ languages for call-center use. The developer page itself highlights 61 languages.

Is there public pricing for the API?

Pricing is shown under Krisp’s Developers plans as 'Voice Translation API' with self-serve access, and the page highlights 60 mins of free sign-up credit. The pricing page does not provide a public per-minute or per-seat rate for this API.

Quick Facts

Category: Developer tool / voice translation API
Primary use: Real-time speech-to-speech translation
Language support: 61 languages, any-to-any pairs
Access model: Self-serve API key and playground
SDK examples: Python and Node.js
Source domain: krisp.ai

Krisp Voice Translation API Alternatives

Sanota

Sanota turns spoken memories and interviews into clear written stories for personal storytelling, family history and shared memories, with guided prompts and subscriptions.

Carbon Voice

Carbon Voice is an async voice messaging app for teams and individuals, with transcripts, AI catch-up, and cross-device access.

QuickQuill

QuickQuill is a local-first macOS dictation and transcription app to record meetings, summarize audio, and export notes without the cloud.

Speech to Text Converter

Speech to Text Converter is a browser-based transcription tool for live dictation and uploaded audio or video files. Free for short tasks, Pro offers unlimited transcription, AI summaries, translation, speaker ID, and advanced exports.

Dictato

Dictato is a Mac dictation app that transcribes speech to text in any app with an offline, on-device workflow. Includes cleanup, translation, and one-time purchase.

Tavus

Tavus is an AI video platform for real-time face-to-face agents, digital twins, and AI companions, with APIs and multilingual workflows for developers.