Discover tools that power great ideas
Gemma 4 is an open model family for advanced reasoning and agentic workflows, with multiple sizes for local and edge multimodal use.
MulmoChat is a research prototype for multimodal AI chat on a canvas, combining conversational text with rich visual and interactive content, plus APIs and ComfyUI.
UNI-1, Luma’s multimodal reasoning model, generates pixels and supports directable, reference-guided image creation for scene completion and transformations.
Agentset is an open-source platform for production-ready RAG apps with reliable search and Q&A, including citations, multimodal ingestion, and metadata filtering.
MiniCPM-o 4_5 is a 9B omni-modal model for full-duplex live interaction with vision, speech, and text—real-time concurrent streaming output.
Gemini Embedding 2 maps text, images, video, audio and documents into one embedding space for multimodal retrieval and classification.
Compact, open-weight multimodal AI model excelling at vision-language tasks, math, and UI understanding. High accuracy, low latency.
Build low-latency, multimodal voice and realtime audio experiences with OpenAI Realtime API—browser voice agents and realtime transcription.
GLM-5 is the next-generation large language model from Zhipu AI, designed for superior reasoning, coding, and multimodal capabilities, setting a new standard for open-source LLMs.
Seedance 2.0 is a unified multimodal audio-video joint generation architecture supporting text, image, audio, and video inputs for comprehensive content reference and editing.
MiniCPM-o 4.5 is a highly capable multimodal AI model designed for vision, speech, and full-duplex live streaming, offering advanced visual understanding, speech synthesis, and real-time interactive capabilities in a compact 9B parameter architecture.
Modaal uses AI agents to build native iOS apps with real iOS architecture, helping you ship solo to the App Store or hand off to a team.
MiniCPM-o 4.5 is an advanced multimodal AI model designed to process and understand visual, speech, and textual data simultaneously. Built with a combination of state-of-the-art architectures such as SigLip2, Whisper-medium, CosyVoice2, and Qwen3-8B, it features a total of 9 billion parameters. This model is engineered to excel in full-duplex multimodal live streaming, enabling real-time, fluid interactions that see, hear, and speak concurrently. Its capabilities make it a versatile tool for applications requiring integrated vision, speech, and language understanding.
Beni AI is a multimodal AI companion that uses real-time voice, video, and text with persistent memory to create emotionally aware, presence-native interactions and scalable content from living IP.
Agentset is an open-source platform for building production-grade AI chat and search applications, with reliable RAG, multimodal support, and a developer-friendly SDK.
模袋云 is an accessible online villa modeling software.
墨刀AI helps product managers turn text or images into prototype pages, then generates structured documents and matching HTML/CSS code.
Skywork turns simple input into multimodal content - docs, slides, sheets with deep research, podcasts & webpages.
墨刀 is an online product design collaboration platform that supports AI-generated prototypes and compatibility with various design files.
Stability AI offers multimodal image and video generation and editing tools, including Dream Studio for creative workflows and enterprise integration.