Sprachmodelle

423 Produkte

Argmin AI

Argmin AI helps teams turn their rules, docs, and examples into an AI evaluation they can run before release. It is positioned for product and engineering teams that need quality checks without building custom evaluation code or hiring an ML team.

BaseRT

BaseRT is an LLM runtime for Apple Silicon Macs that runs local models on your own device. It is positioned for on-device inference and a local coding-agent workflow, with public materials emphasizing speed and privacy by keeping execution on-machine.

Kimi K3

Kimi K3 is Moonshot AI’s frontier model for coding, knowledge work, and reasoning. It is available across Kimi.com, Kimi Work, Kimi Code, and the Kimi API, with a 1-million-token context window and native vision support.

Zro

Zro is a private inference endpoint for coding agents that runs on EU infrastructure and supports OpenAI-compatible and Anthropic-compatible access. It is designed for developers and teams that want open-model inference with zero request retention.

derouter.ai

derouter.ai is a web API for Claude and GPT models that aims to mirror the official Anthropic and OpenAI interfaces while offering fixed, discounted token pricing. It also supports CLI workflows such as Claude Code and Codex CLI, plus GPT Image 2 image generation.

SuperCompress

SuperCompress is a query-aware context compressor for LLM applications that reduces input tokens before inference while preserving answer-critical evidence. It is open source, runs on CPU, and is available as a hosted API and Python package.

Muse Spark 1.1

Muse Spark 1.1 is a multimodal reasoning model from Meta Superintelligence Labs for agentic tasks, coding, computer use, and multimodal understanding. It is available in public preview through the Meta Model API and in Thinking mode in the Meta AI app and on meta.ai.

Auriko

Auriko is an LLM inference routing API that lets developers access multiple model providers through one integration. It focuses on cache-aware cost optimization, routing control, and reliability for AI applications.

Opper AI

Opper AI is an EU-hosted AI gateway for accessing 300+ models through an OpenAI SDK-compatible API, with optional control-plane features for routing, observability, guardrails, and compliance. It is aimed at developers and teams that want to run AI applications and agents with pay-as-you-go billing and production governance.

Constellation Gate AI

Constellation Gate AI is a gateway for AI agents that screens requests, redacts sensitive data, records activity in a tamper-evident audit trail, and can reduce token usage. It supports desktop tools, CLI routing, and SDK-based setup without code changes.

LongCat-2.0

LongCat-2.0 is a LongCat AI model announcement highlighting a 1.6 trillion-parameter system trained entirely on domestic chips. The available pages confirm the product’s scale and a separate pricing page, but not usage details or plan structure.

TuneLLM

TuneLLM is an enterprise platform that distills recurring Claude- or GPT-style workflows into smaller fine-tuned models inside your infrastructure. It is aimed at teams that want benchmarked quality on narrow LLM tasks at lower inference cost.

Alvoff Inference

Alvoff Inference is an OpenAI-compatible API for speech-to-text, text-to-speech, embeddings, and chat/code generation. It is built for developers who want to swap in a different base URL, use familiar SDKs, and pay per request.

RunInfra

RunInfra helps teams turn open-source models into production inference stacks by benchmarking GPUs, tuning supported runtime paths, and either deploying a managed API or exporting the stack for self-hosting.

ClinePass

ClinePass is a paid subscription offer for accessing curated open weight models in Cline, with a Product Hunt first-month promotion. It is aimed at developers who want a simpler setup for IDE and CLI coding workflows.

discode.ai

discode.ai is a browser-based AI chat product that routes prompts across many models and adds controls for eco impact, local privacy, and multi-model verification. It helps users choose how each answer should balance cost, confidentiality, and confidence.

Heron

Heron is a passive observability tool for AI agents and LLM APIs. It reconstructs agent turns, tool calls, and LLM interactions from network traffic without requiring SDK changes or an in-path proxy.

Oxlo.ai

Oxlo.ai is an AI inference API with OpenAI-compatible access and request-based monthly pricing. It is designed for developers and AI teams that want predictable costs for assistants, document workflows, and other production inference workloads.

Crewdle Chat

Crewdle Chat brings GPT, Claude, Gemini, and Grok into one chat workspace for business teams. It supports web search, uploaded documents, and token-based billing with no per-seat fees.

TruthAgent

TruthAgent is a web app that runs a question through multiple AI models and surfaces consensus, disagreements, and confidence. It offers a free tier plus credit-based Pro and Pro+ plans for deeper research and decision support.

...