Perceptron Mk1

Perceptron Mk1 is a closed-source vision model for video understanding and embodied reasoning, with API access and structured outputs for robotics and other physical-world workflows. It also supports image reasoning tasks such as pointing, counting, OCR, and document extraction.

Sprachmodelle

KI-Bilderkennung

KI Agent Entwicklung

Website Besuchen

Overview

Perceptron Mk1 is a closed-source model from Perceptron built for video understanding and embodied reasoning. The company describes it as a layer of intelligence for the physical world, aimed at workloads where perception, timing, and spatial grounding matter more than text-only generation.

The model is positioned for physical AI and robotics workflows, with support for image, video, and embodied reasoning, plus structured outputs such as points, boxes, polygons, tracks, clips, HTML, JSON, and Markdown. The source pages also show developer examples for detection, pointing, counting, OCR, captioning, and promptable visual analysis through APIs.

Features

Video and embodied reasoning

Mk1 is described as purpose-built for video understanding and embodied reasoning, with an emphasis on temporal reasoning across continuous streams instead of isolated snapshots.

Temporal reasoning with thinking traces

The model can reason over time, produce structured breakdowns of events, and optionally turn reasoning off when it is not needed.

Temporal grounding in long videos

It analyzes video at a dynamic frame rate of up to 2 FPS in a 32K-token context window and can return structured timecodes for specific moments.

In-context multimodal matching

The site says one reference image or video can be used to find matching instances across new media, and two pieces of media can be compared without fine-tuning or a labeled dataset.

Advanced image understanding

Mk1 supports pointing, counting, OCR, document extraction, and other image-reasoning tasks, including messy text, analog gauges, and tables with preserved structure.

Structured outputs for robotics workflows

The model is trained to emit spatial primitives such as point, box, polygon, track, and clip, which can be consumed directly by downstream systems.

Use Cases

Robotics data preparation
Use Mk1 to interpret teleoperation footage, label subtask boundaries, extract success or failure signals, and turn raw episodes into supervised data for downstream policy training.
Robotics runtime assistance
Apply the model during inference to return grasp affordances, constraint checks, relational targets, and cross-camera tracking for manipulation or navigation systems.
Industrial inspection and safety
Run the model on factory, warehouse, or construction imagery and video to detect defects, flag safety issues, and read instruments during inspection rounds.
Media search and clipping
Use temporal grounding and structured outputs to clip sports moments, search film and TV libraries, or moderate AI-generated content at scale.
Geospatial monitoring
Analyze satellite, drone, and fixed-camera footage for infrastructure monitoring, construction progress, vegetation encroachment, or post-disaster assessment.

Pros and Cons

Pros

Built specifically for video understanding and embodied reasoning rather than only static image tasks.
Supports structured outputs that can plug into robotics and automation pipelines, including spatial primitives and document formats.
Can handle several practical vision workflows from a single model call, including matching, counting, OCR, and comparison.
Offers temporal reasoning and video grounding features that are useful for long or continuous visual streams.

Cons

The source pages do not publish full pricing, plan limits, or licensing terms in the collected text.
The model is presented as closed-source, so it is not an open-weight option.

FAQ

What is Perceptron Mk1 designed for?

Perceptron Mk1 is built for video understanding and embodied reasoning, with additional support for image reasoning and structured document extraction. The site positions it for physical-world applications rather than general chat.

What kinds of tasks can it handle?

The developer page shows Python-style examples for tasks such as focus/zoom and crop, conversational pointing, in-context learning, object detection, counting, OCR, and captioning. The demo also shows a mode for segmenting one or multiple classes in an image.

How does it work with video and structured outputs?

The site says Mk1 analyzes video at up to 2 FPS within a 32K-token context window and can return structured timecodes, clips, and other spatial outputs such as points, boxes, polygons, tracks, and clips.

Is it open source or available through a commercial license?

The homepage says Mk1 is a closed-source model family release. The site also says developers can use the model through APIs or reach out for a commercial license to the weights.

What does Perceptron cost?

The pricing page does not show published plan details on the collected text, so exact pricing, tiers, and limits are not available from the source pages used here.

Quick Facts

Product: Perceptron Mk1
Category: AI Developer Tool
Primary use: Video understanding and embodied reasoning
Platform: API-based model
Company: Perceptron Inc.
Source domain: perceptron.inc

Perceptron Mk1 Alternativen

AakarDev AI

AakarDev AI helps teams manage AI provider access, project-level setups, logs, and analytics from one dashboard. It supports BYOK workflows and lists providers including OpenAI, Google Gemini, Anthropic, Groq, Mistral AI, and Perplexity AI.

Arduino VENTUNO Q

Arduino VENTUNO Q is an edge AI computer for AI and robotics applications. It combines AI inference and deterministic control on a single board and is designed to work with Arduino App Lab.

Benchspan

Benchspan is an AI agent security platform that discovers agents, blocks prompt injection and data exfiltration in real time, and supports pre-launch red teaming. It is aimed at teams running agents in production and includes Python and TypeScript SDKs.

Edgee

Edgee is an AI gateway for coding agents and LLM-powered apps. It compresses token traffic, routes requests across models, and provides observability and team controls to help reduce cost and keep sessions running.

CreateOS Sandbox

CreateOS Sandbox is an isolated compute environment for running code and agent workloads inside Firecracker micro-VMs. It is designed for workflows that need machine-level isolation, private networking between sandboxes, and programmatic control through SDK, CLI, or MCP.

Codex Plugins

Codex Plugins bundle reusable skills, app integrations, and MCP servers into workflows you can install in the Codex app or use from Codex CLI. They help extend Codex with connected-service tasks, reusable instructions, and shared team workflows.