Kimi-K2.7-Code icon

Kimi-K2.7-Code

Kimi-K2.7-Code is a coding-focused agentic model from Moonshot AI on Hugging Face with thinking mode, long context, tool use, and official API access.

Kimi-K2.7-Code

Overview

Kimi-K2.7-Code is a coding-focused agentic model from Moonshot AI available on Hugging Face. It is presented as an update to Kimi-K2.6 with stronger performance on real-world, long-horizon coding tasks and improved token efficiency.

The model summary describes a Mixture-of-Experts architecture with 1T total parameters, 32B activated parameters, a 256K context length, and support for thinking mode, tool calling, and image/video inputs through the official API. The deployment guide says the same architecture as Kimi-K2.5/K2.6 can be reused and provides examples for vLLM, SGLang, and KTransformers.

For teams building software engineering assistants or internal coding workflows, the documentation emphasizes end-to-end task completion, reasoning-focused usage, and deployment on common inference engines. The model also exposes OpenAI/Anthropic-compatible API access through Moonshot AI’s platform.

Key features

Agentic coding focus

Built as a coding-focused agentic model on top of Kimi K2.6, with improved support for long-horizon software engineering tasks and end-to-end task completion.

Reduced thinking-token usage

The model page reports roughly 30% lower thinking-token usage than Kimi K2.6, which points to more token-efficient reasoning during coding workflows.

Large MoE architecture

It uses a Mixture-of-Experts architecture with 1T total parameters, 32B activated parameters, 384 experts, and 8 selected experts per token.

Long context window

The context length is listed as 256K, which supports long codebase interactions and extended task context.

Multiple deployment paths

The deployment guide recommends official support for vLLM, SGLang, and KTransformers, and the usage examples show OpenAI/Anthropic-compatible APIs.

Multimodal and tool-use support

The model documentation includes tool calling, thinking-mode reasoning, and image/video input examples in the official API.

Common use cases

  • End-to-end coding tasks

    Use the model as a coding assistant for multi-step software engineering work that benefits from long context, reasoning, and tool use across a repository or project plan.

  • API integration for developer tools

    Deploy it behind an internal API for teams that want OpenAI- or Anthropic-compatible access to a coding model without changing client-side request patterns.

  • Self-hosted inference

    Run it with vLLM, SGLang, or KTransformers when you need a self-hosted inference setup and want to follow the deployment patterns documented by Moonshot AI.

  • Multimodal assistant workflows

    Use the official API examples to process text prompts together with images or video for workflows that need visual understanding alongside coding-oriented reasoning.

  • Long-running agent workflows

    Apply it to persistent agent-style jobs where the model needs to keep working through long-horizon tasks rather than answer a single isolated prompt.

Pros and Cons

Pros

  • Focused on coding and agentic task completion rather than general chat.
  • Long 256K context window is useful for extended repository and workflow context.
  • Official API examples cover text, images, and video inputs.
  • Deployment guidance is available for vLLM, SGLang, and KTransformers.
  • The model page reports lower thinking-token usage than Kimi K2.6.

Cons

  • The documentation says the model supports thinking mode only, and instant mode is not supported.
  • The collected evidence does not include a public model-specific pricing table or usage limits.
  • Some deployment details are example-based and the guide notes that inference engines are changing quickly, so configurations may need adjustment.

FAQ

How can I deploy Kimi-K2.7-Code?

Kimi-K2.7-Code is a coding-focused agentic model on Hugging Face. The deployment guide says the same architecture as Kimi-K2.5/K2.6 can be reused, and example deployments are provided for vLLM, SGLang, and KTransformers.

Does Kimi-K2.7-Code support instant mode?

The model is documented as supporting thinking mode only. The usage notes also say instant mode is not supported, and third-party deployments should keep the reasoning parser set appropriately.

Can Kimi-K2.7-Code work with images or video?

Yes. The usage examples and deployment guide show both text chat and visual inputs, and note that image and video input are supported in the official API.

How do I access the official API?

The model page says you can access the API on platform.moonshot.ai, with OpenAI-compatible and Anthropic-compatible API options.

What does it cost to use the model?

The source pages do not provide a full public pricing breakdown for this model. The Hugging Face pricing page is linked, but no model-specific price or quota is listed in the collected evidence.

Quick Facts

Category
Developer Tool
Model family
Moonshot AI Kimi K2.7 Code
Platform
Hugging Face
Source domain
huggingface.co
API access
platform.moonshot.ai
Context length
256K