UStackUStack
MiniMax-AI/cli icon

MiniMax-AI/cli

MiniMax-AI/cli is the official MiniMax AI Platform CLI to generate text, images, video, speech, music, plus vision & web search.

MiniMax-AI/cli

What is MiniMax-AI/cli?

MiniMax-AI/cli is the official command-line interface (CLI) for the MiniMax AI Platform. It lets you generate and process multiple media types—text, images, video, speech, and music—directly from an agent workflow, terminal, or automation pipeline.

The CLI is designed to be usable across agent environments (“from any agent or terminal”) and supports both global and CN regions via different API endpoints.

Key Features

  • Multi-modal generation in one CLI: Generate text, images, video, speech (TTS), and music from command-line prompts and inputs.
  • Text chat with streaming and structured output: Supports multi-turn chat, streaming, system prompts, and JSON output using the mmx text chat command.
  • Image generation controls: Create images with aspect ratio settings and batch generation (--n), and save results to an output directory.
  • Async video generation with progress tracking: Start video jobs asynchronously (--async) and later download results using task/file identifiers.
  • Speech synthesis with voice, speed, and streaming: Generate TTS with 30+ voices, adjust speed, and stream audio output to a media player.
  • Music generation features: Produce lyrics-based songs, generate auto lyrics from prompts (--lyrics-optimizer), create instrumental tracks, and generate covers from reference audio.
  • Vision and search from the command line: Use mmx vision to describe images and mmx search for web search, including JSON output mode.
  • Authentication and region configuration: Login with an API key and manage region settings (example includes setting region to cn).

How to Use MiniMax-AI/cli

  1. Install the CLI.
    • For AI agents (OpenClaw, Cursor, Claude Code, etc.): add the skill using npx skills add MiniMax-AI/cli -y -g.
    • For terminal use: install globally with npm install -g mmx-cli.
  2. Authenticate with your MiniMax token plan API key:
    • mmx auth login --api-key sk-xxxxx
  3. Run a media command. For example:
    • Text: mmx text chat --message "What is MiniMax?"
    • Image: mmx image "A cat in a spacesuit"
    • Speech: mmx speech synthesize --text "Hello!" --out hello.mp3
    • Video: mmx video generate --prompt "Ocean waves at sunset"
    • Music: mmx music generate --prompt "Upbeat pop" --lyrics "[verse] La da dee, sunny day"
  4. Use JSON mode when needed: pipe input (e.g., cat messages.json) into the chat command and request --output json.

Use Cases

  • Agent workflows (coding assistants): Add this CLI as a “skill” to an AI agent so the agent can call commands like mmx text chat, mmx image, or mmx video generate while following agent conventions.
  • Terminal-based content creation: Generate images, speech, or music from scripts without building a separate UI (for example, creating assets and saving them to an output path).
  • Streaming text responses for interactive work: Use mmx text chat --stream to handle incremental output in terminal sessions when you want to observe responses as they generate.
  • Async media pipelines: Start a video generation job with --async, then retrieve and download results later using mmx video task get --task-id ... and mmx video download --file-id ....
  • Media transformation and music covers: Generate instrumental tracks or create cover versions from a reference audio file using mmx music cover with --audio-file or --audio.

FAQ

  • What media types can the CLI generate? The README lists support for text, images, video, speech (TTS), and music, plus vision (image understanding/description) and web search.

  • How do I authenticate? Use mmx auth login --api-key sk-xxxxx. The CLI also provides commands like mmx auth status, mmx auth refresh, and mmx auth logout.

  • Can I use streaming output? Yes. Text chat includes a --stream option, and speech synthesis supports a --stream mode (example pipes output to mpv -).

  • How do I work with JSON outputs for chat/search? The CLI examples show --output json for commands like text chat (including piping messages from a file/STDIN) and for search.

  • Is there support for both Global and CN endpoints? The project notes “Seamless Global (api.minimax.io) and CN (api.minimaxi.com) support,” and includes an example command to set the region to cn (mmx config set --key region --value cn).

Alternatives

  • HTTP API clients for the MiniMax Platform: If you prefer direct integration, you can call the platform endpoints from your own scripts instead of using this CLI. This offers more control but requires handling authentication and request logic.
  • Other agent “tool/skill” CLIs: Many AI agents support attaching tools/skills; you could use a different tool connector for agent-driven media generation. The difference is how the tool is surfaced to the agent and how commands are invoked.
  • Dedicated UI-based media generators: For non-developer workflows, browser-based tools may simplify prompt-to-output interaction. Compared with a CLI, they typically trade automation and scripting flexibility for a guided interface.