UStackUStack
Gemma 4 icon

Gemma 4

Gemma 4 is an open model family for advanced reasoning and agentic workflows, with multiple sizes for local and edge multimodal use.

Gemma 4

What is Gemma 4?

Gemma 4 is an open model family designed to run on a range of developer and edge hardware. It targets advanced reasoning and “agentic workflows,” extending beyond basic chat to support tasks that require multi-step logic and tool use.

Gemma 4 is released under an Apache 2.0 license and is positioned to complement Google’s Gemini models by giving developers an open-model option that can be run locally and fine-tuned for their own tasks.

Key Features

  • Multiple model sizes for different hardware: Gemma 4 is released in four sizes—Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense—so developers can choose capacity versus runtime needs.
  • Agentic workflow support: Native support for function-calling, structured JSON output, and native system instructions to help build agents that interact with tools and APIs.
  • Advanced reasoning: Demonstrated improvements on math and instruction-following benchmarks that require multi-step planning and deeper logic.
  • Code generation for local use: Supports high-quality offline code generation, enabling a local-first AI code assistant workflow.
  • Multimodal input (video, images, and audio on edge sizes): All models natively process video and images for tasks such as OCR and chart understanding; the E2B and E4B models also include native audio input for speech recognition and understanding.
  • Long-context processing: Edge models support a 128K context window, and larger models support up to 256K, enabling prompts that include long documents or repositories.
  • Multilingual capability: Natively trained on over 140 languages for broad-language application development.

How to Use Gemma 4

  1. Choose a size that matches your hardware and latency needs (E2B/E4B for edge/local multimodal use; 26B/31B for more capable reasoning on suitable GPUs/workstations).
  2. Run the model weights locally and integrate it into your application workflow.
  3. Fine-tune on your tasks when you want task-specific performance; the source notes that Gemma 4 is sized to run and fine-tune efficiently on hardware.
  4. Use model capabilities such as function-calling and structured JSON outputs when building agent-like flows that call tools and produce machine-readable results.

Use Cases

  • Build an autonomous tool-using agent: Use function-calling plus structured JSON output to let the model execute multi-step workflows that interact with external tools or APIs.
  • Local-first coding assistant: Run Gemma 4 offline on a workstation for code generation without relying on remote inference, and structure responses to fit developer workflows.
  • OCR and chart understanding in documents: Send images (and video content) to the relevant model variant to extract text via OCR or interpret charts.
  • Speech-enabled edge applications: Use E2B or E4B with native audio input for speech recognition and understanding in a low-latency context.
  • Long-form document analysis: Feed long documents or repository context into models with up to a 256K context window to support tasks that require sustained reasoning.

FAQ

  • Is Gemma 4 open source? Gemma 4 is released under an Apache 2.0 license.

  • What model sizes are available? The family is released in Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.

  • Does Gemma 4 support tool use for agents? Yes. The source specifies native function-calling, structured JSON output, and native system instructions for agentic workflows.

  • What kinds of inputs can Gemma 4 handle? All models natively process video and images. The E2B and E4B models also support native audio input for speech recognition and understanding.

  • How much context can it process? Edge models provide a 128K context window, and larger models offer up to 256K.

Alternatives

  • Other open-weight LLM families: If you primarily need an open model you can run locally, you can compare Gemma 4 against other open-weight language model families that offer different size tiers and context lengths.
  • Proprietary cloud-based agent platforms: If you prefer managed services for agent execution and tool orchestration rather than local inference, cloud-based offerings may reduce infrastructure effort, at the cost of running models remotely.
  • Multimodal models from other vendors: For OCR/video/chart + speech needs, compare against multimodal model families that explicitly support the modalities you plan to use (image/video and audio).
  • Model orchestration frameworks (agent runtimes): If your main goal is reliable tool-calling and structured outputs, consider agent orchestration libraries/frameworks that can run with multiple underlying model providers, including open models.