xAI API

What is xAI API?

The xAI API is a developer-facing way to use xAI’s Grok models from your application code. The core purpose of the API is to accept prompts (and, for some models, images) and return generated responses that you can display, process, or structure for downstream use.

The quickstart walks through the end-to-end workflow: create an xAI account and credits, generate an API key, install an SDK, and send your first request to a Grok model using supported endpoints and examples.

Key Features

API-key authentication via environment variable: Configure your code with XAI_API_KEY, which the xAI SDKs read automatically.
SDK support for common languages: Install the xAI SDK for Python or JavaScript to call Grok models without writing raw HTTP requests.
Chat-style text generation: Send system and user messages and sample the model’s output for text responses.
Responses endpoint compatibility: Call https://api.x.ai/v1/responses directly with an API key for model inference.
Multimodal inputs (text + image): For models that support it, include an image URL alongside text in a single request.
Structured Outputs (for supported models): Some models allow enforcing an output schema to control the shape of generated results.

How to Use xAI API

Create an xAI account at accounts.x.ai, then add credits so you can use the API.
Create an API key in the xAI Console under API Keys.
Set XAI_API_KEY either by exporting it in your terminal or adding it to a .env file:
- export XAI_API_KEY="your_api_key"
- XAI_API_KEY=your_api_key
Install an SDK based on your language:
- Python: pip install xai-sdk
- JavaScript: npm install ai @ai-sdk/xai zod
Send a request to a Grok model (example shows grok-4.20-reasoning for text, and grok-4 for image+text). Use either the SDK examples or the direct responses HTTP request.

Use Cases

Build a chat interface for Grok: Create an application that sends user questions and optional system instructions, then displays response.content or completion.output_text.
Generate text with a known model endpoint: Use the POST https://api.x.ai/v1/responses workflow to integrate Grok into services where you prefer direct HTTP calls.
Add image understanding to a Q&A flow: Submit an image URL with a prompt like “What’s in this image?” using the multimodal request format shown in the quickstart.
Enforce output formatting for downstream processing: When using a Grok model that supports it, apply Structured Outputs so results follow a schema you define.
Run quick experiments across runtimes: Switch between Python and JavaScript examples while keeping the same environment variable setup (XAI_API_KEY).

FAQ

How do I authenticate requests to the xAI API?

Create an API key in the xAI Console and set it as XAI_API_KEY (e.g., via export XAI_API_KEY="..." or a .env file). The xAI SDK is configured to read this environment variable automatically.

Which Grok model can I use for my first request?

The quickstart examples use grok-4.20-reasoning for text-only chat-style generation and grok-4 for image+text input.

Can I call the API without an SDK?

Yes. The quickstart includes a direct curl example that posts to https://api.x.ai/v1/responses with a JSON body containing model and input.

How do I send images to Grok?

For models that accept images, include an image URL in the input alongside text (the example uses an input_image / input_text structure in the SDK or a typed content structure in the responses call).

What is Structured Outputs?

The quickstart notes that certain models support Structured Outputs, which lets you enforce a schema for the LLM output. The page references a dedicated “Text Generation Guide” for deeper usage.

Alternatives

Use another LLM provider’s chat/assistants API: If your workflow is “prompt in, generated text out,” you can swap in another vendor’s API using a similar key-based authentication and request format.
Use a framework-agnostic text generation approach: Instead of a vendor-specific SDK, build requests directly against a “completions/responses” style endpoint to keep your integration consistent across languages.
Use multimodal-capable model APIs: If your primary need is image+text understanding, look for providers that explicitly support image inputs in their API request schema, then adapt the request payload accordingly.