UStackUStack
Agentset favicon

Agentset

Agentset is an open-source platform for building production-grade AI chat and search applications, with reliable RAG, multimodal support, and a developer-friendly SDK.

Agentset

What is Agentset?

What is Agentset?

Agentset is an open-source infrastructure platform for developers building production-grade Retrieval-Augmented Generation (RAG) applications. It powers AI chat and search experiences that deliver reliable, cited answers over your own documents and data, without requiring you to design, tune, and maintain a complex RAG pipeline from scratch.

Most RAG demos look impressive in controlled environments but break down when faced with real users, large document volumes, and messy, multimodal data. Agentset is built specifically for those real-world, production conditions. It combines robust ingestion, hybrid search, agentic reasoning, and automatic citations into a single system that works out of the box, so teams can ship accurate AI search and Q&A inside their products in minutes instead of months.

Key Features

  • Production-grade RAG out of the box
    Agentset provides an end-to-end RAG stack—ingestion, indexing, retrieval, reasoning, and answer generation—designed for production workloads. It is optimized for reliability and consistency as data volume, usage, and complexity scale.

  • Accurate answers with benchmark-level performance
    The platform is optimized for high-accuracy responses on your own data before any custom tuning. Agentset targets industry-setting benchmarks such as MultiHopQA and FinanceBench, making it well-suited to complex, multi-step, and domain-specific question answering.

  • Multimodal support (text, images, tables, graphs, and more)
    Agentset natively supports images, tables, and graphs alongside traditional text. This allows you to answer questions across the full breadth of your knowledge base—PDFs, presentations, spreadsheets, image-heavy documents, and structured artifacts—rather than being limited to plain text.

  • Automatic citations for trustworthy answers
    Every answer generated by Agentset includes citations back to the underlying sources. Users can inspect exactly which documents and passages were used, increasing trust, debuggability, and compliance in sensitive domains such as healthcare and finance.

  • Metadata filters and granular retrieval control
    Agentset supports metadata-based filtering so you can constrain answers to the right subset of documents (by customer, project, region, date, permission level, etc.). This is essential for multi-tenant products and role-based access control scenarios.

  • Hybrid search with reranking
    The retrieval layer combines vector search with traditional keyword and metadata-based approaches, plus reranking to maximize precision. This improves both recall and relevance, reducing hallucinations and missed results.

  • Agentic reasoning built in
    Agentset brings agentic reasoning capabilities into the stack, enabling multi-step analysis, multi-document synthesis, and complex Q&A without having to build your own orchestration logic.

  • Extensive file type support
    With 22+ file formats supported, Agentset can ingest documents in formats such as:
    .PDF, .DOCX, .PPT, .PPTX, .XLSX, .ODT, .TXT, .MD, .CSV, .TSV, .HTML, .XML, .EML, .MSG, .JPEG, .PNG, .BMP, .HEIC and more. This breadth of support simplifies bringing your existing knowledge repositories into one searchable, AI-ready index.

  • Developer-first SDKs (JavaScript & Python)
    Agentset offers JavaScript and Python SDKs that make it simple to ingest data, configure namespaces, and query your AI agents. A typical workflow involves a few lines of code to create a namespace, upload documents (by file or URL), and start answering questions.

  • Model agnostic and infrastructure-flexible
    You are not locked into a single model or vendor. Agentset allows you to choose your own:

    • Vector database (e.g., Pinecone, Qdrant)
    • Embedding model
    • LLM (e.g., OpenAI, Anthropic Claude, Google AI, xAI Grok, Mistral, Qwen, DeepSeek, and others) This flexibility lets you optimize for cost, latency, data residency, and compliance.
  • MCP Server integration
    Through its Model Context Protocol (MCP) server, Agentset can bring your knowledge base into external applications that support MCP, allowing AI agents in other environments to query your documents securely and efficiently.

  • AI SDK integration
    Agentset integrates with the AI SDK ecosystem, making it straightforward to embed RAG-powered chat and search widgets into your own applications, dashboards, or customer-facing products.

  • External preview links & chat interface
    Quickly capture feedback from stakeholders and users using customizable chat interfaces and preview links. This enables rapid iteration on prompts, retrieval configurations, and answer formatting before going fully live.

  • Trusted by real-world teams
    Agentset is used by teams in high-stakes domains such as healthcare, public sector, and fintech. Testimonials highlight improved reliability, support for complex image search, and the ability to replace legacy search solutions (like Algolia) with better results in under an hour of work.

How to Use Agentset

Using Agentset typically follows a straightforward developer workflow, from setup to production deployment:

  1. Set up your project and obtain an API key

    • Sign up for Agentset and generate an API key.
    • Install the SDK in your application:
      • JavaScript/TypeScript: npm install agentset
      • Python: install the corresponding Python package (e.g., via pip).
  2. Create a namespace for your data
    Namespaces logically isolate collections of documents, tenants, or environments (e.g., production, staging, or per-customer).

    import { Agentset } from "agentset";
    
    const agentset = new Agentset({ apiKey: "agentset_xxx" });
    const namespace = agentset.namespace("ns_1234");
    
  3. Ingest your documents
    Upload files directly or by URL, along with optional metadata for later filtering.

    const ingestJob = await namespace.ingestion.create({
      payload: {
        type: "FILE",
        fileUrl: "https://example.com/document.pdf",
        fileName: "my-document.pdf"
      },
      config: {
        metadata: { foo: "bar" }
      }
    });
    
    • Use supported formats such as PDFs, Office docs, emails, images, markdown, and more.
    • Attach metadata (e.g., customer ID, department, access level, tags) to control retrieval later.
  4. Configure retrieval and models (optional)

    • Select your preferred vector database, embedding model, and LLM.
    • Enable hybrid search and reranking as needed.
    • Define filters to ensure tenant isolation and access control.
  5. Embed chat or search into your app

    • Use the AI SDK to create chat or search endpoints that call Agentset.
    • Build UI components (chat widgets, search bars, side panels) that query Agentset and render answers with citations.
    • Optionally use the MCP server integration to expose your knowledge base to external AI tools.
  6. Test, preview, and iterate

    • Share preview links with stakeholders to validate answer quality.
    • Evaluate performance on your own test sets, particularly multi-hop and domain-specific questions.
    • Adjust retrieval parameters, filters, and prompts based on feedback.
  7. Monitor and scale in production

    • As usage grows, adjust infrastructure choices (databases, models) to match cost and latency requirements.
    • Continuously ingest new documents to keep your knowledge base current.
    • Use metadata and namespaces to manage multi-tenant or multi-product deployments.

Use Cases

1. In-product AI search and chat for SaaS platforms

SaaS products with large help centers, technical documentation, and customer-specific configurations can embed Agentset-powered search to deliver accurate, contextual answers. Instead of static FAQ pages and brittle keyword search, users can ask natural language questions and receive cited, trustworthy responses drawn from release notes, configuration guides, and support tickets.

2. Healthcare and medical knowledge assistants

In medicine, reliability and traceability are critical. Agentset can power internal tools that help clinicians, researchers, or medical operations teams query guidelines, research papers, and internal protocols. Automatic citations and grounded answers reduce the risk of hallucinations, helping teams validate that answers are backed by evidence while keeping workflows efficient.

3. Public sector and municipal information portals

Organizations working with municipalities or governments often manage hundreds or thousands of pages of regulations, policies, and public documents, many of which contain images, charts, and tables. Agentset’s multimodal capabilities support complex image and document search, enabling staff or citizens to find precise information quickly across long, heterogeneous documents.

4. Financial research, compliance, and analysis tools

Financial teams must answer complex, multi-hop questions spanning filings, internal reports, and market data. Agentset’s emphasis on benchmark-level performance for tasks like FinanceBench makes it well-suited for powering research assistants, compliance checkers, and analyst tools that require precise answers over dense, technical documents.

5. Enterprise knowledge bases and internal copilots

Large enterprises with fragmented knowledge (wikis, PDFs, email archives, intranets, and file shares) can use Agentset to unify search across departments. Hybrid search, metadata filtering, and model-agnostic infrastructure support allow IT teams to maintain control over where data lives, which models are used, and how access is governed, while employees get a single, powerful AI assistant for internal knowledge.

FAQ

What is Agentset?

Agentset is an open-source platform and infrastructure layer for building production-ready RAG applications. It provides ingestion, indexing, retrieval, reasoning, and answer generation capabilities so developers can embed accurate AI chat and search into their products without building the entire RAG pipeline in-house.

Who is Agentset for?

Agentset is built for developers and product teams who want to ship reliable AI features—such as chatbots, internal copilots, or advanced search—over their own data. It is suitable for startups, mid-sized companies, and large enterprises that need production-grade performance, multi-tenant support, and flexibility in model and infrastructure choices.

Can large enterprises use Agentset?

Yes. Agentset is designed to handle real-world, large-scale document sets, complex data types, and high usage. Its support for metadata filters, namespaces, and model-agnostic infrastructure makes it a strong fit for enterprise environments that require strict data separation, compliance, and integration with existing stacks.

Is Agentset a framework like LangChain or LlamaIndex?

Agentset is not just a client-side orchestration framework. While frameworks like LangChain or LlamaIndex help you assemble RAG workflows in code, Agentset provides a managed, production-ready backend for ingestion, retrieval, and reasoning. You can integrate Agentset with those frameworks if you like, but its goal is to reduce the need to build and operate your own retrieval infrastructure.

Can Agentset work with my existing stack and infrastructure?

Yes. Agentset is model agnostic and supports popular vector databases, LLM providers, and embeddings. You can choose components such as Pinecone or Qdrant for vector storage and models from providers like OpenAI, Anthropic, Google AI, xAI Grok, Mistral, Qwen, DeepSeek, and others. Integration via JavaScript, TypeScript, Python SDKs, MCP server, and AI SDK makes it straightforward to embed Agentset into existing services and frontends.

Why should I use Agentset instead of building my own RAG system?

Building a robust RAG system from scratch involves designing ingestion pipelines, handling many file types, tuning retrieval, implementing hybrid search and reranking, managing citations, and maintaining infrastructure as requirements change. This can take months of engineering time and ongoing maintenance. Agentset gives you these capabilities out of the box, allowing your team to focus on product features and user experience instead of low-level retrieval plumbing.

How does Agentset handle real-world documents?

Agentset is optimized for messy, real-world data. It supports over 22 file formats—including PDFs, Office documents, emails, images, and HTML—and performs the parsing, chunking, and indexing needed for effective retrieval. Multimodal support ensures that images, graphs, and tables are also surfaced appropriately during search and Q&A, rather than being ignored.

What happens as requirements change over time?

As your product evolves, you can adjust which vector databases, models, and retrieval strategies you use without redesigning everything from scratch. Agentset’s model-agnostic architecture and rich metadata filtering make it easier to adapt to new compliance needs, geographies, data types, or performance constraints while keeping a consistent developer interface.

Agentset | UStack