edit-mind

What is Edit Mind?

edit-mind is a local-first AI video intelligence platform that indexes videos with transcription, face/object/text analysis, and semantic embeddings for natural language search. It processes videos to extract metadata such as transcription, detected objects, and faces, then stores that information for semantic querying.

The core purpose is to turn an existing set of videos into searchable knowledge—covering whole videos and, where applicable, specific scenes—while running via Docker so it can work on any computer or server that has Docker installed.

Key Features

Background video indexing service: Watches for new video files and queues them for AI-powered analysis so your library stays up to date.
Multi-model video analysis: Extracts metadata including face recognition, transcription, object & text detection, and scene-level analysis.
Vector-based semantic search (ChromaDB): Supports natural-language search over video content using embeddings stored in ChromaDB.
Local execution with Docker: Runs as containerized services using Docker Compose to keep setup modular and deployable on different machines.
Model options for AI/NLP processing: Uses Whisper for transcription and supports choosing between Google Gemini or locally via Ollama (per configuration).

How to Use Edit Mind

Install and run Docker Desktop (or ensure Docker is available on your server).
Clone the repository and run through the provided setup flow.
Expose your media folder to Docker by configuring Docker Desktop file sharing (macOS/Windows). On Linux, file sharing is typically enabled by default.
Create environment files: Download/copy .env.example and .env.system.example into .env and .env.system, then configure required settings.
Set the video folder path (HOST_MEDIA_PATH) and choose your AI model option:
- Ollama: set USE_OLLAMA_MODEL, plus OLLAMA_HOST, OLLAMA_PORT, and OLLAMA_MODEL (and run ollama serve / pull the model first).
- Gemini: set USE_GEMINI and provide GEMINI_API_KEY.
Generate security keys: Set ENCRYPTION_KEY and SESSION_SECRET using the commands shown in the setup guide.
Start the Docker Compose stack (the repo provides both a standard compose file and a CUDA-oriented one for NVIDIA GPUs).

Use Cases

Search by spoken words: Query your library with phrases you remember from audio, relying on transcription extracted from videos.
Find videos containing specific objects or on-screen text: Use natural-language queries tied to object & text detection outputs generated during indexing.
Locate scenes with known faces: Use face recognition-derived metadata to narrow results to videos or scenes where people appear.
Curate and navigate large personal archives: Automatically keep metadata refreshed as new video files are added, then search without manual tagging.
Run on a privacy-focused local environment: Index and search entirely on your own machine (or server) through Docker rather than requiring a hosted workflow.

FAQ

Is Edit Mind ready for production? The repository states it is in active development and not yet production-ready, with expectations for incomplete features and occasional bugs.
Does Edit Mind require Docker? Yes. The setup instructions specify Docker Compose to run everything in containers.
What AI options are supported for processing? The documentation mentions Whisper for transcription and supports either Google Gemini or Ollama for NLP-related tasks, selected via environment variables.
How do I connect the system to my video files? Configure Docker to access your media folder (Docker Desktop file sharing on macOS/Windows) and set HOST_MEDIA_PATH in the .env file to match that folder path.
Where does semantic search data live? The stack includes ChromaDB for vector-based semantic search and PostgreSQL (via Prisma ORM) as the relational database.

Alternatives

Cloud-hosted video search platforms: These typically centralize processing on hosted infrastructure. Compared to edit-mind’s local-first Docker approach, they may trade privacy/control for simpler setup.
Desktop media management tools with manual tagging: Some tools let you organize videos via user-entered tags and metadata. They differ in that they don’t perform AI-based transcription/object/face extraction for semantic search.
Self-hosted transcription + search pipelines: You can build a workflow that transcribes video and then indexes text for search. This would differ from edit-mind by focusing more narrowly on audio/text rather than multi-modal analysis (faces/objects/scenes) and integrated semantic querying.
General vector database search apps: You could use embeddings and a vector database to implement semantic search, but you’d need to handle video ingestion, multi-modal extraction, and scene-level linkage yourself—work that edit-mind packages in its pipeline.

edit-mind

What is Edit Mind?

Key Features

How to Use Edit Mind

Use Cases

FAQ

Alternatives

Alternatives

Wikiwand

Struere

garden-md

Falconer

ClayHog

Grok AI Assistant