Gello icon

Gello

Gello is an Android app that runs a Hugging Face language model locally and exposes it as a Discord bot for always-on on-device AI chats.

Gello

What is Gello?

Gello is an Android app that runs a Hugging Face language model fully on-device and connects it to Discord as a bot. It is designed so people in a Discord channel can talk to the bot directly, while the replies are generated locally on the phone rather than through a cloud API.

The project is built around a single APK and a persistent Discord connection, with the phone running a foreground service and handling incoming messages, prompt assembly, and replies on the device. The repository notes support for Gemma 4 E2B packaged as a .litertlm model from the litert-community Hugging Face organization, and states that .task models are not supported.

Key Features

  • On-device model inference on Android: Gello runs the language model locally on the phone, so responses are generated without sending prompts to an external LLM service.
  • Discord bot integration: it connects natively to Discord and can reply in channels where the bot is installed, making it suitable for group chat interactions.
  • Foreground service architecture: the app maintains a persistent connection to the Discord Gateway WebSocket, which is necessary for an always-on bot running from a phone.
  • Rolling channel context buffer: incoming messages update a per-channel buffer, with a default of 20 messages, so replies can use recent conversation history.
  • Automatic speculative decoding support: when the loaded .litertlm model includes MTP drafter support, Gello enables speculative decoding to improve reply speed.
  • Single-phone deployment: the repository emphasizes that the full stack fits into one Android app, avoiding Termux, a laptop, or a separate model server.

How to Use Gello

Install the Android APK on a compatible phone, configure it as a Discord bot, and load a supported .litertlm model such as the tested Gemma 4 E2B build. Once running, the app keeps a foreground service active, listens for Discord messages, builds prompts from recent channel context, and posts generated replies back into the channel.

Use Cases

  • Group chat assistant: use Gello to place a local AI participant inside a Discord channel so multiple people can ask questions and receive replies in the same thread.
  • Repurposing an old Android phone: run a 3-to-5-year-old spare phone as a dedicated, always-on local AI box instead of leaving it unused in a drawer.
  • Offline or self-contained inference setup: keep model execution on the device for users who want to avoid a hosted LLM endpoint or a separate server machine.
  • Lightweight edge deployment experiment: test how a small on-device model behaves as a chat bot when paired with Android, Discord, and LiteRT-LM.
  • Local model benchmarking and iteration: explore how speculative decoding and .litertlm model support affect real-time response behavior on mobile hardware.

FAQ

Does Gello run the model in the cloud?
No. The repository describes Gello as an on-device bot: prompts and responses stay on the Android phone, and the model runs locally through LiteRT-LM.

What model formats does it support?
The source says tested support is for litert-community/gemma-4-E2B-it-litert-lm, and that any .litertlm model from the litert-community Hugging Face organization should work. It explicitly says .task models are not supported.

Does it require a laptop or separate server?
No. The project is presented as a single Android APK that talks to Discord directly, without Termux, a laptop, or a separate model server.

How does it handle conversation context?
Gello maintains a per-channel rolling buffer of recent messages, with a default size of 20 messages, and uses that context when generating a response.

Why is speculative decoding mentioned?
The repository explains that Gemma 4’s MTP heads and LiteRT-LM’s speculative decoding path help make on-device reply generation faster by producing more than one token per decoding step when supported.

Alternatives

  • OpenClaw: a closer adjacent project mentioned in the repository. It also exposes a local AI through chat apps, but it is framed as a desktop product for macOS, Windows, and Linux rather than a phone-first Android app.
  • Hosted chatbot integrations: traditional Discord bots powered by cloud LLM APIs. These are easier to deploy if you want managed inference, but they do not keep generation on the phone or avoid external API keys.
  • Self-hosted local model servers: setups that run a model on a separate machine and connect that model to chat apps. They offer more general-purpose infrastructure than Gello, but they require more components than a single Android app.
  • Other on-device Android AI apps: mobile apps that run models locally without Discord integration. These may share the same inference model family, but they are not necessarily designed to participate in a group chat as a bot.