Fluent AI: Offline & Cloud LLM

7.3K installs
Ratings not yet available
1.6K monthly active users
$<10K monthly revenue est.
IAP 72% · Ad 28%
Install Trends
Weekly +772
Trending
Monthly +2.9K
Trending

Fluent AI: Offline & Cloud LLM Summary

Fluent AI: Offline & Cloud LLM is a ad-supported, with in-app purchases Android app in Productivity by ReadHeights Technologies Private Limited. Released in Oct 2025 (7 months ago). It has about 7.3K+ installs Based on AppGoblin estimates, it reaches roughly 1.6K monthly active users and generates around $<10K monthly revenue (72% IAP / 28% ads). Store metadata: updated May 19, 2026.

Recent activity: 772 installs this week (2.9K over 4 weeks) showing exceptional growth View trends →

Store info: Last updated on Google Play on May 19, 2026 .


0★

Ratings: 0

5★
4★
3★
2★
1★

Screenshots

App screenshot
App screenshot
App screenshot
App screenshot

App Description

AI chat assistant with offline models - private, customizable, multimodal

🤖 Fluent AI — Private Offline LLM + Claude, GPT-4 & Gemini

Run AI entirely on your device — no cloud, no account, no data sent anywhere. Then switch to Claude, GPT-4 or Gemini when you need more power. One app. Every AI. Always private.

✨ WHAT'S NEW IN v1.3

🏥 MEDICAL AI (MedGemma)
• Google's MedGemma 4B — clinical Q&A and biomedical text, 100% on-device
• Requires accepting Google's Health AI Developer Foundation Terms
• Not a substitute for professional medical advice

🤖 AGENTIC MODE
• On-device AI agent with 12 built-in skills
• Runs tasks autonomously: calendar events, web research, document digest, trip planning
• Agent Task Inspector — see every reasoning step in real time
• 3 free agent runs/day — no subscription needed to start
• Scheduled tasks available with Premium

⚡ LITERT MTP — UP TO 2× FASTER
• Gemma 4n E2B/E4B with Multi-Token Prediction on Android GPU
• Speculative decoding — more tokens per step, same quality
• Tok/s display measures decode-phase speed only for accurate results

👁️ ON-DEVICE VISION (Android)
• Attach photos using Gemma 4n — processed entirely on-device
• No image uploaded to any server, ever

🔒 PRIVACY FIRST
• Conversations stay on your device
• Optional local models = zero cloud data
• API keys encrypted with AES — never stored in plain text
• No mandatory account required

🧠 LOCAL AI MODELS
• GGUF / llama.cpp: Gemma 3/4, Qwen 3.5, Phi-4, Llama, DeepSeek R1, Nemotron, MedGemma
• LiteRT (Android GPU/NPU): Gemma 4n E2B/E4B — vision + MTP speculative decoding
• Apple MLX: Native Metal on Apple Silicon and iOS 18+ (A17 Pro+)
• Q5_K GPU acceleration on Qualcomm Adreno (alongside Q4_0)
• Device-aware model recommendations based on your RAM and chipset
• Browse, download, and manage models in-app — no sideloading needed
• Import custom GGUF from HuggingFace URL or device storage

☁️ CLOUD AI
• Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google)
• OpenRouter — 200+ models via a single API key
• Streaming, vision, and tool calling across all providers

🌐 ONLINE SERVERS
• Ollama Cloud and self-hosted Ollama
• LM Studio, vLLM, LocalAI, and any OpenAI-compatible /v1 API
• Multiple server profiles with per-profile encrypted auth headers

🎤 VOICE MODE
• 5 conversation modes: Normal, Interview, Learning, Storytelling, Translation
• Animated waveform, voice commands (speed, repeat, stop)
• Quick-capture mic button directly in the chat input bar

📚 KNOWLEDGE BASES (RAG)
• Import PDFs, TXT, and Markdown — AI references your docs when answering
• Semantic search for relevant context, topic and project organisation

🔧 POWER FEATURES
• Tool calling: Calculator, DateTime, Weather, Web Search, mem0 Memory
• MCP servers: GitHub, Slack, Notion, Supabase, and 20+ presets
• Code execution: Python, Bash, Node.js from code blocks (desktop + mobile JS)
• Model benchmarking: tok/s, TTFT, MMLU-50 quality score, shareable PNG cards
• Slash commands: /agent, /clear, /export, /voice, /template and more
• Per-chat thinking toggle for Qwen3, DeepSeek R1, Nemotron reasoning models
• URL context injection — paste a link, AI reads the page for context
• Polish Before Send — AI rewrites your draft before you hit send
• Continue button — resumes responses cut off at the token limit

📁 CHAT ORGANISATION
• Folders, tags, and cross-chat full-text search across every message
• HuggingFace model browser with bookmarks and memory fitness badges
• Conversation branching and message reactions

🌟 PREMIUM (OPTIONAL)
• Ad-free experience
• Scheduled agent tasks (recurring or one-time)
• Priority feature access and advanced analytics

📱 PERFECT FOR
✓ Privacy-focused users — local models, zero cloud data
✓ Android power users — LiteRT GPU/NPU with MTP acceleration
✓ Developers — benchmark GGUF, LiteRT, and MLX side-by-side
✓ Healthcare researchers — MedGemma on-device, no upload needed
✓ Students — knowledge bases for study documents and materials
✓ Professionals — agentic tasks, document Q&A, and tool calling