Fluent AI: Offline & Cloud LLM

Developer: ReadHeights Technologies Private

Category: Productivity

Add App to Comparison

7.3K installs

Ratings not yet available

1.6K monthly active users

$<10K monthly revenue est.

IAP 72% · Ad 28%

Install Trends

Weekly +772

Trending

Monthly +2.9K

Trending

Fluent AI: Offline & Cloud LLM Summary

Fluent AI: Offline & Cloud LLM is a ad-supported, with in-app purchases Android app in Productivity by ReadHeights Technologies Private Limited. Released in Oct 2025 (7 months ago). It has about 7.3K+ installs Based on AppGoblin estimates, it reaches roughly 1.6K monthly active users and generates around $<10K monthly revenue (72% IAP / 28% ads). Store metadata: updated May 19, 2026.

Recent activity: 772 installs this week (2.9K over 4 weeks) showing exceptional growth View trends →

Store info: Last updated on Google Play on May 19, 2026 .

SDKs, Trackers & Permissions

App not yet scanned for SDKs.

App Details

Store ID: com.readheights.fluentai

First Released: 2025-10-29

Store Last Updated: 2026-05-19

In-App Purchases: Yes

Ads: Yes

Website: readheights.com

App Store

Crawl Status: Success

AppGoblin First Crawled: 2025-11-15

AppGoblin Last Crawled: 2026-05-27

Ads & App-Ads.txt

Ads.txt Last Crawled: 2025-10-20

Ads.txt Crawl Status: Success

AppGoblin SDK Scans

App not yet analyzed for SDKs.

The download and scan process is automated but can require manual troubleshooting. Feel free to reach out if your request does not complete in 24hrs or if you have any questions.

0★

Ratings: 0

5★

4★

3★

2★

1★

Screenshots

App Description

AI chat assistant with offline models - private, customizable, multimodal

🤖 Fluent AI — Private Offline LLM + Claude, GPT-4 & Gemini

Run AI entirely on your device — no cloud, no account, no data sent anywhere. Then switch to Claude, GPT-4 or Gemini when you need more power. One app. Every AI. Always private.

✨ WHAT'S NEW IN v1.3

🏥 MEDICAL AI (MedGemma)
• Google's MedGemma 4B — clinical Q&A and biomedical text, 100% on-device
• Requires accepting Google's Health AI Developer Foundation Terms
• Not a substitute for professional medical advice

🤖 AGENTIC MODE
• On-device AI agent with 12 built-in skills
• Runs tasks autonomously: calendar events, web research, document digest, trip planning
• Agent Task Inspector — see every reasoning step in real time
• 3 free agent runs/day — no subscription needed to start
• Scheduled tasks available with Premium

⚡ LITERT MTP — UP TO 2× FASTER
• Gemma 4n E2B/E4B with Multi-Token Prediction on Android GPU
• Speculative decoding — more tokens per step, same quality
• Tok/s display measures decode-phase speed only for accurate results

👁️ ON-DEVICE VISION (Android)
• Attach photos using Gemma 4n — processed entirely on-device
• No image uploaded to any server, ever

🔒 PRIVACY FIRST
• Conversations stay on your device
• Optional local models = zero cloud data
• API keys encrypted with AES — never stored in plain text
• No mandatory account required

🧠 LOCAL AI MODELS
• GGUF / llama.cpp: Gemma 3/4, Qwen 3.5, Phi-4, Llama, DeepSeek R1, Nemotron, MedGemma
• LiteRT (Android GPU/NPU): Gemma 4n E2B/E4B — vision + MTP speculative decoding
• Apple MLX: Native Metal on Apple Silicon and iOS 18+ (A17 Pro+)
• Q5_K GPU acceleration on Qualcomm Adreno (alongside Q4_0)
• Device-aware model recommendations based on your RAM and chipset
• Browse, download, and manage models in-app — no sideloading needed
• Import custom GGUF from HuggingFace URL or device storage

☁️ CLOUD AI
• Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google)
• OpenRouter — 200+ models via a single API key
• Streaming, vision, and tool calling across all providers

🌐 ONLINE SERVERS
• Ollama Cloud and self-hosted Ollama
• LM Studio, vLLM, LocalAI, and any OpenAI-compatible /v1 API
• Multiple server profiles with per-profile encrypted auth headers

🎤 VOICE MODE
• 5 conversation modes: Normal, Interview, Learning, Storytelling, Translation
• Animated waveform, voice commands (speed, repeat, stop)
• Quick-capture mic button directly in the chat input bar

📚 KNOWLEDGE BASES (RAG)
• Import PDFs, TXT, and Markdown — AI references your docs when answering
• Semantic search for relevant context, topic and project organisation

🔧 POWER FEATURES
• Tool calling: Calculator, DateTime, Weather, Web Search, mem0 Memory
• MCP servers: GitHub, Slack, Notion, Supabase, and 20+ presets
• Code execution: Python, Bash, Node.js from code blocks (desktop + mobile JS)
• Model benchmarking: tok/s, TTFT, MMLU-50 quality score, shareable PNG cards
• Slash commands: /agent, /clear, /export, /voice, /template and more
• Per-chat thinking toggle for Qwen3, DeepSeek R1, Nemotron reasoning models
• URL context injection — paste a link, AI reads the page for context
• Polish Before Send — AI rewrites your draft before you hit send
• Continue button — resumes responses cut off at the token limit

📁 CHAT ORGANISATION
• Folders, tags, and cross-chat full-text search across every message
• HuggingFace model browser with bookmarks and memory fitness badges
• Conversation branching and message reactions

🌟 PREMIUM (OPTIONAL)
• Ad-free experience
• Scheduled agent tasks (recurring or one-time)
• Priority feature access and advanced analytics

📱 PERFECT FOR
✓ Privacy-focused users — local models, zero cloud data
✓ Android power users — LiteRT GPU/NPU with MTP acceleration
✓ Developers — benchmark GGUF, LiteRT, and MLX side-by-side
✓ Healthcare researchers — MedGemma on-device, no upload needed
✓ Students — knowledge bases for study documents and materials
✓ Professionals — agentic tasks, document Q&A, and tool calling