100% On-Device • Zero Cloud Dependencies

Privacy-First AI Assistant for Android

Run powerful LLMs and Stable Diffusion completely offline. Complete on-device intelligence with enterprise-grade encryption, RAG document understanding, and sophisticated memory management.

Android 8.0+ Apache 2.0 Version 1.1.2 Discord
500+
Downloads
4.8★
Rating
Zero
Telemetry

Complete On-Device Intelligence

Enterprise-grade AI capabilities that run entirely on your Android device. No cloud dependencies, no subscriptions, complete digital sovereignty.

Text Generation

Run any GGUF model locally - Llama, Mistral, Gemma, Phi. 8-15 tokens/sec on flagship devices with streaming output.

  • 500MB to 20GB+ models supported
  • All GGUF quantizations (Q2 to F16)
  • Function calling with grammar enforcement

Image Generation

Stable Diffusion 1.5 with censored & uncensored variants. 30-90 seconds generation with inpainting support.

  • Text-to-image with custom parameters
  • Inpainting with mask support
  • NPU/CPU optimized backends

RAG System

Inject documents (PDF, Word, Excel, EPUB) with semantic search and encrypted knowledge bases.

  • Semantic search with embeddings
  • Multi-RAG support & graph traversal
  • Encrypted RAG sharing (.neuron)

Memory Vault

Hardware-backed AES-256-GCM encryption with crash-recoverable WAL and LZ4 compression.

  • Content deduplication with SHA-256
  • Three-tier caching system
  • ACID-compliant transactions

Document Processing

Parse PDF, Word, Excel, EPUB with automatic chunking and metadata extraction.

  • Multi-sheet Excel support
  • Table structure preservation
  • MIME type auto-detection

Model Store

Browse and download models from HuggingFace directly in-app with concurrent downloads.

  • In-app HuggingFace integration
  • Resume interrupted downloads
  • Model categories & search
100% Offline RAG System

Document Intelligence Without Internet. Ever.

Transform your documents into queryable knowledge bases with on-device semantic understanding. Zero cloud dependency, zero API calls, zero internet required. Perfect for medical professionals, lawyers, researchers, and anyone handling sensitive information.

On-Device Processing

Parse PDF, Word, Excel, EPUB files locally. Documents never leave your device—process everything offline with Apache POI and PDFBox engines.

Local Embedding Engine

all-MiniLM-L6-v2 model runs entirely on-device. Generate 768-dimensional embeddings with cosine similarity search—no external APIs.

Hardware-Backed Encryption

AES-256-GCM with Android KeyStore. Admin passwords, read-only users, and encrypted .neuron packets—all secured locally on your device.

Multi-RAG Queries

Load multiple knowledge bases simultaneously. Top-K retrieval with automatic context injection—all processed locally in under 100ms.

No Internet Required

Your documents, embeddings, and queries never leave your device. Complete RAG pipeline runs 100% offline.

RAG System Architecture

On-Device Processing Pipeline

User Input

PDF • Word • Excel • EPUB • Text

Document Parser

Apache POI • PDFBox • EpubLib

Offline

Smart Chunking

Semantic Segmentation • Overlap

Embedding Engine

all-MiniLM-L6-v2 • 768D Vectors

On-Device

Memory Vault

AES-256-GCM • LZ4 • Dedup

Encrypted
Query Flow

User Query

Natural Language Question

Semantic Search

Cosine Similarity • Top-K

< 100ms

Context Injection

Augmented Prompt → LLM

768D
Embeddings
AES-256
Encryption
100% Offline

Supported Document Formats (All Processed Locally)

PDF

PDFBox

Word

Apache POI

Excel

Apache POI

EPUB

EpubLib

Plain Text

Native

Zero Data Collection. Complete Digital Sovereignty.

Your data never leaves your device. No telemetry, no analytics, no cloud dependencies. Open source for full transparency.

Offline-First

Works completely offline after model downloads. No internet required for AI inference.

AES-256-GCM

Military-grade encryption with hardware-backed key storage in Android KeyStore.

Zero Telemetry

No analytics, crash reporting, or tracking. What happens on your device stays on your device.

Open Source

Apache 2.0 license. Audit the code yourself or review community security assessments.

Trusted by Privacy-Critical Professionals

🏥
Healthcare
HIPAA-compliant patient data handling
⚖️
Legal
Confidential document analysis
🔬
Research
Sensitive data processing

Technical Architecture

Enterprise-grade AI processing, entirely on-device

System Architecture

UI Layer
Jetpack Compose
  • • ChatScreen
  • • ImageScreen
  • • RAG Manager
  • • Model Browser
AI Engines
Native Performance
  • • llama.cpp (GGUF)
  • • LocalDream (SD 1.5)
  • • Embedding Engine
  • • RAG Query Engine
Memory Vault
AES-256-GCM
  • • Encrypted Storage
  • • LZ4 Compression
  • • WAL Recovery
  • • Deduplication
Bidirectional Data Flow

Text Generation Pipeline

1
User Input
Chat message received via Jetpack Compose UI
2
RAG Query (Optional)
Semantic search across enabled RAGs • Cosine similarity • Top-K retrieval (<100ms)
3
GGUF Inference (llama.cpp)
Load model • Context injection • Token generation • 8-15 tokens/sec on flagship devices
4
Memory Vault Storage
AES-256-GCM encryption • LZ4 compression • Full-text indexing • WAL crash recovery
5
Real-time Streaming
Token-by-token UI updates • Kotlin Flow • Compose recomposition
Performance: First token: 1-3s • Generation: 8-15 tokens/sec • RAG query: <100ms

Image Generation Pipeline

Text Prompt Input
User prompt + optional negative prompt + generation parameters
Model Loading
Stable Diffusion 1.5 • LocalDream engine • NPU/CPU backend selection
DDPM Sampling
10-50 iterative refinement steps • Real-time preview • CFG guidance
Image Output
512×512 to 1024×1024 • Optional NSFW filter • Memory Vault storage
Generation Parameters
Resolution: 512×512 → 1024×1024
Inference Steps: 10-50 steps
CFG Scale: 1.0-20.0
Seed: Reproducible
Advanced Features
  • Inpainting: Mask-based regeneration
  • Pony model support (anime/cartoon)
  • Safety checker (optional NSFW filter)
  • Intermediate result streaming
Gen Time: 30-50s (flagship) • 60-90s (mid-range)

RAG System Workflow

RAG Creation Pipeline
1
Document Ingestion
PDF • Word • Excel • EPUB • Plain Text
2
Intelligent Chunking
Preserve structure • Extract metadata • Auto-segmentation
3
Embedding Generation
all-MiniLM-L6-v2 • 768 dimensions • Batch processing
4
Encrypted Storage
AES-256-GCM • LZ4 compression • Deduplication
RAG Query & Augmentation
1
Query Embedding
Convert user query to 768D vector
2
Semantic Search
Cosine similarity • Multi-RAG aggregation • <100ms
3
Top-K Retrieval
Return 3-5 most relevant chunks • Rank by score
4
Context Injection
Augment LLM prompt • Source attribution • Enhanced generation
Performance: <100ms per RAG (1000 chunks)

Model Loading & Inference Pipeline

Model Source
  • • HuggingFace browser
  • • Manual file picker
  • • Pre-installed models
Device Detection
  • • RAM tier classification
  • • CPU core detection
  • • Optimal params
Model Loading
  • • Memory-mapped I/O
  • • JNI → llama.cpp
  • • KV cache init
Inference
  • • Tokenization (BPE)
  • • Token generation
  • • Real-time streaming
Performance Metrics
Load Time (8B): 5-15s
First Token: 1-3s
Generation: 8-15 tok/s
Context Process: 500+ tok/s
Inference Parameters
• Temperature (0.0-2.0)
• Top-k & Top-p sampling
• Min-p sampling
• Repeat penalty
• System prompt
• Reproducible seed
Device Tiers
LOW: 6GB RAM • 1-3B Q4
MID: 8GB RAM • 7-8B Q4
HIGH: 12GB+ • 8B Q6
8-15
Tokens/Second
Flagship devices (8B Q4)
<100ms
RAG Query Time
1000 chunks per RAG
768D
Embedding Size
all-MiniLM-L6-v2-Q5_K_M
AES-256
Encryption
GCM mode + WAL recovery

Ready to Own Your AI?

Join users running AI completely on their terms

Minimum Requirements

  • • Android 8.0+ (API 26)
  • • 6GB RAM
  • • 4GB free storage
  • • ARM64 or x86_64 processor

Recommended

  • • Android 10+
  • • 12GB RAM
  • • 10GB free storage
  • • Snapdragon 8 Gen 1+

Frequently Asked Questions

Does this really work completely offline?

Yes. After downloading models and the embedding model, all AI processing (text generation, image generation, RAG queries, document parsing) happens entirely on your device with zero internet dependency.

Is my data actually private?

Yes. Nothing leaves your device. All processing is local. The code is open source - you can verify yourself or review community audits. We collect zero telemetry, analytics, or tracking data.

How much storage do I need?

Minimum 4GB for a single 7B model. Recommended 10GB for multiple models, SD 1.5, and RAGs. Large setups with many models can use 20GB+.

Can I use custom models?

Yes. Any GGUF text model works. For image generation, Stable Diffusion 1.5 checkpoints are supported (.safetensors or .ckpt).

What's the performance like?

Text: 8-15 tokens/sec on flagship devices (12GB RAM) with 8B Q4_K_M models. Image: 30-50s on SD 8 Gen 3 flagship, 60-90s on mid-range. Model load time: 5-15 seconds.