Ankahi (अनकही) — AAC Voice for Indian Children with Cerebral Palsy

"The unspoken" — giving a voice to 2.5 million Indian children with CP

Hackathon: Gemma 4 Good (Kaggle × Google DeepMind) Tracks: Digital Equity (primary) · Health & Sciences · Google AI Edge · Unsloth Special Mention License: Apache 2.0

What is Ankahi?

Ankahi is an offline, per-child-personalised AAC (Augmentative & Alternative Communication) app for Indian children with cerebral palsy. It runs Gemma 4 E4B fully on-device on a ~₹10,000 Android tablet.

A child taps pictograms, makes sounds, or points the camera at objects. Ankahi predicts the full sentence they intended — then speaks it in the parent's own voice — in Hindi, Punjabi, Bengali, Tamil, Telugu, Marathi, or English, with natural code-switching.

The critical differentiator: per-child LoRA adapters (30MB each) that learn each child's vocabulary, syntax, and language preferences over time, trained on your H100 and deployed to the tablet.

Why this matters

~2.5 million Indian children have cerebral palsy
Commercial AAC devices cost ₹20,000–₹2,00,000 and speak no Indian language naturally
Most of these children go through life without ever communicating a full sentence
Ankahi costs one-twentieth of the cheapest alternative and runs 100% offline

Quick start (H100 training)

git clone https://github.com/your-org/ankahi
cd ankahi
bash scripts/00_env_setup.sh

# Sanity check first — never skip this
python scripts/07_overfit_sanity_check.py

# Then data
python scripts/01_download_arasaac.py
python scripts/02_download_mulberry.py
python scripts/05_generate_personas.py
python scripts/04_generate_synth_dialogues.py
python scripts/06_build_eval_sets.py

# Training
python src/ankahi/training/stage1_base.py --config configs/train_stage1_base.yaml
python src/ankahi/training/stage2_persona.py --persona ananya --config configs/train_stage2_persona_ananya.yaml
# ... repeat for all 5 personas
python src/ankahi/training/stage3_audio.py --config configs/train_stage3_audio.yaml
python src/ankahi/training/stage4_safety.py --config configs/train_stage4_safety.yaml

# Deploy
python src/ankahi/deploy/merge_and_quantize.py
python src/ankahi/deploy/convert_to_litertlm.py

System architecture

Child input (pictogram tap / camera / mic)
        ↓
Gemma 4 E4B INT8 + per-child LoRA adapter (30MB)
        ↓
Full sentence prediction (multilingual, code-switching aware)
        ↓
Parent-voice TTS (AI4Bharat / svara-TTS, zero-shot cloned)
        ↓
Speaker output + large-text screen display

Everything runs on the tablet. Zero bytes to the cloud.

5 Personas trained

Name	Age	Languages	Disability profile	City
Ananya	6	Tamil + English	Spastic quadriplegia	Chennai
Arjun	9	Punjabi + Hindi + English	Dyskinetic CP + mild ID	Ludhiana
Priya	4	Bengali + English	CP + CVI	Kolkata
Rohan	11	Hindi + English	Athetoid CP	Delhi
Zara	7	Marathi + English	Spastic CP	Pune

Repository layout

ankahi/
├── scripts/          Data download, env setup, sanity checks
├── configs/          YAML configs for each training stage
├── src/ankahi/       Core Python package
│   ├── data/         Collators, prompts, persona, augmentation
│   ├── training/     Stage 1–4 training scripts
│   ├── eval/         Metrics, specificity, latency evaluation
│   ├── deploy/       Merge, quantise, convert to .litertlm
│   └── tts/          Parent-voice cloning + synthesis
├── mobile/           Flutter Android app
├── notebooks/        Colab/Jupyter notebooks for each stage
├── writeup/          Technical report (markdown → PDF)
├── demo/             Video script and shot list
└── tests/            Unit tests

H100 compute budget

Stage	Hours	Output
Stage 0: Sanity check	0.5h	Go/no-go
Stage 1: Base multimodal SFT	12h	base adapter (rank 16)
Stage 2: 5× persona LoRA (rank 8)	10h	persona adapters
Stage 3: Audio/dysarthric adapter	10h	audio adapter
Stage 4: Safety tuning	4h	safety-merged base
Stage 5: Merge + quantise + convert	4h	ankahi.litertlm (2.5GB)
Stage 6: TTS voice cloning × 3	3h	voice models
Buffer	10h	—
Total	~53h

Evaluation highlights

BLEU-4 + chrF++ on 500-sample held-out set per persona
5×5 adapter-specificity heatmap — proves personalisation works
Audio disambiguation accuracy (with vs. without audio adapter)
On-device benchmarks across 3 phone tiers (flagship / mid / budget)
Ablations: rank, vision FT on/off, data scaling curve

Partners / data credits

ARASAAC (pictograms, CC-BY-NC-SA)
Mulberry Symbols (CC-BY-SA)
AI4Bharat (Indic-TTS, 13 languages)
TORGO + UA-Speech (dysarthric speech corpora)
Unsloth (Gemma 4 E4B fine-tuning kernels)

Potential clinical partners: Ummeed Child Development Center, CP Guild India, SPASTN

Ethical statement

Ankahi does not make medical diagnoses. It is a communication aid. No data leaves the device. The app stores only what the family explicitly provides. All demo footage of children was recorded with full written parental consent (or substituted with adult collaborators with clear disclosure).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

bhriguverma
/

ankahi